Can AI Create a White Painting?
First, Some of the Human Art.
Frequently cited as the inspiration for many future monochrome works.
White on White, 1918 by Kazimir Malevich, Image Credit: MoMA
White Painting [three panel], 1951 by Robert Rauschenberg, Image Credit: SFMOMA
Not white, but monochrome in the same spirit.
Untitled (IKB 79), 1959 by Yves Klein, Image Credit: Tate
Untitled, 1965 by Ryman Robert, Image Credit: Dia
White Panel II, 1985 by Ellsworth Kelly, Image Credit: High Museum of Art
A perfectly smooth, featureless expanse of light. Colorless and all colors at the same time. A white canvas in a white frame in a white room lit by white light. Nothing, not darkness, not even absence. Perhaps, by necessity, your awareness focused on this idea, but only as a gateway into the space where the most subtle, ancient, and remote details would loom large if they were there at all.
“I called them clocks. If one were sensitive enough that you could read it, that you would know how many people were in the room, what time it was, and what the weather was like outside.”
- Robert Rauschenberg. White Painting, 1951
So maybe this is a test? The long path of civilization has passed through a point where human genius had a thought that spurred art into existence that was a solid white canvas. Not that white canvases had never existed before, but some now were created as a result of ideas to be expressed. It was not a background to be painted over, but the foreground, the focus, the result. Now, can AI do that?
First Prompt Experiments with DALLE
Let’s start by seeing if an image generation model can produce an image that is nothing but the color white.
Higher Concept
We’re close, we have a simple blank canvas, but ideally there wouldn’t even be the canvas. And we’re still saying what we want to see, not what meaning creates the image. Let’s try some different approaches.
SDXL
Okay, so that was a lot with DALL-E 3, let’s try a different model, Stable Diffusion XL. We can use image-to-prompt models that are designed to generate prompts for Stable Diffusion models.
CLIP-interrogator allows us to generate prompts for stable diffusion 1, 2, and XL models. If we prompt it with the white.jpg that is the banner of this page, we get some bizarre responses.
CLIP ViT-L-14/openai (for Stable Diffusion 1)
there is a man riding a surfboard on the beach, 144p, - signature, laughing, minimalist svg, abcdefghijklmnopqrstuvwxyz, $100000000, minimalist, college, on 16k, 5 4 s
CLIP ViT-H-14/laion2b-s32b-b79k (for Stable Diffusion 2)
“there is a man riding a surfboard on the beach, naver fanpop, vines. tiffany blue, 7 feet tall, high contract, frontshot, high quality photos, incredible hd detail, 4 legs, splotch, 2 5 year old”
CLIP ViT-bigG-14/laion2b_s39b_b160k (for Stable Diffusion XL)
fails - runs out of memory
Img2prompt yields a similar prompt for Stable Diffusion 1:
a man riding a wave on top of a surfboard, a screenshot by Nicomachus of Thebes, behance, postminimalism, behance hd, quantum wavetracing, furaffinity
Gradio-client-demos/comparing-captioning-models generates captions for an image from multiple image-to-prompt models. It seems BLIP-large also sees a surfer in our white.jpg. None of them seem particularly promising.
So let’s try that surfer prompt and some of the others we’ve tried with DALL-E 3.
Discussion So Far
Ask for a white background, and maybe if that was in the model’s training data, you get something indistinguishable from the art for which we are looking. Ask for a white canvas on a white wall with no other details visible, and the models struggle - details are present. They are trained on details, trained to match details large and small. Ask for a minimalist image of a white surface and we get an image of a canvas, not the image itself for which we are looking. Ask for Rauschenberg’s White Painting and they will hit their copyright guardrails. None of those results matter though.
We want these images to be the result of an idea of what they mean, not what they are. Rauschenberg’s White Painting was not just a clock, but also a reaction to the abstract expressionism of the time. He painted religious triptychs of them. He insisted that the creator of these paintings doesn’t matter and had them painted over and re-painted by others. There is deep, layered meaning encoded into all the monochrome paintings shown at the start of this article. Can modern neural networks, with billions of parameters linked with adaptable connections, produce a field of ones across the board, every output perfectly maximized, every output enlightened? What idea does that for them, without telling them to do it explicitly? Will those ideas be like ours? Probably those ideas will be like ours, the models are trained on our ideas. All they have ever seen is our ideas. All they have ever seen, so far.
So let’s dig deeper.
Both Bull and Self Transcended
Quite apart from the Minimalist art of the 20th century, Zen buddhists started creating paintings of nothing much earlier - in the 12th century A.D. The eighth image of the ten ox-herding images, “Both Bull and Self Transcended”, depicts an image that contains nothing, on purpose. The ideas that led to the creation of that image are incredibly deep and layered. Here is one of the oldest surviving such images, from 1278 A.D.:
Both Bull and Self Transcended, 1278, Image Credit: metmuseum.org
So what happens when we try the translated text as prompt?
More Context
What if we use the opening paragraph of this article as a prompt?
What if we use this entire article? Then DALL-E returns a long discussion that sounds like it understands.
Let’s give the prompt from our chat with DALL-E 3 to SDXL and see what happens.
So close in form and coming from all the ideas we’ve discussed so far, but still, no.
So Close (more discussion)
We wanted to see if a couple of the current models can generate an image without telling them explicitly what the image is. Like getting them to draw an apple without saying apple (“An image of a red fruit that grows on a tree, commonly used in pies.”). It is necessary in the case of white paintings because explicitly, there is nothing (An image of “”, please?). The white canvas is just the shadow of the actual thing, the finger pointing at the moon.
Perhaps with more time it would be useful to try some of the automatic, iterative prompt generation tools that are being developed. Or to get our hands dirty in the code and do some gradient descent towards those sweet spots in the latent space that create all white images, and see what prompts take us there. Models like CLIP are used to do just that, but they gave us the surfer prompt, so there is clearly room for improvement.
Industrialization
Let’s ask ChatGPT for prompts, take the best and add them back and ask for more.
For a simple measure of how well we are doing, we’ll measure the average absolute difference in RGB values over all channels from the mean. Lower is better, range is 0 to 255.
If you want to follow along in code, see this colab notebook.
Round 0
“Can you give me ten wildly different prompts that you think would give an image that contains nothing at all?”
That second image is pretty even, just too dark. The third isn’t too bad either.
Round 1
Let’s take the best prompt from round 0 and generate variations on that. We want to focus on the color white.
Can you generate ten wildly different variations on the following prompt, that you think would give the most perfect, blank, white image? Try incorporating abstract, artistic, philosophical, mathematical or other concepts if you think they might help.
‘An empty, pure white space with no light, details, or elements, representing the absence of everything.’
The fifth image isn’t terrible.
Round 2
Let’s use the best 3 prompts from the first two rounds to see if we can get anything better.
Getting there, the 6th and 10th images aren’t terrible.
Round 3
Let’s try variations on the five best prompts we’ve seen, as well as the prompt that got us so close with SDXL.
The fourth and seventh images win this round.
Round 4
Let’s try variations on the two best prompts from round 3.
The seventh image wins this round. The best bright image we’ve seen yet.
When I asked ChatGPT for prompts, it was so excited it just went ahead and gave me these images instead: I see why they were excited. Our first results under 20 in the industrial phase, and two of them!
Wait a minute, let’s review.
Let’s check the scores of all the real and manually prompted images from the first parts of this article.
Real Images
The human artifacts are the only images getting under 5, so far.
Manually Prompted Images from DALLE
The rabbit in a snowstorm comes in at 18!!!
The only other images close to 20 or less are low/no concept images.
SDXL Images
The SDXL background image (the fourth one) is actually an 8, there really is a gradient there. The last SDXL image is also an 8. By far the lowest score we’ve seen from a high concept image.
Round 5
Let’s try subtle variations of the five best prompts we’ve seen.
meh
Back to SDXL one more time
Let’s take the best prompt from the industrial phase (13.5, from round 4) and see how SDXL handles it.
Final Results and Thoughts
Still room for improvement.
The goal was to see if AI could create a white painting. Not just a pure white image, but something that is the result of some inspiration, idea, or concept.
For a much more accessible (and video!) history of white paintings and the problems they cause, I recommend this Vox article
For all ten images of the ox-herder’s parable, with translated text and history, I recommend The Met Museum’s page.
Site Repo: codyznash/white_paintings
Colab for Prompt Development: Notebook
March 18th, 2024
Can AI Create a White Painting by CZ Nash is licensed under CC BY-NC-SA 4.0