I have to admit, I was really impressed with the recent advances in artificial image generation. I feel like it was just yesterday that I’ve played with funny and interesting toys where you could describe in plain text what you want to see and then watch AI algorithms slowly, very slowly iterate through trippy abstract images to distil and perfect whatever they “thought” is the thing you wanted to see. Painfully slow process that rendered more or less abstract images that were clearly computer generated, as such images a sane painter (or the one that wants to get paid for the work) would not paint.
Boy, how times have changed in those… um… huh, one year? AI artists of today are as far removed from the old algorithms as an art student is from the preschool kid when it comes to composition and acuity.
Although there are couple of incredibly advanced generators, one of them caught my eye: Midjourney. It’s a commercial service, but everyone gets a free trial, a chance to generate small number of images to see if the service is worth their money. The way Midjourney operates is a little bit unusual: you have to join their Discord server to be able to use the AI by talking to their bot. It does sound complicated if you’ve never used Discord, yet it isn’t as bad as it sound: Discord can be used like any chat platform you are familiar with (yes, even ICQ). Once you join their server and pick a newbie channel in Newcomer Rooms (creating a Discord account might be necessary), all you have to do is enter a simple command that starts with “/imagine”. So I did:
/imagine baby floating in air lying on a cloud, with butterflies
The first result wasn’t exactly that, but it looked pretty good and was way, way faster than previous toys I’ve played with:
The way this AI generates pictures will always present you with four chosen renderings and let you pick one for further development, or to make variations on it. Now, this first iteration maybe does not look like something to you, but for me it was witnessing a quantum leap from previous generation of AI artists: very quick to generate not one, but four variations, and right from the start those variations actually do resemble vaguely what I meant. There’s a hint of the baby face in first image, so let’s see if we can ask the AI to “reimagine” my request (make another set of variations)… Lo and behold, third iteration has created what is definitely a baby in a cloud:
Fourth iteration was even more successful:
While this does not look like anything resembling breathtakingly (sur)realistic image, those were my baby steps. It turns out that being a little more descriptive “helps” AI produce more focused results; not a direct interpretation of whatever came to my mind, but still amusingly close to it, and always with small surprising twists.
You see, this thing can not read your mind. At the moment you’re entering the description on Discord channel, you inevitably have at least a vague picture in your mind of what you want to get from the AI engine. The engine, however, “has a mind of its own”, and just like some real artist might draw a vastly different picture than what was in your head even if you provided them with a very detailed instructions, this engine will internally produce big number of representations, select four of them and present them to you. They can be close to what you wanted to have, and they can be unlike anything you imagined that you will get.
But, how does it work? Mathematical principles of AI generated images are seriously complex (and I’m not going to pretend that I understand them), if you want to know more take a look at this paper (or at least take a look at the figures in that paper, they will give you an overview of internal machinery).
It isn’t easy to explain in layman terms what is going on there, but on an overarching and simplistic level, two things happen:
- first, you describe what you want to see; AI takes your sentence and breaks it into semantic chunks that roughly describe your request; for example, “white elephant in room, painted like Picasso would paint it” will be translated into three semantic chunks: “object: elephant, colour: white”, “object: room”, “style: Picasso”;
- from there, AI that has been trained on huge sets of real images will seek representations of an elephant and a room, put them together in a meaningful way if possible (elephant in room, not room in elephant), colour the animal and apply artistic style that has been trained on Picasso’s paintings; AI will generate big number of possible images with one of its built-in parts, and check the validity with another (so called Generative Adversarial Network), kind of internal tug-of-war, somewhat akin to when we deliberate over something and weigh in different options; once the process is over, four of the “best” images will be selected and presented to the user.
The user then can, as is the case with Midjourney, select one of the images and tell the AI to make another set of four images created with the selected image as a “seed”, thus steering internal algorithms towards more specific results; or, user can tell the bot to “upscale” an image: take it and work more on exactly that image, adding details and improving the quality. While first process can take as many iterations as one wants, the other one seem to be rather limited to just a few iterations, after which images (at least for me) become too busy and way off. But still impressive.
Here’s an example of me starting with simple command and making a couple of variations:
/imagine glass skeletons in fiery cave
As you can see, each of four iterations provided different results, but all of them were interpreted fairly consistently. The level of abstraction is high, likely because the AI was not trained on much images of skeletons and of fiery caves. Yet again, really impressive, don’t you agree?
It becomes much better once we move from abstract requests (glass skeletons in fiery cave might produce quite vivid, sharp and well defined image in your head, but let’s be honest – human mind is still lightyears ahead of AI when it comes to imagination) to something that is more definite – we get much better results:
/imagine all the people
Way, way more impressive, even if the request was quite vague and Lennon-ish. First two sets are variations, and the third image is upscaled second image from the second set. You can see how much more detailed it is; in my opinion, this image can hold its ground against a real artist. Of course, it is still pretty abstract, but could be used as an illustration or artwork.
I could not resist:
/imagine bored ape token Trump
And here’s one upscaled token; notice how detailed it is, and how resembling:
Don’t tell me that you aren’t impressed.
It seems that this AI really loves nature (or is well trained with such images), here’s one:
/imagine storm over borderlands
… and one upscaled image:
To cut the long story short, here are some more examples that I’ve made after I got the hang of the interface:
/imagine computer hacker Hieronymus Bosch
/imagine computer chip painted by Salvador Dali
/imagine sunset over alien sea, crabs
… and, of course, the inevitable: what would happen if I try… if I just try…
/imagine Radoslav Dejanović
Apparently, I am a Russian aristocrat, maybe a writer or some high official. This AI is politically incorrect.
Last, but not least, if you think there’s potential for creative play with this toy, head first to Community Feed to witness things way more impressive than those I’ve created.
If you decide to bite the bullet, monthly subscriptions start from $10 for a decent rendering time allowing for about 200 images. The price might seem to be a little steep, but this has to cover hardware doing all the thinking and rendering. Given the fact that the advances in hardware are as fast as those of image generating AI, it is reasonable to think that the lowest tier might get halved in a near future. And for $5/month, that would be a present you would not dare to refuse to your child (real or inner).0