Originally presented at IGDA Ann Arbor on October 27, 2022 by Nathan Hanish
Flashback to last month: Stable Diffusion for Sprite Collage
Which describes a manual process for using Stable Diffusion to take 'Sprite Collage' style game art from the 8 bit era and bring the art into the modern era while maintaining authenticity and readability. As an example here is a screen from the 1984 C64 gem Below the Root run through the process.
The initial version of this process boiled down to manually chopping up the screen into clusters based upon sprite color in GIMP. Then feeding the clusters into Stable Diffusion to get a collage of images. And finally compositing them all together manually in GIMP. The results look compelling but the process takes about an hour per screen. As such, automation is the word of the day.
Automate! Segment and Save raw bytes
As a first step, I took my custom 'Windham Classics' screen renderer, built on top of ScummVM, and added support for segmenting screens quickly using mouse and keyboard, and exporting of said segment clusters as 'RAW' image data.
Here is an initial screen from Below the Root
With a set of segments selected for export as a unit as raw image data labelled .cls for 'cluster. This process takes 1-2 minutes per screen. Pretty fast.
Which can then be opened manually in GIMP
And with some futzing around with the width and height and format
Can be exported out for further manipulation as a PNG.
This process is pretty tedious and error prone. There has to be a better way. And there is, Image Magick!
Automate! Image Magick to convert to PNG
Image Magick is a great open source library for automated image manipulation including some easy to use command line tools.
With a little internet sleuthing and trial-and-error, I managed to come up with a template for taking the raw image data, upscaling, setting the background to black, and exporting as the particular flavor of PNG that suited my needs. Here is an example:
With this new tool in hand, going from an original screen to clusters is a 3-4 minute process.
Automate! GIMP + Python to add Noise
One thing I discovered was that in order to get good results out of Stable Diffusion, I really needed to add noise to the sprite clusters. Again a pretty tedious process. Select black, select inverse, grow 8 pixels, filter add noise, export out as a new name. Time for more automation. Turns out you can script GIMP off the command line using Python scripts. Here is what I came up with.
My python script that follows the steps described above.
A command line to run the script for a directory of images.
Which takes a clean cluster to a noise cluster.
Now we move on to actually using the img2img facilities in Stable Diffusion. I'm using the Automatic1111 web UI and definitely recommend it.
Automate! Install Stable Diffusion UI
First go to the automatic1111 github site.
And the simplest approach is to 'download as zip'
Unzip that on a local drive
You will also need to grab a Stable Diffusion model. For the sake of this discussion, an SD model is a several gigabyte file which encodes a bunch of 'neural network' weights derived from 'atomizing' a training set of images. Here is a link for SD1.5, but there are many data sets available and depending on your project you might want to use a different one.
Put that file in the appropriate place in your unzipped directory.
Then launch the webUI. Basically the program launches a local webserver which feeds commands into the actual Stable Diffusion solver program.
Once that's up and running, you can access the UI using a local URL.
So open the browser of your choice and key in localhost:7860
Now let's actually use the WebUI
Automate! Run img2img on noisy segmented PNG
There are a whole lot of parameters here. You can put in a prompt describing your desired output image. Set the resolution appropriately. The 'denoising' value basically goes left to right from 'just use my source image' to 'just make something up'. This is the dial I have to tinker with the most.
So here is an example of taking a noisy segment cluster of a 'grund' (really big tree) and feeding it through img2img. The 'bake' at this resolution takes about 30 on a nvidia 1080Ti at this resolution (1280x640).
The output gets dumped in a subdirectory of the WebUI installation
And here is a pretty compelling result that matches my goals.
Composing together a whole screen this way works pretty good. However......
Technique falls apart for adjacent screens!
This approach so far can't handle multiple adjacent screens very well.
So how the heck do we solve that problem?! Turns out there's this cool technique calling 'Inpainting'.
We’re going to need inpainting! Start with a Seed
The basic idea is to pick a screen as the 'seed' screen. Pad that screen out by a row or 2 of sprites and 'bake'. Then copy the overlap into the adjacent screen and mask off the 'prebaked' porting. The 'prebaked' portion then acts like a crystal seed when used with a matching set of options (prompt, denoising, random seed number).
Here is the padded 'seed' image.
Fed through img2img
and the result
Inpainting! Masks and adjacencies
For the adjacent screen we copy some rows
Mask off the part we don't want to change (there is a white row at the top of the image below)
And use the same prompt and random number seed as the first screen.
Run it through 'inpainting'
And now when we put the screens together we get something sensible
Now let's put it all together to get 2 adjacent screens with all the segment clusters converted
Now let’s put it ALL together!
Here are a stack of screens from the original game.
And a reimagined version using the techniques described above.
My asks…..
So here are my asks for you, the humble reader...
Your thoughts on further automation and data organization
Your thoughts on AI Art direction
Your thoughts on applying these techniques for animation
How could you apply this to your projects?
Seriously though, go try it out yourself
Comments