Practical Techniques for Exploiting Stable Diffusion in 2D Games

hanishn
Oct 29, 2022
4 min read

Originally presented at IGDA Ann Arbor on October 27, 2022 by Nathan Hanish

Flashback to last month: Stable Diffusion for Sprite Collage

Which describes a manual process for using Stable Diffusion to take 'Sprite Collage' style game art from the 8 bit era and bring the art into the modern era while maintaining authenticity and readability. As an example here is a screen from the 1984 C64 gem Below the Root run through the process.

The initial version of this process boiled down to manually chopping up the screen into clusters based upon sprite color in GIMP. Then feeding the clusters into Stable Diffusion to get a collage of images. And finally compositing them all together manually in GIMP. The results look compelling but the process takes about an hour per screen. As such, automation is the word of the day.

Automate! Segment and Save raw bytes

As a first step, I took my custom 'Windham Classics' screen renderer, built on top of ScummVM, and added support for segmenting screens quickly using mouse and keyboard, and exporting of said segment clusters as 'RAW' image data.

Here is an initial screen from Below the Root

With a set of segments selected for export as a unit as raw image data labelled .cls for 'cluster. This process takes 1-2 minutes per screen. Pretty fast.

Which can then be opened manually in GIMP

And with some futzing around with the width and height and format

Can be exported out for further manipulation as a PNG.

This process is pretty tedious and error prone. There has to be a better way. And there is, Image Magick!

Automate! Image Magick to convert to PNG

Image Magick is a great open source library for automated image manipulation including some easy to use command line tools.

With a little internet sleuthing and trial-and-error, I managed to come up with a template for taking the raw image data, upscaling, setting the background to black, and exporting as the particular flavor of PNG that suited my needs. Here is an example:

With this new tool in hand, going from an original screen to clusters is a 3-4 minute process.

Automate! GIMP + Python to add Noise

One thing I discovered was that in order to get good results out of Stable Diffusion, I really needed to add noise to the sprite clusters. Again a pretty tedious process. Select black, select inverse, grow 8 pixels, filter add noise, export out as a new name. Time for more automation. Turns out you can script GIMP off the command line using Python scripts. Here is what I came up with.

My python script that follows the steps described above.

A command line to run the script for a directory of images.

Which takes a clean cluster to a noise cluster.

Now we move on to actually using the img2img facilities in Stable Diffusion. I'm using the Automatic1111 web UI and definitely recommend it.

Automate! Install Stable Diffusion UI

First go to the automatic1111 github site.

And the simplest approach is to 'download as zip'

Unzip that on a local drive

You will also need to grab a Stable Diffusion model. For the sake of this discussion, an SD model is a several gigabyte file which encodes a bunch of 'neural network' weights derived from 'atomizing' a training set of images. Here is a link for SD1.5, but there are many data sets available and depending on your project you might want to use a different one.

Put that file in the appropriate place in your unzipped directory.

Then launch the webUI. Basically the program launches a local webserver which feeds commands into the actual Stable Diffusion solver program.

Once that's up and running, you can access the UI using a local URL.

So open the browser of your choice and key in localhost:7860

Now let's actually use the WebUI

Automate! Run img2img on noisy segmented PNG

There are a whole lot of parameters here. You can put in a prompt describing your desired output image. Set the resolution appropriately. The 'denoising' value basically goes left to right from 'just use my source image' to 'just make something up'. This is the dial I have to tinker with the most.

So here is an example of taking a noisy segment cluster of a 'grund' (really big tree) and feeding it through img2img. The 'bake' at this resolution takes about 30 on a nvidia 1080Ti at this resolution (1280x640).

The output gets dumped in a subdirectory of the WebUI installation

And here is a pretty compelling result that matches my goals.

Composing together a whole screen this way works pretty good. However......

Technique falls apart for adjacent screens!

This approach so far can't handle multiple adjacent screens very well.

So how the heck do we solve that problem?! Turns out there's this cool technique calling 'Inpainting'.

We’re going to need inpainting! Start with a Seed

The basic idea is to pick a screen as the 'seed' screen. Pad that screen out by a row or 2 of sprites and 'bake'. Then copy the overlap into the adjacent screen and mask off the 'prebaked' porting. The 'prebaked' portion then acts like a crystal seed when used with a matching set of options (prompt, denoising, random seed number).

Here is the padded 'seed' image.