Large Scale AI Art Automation and Direction

hanishn
Jan 28, 2023
3 min read

Originally presented as a short talk to IGDA Ann Arbor on January 26, 2023

Part 4 of my series on remastering a classic 8bit sprite game (Below the Root) using free AI and open source technologies. Please see the previous 3 posts discussing:

1) Stable Diffusion for updating Sprite Collage vintage games

2) Practical Techniques for Exploiting Stable Diffusion in 2D Games

3) Game development automation using Google Sheets and Python

In this discussion I scale up the process, present results, and share lessons learned.

One of my goals here is to create a blueprint for other developers wanting to remaster older titles, so if you're reading this please feel free to share these techniques and let me know.

Previous results

Previously I described all the technical details to convert a few screens of sprites into photographic collage style as seen below:

But now for the whole map.

I'm showing about 1/3 here. There are roughly 500 screens total.

High level process:

Chop up every screen into clusters
Upload to Google Drive and Google Sheets, Master Control Program
Convert clusters into PNGs
Apply image processing to add noise around each cluster PNG
Apply Stable Diffusion Image-to-Image per cluster
Composite all clusters from screen into single image
Manage continuity between screens using masks and adjacency
Direct editorial process, locking and reprocessing
Manage Google API limitations
Results!

Run Stable Diffusion Image to Image per cluster:

Pull parameters from screen spreadsheet
InvokeAI command line interface
N steps, with duration → Use a multi-step CLI interface
- Setting up environment
- Loading stable diffusion model (~6GB of ‘weights)
- Applying Image2Image
Had to add ‘output filename’ parameter

And a simple example of results for a given cluster. These are 'candles sconce'.

Multistep CLI in Python

Here are some technical details for getting Python to talk to the Windows command line for multistep commands using message pipes.

Notably I had to stop using Automatic1111's otherwise excellent Stable Diffusion front end and switch over to InvokeAI because InvokeAI has a command line interface. Here is how I passed parameters into InvokeAI and had Python wait around until the output image was generated.

Parameterize InvokeAI

In order to drive the parameters and input image into InvokeAI, I modified the screen cluster format to include columns for text prompt, various other parameters, and results.

Once all of the sprite clusters for a screen have been converted, I run an automated process to composite them into a single image. This ended up being a real pain to do with free and open source solutions because:

1) Handling of transparency remains highly idiosyncratic in many image programs

2) Much of the ImageMagick documentation online is not quite correct

3) Much of the PythonFu (GimpFu scripting in Python) is not quite correct

In the end I had to resort to a hybrid solution partially recommended by ChatGPT. Here is an example of compositing a screen from discrete converted sprite clusters.

Composition with GIMP + ImageMagick

For the sake of completeness, here is the GIMP + ImageMagick hybrid solution I ended up with to get all the images in an alpha friendly format and merged down in the prescribed manner.

Editorial process

Here are some details on how I direct the editorial process, with locking and reprocessing.

On the master page I have columns for original and converted composite screens along with a column for 'has this been reviewed since last process' and 'is this screen locked.

Then when I have identified a screen the needs reprocessing, I can drill in and modify prompts and seeds and lock/unlock individual clusters.

As shown here, the masking and adjacency got a bit out of control and will have to be reworked.

Limitations of Google APIs

Along the way of converting ~500 screens, I came upon several limitations in the Google APIs

Maximum 10M cells in spreadsheet
- → Manually reset sizes of existing sheets
- → Limit rows on new sheets
Limits on download burst rate
- Sleep(10) all over the place
Sometimes not all images load in browser
- Edge™ works better than Chrome for this
Arbitrary subsheet order

Sorting Google sheets

Here is how I was able to add a custom sort button to my Google sheet and get the subsheets into a sensible order.

First pass, whole map!

So now I have a first pass on converting around 500 screens made up of around 2500 discrete elements.

The Future

Looking to the future, there are several pathways moving forward.

Optimize Stability Diffusion step
- Stop reloading environment and weights
- Investigate faster versions of algorithm
- Run on more computers
AI Art refinement
- Scale and translate generated PNGs
- Generate more variants
- Organize prompts to swap art styles and tags
- Rework adjacency to only consider matching tag clusters
- Coherence between elements, nVidia eDiff-I?
Build out playable version
- Python? HTML5? ScummVM?
- Define gameplay