Originally presented as a short talk to IGDA Ann Arbor on January 26, 2023
Part 4 of my series on remastering a classic 8bit sprite game (Below the Root) using free AI and open source technologies. Please see the previous 3 posts discussing:
In this discussion I scale up the process, present results, and share lessons learned.
One of my goals here is to create a blueprint for other developers wanting to remaster older titles, so if you're reading this please feel free to share these techniques and let me know.
Previous results
Previously I described all the technical details to convert a few screens of sprites into photographic collage style as seen below:
But now for the whole map.
I'm showing about 1/3 here. There are roughly 500 screens total.
High level process:
Chop up every screen into clusters
Upload to Google Drive and Google Sheets, Master Control Program
Convert clusters into PNGs
Apply image processing to add noise around each cluster PNG
Apply Stable Diffusion Image-to-Image per cluster
Composite all clusters from screen into single image
Manage continuity between screens using masks and adjacency
Direct editorial process, locking and reprocessing
Manage Google API limitations
Results!
Run Stable Diffusion Image to Image per cluster:
Pull parameters from screen spreadsheet
InvokeAI command line interface
N steps, with duration → Use a multi-step CLI interface
Setting up environment
Loading stable diffusion model (~6GB of ‘weights)
Applying Image2Image
Had to add ‘output filename’ parameter
And a simple example of results for a given cluster. These are 'candles sconce'.
Multistep CLI in Python
Here are some technical details for getting Python to talk to the Windows command line for multistep commands using message pipes.
Notably I had to stop using Automatic1111's otherwise excellent Stable Diffusion front end and switch over to InvokeAI because InvokeAI has a command line interface. Here is how I passed parameters into InvokeAI and had Python wait around until the output image was generated.
Parameterize InvokeAI
In order to drive the parameters and input image into InvokeAI, I modified the screen cluster format to include columns for text prompt, various other parameters, and results.
Once all of the sprite clusters for a screen have been converted, I run an automated process to composite them into a single image. This ended up being a real pain to do with free and open source solutions because:
1) Handling of transparency remains highly idiosyncratic in many image programs
2) Much of the ImageMagick documentation online is not quite correct
3) Much of the PythonFu (GimpFu scripting in Python) is not quite correct
In the end I had to resort to a hybrid solution partially recommended by ChatGPT. Here is an example of compositing a screen from discrete converted sprite clusters.
Composition with GIMP + ImageMagick
For the sake of completeness, here is the GIMP + ImageMagick hybrid solution I ended up with to get all the images in an alpha friendly format and merged down in the prescribed manner.
Editorial process
Here are some details on how I direct the editorial process, with locking and reprocessing.
On the master page I have columns for original and converted composite screens along with a column for 'has this been reviewed since last process' and 'is this screen locked.
Then when I have identified a screen the needs reprocessing, I can drill in and modify prompts and seeds and lock/unlock individual clusters.
As shown here, the masking and adjacency got a bit out of control and will have to be reworked.
Limitations of Google APIs
Along the way of converting ~500 screens, I came upon several limitations in the Google APIs
Maximum 10M cells in spreadsheet
→ Manually reset sizes of existing sheets
→ Limit rows on new sheets
Limits on download burst rate
Sleep(10) all over the place
Sometimes not all images load in browser
Edge™ works better than Chrome for this
Arbitrary subsheet order
Sorting Google sheets
Here is how I was able to add a custom sort button to my Google sheet and get the subsheets into a sensible order.
First pass, whole map!
So now I have a first pass on converting around 500 screens made up of around 2500 discrete elements.
The Future
Looking to the future, there are several pathways moving forward.
Optimize Stability Diffusion step
Stop reloading environment and weights
Investigate faster versions of algorithm
Run on more computers
AI Art refinement
Scale and translate generated PNGs
Generate more variants
Organize prompts to swap art styles and tags
Rework adjacency to only consider matching tag clusters
Coherence between elements, nVidia eDiff-I?
Build out playable version
Python? HTML5? ScummVM?
Define gameplay
コメント