top of page
Writer's picturehanishn

Large Scale AI Art Automation and Direction

Originally presented as a short talk to IGDA Ann Arbor on January 26, 2023


Part 4 of my series on remastering a classic 8bit sprite game (Below the Root) using free AI and open source technologies. Please see the previous 3 posts discussing:


In this discussion I scale up the process, present results, and share lessons learned.

One of my goals here is to create a blueprint for other developers wanting to remaster older titles, so if you're reading this please feel free to share these techniques and let me know.


Previous results


Previously I described all the technical details to convert a few screens of sprites into photographic collage style as seen below:














But now for the whole map.


I'm showing about 1/3 here. There are roughly 500 screens total.




High level process:

  • Chop up every screen into clusters

  • Upload to Google Drive and Google Sheets, Master Control Program

  • Convert clusters into PNGs

  • Apply image processing to add noise around each cluster PNG

  • Apply Stable Diffusion Image-to-Image per cluster

  • Composite all clusters from screen into single image

  • Manage continuity between screens using masks and adjacency

  • Direct editorial process, locking and reprocessing

  • Manage Google API limitations

  • Results!


Run Stable Diffusion Image to Image per cluster:

  • Pull parameters from screen spreadsheet

  • InvokeAI command line interface

  • N steps, with duration → Use a multi-step CLI interface

    • Setting up environment

    • Loading stable diffusion model (~6GB of ‘weights)

    • Applying Image2Image

  • Had to add ‘output filename’ parameter

And a simple example of results for a given cluster. These are 'candles sconce'.


Multistep CLI in Python


Here are some technical details for getting Python to talk to the Windows command line for multistep commands using message pipes.





Notably I had to stop using Automatic1111's otherwise excellent Stable Diffusion front end and switch over to InvokeAI because InvokeAI has a command line interface. Here is how I passed parameters into InvokeAI and had Python wait around until the output image was generated.




Parameterize InvokeAI

In order to drive the parameters and input image into InvokeAI, I modified the screen cluster format to include columns for text prompt, various other parameters, and results.


Once all of the sprite clusters for a screen have been converted, I run an automated process to composite them into a single image. This ended up being a real pain to do with free and open source solutions because:

1) Handling of transparency remains highly idiosyncratic in many image programs

2) Much of the ImageMagick documentation online is not quite correct

3) Much of the PythonFu (GimpFu scripting in Python) is not quite correct

In the end I had to resort to a hybrid solution partially recommended by ChatGPT. Here is an example of compositing a screen from discrete converted sprite clusters.




















Composition with GIMP + ImageMagick


For the sake of completeness, here is the GIMP + ImageMagick hybrid solution I ended up with to get all the images in an alpha friendly format and merged down in the prescribed manner.





Editorial process


Here are some details on how I direct the editorial process, with locking and reprocessing.


On the master page I have columns for original and converted composite screens along with a column for 'has this been reviewed since last process' and 'is this screen locked.

Then when I have identified a screen the needs reprocessing, I can drill in and modify prompts and seeds and lock/unlock individual clusters.



As shown here, the masking and adjacency got a bit out of control and will have to be reworked.


Limitations of Google APIs


Along the way of converting ~500 screens, I came upon several limitations in the Google APIs

  • Maximum 10M cells in spreadsheet

    • → Manually reset sizes of existing sheets

    • → Limit rows on new sheets

  • Limits on download burst rate

    • Sleep(10) all over the place

  • Sometimes not all images load in browser

    • Edge™ works better than Chrome for this

  • Arbitrary subsheet order


Sorting Google sheets


Here is how I was able to add a custom sort button to my Google sheet and get the subsheets into a sensible order.



First pass, whole map!


So now I have a first pass on converting around 500 screens made up of around 2500 discrete elements.



The Future


Looking to the future, there are several pathways moving forward.

  • Optimize Stability Diffusion step

    • Stop reloading environment and weights

    • Investigate faster versions of algorithm

    • Run on more computers

  • AI Art refinement

    • Scale and translate generated PNGs

    • Generate more variants

    • Organize prompts to swap art styles and tags

    • Rework adjacency to only consider matching tag clusters

    • Coherence between elements, nVidia eDiff-I?

  • Build out playable version

    • Python? HTML5? ScummVM?

    • Define gameplay

91 views0 comments

コメント


bottom of page