A key takeaway was:
every model does one thing really well. So it's important to know what that one thing is.
Datasets don't have to be massive, in fact most "style transfer" can use just 2 images (a style/source, and a target).
|function||dataset (# of images)|
|Next Frame Prediction||1 video|
|MUNIT||~250 x 2|
|Pix2Pix||500 x 2|
That said, with projects that use multiple images (like StyleGAN) the bigger the dataset the better quality output (assuming that it's well curated).
Derrick did a great job of comparing the differences of the methods above, for now I'm mostly interested in StyleGAN2-ADA. What makes StyleGAN unique is that you can feed it a (large) dataset and it will basically deconstruct the image into patterns found. This "deconstruction" is represented by a 512 dimensional vector (think of each "dimension" as a parameter for fine details of hair, color, lighting, etc)
Ideally you want ~1,000 images as a starting point, but after doing some curating I was left with just over 440 images of my dog.
Here's a sampling of them:
And since doing "transfer learning" (meaning, leveraging previous training model) is better than starting from scratch, I chose the ffhq1024 network to start from. Here's a sampling from that:
In very simple terms, the code will start with the image of a face, and turn it into an image of zelda. Each attempt is a "tick", and after every tick the code evaluates how well it's doing and adjusts itself accordingly.
Training with Google Colab
Initially I tried training on my local machine, but after a few "ticks" I would get a non-descriptive "ILLEGAL MEMORY ACCESS" error and instead of getting derailed into a driver/memory troubleshooting tangent, I took that as an opportunity to use Google Colab.
Specifically using the python custom training notebook, I mounted a drive then uploaded .zip of these 440 "zelda-1024" images and started the training. Here are some examples that were generated in the first training session of ~9 hours total:
Probably the best one so far
The way the kid's collar turns into a blanket is interesting.
This one looks great, and shows some mirrored training.
The blue and brown vertical bands from the girl's hair and shirt turn into a striped towel.
There's something a bit pathetic about this one. The designs on the blanket are interesting too.
Top three from this set, easily. It looks like it's going somewhere realistic.
There's something about the large eyes on this one. And a fancy halloween themed shirt?
Not much of a dog, but the "metal tags" that start to form were unexpected.
Looks like a dog balled up and sleeping, but a few frames at the very start show a dog-human hybrid I named "Susan".