>>100864859NTA but this is something I've been playing with and looking into recently. There's a phenomenon I've come to call Latent Concept Composition (ChatGPT came up with the name) where a model can merge two concepts it understands to come up with something "new" (that doesn't actually exist in its training data, or exists in too small a way to be created using just the token). For example there does not exist a tag for 'lace bracelet' on Danbooru, nor Rule34 or E621, and there is exactly 2 examples for it on Gelbooru. There's no reason to manually include something so specific in the training data for Illustrious (when there are more interesting things yet to be added) so it is not a token that exists, yet you can use 'lace bracelet' as a token and generate a bracelet made of lace. The model understands the concept of lace as a material, it understands the concept of bracelet as an accessory, and it understands how these two concepts can be combined. You remember all those early AI example images of things like an avocado with tiger stripes? Same kinda deal, but a bit more granular.
Using Anon's RRAT example, it has the tags "1other, hakos_baelz_(rat)," and if the model understood that then it would be able to gen it with just that, but Anon had to add, "mouse ears, mouse tail, monster, rat, furrification, red fur, multicolored hair, streaked hair, white hair, black hair" to solidify the concept, because hakos_baelz_(rat) only has 25 examples on Danbooru to draw from and that alone is not enough (also they're all very different in appearance). It followed the instructions given and used the implications from Hakos Baelz to abstract the rest. Think of it like brute forcing concepts together.
Or to put it another way, the AI doesn't need to have a picture of Okayu doing the splits in order to make an image of Okayu doing the splits, it just needs to understand what Okayu looks like and what a character doing the splits looks like; this is the same shit, just with more steps. And it goes without saying that doing this kinda thing is a lot more finicky than just relying on stable tags.