>>67486613You can see the settings in lora metadata, there's nothing special.
If you are using sd-scripts, only difference from small lora training is that instead of using folders like 1_name, 2_name .. you drop everything in one folder and generate metadata with 'python sd-scripts/finetune/merge_captions_to_metadata.py $image_folder $image_folder"/metadata_cap.json" --caption_extension ".txt" --recursive --full_path'
When launching the training script you add '--in_json "$image_folder/metadata_cap.json'
i run it like this -
https://litter.catbox.moe/zjdk8c.png You'll obviously use different settings for SDXL. You should have '--resolution="1024,1024" --min_bucket_reso=512 --max_bucket_reso=2048', no --bucket_no_upscale and vped/zsnr.
48gb is fine.
For a start, try grabbing 10 or 20 artists and training a regular high dim lora for 10 epochs using prodigy.
If it works - it works. If not - you'll have to experiment with settings and with tagging / tag ordering.
ponyXL was trained on a mix of tags and natural language. Maybe you should add NLP too. They used LLAVA, but you can try better models like
https://huggingface.co/01-ai/Yi-VL-6B#why-yi-vl instead
Qwen-VL and cogAgent (picrelated) are good too, but cogAgent is so slow it's unusable