>>77986385Nah, it's due to being trained on LLM captions. You can just go to chat gpt, ask it to take a short sentence and output a prompt fit for a text2image model.
Previous models were trained on the ALT text intended for blind people or for moments when the image didn't load. Those tooltips that popped up when you held your cursor over an image and it said "woman eating ice cream". that's why previous models worked with shorter prompts. That format would be a waste for t5 and might as well drop the third TE in that case.
Cya vtai!