>>65756196I can probably code the reflection as a two-shot over completion api. Save first caption, repeat inference with the reflect question and the existing caption. But as you mentioned it already takes me 6-8 hours to caption 5k images and as of now I don't even know if it will work. I need some proof first. Thank you for suggesting cog. I'll look into it as well.