>>56108636For that you'd probably want to turn all of her content and dialogue into text with as much context as possible (comments she's responding to, image recognition on what's happening, analysis of the video in general, and what's happening in particular...) into a finetune for llama 70b. The more info the better, but you might get 90%of the way there with just the transcription.