>>45297986well the problem really is doing it live, like what Patra did, and FBK recently. It has to listen to a live audio audio stream and keep up. It's better for it to have context so you have to send entire phrases spoken in audio form to it at a time.
I found it's kind of hard to do that, I tried a common solution (using the lib `speech_recognition`) but since I'm not using the mic, I'm rerouting an audio output to input it bugged the audio I'm listening to... (severe fast cutting out).
Since it's live, I just relented to sending what I can as soon as it's ready even if I re-translate some things.
(I tried using whisper itself since it separates and times text segments, but I overcomplicated things quite a bit)
It does miss sometimes, but it's surprisingly good.
sometimes it's schizophrenic and hears things when there's only BGM playing which is kinda funny.