Messing around with vibevoice from MS, it's a hefty boy (8gb of vram) but it can do multiple speakers at once and 45 mins in one go apparently. Cloning seems pretty good to me quality wise. I think it's unintended but you can have it add music when doing like a podcast intro or just say "music start" lol
https://files.catbox.moe/7ccr07.mp3https://files.catbox.moe/ulsrgg.mp3https://github.com/wildminder/ComfyUI-VibeVoice>>103974182https://files.catbox.moe/59gyy1.png>>104083012Whoa that's dedication, looking forward to it. Out of curiosity have you seen a spike in ram usage on comfy (if you updated)?
>>104084540Lol love that Fauna
>>104085211Cute Tummy.