https://huggingface.co/nvidia/Cosmos-1.0-Diffusion-14B-Video2World>Given a text description, predict an output video of 121 frames.>Given a text description and an image as the first frame, predict the future 120 frames.open-source and local text-to-video and image-to-video
I'll test it in the next few days
However, it is trust-and-safety'd. Though each of the guardrails are also open-source:
https://huggingface.co/nvidia/Cosmos-1.0-Guardrail