I got the workflow now! It does require specific prompting when the frame changes too much (duh) so it can be quite time consuming. Background and foreground should be separated and combined in post (static background) which I did not do here. Final step requires me to pay 300$ to davinci resolve for their deflicker, which I won't do.
https://files.catbox.moe/zdp63s.mp4