What is VOID?
VOID is a video object removal framework that focuses on creating physically plausible inpainting results, particularly in scenarios where objects interact significantly with their environment.
How does VOID work?
VOID utilizes a vision-language model to identify regions affected by the removal of an object and guides a video diffusion model to generate consistent counterfactual outcomes. It employs a two-pass refinement process to enhance the quality of the output.
What datasets are used to train VOID?
VOID is trained on a new paired dataset generated from Kubric (synthetic) and HUMOTO (human motion) to ensure effective counterfactual object removal.
How does VOID compare to other video object removal methods?
VOID outperforms previous methods by better preserving scene dynamics and producing more realistic results after object removal.