Video stabilization remains a challenging problem, particularly for handheld footage with large motion and dynamic content. Traditional 2D methods often suffer from distortion and cropping, while existing 3D methods struggle with robustness in complex environments. We present a novel 3D Multi-frame Fusion framework that leverages scene geometry and temporal consistency to generate high-quality stabilized video. By integrating a dense depth prior with a multi-frame optimization strategy, our method effectively decouples camera motion from scene dynamics. We introduce a differentiable warping module that synthesizes full-frame stabilized views by fusing information from adjacent frames, significantly reducing the need for cropping. Extensive experiments on public benchmarks demonstrate that our approach outperforms state-of-the-art methods in both stability and visual quality metrics.
Our pipeline consists of three main stages: (1) Robust 3D Trajectory Estimation using a learned depth prior to handle dynamic objects; (2) Optimal Path Planning that smooths the camera trajectory while respecting field-of-view constraints; and (3) Neural Multi-frame Rendering, which fuses pixels from temporal neighbors to fill in missing regions caused by stabilization warping, ensuring full-frame output without cropping.
Drag the slider to compare Input vs. Our Stabilized Result.
@inproceedings{johnson20243dmultiframe,
title={3D Multi-frame Fusion for Video Stabilization},
author={Johnson, Alex and Chen, Sarah and Roberts, Michael and Davis, Emily and Zhang, David},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2024}
}