High-Resolution Training for Memory-Efficient Multi-Frame Optical Flow

1Lomonosov Moscow State University 2MSU Institute for Artificial Intelligence

Native-Resolution Results on Spring

MEMFOF demonstrates that multi-frame processing enhances temporal coherence, and native resolution input helps retain details

Controls: Click or press spacebar to play/pause; drag slider to compare results; use / arrows to step through frames.

Highlights

  • MEMFOF delivers high-accuracy optical flow estimation for Full HD video while significantly reducing GPU memory usage — requiring just 2.09 GB — enabling native 1080p processing without cropping or downsampling
  • By combining multi-frame estimation, scalable correlation volumes, and resolution-aware training, it achieves state-of-the-art results across multiple benchmarks with lower resource demands
  • The method ranks first on Spring and Sintel (Clean) benchmarks and shows strong performance on KITTI-2015, combining superior accuracy with efficiency

High-Resolution Data

Using 2D histograms, we analyze and compare motion patterns across optical flow datasets to uncover the full range and distribution of movements. We identify gaps in existing data and enhance training set through upsampling to address them

TartanAir

FlyingThings

KITTI-2015

HD1K

Sintel

Spring

Combined

Combined at 2x resolution

Color intensity indicates the number of motion vectors per bin, with borders marking each dataset’s maximum motion range. Large motions in the Spring dataset, missing from other training sets, are captured after 2x upsampling the combined data

Memory Efficiency

By reducing the resolution of correlation volumes, we free up enough memory to implement a multi-frame method that improves temporal stability. Our method requires just 2.09 GB of memory for inference, enabling native Full HD training

SEA-RAFT

MEMFOF

Reducing correlation volume resolution lowers memory use but can degrade quality. Our three-frame approach compensates for this, restoring accuracy while keeping efficiency and enabling native Full HD processing

1px Error vs. Memory Usage on Spring

MEMFOF demonstrates superior memory efficiency and the lowest error among all methods. Speed and peak memory usage were measured on a Nvidia RTX 3090

Results are sourced from official leaderboard of the Spring benchmark.
w/o ft stands for methods that were not finetuned on Spring dataset

Performance on Sintel Benchmark

MEMFOF produces consistent motion across large deformations and occlusions in challenging cinematic scenes

Controls: Click or press spacebar to play/pause; drag slider to compare results; use / arrows to step through frames.

EndPoint Error on Sintel

MEMFOF achieves competitive performance on both Sintel splits, sharing first place with the five-frame version of VideoFlow on the clean pass and outperforming SEA-RAFT (L) by 32% on the final pass

Results are sourced from official leaderboard of the Sintel benchmark.
Methods are sorted by performance on Sintel Clean split

Performance in Real-World KITTI Scenes

MEMFOF excels in real-world driving scenes, showing high accuracy and stability across challenging motions and lighting

Controls: Click or press spacebar to play/pause; drag slider to compare results; use / arrows to step through frames.

Fl-all on KITTI-2015

MEMFOF achieves state-of-the-art performance among all non-scene flow methods, outperforming both SEA-RAFT and VideoFlow

Results are sourced from official leaderboard of the KITTI benchmark.
Only non-scene flow methods are shown

BibTeX

@article{bargatin2025memfof,
  title = {MEMFOF: High-Resolution Training for Memory-Efficient Multi-Frame Optical Flow Estimation},
  author = {Bargatin, Vladislav and Chistov, Egor and Yakovenko, Alexander and Vatolin, Dmitriy},
  journal = {arXiv preprint arXiv:2506.23151},
  year = {2025}
}

The project page is based on this template, licensed under a CC BY-SA 4.0 license