Evaluation Methodology of MSU Video Frame Interpolation Benchmark
Problem definition
Video Frame Interpolation (VFI) algorithms synthesize non-existent images between adjacent frames, with the aim of providing a smooth and consistent visual experience.
Our benchmark will rank these algorithms and determine which is the best by means of interpolation quality.
Dataset
We present a new dataset for this comparison. This was done to ensure that neural network methods do not benefit from training on data that could get into our test sample.
Details about Dataset characteristics and processing you can find in Dataset tab.
Metrics
PSNR
PSNR – commonly used metric based on pixels’ similarity. For metric calculation, we use the implementation from MSU VQMT[1]. A higher metric value indicates better quality.
SSIM
SSIM – another commonly used metric based on structure similarity. For metric calculation, we use the implementation from MSU VQMT[1]. A higher metric value indicates better quality.
MS-SSIM
Multiscale SSIM (MS-SSIM) is conducted over multiple scales through a process of multiple stages of sub-sampling. Implementation from Pytorch MS-SSIM[2]. A higher metric value indicates better quality.
VMAF
VMAF is a perceptual video quality assessment algorithm developed by Netflix. In our benchmark, we calculate VMAF on the Y component in YUV colorspace. For metric calculation, we use MSU VQMT[1].
LPIPS
LPIPS (Learned Perceptual Image Patch Similarity) evaluates the distance between image patches. Higher means further/more different. Lower means more similar. To calculate LPIPS we use Perceptual Similarity Metric implementation proposed in The Unreasonable Effectiveness of Deep Features as a Perceptual Metric[3].
Computational complexity
The tests were performed in Google Colab. Main characteristics:
- GPU: NVIDIA K80
- reading and writing images are not taken into account
- 1920×1080 resolution
- interpolation of one frame between the same pair of adjacent frames
- result is 3rd minimum from 100 runs
Subjective comparison
For the subjective comparison we slow down outputs from algorithms in 4 times. For the participants are shown 2 seconds length videos:
- 30 fps for gaming samples
- 60 fps for others
Each one of 413 participants has seen 32 video pairs and had to choose which one of them looks more smooth (option “indistinguishable” is also available). There were 2 verification questions to protect against random answers and bots. We used these valid answers to predict the ranking using the Bradley-Terry model.
References
-
MSU Benchmark Collection
- Video Colorization Benchmark
- Super-Resolution for Video Compression Benchmark
- Defenses for Image Quality Metrics Benchmark
- Learning-Based Image Compression Benchmark
- Super-Resolution Quality Metrics Benchmark
- Video Saliency Prediction Benchmark
- Metrics Robustness Benchmark
- Video Upscalers Benchmark
- Video Deblurring Benchmark
- Video Frame Interpolation Benchmark
- HDR Video Reconstruction Benchmark
- No-Reference Video Quality Metrics Benchmark
- Full-Reference Video Quality Metrics Benchmark
- Video Alignment and Retrieval Benchmark
- Mobile Video Codecs Benchmark
- Video Super-Resolution Benchmark
- Shot Boundary Detection Benchmark
- The VideoMatting Project
- Video Completion
- Codecs Comparisons & Optimization
- VQMT
- MSU Datasets Collection
- Metrics Research
- Video Quality Measurement Tool 3D
- Video Filters
- Other Projects