The methodology of the MSU Video Upscalers Benchmark
The goal of our benchmark is to find the algorithms that produce the most visually pleasant image possible.
The test video
Our test video consists of 15 clips and contains:
- animated and camera-shot segments
- spatially complex and simple segments (textures, small details)
- temporally complex and simple segments (camera and object movement)
- segments with high and medium bitrate sources
- a segment with stripes to check for moire artifacts
- several letterboxed segments
- one segment with text overlay
- several segments with human faces
We use different downscaling methods:
- Bicubic downscaling on complex segments to make them easier to process, as we noticed that upscalers perform better with bicubic downscaling than with gauss downscaling
- Gauss downscaling on simple segments. It more closely resembles real camera footage
- We did not use gauss downscaling on animated segments as it’s not a practical use case
- For one of the segments we simulated debayering process that happens in cameras (Menon 2007)
Our test video is slightly compressed to simulate a practical use case. Resolution of our test video is 480×270. It is ¼ of the most popular modern video resolution: 1920×1080. One could argue that this resolution is too small for a real use case, but small details (which significantly affect the perceived restoration quality) in a 1920×1080 video are comparable to medium-sized objects in a 480×270 video. Therefore, using a bigger resolution would be a waste of computational resources.
In preparation of our test video we extracted PNG frames from MP4 sources. Therefore additional YUV⇔RGB conversions will not lead to any precision loss.
All test segments are combined into a single test video file for your convenience. We separate different segments by 5 black frames to avoid upscalers behaving badly due to scene changes. There are no scene changes inside the segments themselves. Additionally, test video starts and ends with 5 black frames.
Each segment is 48 frames long, but we don’t account for the first 18 frames in our subjective rank calculation to improve results of algorithms which use temporal information and take a few frames to “warm up” (for example, motion estimation).
We made sure that our test video has good algorithm behavior coverage while reducing the number of redundant clips by running the algorithms on a larger set of clips, then clustering the clips in accordance with Pearson correlation of objective scores. Our selected set of clips showed the highest variety of super-resolution algorithm behavior.
The subjective comparison and metrics
To calculate the subjective ranking for upscalers, crowd-sourced subjective comparison with over 4300 valid participants was conducted. Participants were to choose the most visually appealing clip in a pair, the clips being the results of the upscalers on our test video. Frames from the clips which were shown to the participants are available in “Visualizations” sections. To calculate subjective quality values in the tables Bradley–Terry model was used.
Upscalers often slightly displace object borders. It’s not noticeable subjectively, but decreases values of some objective metrics. To combat this, we search for the best values of the metrics by going over pixel shifts of upscalers’ results with ¼ pixel precision. We do it for each segment individually for all frames at once. Graphs with correlation of these shifted metrics with the subjective scores are presented below. The metrics are calculated over Y color component.
MSU Benchmark Collection
- MSU Super-Resolution for Video Compression Benchmark 2022
- MSU Video Quality Metrics Benchmark 2022
- MSU Video Upscalers Benchmark 2022
- MSU Video Alignment and Retrieval Benchmark
- MSU Mobile Video Codecs Benchmark 2021
- MSU Video Super-Resolution Benchmark
- MSU Shot Boundary Detection Benchmark 2020
- MSU Deinterlacer Benchmark
- The VideoMatting Project
- Video Completion
- Codecs Comparisons & Optimization
- Video Quality Measurement Tool 3D
MSU Datasets Collection
- Video Filters