The methodology of the MSU Video Upscalers Benchmark
The goal of our benchmark is to find the algorithms that produce the most visually pleasant image possible.
Our test video contains 14 clips from Vimeo and 1 from Pixabay. The clips include:
- animated and camera-shot segments
- spatially complex and simple segments (textures, small details)
- temporally complex and simple segments (camera and object movement)
- segments with high and medium bitrate sources
- a segment with stripes to check for moire artifacts
- several letterboxed segments
- one segment with text overlay
- several segments with human faces
We used different downscaling methods:
- Bicubic downscaling on complex segments to make them easier to process, as we noticed that upscalers perform better with bicubic downscaling than with gauss downscaling
- Gauss downscaling on simple segments. It more closely resembles real camera footage
- We did not use gauss downscaling on animated segments as it’s not a practical use case
- For one of the segments we simulated debayering process that happens in cameras (Menon 2007)
Our test video is slightly compressed to simulate a practical use case. Resolution of our test video is 480×270. It is ¼ of the most popular modern video resolution: 1920×1080. One could argue that this resolution is too small for a real use case, but small details (which significantly affect the perceived restoration quality) in a 1920×1080 video are comparable to medium-sized objects in a 480×270 video. Therefore, using a bigger resolution would be a waste of computational resources.
Upscalers often slightly displace object borders. It’s not noticeable subjectively, but decreases values of some objective metrics. To combat this, we search for the best values of full-reference metrics like PSNR by going over pixel shifts of upscalers’ results with ¼ pixel precision. We do it for each segment individually for all frames at once.
In preparation of our test video we extracted PNG frames from MP4 sources. Therefore additional YUV⇔RGB conversions will not lead to any precision loss.
We made sure that our test video has good algorithm behavior coverage while reducing the number of redundant clips by running the algorithms on a larger set of clips, then clustering clips in accordance with Pearson correlation between objective scores of algorithm results between different clips. Our selected set of clips showed the highest variety of super-resolution algorithm behavior.
All test segments are combined into a single test video file for your convenience. We separate different segments by 5 black frames to avoid upscalers behaving badly due to scene changes. There are no scene changes inside the segments themselves. Additionally, test video starts and ends with 5 black frames.
Each segment is 48 frames long, but we don’t account for the first 18 frames in our subjective rank calculation to improve results of algorithms which use temporal information and take a few frames to “warm up” (for example, motion estimation).
Additionally, we do a number of checks to see if the upscalers have been trained on our dataset to prevent cheating.
- Codecs Comparison & Optimization
- Video Filters
- Video Quality Measurement Tool 3D
MSU Datasets Collection
MSU Benchmark Collection