The methodology of the MSU Video Upscalers Benchmark

The goal of our benchmark is to find the algorithms that produce the most visually pleasant image possible.

Our test video contains 14 clips from Vimeo and 1 from Pixabay. The clips include:

We used different downscaling methods:

Our test video is slightly compressed to simulate a practical use case. Resolution of our test video is 480×270. It is ¼ of the most popular modern video resolution: 1920×1080. One could argue that this resolution is too small for a real use case, but small details (which significantly affect the perceived restoration quality) in a 1920×1080 video are comparable to medium-sized objects in a 480×270 video. Therefore, using a bigger resolution would be a waste of computational resources.

Upscalers often slightly displace object borders. It’s not noticeable subjectively, but decreases values of some objective metrics. To combat this, we search for the best values of full-reference metrics like PSNR by going over pixel shifts of upscalers’ results with ¼ pixel precision. We do it for each segment individually for all frames at once.

In preparation of our test video we extracted PNG frames from MP4 sources. Therefore additional YUV⇔RGB conversions will not lead to any precision loss.

We made sure that our test video has good algorithm behavior coverage while reducing the number of redundant clips by running the algorithms on a larger set of clips, then clustering clips in accordance with Pearson correlation between objective scores of algorithm results between different clips. Our selected set of clips showed the highest variety of super-resolution algorithm behavior.

All test segments are combined into a single test video file for your convenience. We separate different segments by 5 black frames to avoid upscalers behaving badly due to scene changes. There are no scene changes inside the segments themselves. Additionally, test video starts and ends with 5 black frames.

Each segment is 48 frames long, but we don’t account for the first 18 frames in our subjective rank calculation to improve results of algorithms which use temporal information and take a few frames to “warm up” (for example, motion estimation).

Additionally, we do a number of checks to see if the upscalers have been trained on our dataset to prevent cheating.

References

  1. https://pypi.org/project/colour-demosaicing/
11 Oct 2021
See Also
PSNR and SSIM: application areas and critics
Learn about limits and applicability of the most popular metrics
MSU Video Upscalers Benchmark 2021
The most comprehensive comparison of video super resolution (VSR) algorithms by subjective quality
MSU Video Upscalers Benchmark Participants
The list of the participants of the MSU Video Upscalers Benchmark
MSU Super-Resolution for Video Compression Benchmark
Learn about the best SR methods for compressed videos and choose the best model to use with your codec
MSU Super-Resolution for Video Compression Benchmark Participants
The list of participants of MSU Super-Resolution for Video Compression Benchmark
MSU Super-Resolution for Video Compression Benchmark Methodology
The evaluation methodology of MSU Super-Resolution for Video Compression Benchmark
Site structure