The methodology of the MSU Video Upscalers Benchmark

The goal of our benchmark is to find the algorithms that produce the most visually pleasant image possible.

The test video

Our test video consists of 15 clips and contains:

  • animated and camera-shot segments
  • spatially complex and simple segments (textures, small details)
  • temporally complex and simple segments (camera and object movement)
  • segments with high and medium bitrate sources
  • a segment with stripes to check for moire artifacts
  • several letterboxed segments
  • one segment with text overlay
  • several segments with human faces

We use different downscaling methods:

  • Bicubic downscaling on complex segments to make them easier to process, as we noticed that upscalers perform better with bicubic downscaling than with gauss downscaling
  • Gauss downscaling on simple segments. It more closely resembles real camera footage
  • We did not use gauss downscaling on animated segments as it’s not a practical use case
  • For one of the segments we simulated debayering process that happens in cameras (Menon 2007[1])

Our test video is slightly compressed to simulate a practical use case. Resolution of our test video is 480×270. It is ¼ of the most popular modern video resolution: 1920×1080. One could argue that this resolution is too small for a real use case, but small details (which significantly affect the perceived restoration quality) in a 1920×1080 video are comparable to medium-sized objects in a 480×270 video. Therefore, using a bigger resolution would be a waste of computational resources.

In preparation of our test video we extracted PNG frames from MP4 sources. Therefore additional YUV⇔RGB conversions will not lead to any precision loss.

All test segments are combined into a single test video file for your convenience. We separate different segments by 5 black frames to avoid upscalers behaving badly due to scene changes. There are no scene changes inside the segments themselves. Additionally, test video starts and ends with 5 black frames.

Each segment is 48 frames long, but we don’t account for the first 18 frames in our subjective rank calculation to improve results of algorithms which use temporal information and take a few frames to “warm up” (for example, motion estimation).

We made sure that our test video has good algorithm behavior coverage while reducing the number of redundant clips by running the algorithms on a larger set of clips, then clustering the clips in accordance with Pearson correlation of objective scores. Our selected set of clips showed the highest variety of super-resolution algorithm behavior.

The subjective comparison and metrics

To calculate the subjective ranking for upscalers, crowd-sourced subjective comparison with over 4300 valid participants was conducted. Participants were to choose the most visually appealing clip in a pair, the clips being the results of the upscalers on our test video. Frames from the clips which were shown to the participants are available in “Visualizations” sections. To calculate subjective quality values in the tables Bradley–Terry model was used.

Upscalers often slightly displace object borders. It’s not noticeable subjectively, but decreases values of some objective metrics. To combat this, we search for the best values of the metrics by going over pixel shifts of upscalers’ results with ¼ pixel precision. We do it for each segment individually for all frames at once. Graphs with correlation of these shifted metrics with the subjective scores are presented below. The metrics are calculated over Y color component.

References

  1. https://pypi.org/project/colour-demosaicing/
09 Nov 2021
See Also
MSU Video Quality Measurement Tool: Picture types
VQMT 14.0 Online help: List of all picture types available in VQMT and their aliases
MSU HDR Video Reconstruction Benchmark
The most comprehensive comparison of HDR video reconstruction methods
MSU HDR Video Reconstruction Benchmark Participants
The list of participants of MSU HDR Video ReconstructionBenchmark
MSU HDR Video Reconstruction Benchmark Methodology
Evaluation Methodology of MSU HDR Video Reconstruction Benchmark
MSU HDR Video Reconstruction Benchmark Dataset
MSU HDR Video Reconstruction Benchmark Dataset
MSU Super-Resolution for Video Compression Benchmark 2022
Learn about the best SR methods for compressed videos and choose the best model to use with your codec
Site structure