The Methodology of the Subjective QRS Benchmark
Video Sources
Acquisition of a representative dataset involves selecting a large number of videos with suitable technical parameters (length, license, etc.), then making representative subsamples using clustering and feature selection methods; after that, we plan to apply distortions to the resulting videos and the resulting representative set becomes ready for subjective comparison.
Requirements to Video Sources
- Only videos with CC-BY or CC-0 licenses (re-distribution rights).
- Duration: at least 10 seconds.
- Frame rate: at least 10 FPS.
Selected Video Sources
- Vimeo
- FineVideo
- YouTube
Video Pre-processing
- No more than 2 slices are selected from the list of scenes detected using AutoShot.
- The number of scenes in each slice is minimal, so that the total length of the fragment is at least 15 seconds.
- Slices are selected to be equidistant from each other, the beginning, and the end of the video; final fragments are trimmed to 15 seconds in the center.
- Encoding (FFmpeg):
-vcodec libx264 -pix_fmt yuv420p -crf 0 -preset medium -sn -dn - Audio (if present):
-acodec aac -audio_bitrate 256k - Frames resized to 1080 pixels on the smaller side while keeping the aspect ratio. So the final resolution was 1920×1080 and 1080×1920
Feature Extraction
To properly select videos for representative subsampling we utilized various feature extraction methods.
General IQA/VQA Metrics
- Luminance histogram features: luminance quantiles, dark and bright pixel ratios, noise ratio, and entropy
- Hasler–Suesstrunk
- Std-Luminance
- CLIP-IQA+
- TOPIQ
- LAION Aesthetic
- PaQ-2-PiQ
- StableVQA
VQMT Metrics
- SI and TI
- Blurring
- Blocking
Semantic Features and Labeling
- YOLOv12-L
- Places365
- SigLIP-2
- InternVL3
Feature-Based Video Clustering
Clustering Techniques
- Each video is represented by a unified feature vector combining numerical quality metrics, categorical semantic descriptors, and InternVideo embeddings.
- Numerical features are summarized by the mean and standard deviation of frame-level quality metrics.
- Categorical features are constructed from annotations using occurrence or confidence-weighted counts, then log-scaled and ℓ2-normalized: \(x_c \leftarrow \dfrac{x_c}{\left\lVert \log(1+x_c)\right\rVert_2 + \varepsilon}\)
- Embeddings are normalized to unit length: \(x_e \leftarrow \dfrac{x_e}{\left\lVert x_e\right\rVert_2 + \varepsilon}\)
- All components are concatenated and globally ℓ2-normalized to obtain the final representation
x. - Clustering: k-means in FAISS with k = 1000 clusters and 50 training iterations, minimizing squared Euclidean distance in the normalized feature space.
- After clustering, each video is assigned to its nearest centroid; the video with the smallest distance to the centroid is selected as the representative of each cluster, yielding a diverse and balanced subset of videos.
- The result is a selection of 1000 videos for inference methods and comparison using a subjective quality rating system (QRS).
Distribution of cluster sizes after k-means clustering.
Technical specifications
Technical specifications of the system for the inference of methods:
- CPU AMD EPYC 7532 32-Core Processor
- RAM 500 GB
- GPU NVIDIA A100 80 GB
Subscribe to this benchmark's updates using the form and get notified when the paper will be available.
-
MSU Benchmark Collection
- Super-Resolution Quality Metrics Benchmark
- Super-Resolution Quality Metrics Benchmark
- Video Colorization Benchmark
- Video Saliency Prediction Benchmark
- LEHA-CVQAD Video Quality Metrics Benchmark
- Learning-Based Image Compression Benchmark
- Super-Resolution for Video Compression Benchmark
- Defenses for Image Quality Metrics Benchmark
- Deinterlacer Benchmark
- Metrics Robustness Benchmark
- Video Upscalers Benchmark
- Video Deblurring Benchmark
- Video Frame Interpolation Benchmark
- HDR Video Reconstruction Benchmark
- No-Reference Video Quality Metrics Benchmark
- Full-Reference Video Quality Metrics Benchmark
- Video Alignment and Retrieval Benchmark
- Mobile Video Codecs Benchmark
- Video Super-Resolution Benchmark
- Shot Boundary Detection Benchmark
- The VideoMatting Project
- Video Completion
- Codecs Comparisons & Optimization
- VQMT
- MSU Datasets Collection
- Metrics Research
- Video Quality Measurement Tool 3D
- Video Filters
- Other Projects