Metrics correlations of the Video Colorization Benchmark
Metrics charts
Metrics
PSNR
PSNR is a commonly used metric for reconstruction quality for images and video. In our benchmark, we calculate PSNR on the A, B components in LAB colorspace.
SSIM
SSIM is a metric based on structural similarity. In our benchmark, we calculate SSIM on the A, B components in LAB colorspace.
LPIPS
LPIPS (Learned Perceptual Image Patch Similarity) evaluates the distance between image patches. Higher means further/more different. Lower means more similar.
Color[1]
Color is a no-reference metric that is proposed to evaluate the colorfulness of an image. It uses statistics calculated on A, B components in LAB colorspace.
Warp Error[2]
Warp Error is a metric that evaluates the temporal stability of a video. By warping one frame to another using corresponding optical flow, we can compare their differences. In our benchmark, we calculate WarpError on the A, B components in LAB colorspace.
CDC[2]
CDC (the Color Distribution Consistency index) measures the Jensen–Shannon (JS) divergence of the color distribution between consecutive frames. Unlike the commonly-used warping error, CDC is specifically designed for the video colorization task, and can better reflect the consistency of color.
ID
Fréchet inception distance (FID) is a metric for quantifying the realism and diversity of images predicted by generative adversarial networks (GANs). FID has a frequent use in articles on colorization, but we want to point out that this metric makes sense for unpaired datasets. For paired datasets it is enough to calculate just Inception Distance.
VCQI
Video Colorization Quality Index (VCQI) is a novel metric developed by us — a linear ridge‑regression model trained on our subjective‑rating training set (download training set). To better approximate human judgments, we use objective metrics as predictor variables and exhaustively evaluated all possible metric subsets, selecting the combination that maximized correlation with subjective ratings: PSNR, SSIM, Color, and ID.
Because ID exhibited the strongest correlation with human judgments, we further evaluate block‑wise variants of ID across the four InceptionV3 blocks. For each block we extract averaged feature vectors for predicted and ground‑truth images, compute the cosine distance between paired vectors, and average per‑frame distances across frames to obtain a clip‑level score. Features from Block 2 achieved the highest agreement with human evaluations and outperformed the original ID, indicating that Block 2’s intermediate, more abstract representations better capture perceptual differences relevant to video colorization.
To capture non‑linear interactions among predictors, each metric is expanded into a third‑order polynomial feature space, standardized, and then used to fit a ridge regression. The regularization strength is selected by sweeping a logarithmic grid to maximize Pearson correlation on the training set. The final VCQI is therefore a linear model in an enriched feature space whose learned coefficients weight individual metrics and their higher‑order interactions, yielding closer agreement with human judgments than any single objective measure alone.
Metrics runtime
We measured runtime of metrics, for cpu-compatible metrics (PSNR, SSIM, Color, CDC, WE) we run on AMD EPYC 7532 32-Core Processor @ 1.50 GHz, for gpu-based metrics (LPIPS, ID) we run on NVIDIA RTX A6000. The average time per frame on video was calculated and three runs were performed, from which the minimum was taken.
References
- Hasler, D., & Suesstrunk, S. E. (2003, June). Measuring colorfulness in natural images. In Human vision and electronic imaging VIII (Vol. 5007, pp. 87-95)
- Liu, Y., Zhao, H., Chan, K. C., Wang, X., Loy, C. C., Qiao, Y., & Dong, C. (2024). Temporally consistent video colorization with deep feature propagation and self-regularization learning. Computational Visual Media, 10(2), 375-395.
-
MSU Benchmark Collection
- Super-Resolution Quality Metrics Benchmark
- Super-Resolution Quality Metrics Benchmark
- Video Colorization Benchmark
- Video Saliency Prediction Benchmark
- LEHA-CVQAD Video Quality Metrics Benchmark
- Learning-Based Image Compression Benchmark
- Super-Resolution for Video Compression Benchmark
- Defenses for Image Quality Metrics Benchmark
- Deinterlacer Benchmark
- Metrics Robustness Benchmark
- Video Upscalers Benchmark
- Video Deblurring Benchmark
- Video Frame Interpolation Benchmark
- HDR Video Reconstruction Benchmark
- No-Reference Video Quality Metrics Benchmark
- Full-Reference Video Quality Metrics Benchmark
- Video Alignment and Retrieval Benchmark
- Mobile Video Codecs Benchmark
- Video Super-Resolution Benchmark
- Shot Boundary Detection Benchmark
- The VideoMatting Project
- Video Completion
- Codecs Comparisons & Optimization
- VQMT
- MSU Datasets Collection
- Metrics Research
- Video Quality Measurement Tool 3D
- Video Filters
- Other Projects