CrowdSAL: Crowdsourced Video Saliency Prediction
Dataset and Benchmark

Introduction
Video saliency prediction aims to estimate where people look while watching visual content. Reliable saliency maps are valuable for a wide range of multimedia applications, including compression and transmission, quality assessment, retargeting, rendering optimization, enhancement, etc.
We introduce CrowdSAL Dataset, the largest publicly available dataset for the video saliency prediction task, annotated through a carefully designed crowdsourced mouse-tracking pipeline. The collection methodology was validated against established eye-tracking datasets, showing that large-scale crowdsourcing can serve as a practical and effective alternative for saliency annotation.
We provide an official CrowdSAL Benchmark split. Benchmark results are reported for 10 state-of-the-art video saliency prediction models. Everyone is welcome to participate! Run your method on our dataset and send us the result to see your method in the “Leaderboard“. Check the “Submitting” section to learn the details.
5000 Videos (2.7M+ frames)
of 886 vertical and 4114 horizontal
videos from YouTube, Shorts and Vimeo
Reliable Data Collection
from 19,585 observers
with 75+ observers per video
Dataset Quality Standards
featuring Full HD resolution,
audio streams, and CC-BY license
Open Visual Comparison
with representation of model predictions
paired with GT saliency maps
10 Open-Source SotA Models
tested on the CrowdSAL
video saliency prediction benchmark
Speed/Quality Scatter Plots
and tables with objective metrics
for a comprehensive comparison
What’s New
- April 1st, 2026: CrowdSAL release
We use various objective metrics for evaluating video saliency prediction methods. Check the “Methodology” section to learn the details. Also, we calculate the average FPS (frames per second) and number of parameters to compare the performance of the algorithms.
Scroll below for comparison charts, tables, and interactive visual comparisons of saliency model results.
Visualizations
Leaderboards
Charts
Submitting
To add your video saliency prediction method to the benchmark, follow these steps:
| 1. Download the CrowdSAL dataset (Hugging Face or Google Drive) |
|
2. Apply your Video Saliency Prediction method to all videos from the test CrowdSAL subset (2000 videos) |
3. Send us an email to and.v.moskalenko@gmail.com with the following information:
|
If you have any suggestions or questions, please contact us: and.v.moskalenko@gmail.com
Get Notifications About the Updates
Do you want to be the first to discover the best new video saliency prediction algorithm? We can notify you about this benchmark’s updates: simply submit your preferred email address using the form below. We promise not to send you unrelated information.
Cite Us
The BibTeX citation will be here upon conference acceptance.
Supplementary Dataset Information
Check the “Methodology” section to learn how we prepare our dataset.
-
MSU Benchmark Collection
- Super-Resolution Quality Metrics Benchmark
- Video Colorization Benchmark
- Video Saliency Prediction Benchmark
- LEHA-CVQAD Video Quality Metrics Benchmark
- Learning-Based Image Compression Benchmark
- Super-Resolution for Video Compression Benchmark
- Defenses for Image Quality Metrics Benchmark
- Deinterlacer Benchmark
- Metrics Robustness Benchmark
- Video Upscalers Benchmark
- Video Deblurring Benchmark
- Video Frame Interpolation Benchmark
- HDR Video Reconstruction Benchmark
- No-Reference Video Quality Metrics Benchmark
- Full-Reference Video Quality Metrics Benchmark
- Video Alignment and Retrieval Benchmark
- Mobile Video Codecs Benchmark
- Video Super-Resolution Benchmark
- Shot Boundary Detection Benchmark
- The VideoMatting Project
- Video Completion
- Codecs Comparisons & Optimization
- VQMT
- MSU Datasets Collection
- Metrics Research
- Video Quality Measurement Tool 3D
- Video Filters
- Other Projects