Main page — MSU Datasets Collection

CrowdSAL: Crowdsourced Video Saliency Prediction
Dataset and Benchmark

Dataset on Hugging Face

Dataset on Google Drive

Evaluation Сode on GitHub

G&M Lab head: Dr. Dmitriy Vatolin

Measurements, analysis: Alexey Bryncev, Andrey Moskalenko, Ivan Kosmynin, Kira Shilovskaya

Introduction

Video saliency prediction aims to estimate where people look while watching visual content. Reliable saliency maps are valuable for a wide range of multimedia applications, including compression and transmission, quality assessment, retargeting, rendering optimization, enhancement, etc.

We introduce CrowdSAL Dataset, the largest publicly available dataset for the video saliency prediction task, annotated through a carefully designed crowdsourced mouse-tracking pipeline. The collection methodology was validated against established eye-tracking datasets, showing that large-scale crowdsourcing can serve as a practical and effective alternative for saliency annotation.

We provide an official CrowdSAL Benchmark split. Benchmark results are reported for 10 state-of-the-art video saliency prediction models. Everyone is welcome to participate! Run your method on our dataset and send us the result to see your method in the “Leaderboard“. Check the “Submitting” section to learn the details.

5000 Videos (2.7M+ frames)

of 886 vertical and 4114 horizontal
videos from YouTube, Shorts and Vimeo

Reliable Data Collection

from 19,585 observers
with 75+ observers per video

Dataset Quality Standards

featuring Full HD resolution,
audio streams, and CC-BY license

Open Visual Comparison

with representation of model predictions
paired with GT saliency maps

10 Open-Source SotA Models

tested on the CrowdSAL
video saliency prediction benchmark

Speed/Quality Scatter Plots

and tables with objective metrics
for a comprehensive comparison

What’s New

April 1st, 2026: CrowdSAL release

We use various objective metrics for evaluating video saliency prediction methods. Check the “Methodology” section to learn the details. Also, we calculate the average FPS (frames per second) and number of parameters to compare the performance of the algorithms.

Scroll below for comparison charts, tables, and interactive visual comparisons of saliency model results.

Visualizations

Leaderboards

Charts

Submitting

To add your video saliency prediction method to the benchmark, follow these steps:

1. Download the CrowdSAL dataset (Hugging Face or Google Drive)

2. Apply your Video Saliency Prediction method to all videos from the test CrowdSAL subset (2000 videos)

3. Send us an email to and.v.moskalenko@gmail.com with the following information:

Your method name that will be displayed on the leaderboard.
Your method predictions for the CrowdSAL test subset as .mp4 files with the same filenames and frame counts as the videos in the Test/Videos folder.
Additional information about the method:
1. The total number of parameters in your model;
2. Estimated frames-per-second and used hardware (CPU, RAM, GPU, VRAM);
3. (optional) A link to the paper about your model.

If you have any suggestions or questions, please contact us: and.v.moskalenko@gmail.com

Get Notifications About the Updates

Do you want to be the first to discover the best new video saliency prediction algorithm? We can notify you about this benchmark’s updates: simply submit your preferred email address using the form below. We promise not to send you unrelated information.

E-mail

Name

Company / developer name (optional)

How did you hear about us (optional)

Questions and suggestions (optional)

Cite Us

The BibTeX citation will be here upon conference acceptance.

Supplementary Dataset Information

Check the “Methodology” section to learn how we prepare our dataset.

27 Mar 2026

Video processing, compression and quality research group Based in MSU Graphics & Media Laboratory

CrowdSAL: Crowdsourced Video Saliency Prediction Dataset and Benchmark