Temporal shift estimation for stereoscopic videos


Video synchronization is a fundamental computer-vision task necessary for a wide range of applications. A 3D video involves two streams, which show the scene from different angles simultaneously. It was demonstrated that desynchronization between streams causes severe discomfort for people watching the stereo video.

We propose a temporal shift (time difference) estimation method. In this method we assume that the temporal shift and geometric distortion between the two streams are constant throughout each scene. The result of the algorithm is a shift value measured in fractions of frame steps (inverted FPS).

Example of a detected shot with temporal shift
Drive Angry

We approached the task as a regression problem by constructing an equation that describes the spatio-temporal dependency using the motion disparity and stereo parallax vectors.

The proposed algorithm consists of the following two main stages:

  1. Calculate the stereo parallax and motion vectors using a block-based matching for each stereo frame;
  2. Estimate model parameters from motion vectors with high confidence using the RANSAC algorithm.

Stereoscopic video can employ horizontal disparity by design in order to achieve the stereo effect, but vertical disparity is always the result of spatio-temporal misalignment. The algorithm uses this assumption to restore a temporal shift value from vectors’ vertical components. The detailed algorithm description is published in [1].

A histogram of founded values. The tangent of the slope is the shift value in frames’ fractions


The algorithm has been tested on our synthetically created dataset. The video set contained 396 stereoscopic scenes with frame rate of 30 FPS from only converted stereoscopic movies, as they did not contain temporal shifts. The frames were subsampled to simulate the temporal shift (e.g. taking only even frames for the left view and uneven frames for the right view results in a shift of 0.5 frames). The final dataset consisted of subsampled views, resulting in a relative temporal shift by ±{0.25, 0.5, 1.0, 2.0} frames.

The comparison of the current algorithm with the previous work shows a significant gain. The error was calculated as the absolute difference between the target and estimated shift values in the frame steps. The evaluation problem was addressed as a classification of whether the error was below a threshold value. In the experiments, the least noticeable value of the time shift was estimated to be 0.10 frames, and it was used as a threshold error value for comparing algorithms.

The comparison of proposed algorithm with our previous work [1]. Left: the relation between the temporal shift accuracy and the estimation error threshold; Right: the exact scores for error threshold value equal to 0.10 frames.

Additionally, we’ve processed 60 full-length stereoscopic movies and revealed 198 scenes with temporal shift value at least 0.10 frames. Further examples can be found in our VQMT3D reports 8 and 9.

Histogram of revealed scenes with temporal shift



1. Ploshkin, A., and Vatolin, D.,
“Accurate method of temporal-shift estimation for 3D video,” [pdf]
2018-3DTV-Conference: 3D at any scale and any perspective (3DTV-CON),
2018. doi:10.1109/3DTV.2018.8478431

05 May 2020
See Also
Video Colorization Benchmark
Explore the best video colorization algorithms
MSU 3D-video Quality Analysis. Report 12
MSU 3D-video Quality Analysis. Report 11
MSU 3D-video Quality Analysis. Report 10
Detection of stereo window violation
How to find objects that are present only in one view?
Depth continuity estimation in S3D video
How smooth is the depth transition between scenes?
Site structure