Main page — Video Quality Measurement Tool 3D

Automatic detection and analysis of techniques for 2D to 3D video conversion

Author: Polina Pereverzeva
Supervisor: dr. Dmitriy Vatolin

Introduction

One of the most common methods of 3D movies creation is conversion from 2D. It is a process, where two separate views for each eye are created from one source image.

The most widely used method for 2D to 3D conversion is warping of a source video according to a depth map — Depth Image-Based Rendering (DIBR). Original image pixels are shifted horizontally depending on the corresponding depth value. But at the same time, unfilled areas appear in occlusions — parts of the image invisible in the original frame. Filling such areas is a difficult task that has not been completely resolved yet.

Incorrect filling in occlusions
Valerian and the City of a Thousand Planets

Opening areas

In addition to filling the occlusions, the following conversion methods exist:

Enlarging the foreground object

Enlarged object example
Spider-Man: Homecoming

Enlarged object example

Stretching the background beyond the borders of the foreground objects

Warped background example
Ant-Man

Warped background example

Removing plot-insignificant objects

Deleted object example
The Legend of Tarzan

Deleted object example

This method allows to detect the conversion method and to what extent the final frame differs from the source one.

Proposed method

Algorithm scheme

Algorithm scheme

Experiments

To verify the correctness of the classifiers, a test dataset of 35 full-length converted stereoscopic movies containing 4 classes was compiled (1000 examples per class):

Frames containing only removed objects;
Frames containing only enlarged objects;
Frames containing only a deformed background;
Frames where none of the considered conversion techniques are present.

Analysis of the deformed areas boundaries
Alice Through the Looking Glass

Left view Final map Marked borders

Blue indicates the border with a positive depth change, green and red — the border without any depth changes, purple — the border with a negative depth change.

The evaluation of proposed algorithms on the test dataset are presented on the following graph. The classification accuracy was at least 90%.

Graph

Results

The average runtime of the proposed method for processing video sequences with a resolution of 960 × 540 is approximately 1 second on a computer with the following characteristics: 3.20 GHz Intel Core i5, 8 GB RAM.

05 May 2020

Video processing, compression and quality research group Based in MSU Graphics & Media Laboratory

Automatic detection and analysis of techniques for 2D to 3D video conversion

Introduction

Proposed method

Experiments

Results

Video processing, compression
and quality research group
Based in MSU Graphics & Media Laboratory