Automatic detection and analysis of techniques for 2D to 3D video conversion
- Author: Polina Pereverzeva
- Supervisor: dr. Dmitry Vatolin
One of the most common methods of 3D movies creation is conversion from 2D. It is a process, where two separate views for each eye are created from one source image.
The most widely used method for 2D to 3D conversion is warping of a source video according to a depth map — Depth Image-Based Rendering (DIBR). Original image pixels are shifted horizontally depending on the corresponding depth value. But at the same time, unfilled areas appear in occlusions — parts of the image invisible in the original frame. Filling such areas is a difficult task that has not been completely resolved yet.
Incorrect filling in occlusions
Valerian and the City of a Thousand Planets
In addition to filling the occlusions, the following conversion methods exist:
- Enlarging the foreground object
Enlarged object example
- Stretching the background beyond the borders of the foreground objects
Warped background example
- Removing plot-insignificant objects
Deleted object example
The Legend of Tarzan
This method allows to detect the conversion method and to what extent the final frame differs from the source one.
To verify the correctness of the classifiers, a test dataset of 35 full-length converted stereoscopic movies containing 4 classes was compiled (1000 examples per class):
- Frames containing only removed objects;
- Frames containing only enlarged objects;
- Frames containing only a deformed background;
- Frames where none of the considered conversion techniques are present.
Analysis of the deformed areas boundaries
Alice Through the Looking Glass
Blue indicates the border with a positive depth change, green and red — the border without any depth changes, purple — the border with a negative depth change.
The evaluation of proposed algorithms on the test dataset are presented on the following graph. The classification accuracy was at least 90%.
The average runtime of the proposed method for processing video sequences with a resolution of 960 × 540 is approximately 1 second on a computer with the following characteristics: 3.20 GHz Intel Core i5, 8 GB RAM.
- Codecs Comparison & Optimization
- Video Filters
- Video Quality Measurement Tool 3D
Semiautomatic Visual-Attention Modeling
MSU Benchmark Collection