Second annual MPEG4-AVC/H.264 codecs comparison
Frequently Asked Questions
Q: How did you choose metric set for your comparison?
A: We used metrics, which are implemented in MSU Quality Measurement Tool. In that tool we implemented objective comparison metrics that are most commonly used.
The main metric in our comparison is PSNR, because it is used in most objective comparisons, so our results will be understandable for everybody. We will increase the number of metrics in next comparison in spite of increasing measurements time.
Q: Did you try to find mistakes in your comparison?
A: Of course we did. Before the comparison start we found reviewers for our comparison. They got draft of our report the month before public release. In exchange they send to us list of comments and mistakes in our comparison. Such exchange significantly decreased number of mistakes in our report.
Q: How did you verify objective measurements?
A: We used different ways:
- First, we have published our measurement tool. Lots of people download it, use it, and some times send us bug reports (as a rule, bugs are in work with file formats, B-frames in AVI, etc). So, reliability of our tool now is much better than in previous comparisons.
- Second, after completion of all measurements, we provided original sequences and full results to codec developers (only results of the developer’s codec and one of the freeware reference codecs for each developer). Developers are really interested in good results for their codec and they can verify some strange results.
These are our methods to increase our results reliability.
Q: You mainly test the encoders, why don’t you call your comparison “encoder comparison”?
A: We use developer’s decoder if it is provided to us with encoder. It means that developers can increase their results using decoder optimization, postfiltering, etc.
In next comparison we are going to make additional decoder compatibility tests.
Q: What computers have you used for measurements?
A: You can find information about our computers’ configurations on this page, or on the 7-th page of PDF document.
Q: Why did you use deinterlaced sequences?
A: It is our common policy. We chose our sources similar to sequences, which ordinary users use. It is very difficult for ordinary user to get progressive sequence nowadays. As a rule, users get sequences to compress from DVDs, satellite receivers, DV-cameras, etc. They capture that sequences in real time using popular simple embedded deinterlacing methods. Such methods along with compressing artifacts decrease encoding performance on those sequences. We think that popular codecs should take into consideration such features of sequences with the help of prefiltering and advanced motion compensation.
Q: Is it possible for developers to use information about your testing set to make better their results in your comparison?
A: Theoretically it is possible. That is why we each time replace two or three sequences with absolutely new ones, publishing them only after finish of all measurements. So, user can draw more attention on that sequences (we also track differences in results with big interest).
Q: You have used ATI graphics accelerator on your computers and ATI codec is fastest in your comparison. Don’t you think it is strange?
A: We used the same computers as in previous comparison last year. No one knew about ATI codec at that moment.
Anyway, we also were very interested, if ATI codec had used any hardware acceleration. However, such acceleration does not conflict with testing rules (we compare codecs for ordinary PC machines).
We made additional tests at another computer with following configuration:
- Processor: Pentium 4, 3.0 GHz with Hyper Threading
- Operation system: Windows XP Pro, SP2
- Memory: 1Gb
- Video accelerator: nVidia GeForce 6600 GT
- Hard disk: SATA 200Gb
Measurement results (“Foreman” sequence):
Q: I think, you did not attempted visual comparison; there are only few frame pictures in your comparison!
A: We apply much attention to correct visual comparison. :)
- First, by choosing frames anyone can show that any codec is better than any other! It is because the quality of different frames in decoded sequence changes significantly. You can read about such situation reasons in Introduction to Video Codecs Comparison. But people are still asking such questions; that is why we introduced per-frame metrics in our comparison to show differences in frames quality.
- Second, we developed special software MSU Perceptual Video Quality, which is created to conduct subjective blind (when experts don’t know what encoder was used for current sequence) assessments using different testing methods from Butterfly-test to ITU-T BT.500-11 standard recommendations. This program is the only freeware tool for such assessments.
- Third, we are going to make number of visual subjective comparisons. First such comparison will be released soon.
Q: Why didn’t you add codec X in your comparison?
A: For each codec we used presets, provided us by developers. So, only codecs, for which we were able to communicate with developers, were added to our comparison.
Q: Why did you measure High Profile separately?
A: This year, similar to previous comparison, we tested only Main Profile. More over, when we announced “Call for codecs”, only one codec, according to our knowledge, could support High Profile good enough. However, when developers provided to us new version of their codecs, we discovered that at least three companies implemented High Profile.
Next year we are going to remove profile restrictions, but we will publish codec presets including used profile.
Q: Why did you use such strange rules for your informal comparison?
A: People already have sent to us a lot of suggestion to combine all results into one table. Actually, in the beginning we didn’t want to make such table at all, because when one codec is best at low bitrates, another one could be better at high bitrates, third one at film material and fourth one at video conferencing. So, we can average out all results, but…
Nevertheless, we are going to improve informal comparison rues, so, you your suggestions are welcome.
Q: Why do codecs versions in your comparison are not the latest ones?
A: The thing is that comparison measurements take rather long time. But we didn’t renew our codecs versions during the measurements (except critical bugs in codecs) to ensure fair conditions for all developers. So, some developers could release new version of their codec before comparison release.
In the future we are going to speed up our measurements using more productive work with developers and improving our measurement methods.
Q: Why didn’t you use this type of diagrams?
A: We permanently increase both number of diagram types and number of graphs. There are seven diagram types in our last comparison. In future we are going to add some new diagram types and replace some of existing ones with improved versions. We have already started work in that direction.
Q: Why do your measurements take so much time?
A: There are number of factors, which increase measurement time:
- People often ask us to increase number of sequences to analyze codec results in new application areas. Only pure measurements time is now more than 11 days. We are going to use additional computers, but increase of sequence number will increase time of human work in any case. Currently, all our comparisons are free, so we are trying to limit human work time.
- Another tendency is to increase number of metrics. We know about drawbacks of PSNR and will add new metrics, trying to limit their number in reasonable boundaries.
- Number of codecs is also increasing. We are interested in number of codecs increase, but at the same time it increases time to report preparation.
- If we find obvious mistake in codec, we, as a rule, send a bug report to developer. Some times developers can fix that bugs and send us new version. That is correct approach in developer’s point of view, but it also increases number of measurements in our comparison.
- Report verification takes lots of time, because we are trying to increase report quality as much as possible.
So, if we decrease number of measured metrics and sequences in set, prohibit developers to fix bugs and remove report verification, we will speed up our comparison in few times.
Q: When will you make new comparison?
A: In September 2006, if we will not make two comparisons in year. :)
MSU Benchmark Collection
- MSU Video Upscalers Benchmark 2022
- MSU HDR Video Reconstruction Benchmark 2022
- MSU Super-Resolution for Video Compression Benchmark 2022
- MSU Video Quality Metrics Benchmark 2022
- MSU Video Alignment and Retrieval Benchmark
- MSU Mobile Video Codecs Benchmark 2021
- MSU Video Super-Resolution Benchmark
- MSU Shot Boundary Detection Benchmark 2020
- MSU Deinterlacer Benchmark
- The VideoMatting Project
- Video Completion
- Codecs Comparisons & Optimization
- Video Quality Measurement Tool 3D
MSU Datasets Collection
- Video Filters