Improvements of Mobile Real-time EVC Decoder and Player
By Olga Krovyakova - May, 26 2021
Abstract
This article describes the improvements over a Stage 1 of the Real-time Mobile Video Player Project from October 2020, that implemented an early version of the player, capable of EVC bitstreams real-time playback. The Player is based on ETM reference software version 6.1 and optimized for ARM architecture. The Player is developed and tested on Huawei P40 Pro smartphone and demonstrates a real-time playback at 24 FPS on 1080p test files with a subset of EVC Main profile tools.
1.Results
Summarizing the first stage of the solution had been presented that allows a real-time playback of the encoded sequences with the following restrictions:
- Quantization parameter value: 32;
- Coding tools set: Affine, DMVR, HTDF, ADMVP, ADCC, AMVR, ATS, IQT, CM_INIT, ADDB, DBF, HMVP, MMVD, POCS, RPL (Stage1 set).
Presented EVC Player demonstrates 25.07 FPS for the “ParkScene” and 26.02 FPS for the “Kimono1” sequences.
The results of the optimizations, made during the Stage 2 show that BTT tool can be safely added to the tools set and the QP can be reduced to 27 and this will increase the image quality with the conditions:
- Quantization parameter value: 27;
- Coding tools set: Affine, DMVR, HTDF, ADMVP, ADCC, AMVR, ATS, IQT, CM_INIT, ADDB, DBF, HMVP, MMVD, POCS, RPL, BTT (Further referred as Stage2 set).
Table 1 - Average maximal playback speed achieved with the selected toolset on the Test Mobile Device for QP=27
Test sequence |
Decoding speed of the Stage 1 decoder (fps) |
Decoding speed of the Stage 2 decoder (fps) |
Performance gain over the Stage 1 decoder (%) |
ParkScene |
21.36 |
32.99 |
53,2 |
Kimono1 |
22.85 |
35.02 |
54,4 |
As a result of the optimizations, the decoder shows 53-54% of the performance gain in the EVC player application, designed for Huawei P40 Pro smartphone.
Table 2. Selected toolset for the real-time player improvement
Tool short name |
Tool full name |
ETM6.1 CTC RA default configuration |
Selected toolset, stage 1 |
Selected toolset, stage 2 |
ADMVP |
Advanced Motion Vectors Prediction |
1 |
1 |
1 |
AFFINE |
Affine prediction |
1 |
1 |
1 |
HTDF |
Hadamard Transform Domain Filter |
1 |
1 |
1 |
DMVR |
Decoder side Motion Vectors Derivation |
1 |
1 |
1 |
ADCC |
Advanced Coefficients Coding |
1 |
1 |
1 |
ADDB |
Advanced Deblocking |
1 |
1 |
1 |
ALF |
Adaptive Loop Filter |
1 |
0 |
0 |
AMVR |
Adaptive Motion Vectors resolution |
1 |
1 |
1 |
ATS |
Adaptive Transforms Selection |
1 |
1 |
1 |
BTT |
Binary and Ternary Trees |
1 |
0 |
1 |
CM_INIT |
Context Modeling Initialization |
1 |
1 |
1 |
DBF |
Deblocking Filter |
1 |
1 |
1 |
EIPD |
Enhanced Intra Prediction Directions |
1 |
0 |
0 |
HMVP |
History Motion vectors Prediction |
1 |
1 |
1 |
iQT |
Advanced Quantization and Transforms |
1 |
1 |
1 |
MMVD |
Merge with Motion Vectors Difference |
1 |
1 |
1 |
POCS |
Advanced Picture Order Count |
1 |
1 |
1 |
RPL |
Reference Picture List |
1 |
1 |
1 |
SUCO |
Split Unit Coding Order |
1 |
0 |
0 |
IBC |
Intra Block Copy |
0 |
0 |
0 |
2. Supported EVC tools
The BTT tool is considered having the greatest potential tool so we perform tests for the BTT tool additionally enabled to the Stage1 set (Table 2).
The following Figures 1-3 show EVC encoding, decoding performance and PSNR values for the files ParkScene and Kimono1 encoded with QP values 27-32 with and without BTT64 tool. For convenience, figures contain combined values measured for streams without BTT64 and with BTT64 accordingly.
Figure 1. ETM 6.1 encoding performance on PC with i7-9700 CPU operated by Windows 10 x64 with and without BTT64 tool enabled
Figure 2. Stage 1 decoder performance on HUAWEI P40 Pro operated by EMUI 10.1 with and without BTT64 tool enabled
Figure 3. PSNR values of files encoded by ETM 6.1 encoder with and without BTT64 tool enabled.
Figure 3 shows that enabling BTT64 does not reduce the decoder performance significantly, and also improves the output file’s quality. So we can safely enable it as an extra EVC tool.
To stick to the decoding concept proposed in the Stage 1, it is proposed to enable 64x64 CU size for the BTT (further referred as BTT64 for the convenience).
In order to achieve real-time playback speed on the target device it is important to know EVC tools profiling information while decoding. Such information was collected by the Android Profiler on the test decoded with Stage 1 decoder. Figure 4 and Figure 5 demonstrate obtained profiling data on the target device for the ParkScene and Kimono1 bitstreams encoded with the Stage 2 toolset and the QP value equal to 27.
Figure 4. Decoder profiling for the ParkScene bitstream encoded with the Stage 2 toolset (QP=27)
Figure 5. Decoder profiling for the Kimono1 bitstream encoded with the Stage 2 toolset (QP=27)
The most time consuming functions are related to the DMVR, Motion Compensation (MC), Deblocking and Reconstruction. These tools consume about 77% of the decoding time and are the best candidates for the SIMD optimization.
3. Implementation details and playback
In order to achieve real-time playback on the selected toolset the following main modifications were performed on top of the ETM 6.1 reference SW:
- ARM SIMD implementation of the most critical functions in Deblocking and Reconstruction parts
- Waterfront-like parallel processing (WPP) for deblocking and decoding processes, thread pull and line based startup
As a preparation to the multithreaded functionality optimizations there were some additional refactoring and optimizations in functions related to deblocking. After finalizing multithreaded implementation, the decoder’s FPS gain over the Stage 1 decoder is 40% for the Kimono1 and 42% for the ParkScene sequences, and the Player is capable of demonstrating 24 fps playback on the device.
In order to check playback speed objectively and subjectively the Player was deployed and tested on Huawei P40 Pro smartphone.
Figure 6 demonstrates a picture of the Player during Kimono bitstream playback on the device.
Figure 6. Picture of the Player working on Huawei P40 Pro
4. Profiling of the optimized decoder
Figure 7. Optimized decoder profiling for the ParkScene bitstream encoded with the Stage 2 toolset (QP=27)
Figure 8. Optimized decoder profiling for the Kimono1 bitstream encoded with the Stage 2 toolset (QP=27)
5. CPU Utilization
The decoder is optimized for Huawei P40 Pro mobile phone in terms of CPU cores usage, more specifically the SW works only with 4 most powerful Kirin 990 CPU cores (hi-end and mid-end). Figure 9 and Figure 10 display CPU Utilization percentage for each of the 4 powerful cores during the decoding process.
(a)
(b)
Figure 9. Hi-end and mid-end CPU cores utilization. Kimono playback, (a) Stage 1, (b) Stage 2
(a)
(b)
Figure 10. Hi-end and mid-end CPU cores utilization. ParkScene playback, (a) Stage 1, (b) Stage 2
Figures 9-10 show that hi- and mid-end cores are used more effectively by the Stage 2 optimized decoder.
6. Power Consumption
In order to estimate power consumption, an infinite playback loop of test bitstreams was launched in the Player at 100% charged Test Mobile Device and the process was working until the device switched off due to energy insufficiency. As a result, the Player was working for 4h 25m with the Kimono1 file and 4h 07m with the ParkScene file with 24 fps speed. Figure 11 and 12 summarize obtained results.
(a)
(b)
Figure 11. Power consumption and playback speed during infinite playback of Kimono, (a) Stage 1, (b) Stage 2
(a)
(b)
Figure 12. Power consumption and playback speed during infinite playback of ParkScene, (a) Stage 1, (b) Stage 2
Figures 11-12 show that the battery consumption in Stage 2 has slightly increased in comparison with the Stage 1.
7. Conclusion
Solveig Multimedia continued to make optimizations related to the WPP algorithm introduced in the Stage 1 as well as the SIMD optimizations of the functions.
As a result of the optimizations, the decoder shows 53-54% of the performance gain in the EVC player application, designed for Huawei P40 Pro smartphone.
The optimized decoder shows 33-35 FPS for the proposed toolset, where the BTT tool is additionally included so all objectives of the Stage 2 are over-fulfilled. And it means that the QP value can be also decreased to 23. For QP=23 the decoder shows 31.52 FPS for the Kimono1 and 28.37 FPS for the Parkscene files which is enough to provide a real-time 24 FPS playback for both files.
- https://www.solveigmm.com/en/howto/early-implementation-of-mobile-real-time-evc-player-mpeg-submission-october-2020/
Olga Krovyakova is the Technical Support Manager in Solveig Multimedia since 2010.
She is the author of many text and video guidelines of company's products: Video Splitter, HyperCam, WMP Trimmer Plugin, AVI Trimmer+ and TriMP4.
She works with programs every day and therefore knows very well how they work. Сontact Olga via support@solveigmm.com if you have any questions. She will gladly assist you!