Toward Generalized Psychovisual Preprocessing For Video Encoding

Aaron Chadha, Mohammad Ashraful Anam, Matthias Treder, Ilya Fadeev, Yiannis Andreopoulos

Deep perceptual preprocessing has recently emerged as a new way to enable further bitrate savings across several generations of video encoders without breaking standards or requiring any changes in client devices. In this article, we lay the foundation for a generalized psychovisual preprocessing framework for video encoding and describe one of its promising instantiations that is practically deployable for video-on-demand, live, gaming, and user-generated content (UGC). Results using state-of-the-art advanced video coding (AVC), high efficiency video coding (HEVC), and versatile video coding (VVC) encoders show that average bitrate [Bjontegaard delta-rate (BD-rate)] gains of 11%–17% are obtained over three state-of-the-art reference-based quality metrics [Netflix video multi-method assessment fusion (VMAF), structural similarity index (SSIM), and Apple advanced video quality tool (AVQT)], as well as the recently proposed nonreference International Telecommunication Union-Telecommunication?(ITU-T) P.1204 metric. The proposed framework on CPU is shown to be twice faster than $\times 264$ medium-preset encoding. On GPU hardware, our approach achieves 714 frames/sec for 1080p video (below 2 ms/frame), thereby enabling its use in very-low-latency live video or game streaming applications.

Print ISSN: 1545-0279
Electronic ISSN: 2160-2492
Published: 2022-05
Content type: Original Research
Keywords: Deep neural networks, perceptual optimization, video delivery
DOI: 10.5594/JMI.2022.3160801