Toward Generalized Psychovisual Preprocessing for Video Encoding

Aaron Chadha, Mohammad Ashraful Anam, Matthias Treder, Ilya Fadeev, Yiannis Andreopoulos

Deep perceptual preprocessing has recently emerged as a new way to enable further bitrate savings across several generations of video encoders without breaking standards or requiring any changes in client devices. In this paper, we lay the foundations toward a generalized psychovisual preprocessing framework for video encoding and describe one of its promising instantiations that is practically deployable for video-on-demand, live, gaming and user-generated content. Results using state-of-the-art AVC, HEVC and VVC encoders show that average bitrate (BD-rate) gains of 11% to 17% are obtained over three state-of-the-art reference-based quality metrics (Netflix VMAF, SSIM and Apple AVQT), as well as the recently-proposed non-reference ITU-T p.1204 metric. The proposed framework on CPU is shown to be twice faster than x264 medium-preset encoding. On GPU hardware, our approach achieves 714fps for 1080p video (below 2ms/frame), thereby enabling its use in very-low latency live video or game streaming applications.

Published: 2021-11
Content type: Original Research
Keywords: Perceptual optimization, deep neural networks, video delivery
DOI: 10.5594/M001933