Integration of Depth and Volumetric Capture Workflows for 2.5D Virtual Production

Chris Nash, Nathan Butt, Andrea Loriedo, Robin Spooner, Taegyun Ha, Philip Coulam-Jones, James Bentley, Aljosa Smolic

The ability to capture a volumetric representation of performers in a production environment would enable a multitude of novel on-set and post-production workflows. Current methods are impractical due to their reliance on large numbers of cameras and consistent lighting conditions. We propose a framework for creating 2.5D assets from a single monocular video, allowing digitized performers to be viewed from a range of angles with the appearance of depth. An application processes the video offline, using depth and segmentation AI models to create a packaged 2.5D asset. This is then loaded into Disguise Designer for pre-visualisation of the performers within the virtual stage. Analysis of state-of-the-art depth inference models, using videos captured to represent the challenges of production environments, shows that it is possible to obtain coherent video depth maps in these conditions. However, metric models do not always accurately identify absolute depth values, and it is necessary to use models tailored specifically for video to ensure temporal consistency in the results. This work aims to serve as a foundation for more comprehensive 3D volumetric capture of performers in real-world production environments.

Print ISSN
Electronic ISSN
2160-2492
Published
2025-10
Content type
Original Research
Keywords
volumetric capture, depth estimation, computer vision, virtual production
DOI
10.5594/JMI.2025/FOOS6445