Exploring Automated Voice Casting for Content Localization Using Deep Learning

Aansh Malik, Ha Nguyen

Casting voice-actors to dub source language content into a target language—known as voice casting—consists largely of a manual workflow that could benefit immensely from increased levels of automation. Recent advancements in deep learning architectures for sequential data processing are providing the needed impetus to the realization of various AI-enabled audio-processing workflows. Specifically, applications such as speaker verification and speech synthesis have been gaining immense traction due to the advent and maturity of recurrent neural networks. We explore the viability of leveraging advancements in deep learning for text-independent speaker verification (TI-SV) for use in computer-aided voice casting. To this end, we propose and develop an automated voice-casting tool that uses similarity scores generated from neural network embeddings—from a robust autoencoder model trained for the task of TI-SV—to rank voiceover artists across different languages in voice-casting process. To evaluate the dexterity of the proposed approach, we conduct a subjective study emulating a simplified voice-casting process on actual voice-testing kits (dubbing auditions) from our content. We also use casting decisions from casting experts to further evaluate the tool as well as the subjectivity involved in the voice-casting process. We achieve promising results for the automated tool and prove that it could be a viable approach to automating the voice-casting process and warrants further exploration.

Print ISSN
Electronic ISSN
2160-2492
Published
2021-04
Content type
Original Research
Keywords
Artificial intelligence, audio processing, automation, content localization, deep learning, dubbing, machine learning, neural networks, voice casting, voice similarity
DOI
10.5594/JMI.2021.3057695
Download the PDF