A Fully AI Approach to Descriptive Video Accessibility

Bill McLaughlin, James Hu, Fahad Ahmad Arsal, Rhys Fuller

A highly manual and time-consuming process has traditionally created descriptive video accessibility tracks, but it is now possible to apply automated software to produce high-quality and nuanced descriptive services. We break down this process including steps for transcription, generative AI scripting using large language models (LLM), generation of synthetic voice, and mixing, and show examples for several different program types. We also demonstrate a heuristic model for descriptive video quality scoring and its application to the AI-generated descriptions created using our methods. Finally, we assess dimensions of quality descriptive video where AI continues to struggle and further technology improvements may be required to match the utility of traditional workflows.

Published: 2024-10-21
Content type: Original Research
Keywords: descriptive video, audio description, accessibility, generative ai, large language models
DOI: 10.5594/MOO/3044
ISBN: 978-1-61482-965-2