Exploring Image Corruption in the Workflow, and how to Stop this from Happening

Keith R. Hogan

LTO tape Systems, SAN, NAS, Object Store, RAM, and WAN Optimizers all have configurations available to protect the fidelity of image contents in the workflow. Each system has different methods to ensure that frames contents aren't corrupted, and each method protects against a variety of different failure conditions. And yet, unrecoverable frame corruption still occurs at an unacceptable level. This problem is even more serious in an archive scenario, where content may sit untouched for a long duration, and with the corruption staying undetected for extended periods of time. Media and Entertainment workflows rely solely on the protections provided by the underlying compute, storage, and transmission technologies to ensure the fidelity of the data in the workflow. These protections are very difficult to fully characterize and track, because the technologies are generally deployed deep in the hardware or at the lowest software levels of the IT infrastructure. Depending on the path a frame takes through the workflow, it will be treated to a varying set of protection technologies, like RAID, erasure coding, ECC Memory, and parity checking. To overcome the uncertainty of associated with how these methods ensure fidelity, the industry employs failure detection at each stage of the workflow (generally MD5 checksums). When a corrupt image is detected, the entire file must be restored from a backup copy. This approach is very expensive and introduces significant delays in the production when errors occur. This paper will explore the IT technology utilized in the workflow, and discuss the protection mechanisms provided or employed by each workflow element. In this context, the paper will discuss how frame corruption can occur, even when all of the protection technologies are working as designed. The paper will also discuss a method for providing protection to images at the frame level using Forward Error Correction, such that there is uniformity of protection for images applied throughout the workflow, and such that media errors may be recovered in most cases without having to access a backup copy.

Published
2017-10
Content type
Original Research
Keywords
Data Corruption, Forward Error Correction, FEC, Storage, Archive, SSDs, Object Store, SAN, NAS, ECC, RAID, Erasure Coding, Raster Data, Metadata, MD5, Hash
DOI
10.5594/M001797
ISBN
978-1-61482-959-1