Multi-Label Indexing Technology for News with AI-Based Text Processing

Yuki Yasuda, Simon Clippingdale, Taro Miyazaki, Jun Goto, Takahiro Mochizuki

Broadcasting organizations produce large volumes of news articles daily, requiring accurate metadata to enable efficient reuse across television and online platforms. Manual annotation, however, is both time-consuming and laborintensive. To address this, we propose an AI-based system for automatic multilabel classification of news text. A central challenge in this task is the imbalanced label distribution, where high-frequency labels dominate and rare labels are underrepresented. To mitigate this, we introduce Weighted Asymmetric Loss (WASL) with Label Smoothing, which integrates class-balanced weighting, suppression of dominant negative samples, and smoothing based on label co-occurrence to improve performance on infrequent labels. Evaluation on Reuters-21578 and Japan Broadcasting Corp. (NHK) News Web datasets demonstrates that our approach significantly outperforms baseline methods on both macro-F1 and rnicro-F1 scores. We further developed a prototype system and deployed it in local NHK broadcasting stations, where it reduced metadata creation costs and facilitated content reuse, while high-lighting practical considerations for workflow adaptation.

Print ISSN: 1545-0279
Electronic ISSN: 2160-2492
Published: 2025-10
Content type: Original Research
Keywords: natural language processing, multi-label text classification, loss function, few-shot learning
DOI: 10.5594/JMI.2025/OJUX5368