Multi-Label Indexing Technology for News with AI-Based Text Processing
Broadcasting organizations produce large volumes of news articles daily, requiring accurate metadata to enable efficient reuse across television and online platforms. Manual annotation, however, is both time-consuming and laborintensive. To address this, we propose an AI-based system for automatic multilabel classification of news text. A central challenge in this task is the imbalanced label distribution, where high-frequency labels dominate and rare labels are underrepresented. To mitigate this, we introduce Weighted Asymmetric Loss (WASL) with Label Smoothing, which integrates class-balanced weighting, suppression of dominant negative samples, and smoothing based on label co-occurrence to improve performance on infrequent labels. Evaluation on Reuters-21578 and Japan Broadcasting Corp. (NHK) News Web datasets demonstrates that our approach significantly outperforms baseline methods on both macro-F1 and rnicro-F1 scores. We further developed a prototype system and deployed it in local NHK broadcasting stations, where it reduced metadata creation costs and facilitated content reuse, while high-lighting practical considerations for workflow adaptation.
- Print ISSN
- 1545-0279
- Electronic ISSN
- 2160-2492
- Published
- 2025-10
- Content type
- Original Research
- Keywords
- natural language processing, multi-label text classification, loss function, few-shot learning
- DOI
- 10.5594/JMI.2025/OJUX5368