Leveraging Large Language Models and Knowledge Graphs to Enhance Metadata Extraction for News Media Production

Caio Vinicius Dadauto, Daniel Pinheiro Franco, Roger Sacilotto, Rob Gonsalves, Shailendra Mathur

This paper introduces a novel procedure for leveraging Large Language Models (LLMs) to create Knowledge Graphs (KGs) from unstructured, multi-language news texts. The approach automates the extraction and disambiguation of entities, relations, and taxonomic hierarchies using iterative few-shot LLM prompting and unsupervised clustering techniques. These KGs are further enriched with additional nodes and edges from external sources like WikiData, enhancing the contextual understanding of the data. To make the KGs accessible to non-technical users, a conversational interface is developed, allowing journalists to interact with the KG via natural language queries, which are translated into database queries for information retrieval. This method shows significant potential for improving digital asset indexing, discovery, and content management in the media industry. Future work will focus on expanding the system's capabilities by incorporating metadata from other media types, integrating with external data sources such as DBpedia and social media, and refining the KG generation process to reduce inconsistencies. The ultimate goal is to create a unified, semantically rich environment for content discovery, enabling more sophisticated, context-aware information retrieval across various media platforms.

Published: 2024-10-21
Content type: Original Research
Keywords: large language models, knowledge graphs, unstructured text, language processing, news media, entity extraction, relation extraction, taxonomy inference
DOI: 10.5594/MOO/3042
ISBN: 978-1-61482-965-2