Entity Embellishment Mitigation in LLMs Output with Noisy Synthetic Dataset for Alignment

Reference

Galeshchuk S. (2024). Entity Embellishment Mitigation in LLMs Output with Noisy Synthetic Dataset for Alignment. Proceedings of the Third Ukrainian Natural Language Processing Workshop (UNLP)@ LREC-COLING 2024. pp. 129-134.

Agency Expert(s) related to the Article

Dr. Svitlana Galeshchuk

Read Full Article

Abstract

The present work focuses on the entity embellishments when named entities are accompanied by additional information that is not supported by the context or the source material. Our paper contributes into mitigating this problem in large language model’s generated texts, summaries in particular, by proposing the approach with synthetic noise injection in the generated samples that are further used for alignment of finetuned LLM. We also challenge the issue of solutions scarcity for low-resourced languages and test our approach with corpora in Ukrainian.