On hallucination in Large Language Models

Hallucination describes when a Natual Language Generation algorithm generates a response to a prompt which is inaccurate and has no obvious basis in the prompt or training data it. The term may be misleading when referring to text generated using a Large Language Model - it tends to suggest they have invented facts and are mistaken about what is true. The algorithms are actually designed to find the most likely continuation of a stream of text. To do this they have somehow encoded information about facts based on the prompt text as well as the immense set of training data, and predicts accurate responses based on those facts as more likely than inaccurate ones. If it can’t find relevant facts to identify an accurate answer it will still use the structure of the prompt to generate an answer that looks correct. Based on the original training goal this is not an error but rather the best continuation of text based on the information available. Where the term hallucination is more appropriate is once an innacurate answer has been generated continued text generation includes this as part of the prompt, and so accepts the inaccurate statement as true from that point in the generated text.

Our understanding of how facts are encoded in an LLM is very poor, both during initial training and while parsing a specific prompt for text generation. As research in this topic develops LLMs it will enable better ways to use LLMs for answering questions based. For now the best we can do is use prompts and training data to make the model more likely to respond with “I don’t know”, as well as using LLMs to assess the accuracy of a generated response.

The paper Survey of Hallucination in Natural Language Generation covers this topic.