Hallucinations in AI systems

Ayush Yadav

Sep 24, 2023

Throughout history, the word ‘hallucination’ has been synonymous with altered reality and abnormal functioning of the human brain. However, in a rapidly evolving landscape and in the age of technological advancements, the term has taken completely novel dimensions. Now the term is used to characterise a different phenomenon entirely, the generation of deceptive and erroneous information by Artificial Intelligence systems like Large Language Models(LLMs).

AI models like Large Language Models(LLMs) and Recurrent Neural Networks(RNNs) have ushered in a transformative era in our everyday interactions with technology, profoundly altering our behaviour and enhancing our digital experiences. These innovations have seamlessly integrated into our lives simplifying tasks like text input and correction through predictive text suggestions and precise text corrections, thus making our communication smoother and more efficient. When it comes to search engines like Google, the underlying AI technologies often driven by LLMs, empower us to navigate the vast landscape of information more effortlessly, offering us search results that align with our needs and intentions. Moreover, these AI models have enabled human-like conversations through chatbots and virtual assistants, allowing us to engage in meaningful interactions with technology. They are not only proficient at answering questions but also providing personalised recommendations be it in content products or services. Additionally, advanced AI models including Generative Adversarial Networks(GANs) and RNNs have unlocked the creative potential of image generation, digital art, and photo enhancement.

LLMs are neural network models that can generate new text or other forms of sequential human-like data by analysing the patterns in the existing data. Examples of LLMs include OpenAi’s GPT4, Google PaLM, etc. These models are capable of producing impressive outputs, but they may generate output that is inaccurate, irrelevant, biassed, or harmful. This is termed hallucination.

The hallucinated content is often plausible-sounding, making it difficult for laymen to detect. Hallucinations can have negative consequences both on the users and the developers of the users as they can lead to confusion, misinformation, deception from reality, or manipulation.

Hallucinations spoil the accountability of the LLMs therefore their use in critical scenarios should be avoided such as legal and medical consulting.

A promising approach to rectify hallucination is self-correction, where the LLM itself is guided to fix the problems in its own output. Techniques leveraging automated feedback -either produced by the LLM itself or some external system are of particular interest as they are a promising way to make LLM-based solutions more practical and deployable with minimum human feedback.

Training-time correction refers to the techniques that use automated feedback to improve the LLM during the training process. The feedback can be derived from various sources such as external knowledge bases, human annotations, synthetic data, or self-generated data. The feedback can be used to optimise the LLM’s parameters or to argue the training data or constraints. Some examples of training-time correction methods are:

Knowledge-enhanced training- This method uses external knowledge sources, such as Wikipedia or the Internet to provide factual information for the LLM. The feedback can be used to verify the correctness of generated content or enrich the input with relevant facts.

Human-guided training- This method uses human annotations to provide quality feedback for the LLM. The feedback can be used as a reward signal or supervision signal to optimise the LLM’s performance.

Synthetic-data training- This method uses synthetic data to provide feedback to the LLM. The synthetic data can be generated by perturbing or modifying existing data or by using other models and tools. The synthetic data can be used to create negative examples or counterfactual examples to train the LLM to avoid errors or biases.

Generation-time correction refers to techniques that use automated feedback to guide the LLM during the generation process. The feedback can be derived from various sources, such as external matrics, external knowledge sources, self-evaluation, or other models or tools. The feedback can be used to steer the decoding process towards generating optimal outputs or to rerank or filter the candidate outputs based on some criteria. Some examples include:

Feedback-guided decoding- This method uses step-level feedback to offer fine-grained guidance over the generation process. The generation of output is broken down into multiple reasoning steps or thoughts and in each step the feedback indicates the quality of the candidate step. A search algorithm can be deployed for systemic exploration of the output space based on the feedback.

Generate-then-rank- This method uses output-level feedback to rerank or filter the candidate outputs generated by the LLM. The feedback can be based on some external matrices or criteria that measure the quality of the outputs.

Post-hoc correction refers to techniques that use automated feedback to revise the output after it has been generated by the LLM. The feedback can be derived from various sources such as self-evaluation, external tools, external knowledge sources, or other models. The feedback can be as detailed as a diagnostic report pinpointing exact error locations or as general suggestions for overall writing improvement. Some examples of post-hoc correction methods are;
Self-correction- This method uses LLM itself to generate feedback and refine its own output. The LLM was initially used to produce an initial output based on the feedback. This process can be iterative and continues until an output of acceptable quality is obtained.

Correction with external feedback- This method uses external tools or models to provide feedback and refine the output. The external tool or model can perform various tasks, such as grammar checking or sentimental analysis. The feedback can be used to identify and correct errors or flaws in the output.

Multi-agent debate- This method uses multiple tools to generate feedback and generate feedback and refine the output through the debate process. These models can have different perspectives or objectives and can challenge each other's output or reasoning. The debate process can help reveal errors or inconsistencies in the output and lead to better solutions.

Hallucination is a serious problem that affects the reliability and trustworthiness of LLMs. In this article, we have reviewed various techniques that use automated feedback to detect and prevent hallucinations in LLMs. We have categorised these techniques into 3 types: training-time correction, generation-time correction, and post hoc correction. We have also discussed the major applications and challenges of this strategy. I hope this article can provide a comprehensive overview of this emerging research area and inspire future work on improving LLMs with automated feedback.

Hallucinations in AI systems

Recent Posts