Can ChatGPT and other large language models streamline records management?

By Jason Franks, RecordPoint

In the few months since OpenAI’s ChatGPT language model has roused public attention, there has been enormous interest and speculation about what they can do and how valuable they might become. In the records domain, where practitioners have long been overwhelmed by the exponentially growing volumes of born-digital data that they are expected to manage, these large language models (LLMs) could become a real game changer.  

LLMs are astonishingly good at generating text. They can hold a conversation, write an essay, answer questions, or summarize text in coherent, grammatically correct English, or in many other languages. It’s convincing enough that one of Google’s own engineers came to believe that LaMDA, one of their own language models, had become sentient.

The hype cycle has raised some unrealistic ideas about what LLMs can do, but vendors have done a poor job of communicating what they cannot. As we have quickly seen, LLMs are prone to bias, hallucination, or to outright lying. The popular technology review website CNET has been forced to retract articles written by AI because they are factually inaccurate. Meta’s Galactica--an LLM similar to ChatGPT--had to be taken offline after only three days because it was spouting racist content.

There is a growing body of skeptics who maintain that they are only useful for generating throw-away SEO content or for cheating on homework. But there is also rising concern that these LLMs can be used to automate criminal activities to an industrial level, generating disinformation, drowning out discourse, or even tricking people into enabling cyber-crimes.

The truth lies somewhere in the middle. LMMs have capabilities and they could be transformative for the management of records and archives, but only if they can be employed in ways that are credible and accurate. This requires a basic understanding of the way LLMs operate.

How Large Language Models work

Language models are built on neural networks--a type of machine learning algorithm that crudely approximates the workings of the human brain, in which input data is passed through layers of heavily inter-connected ‘neurons’ until they reach an output layer. The ‘knowledge’ of a neural network is entirely in the connections between neurons, which are weighted to transform the input data into the desired output as it flows between them.  

The training process for a neural network is the same as for any machine learning algorithm. Example data is fed through the network and the weights of the connections are adjusted using a mathematical process that minimizes error.  

Machine learning training follows one of two modes: supervised and unsupervised learning. In supervised learning, we are training the network to perform a certain task. The training data includes the results we want to output, and we can train the network to do very specific tasks. During unsupervised learning, on the other hand, we have no target output. The training process allows the neural network to discover statistical patterns and relationships in the input data that are too difficult to find using declarative logic.

LLMs employ what is called semi-supervised learning. They are pre-trained in an unsupervised mode by feeding them massive quantities of text data, which allows them to encode statistical information about language: which tokens (words or parts of words) are likely to be followed or preceded by which other tokens. The pre-trained models can then be ‘fine-tuned’ in a supervised mode to perform tasks where we do have a target output, such as classification or answering questions. This process is called transfer learning: the knowledge gained by unsupervised learning can then be transferred to specific tasks.  

When an LLM is asked to generate text, it takes the input data and figures out how to transform it, using an incredibly complex statistical process, into the output it thinks you want. This could be a summary of an article, the answer to a question, an essay, a movie script, an entity recognition task, or a sonnet. Or, as author Cory Doctorow puts it--LLMs are a very powerful form of the autocomplete functionality we see in smartphones and search engines.  

What language models cannot do

LLMs don’t actually know anything. They write coherently and convincingly, but they are not databases or search engines. LLMs are very good and manipulating language, but they cannot be trusted to deliver factual data, much less to make decisions. At best they will regurgitate unverified facts from the data used to train them. At worst, they will synthesize convincing sounding lies. The output is too coherent to be easily discounted as nonsense.  

Not only are LLMs untrustworthy with facts, but they cannot actually reason. They can’t process logic; they can only give the semblance of it by parroting the forms they have seen humans employ to argue. LLMs cannot add; they only recognize that when humans write “1+1”, the answer is usually “2”.  

LLMs will never be able to do your book-keeping, plan your finances, or manage the data flowing through your network. All they can do is sit there waiting for you to give them a string of words--and give you the answer they think you want to hear.

How can language models be useful and trustworthy for records managers?  

LLMs are not a good source of information, reasoning, or calculation, but they are powerful tools for manipulating text, and this is where we find applications where they can be invaluable. The common theme for these scenarios is that we must provide trusted data as input to the LLM to process, rather than relying on whatever questionable information has been used to teach it how language behaves.

Text Classification

LLMs can be trained using your own classified records, to categorize records to a disposal schedule, or to any other taxonomy you might require. This LLMs provide superior skill at these text classification tasks, outperforming older statistical models and non-pretrained neural networks by some margin.  

Because the LLMs will be fine-tuned using trusted data, we find they are an accurate and reliable way to classify records that cannot be classified using their metadata.

Text Summarization

LLMs provide a powerful technology for rendering digests of record text, which can be used to help record managers more quickly understand what vast quantities or records are about without needing to open every one of them. This will be invaluable for identifying records that are relevant to FOI requests or other investigative tasks where record managers and archivists need to find which records are relevant or interesting. It may also help expedite the disposal process, giving record managers a quick way to spot check records before they are approved for destruction.

LLMs are trustworthy on these tasks because we have provided them with text to process as input.

Question Answering

LLMs can synthesize answers to sophisticated questions about the text in a record or corpus of records yielding robust results. The power of this application is that it is free form. Where text summarization will try to extract the most important information in a general sense, question-answering allows users to make targeted and specific queries against large amounts of text content. LLMs have the capability to interpret those questions contextually when seeking answers, considering synonyms and phrasing across paragraphs that goes well beyond the capabilities of common text search technology.  

Because the questions are directed at the text of the records, LLMs are generally truthful on this task. ChatGPT is likely to tell you there is no relevant information if it can’t find any. But of course, it’s always worth fact checking the answers.  

Entity Recognition

LLMs are very good at entity recognition tasks, identifying people, places, organizations, and other named objects referenced within a body of text. While there are other technologies that can do these tasks well, in many cases we do not wish to index this information due to the risk of exposing personal information. An LLM like ChatGPT can give us an easy way to discover this information on-the-fly, once we have identified relevant records.

Sentiment Analysis  

ChatGPT and similar LLMs are also skilled at sentiment analysis--detecting the sentiment, or tone, of a document. This type of application is commonly used in the analysis of informal communications data: social media, email, and chat messages. With the growing volume of these types of records under management, this is likely to be of increasing interest to the records and archives community. One use case might be tracing social media posts generated by a disinformation campaign targeted at a political candidate or a public health issue. Identifying which messages promote positive or negative sentiment can help to aggregate these messages and attribute them to a bot farm or a bad actor.

In Summary

While LLMs are not useful or trustworthy replacements for search engines or suitable for broad analysis tasks, they have excellent applications beyond generating throwaway content.  

LLMs can help us to refine records into categories for disposal sentencing. They can be leveraged to enrich record metadata by mining a record’s content for sentiment, named entities and other properties. They can summarize records, allowing record managers to more easily grasp their content, and they can allow record managers to assess text for specific answers. LLMs can help us enrich and refine records so they are easier to search, discover, aggregate, or interrogate, either for routine compliance purposes or to assist in forensic activities.  

Language models like ChatGPT offer powerful new ways of understanding and interacting with digital records. Once they hype has died off this technology is likely to become an intrinsic part of our computing technology, unnoticed and unremarked upon, in much the same way that autocomplete is now something we take for granted. Here at RecordPoint, we'll be taking a close look to see how the technology may enhance our products and deliver better outcomes for customers.

Jason Franks is Engineering Team Lead and Data Scientist at RecordPoint. Originally published HERE