Understanding natural language processing for business

By Daniela Miltner

Classification is an essential step in almost any kind of content management process. Even before the period of digital transformation, the concept of ‘in- formation overload’ was a reality. First coined by Professor Bertram Gross over 50 years ago, he defined it as being directly proportional to a reduction in decision quality. Now we are busier and faced with more information than before – in both our personal and professional lives. Text is all around us – in the form of PDFs, office documents, e-mails and much more.

Categories help the human brain to organise the world. Therefore, it makes sense that decision-makers in many organisations recognise that to manage this swell of information, they need to classify and order documents. However, few actually have the right processes and technology to do so.

Research from Gartner predicts that the role of natural-language processing will increase in-line with the growing trend of AI and machine learning. In order to leverage unstructured content and transform dark data into actionable information, a new approach to classification is required: one that harnesses machine learning, linguistic and semantic technologies, allowing us to master the growing amount of unstructured data.

Natural language processing

To action information, or use it effectively, it is helpful to understand the context and logical class of a document. To let information drive business decisions, the right person needs access to all relevant information when needed – starting from the arrival at the organisation up to retrieval of long-term archives.

Classification not only helps businesses manage the tidal wave of data but also generates business value, which should come as a welcome bonus to those weighing up investment in technology against the bottom line. Beneficially, this is true for any industry – from consumer-focused enterprises in retail or banking, to those organisations relying on search and discovery processes like the legal sector.

How does classification work

Manual classification is time-consuming, inaccurate and inconsistent; with quality often deteriorating as volumes increase and time pressures heighten. Rule-based, automated classification already supports enterprises by sorting and routing semi-structured documents. Structured documents, like a loan application form or an invoice, can be recognised by intelligent input management solutions and routed to the enterprise workflow. However, rules can quickly reach their limits when it comes to unstructured content and natural language texts.

Content in unstructured documents is unexpected and follows non-standard patterns. Oftentimes, different people use different terms, expressions and syntax to talk about the same thing, which adds to the level of complexity when it comes to managing and

converting these documents. As there is limited metadata accessible at best, the technology is unable able to draw meaning or context from the document. It becomes ‘dark data’, which is unsearchable and doesn’t provide any value to an organisation, whether it is business-critical or not.

Accurate classification of unstructured content has remained an exclusive topic of interest for technical experts re-adjusting working parameters. The new approach to classifying unstructured content uses statistics, linguistics and semantic technologies and combines them with tools that makes the setup of classification models easy to use for employees and processing experts.

By deploying machine learning, the most appropriate classification features can be selected. As with traditional rule-based systems, it is not necessary to specify rule sets or manually ‘train’ and tune models with huge quantities of document types.

Classification is an essential step in almost any kind of content management process. This includes the following:

• Content management: High-performance classification of unstructured content allows organisations to manage large repositories quickly, and enables knowledge workers to efficiently search and locate information critical to their work.

• Client support: Support is a crucial element of any customer-oriented business, where satisfaction and retention are key success drivers. Large companies with worldwide operations, a wide range of products and services, and millions of customers need daily feedback about what works, what doesn’t and where they could do better. Customer support services are the primary way of receiving that feedback. Fast and accurate classification of incoming complaints and requests is a critical first step towards delivering timely solutions to customers’ issues and driving higher levels of customer satisfaction. This is how customer support helps to deliver outstanding experience and increase the customer loyalty.

• Information governance: Granular text- and semantic-based classification enables organisations to keep up with security, compliance and records management requirements. This is especially important in Europe given the impending EU General Data Protection Regulation (GDPR) regulation, which will affect any organisation that processes personal data of individuals living within the EU. By setting up category-based document access rights, routing, archiving and search, organisations will support the aim of GDPR to protect all EU citizens from privacy and data breaches, adapting to the increasingly data-driven world that we live in today.

• Data migration: Mergers, reorganisations or even just bringing new IT systems online require fast data migration. This comes with the added challenge of keeping it protected and controlled, and avoiding the pitfalls of dark data, which can be useful for compliance, but storing and securing data

typically incurs more expense (and sometimes greater risk) than value. These hurdles can be overcome by setting up flexible content-aware rules to filter content repositories during data migration projects.

• E-mail management: Organising e-mails manually is painful, but missing business-critical messages from customers or suppliers is even more so. Metadata (such as ‘to’, ‘from’) is rarely good enough. Using both metadata and content, new semantic-based classification automatically distinguishes the wheat from the chaff.

The classification of unstructured information assets is critical in supporting business objectives and driving value to the enterprise. It is this ‘intelligent classification’ of information that should absolutely be a key consideration for decision-makers. Simply owning big data is secondary to having access to the critical data that will accelerate an individual’s time from discovery to decision within their documents.

Daniela Miltner is product marketing manager at ABBYY. She manages the product lifecycle and go-to-market strategy for of ABBYY’s Compreno and intelligent data capture technologies and solutions