Lexalytics Announces OCR Error Correction Tool

Lexalytics has launchedy a patent-pending error correction tool for text data from optical character recognition (OCR) systems.

Built in partnership with one of the world’s leading RPA vendors and leveraging Lexalytics’ natural language processing (NLP) platform and proprietary machine learning tools, the company’s OCR error correction system can automatically detect and rectify common mistakes made by OCR, driving word error rates to less than one percent.

 This improves the reliability and utility of analysis performed on OCR data down the line and lowers non-compliance risk for the firms that use these tools.

A great deal of business-critical information is contained in images of physical text, such as scanned paper documents or smartphone snapshots of invoices, contracts, newsprint, applications, bills and loans, among other materials.

OCR software converts these images of text into electronic text, making it available for computers to “read” for all of the processing tasks that modern enterprises conduct. However, OCR software often misrecognizes characters and words, which can lead to costly downstream application problems requiring time-intensive, manual correction. 

Lexalytics’ patent-pending OCR error correction solution combines pixel position analysis for character errors, along with specialized dictionaries built into Lexalytics’ Salience text analytics engine to choose the most likely correction.

The next stage of development will add contextual language models and machine learning techniques to further improve accuracy.

“While OCR is a rapidly growing market, driven by demand in the banking, insurance and financial services sectors, word-level accuracy errors create major problems for end users and represent a major challenge,” said Jeff Catlin, CEO of Lexalytics.

“We’re excited to bring a fresh approach to the problem and proud to achieve such great accuracy numbers in the tests we’ve run.”

The OCR correction module is available as an add-on component to the Lexalytics’ core NLP Salience engine.

http://success.lexalytics.com/ocr