Skip to content

What is Optical Character Recognition and How Does it Work?

What is Optical Character Recognition and How Does it Work?

When organizations move away from paper processes and introduce digital records management and workflow solutions, a new processing challenge can arise - the information on the paper or now digital documents needs to be inputted into additional systems in order to create retrievable records, or to drive further actions and workflow steps.

For example, when scanning paper employee files or patient records to introduce a digital archive, key pieces of information from those paper documents need to be attached to the digital versions so that the files are searchable. These searchable fields may include things such as first name / last name, date of birth, employee / patient number, and so on.

From a financial systems perspective, data such as invoice number, total amount, vendor name, PO number, and so on, need to be collected from each invoice in order to properly trigger approvals and payments.

The challenge is how to extract this information from the records.

Relying on manual data entry where individuals re-type the details from a record into the appropriate systems is inefficient and results in errors and inaccuracies. Instead, many companies turn to Optical Character Recognition (“OCR”) technologies for data importing.

So, what is OCR and how does it work?

In a very basic sense, OCR software takes a digital document and separates the light background from the dark areas that are identified as the text. Algorithms are leveraged to analyze the dark areas in order to recognize patterns or features in the pixels to identify text characters.

Not only can OCR pick up the critical pieces of information off each document as described in the examples above, OCR technology can identify text on a record to make an entire document searchable (allowing users to simply “control F” to find a specific date, or name within a PDF file, for example). This can make finding information in a digital format significantly more efficient than flipping through paper records.

By utilizing OCR technologies to pick up text from a document, organizations can reduce the manual data entry required, however, it is unfortunately not all that is required in order to obtain a truly touch-less and reliable process.  

As OCR technology has now been in use for decades, its sophistication has advanced where it can pick up handwriting (albeit with less accuracy than machine printed text) and differentiate between small mistaken marks on a page and intentional writing. However, the quality of the document will impact the accuracy of the text picked up via OCR. For example, if a document originated as a paper record with handwritten text that was subsequently scanned multiple times such that the contrast is lessened and difficult to read, OCR may have difficulty accurately identifying the text. By contrast, if the original document was created digitally and machine printed, the accuracy of OCR could be 95% accurate or even greater.

Even if the documents being processed can reach 99% accuracy with OCR software, the 1% of inaccuracies could be critical. For example, if there are 10 data fields off every invoice that are extracted, a 1% error rate would mean that 1 field on every invoice may be incorrect, which could be the vendor ID or total invoice amount. Consider an organization relying on this data to trigger payments – a 1% error rate in these circumstances would be unacceptable.  

While OCR technology is a critical piece of many digital processes, additional measures are required in order to achieve 99.99% data accuracy. Leveraging AI and Machine Learning technology, automated Master Data validations and in some circumstances “human-in-the-loop” quality assurance in addition to OCR technologies allows a service provider, such as Octacom, to provide clients with reliable and accurate data without manual intervention or correction required.


Octacom is a SOC 2, Type II Audited enterprise software and services company focused on document and data automation solutions, including automated data capture. Founded in 1976, Octacom specializes in accounts payable automation and automated invoice processing, among other digital / automated business process outsourcing services. 

If your organization is looking to learn more about our solutions and services, please contact us and we would be glad to help.

 

Stay Connected!

Sign up to receive updates about Octacom