Organizations offering services online face many challenges related to the volume and quality of data being generated each day – and document-related workflows, where a customer needs to upload images or files to complete a transaction, are among the most challenging things to get right. Any time a document is used to determine identity or eligibility, the quality and accuracy of the uploaded document has to be unimpeachable; this is no small feat when you depend on end users to deliver a perfect upload every time. Many organizations try to solve this problem by creating labor-intensive, manual review processes that add processing time and degrade the customer experience. But with AI, there’s a better way.

At Nuvalence, we’ve recently helped a large government agency transform its business by moving many traditionally in-person transactions online. The agency receives over 25,000 online customer applications each month, resulting in over 150,000 uploaded documents such as application forms, passports, or bank statements. To manage this with the same level of service and accuracy as an in-person transaction, they have had to dedicate a large team to visually inspecting and assessing each document after submission. What this agency could use is a pair of eyes on document quality during the online intake process – a virtual referee that makes immediate, consistent, and unbiased calls to help the applicant get their upload right the first time.

Nuvalence’s Data-Driven Approach

Quality review has helped the agency realize that the convenience of self-service online transactions can come at a cost. Approximately 11% of customer documents are identified as incorrect by agency staff, and 29% of all applications submitted have at least one incorrect document. What follows is a complex, high-touch process that averages two business days per correction request, with 10% of applications requiring multiple requests and, in the worst-case scenario, a 5% abandonment rate. Overall, these correction requests are operationally inefficient and contribute to a negative customer experience.

With over 16,000 documents being flagged as incorrect after manual review each month, we dug into the data to learn more. Our analysis showed that three use cases were responsible for the majority of corrections:

  1. Quality (28%)
  2. Uploaded documents that were of poor quality and simply could not be inspected by the staff. 

  3. Classification (33%)
  4. The wrong document type was submitted, such as an unexpected version of a form.

  5. Validation (24%)
  6. The document didn’t meet all of the agency’s specific requirements. For example, an expired passport or license.

To solve these problems, we knew we needed a solution to help automate the agency’s document review process. This would require a tool that could understand and evaluate these unstructured documents. For this, we turned to a leading solution in this space, Google Cloud’s Document AI (DocAI).

We challenged a couple of Nuvalence’s experienced backend engineers to see if DocAI could be used to proactively identify these document issues within the digital customer journey before human review. Here’s what we found.

Use Case 1: Quality (28%)

Objective

Leverage DocAI to proactively identify document quality issues that cause 28% of correction requests today.

Key Results

  • Able to judge document quality with 99.3% accuracy.
  • Estimated reduction of 3,000 document correction requests per month.
  • Estimated accuracy improvement, reducing 4,500 poor-quality documents being accepted each month.

When it came to quality, our goal was to determine if the submitted document was legible, so that it could be effectively verified. Today, 28% of documents rejected by the agency’s manual review are due to quality issues. Because many customers submit pictures of their documents via mobile devices, these poor-quality documents were often out of focus, had glare that made important information illegible, or were even completely blank images.

Now, we wanted to see if AI could detect these same problems in real-time. To test this out, we took a sample set of approximately 10,000 documents that had been submitted online. Of those, approximately 3% had been rejected due to quality issues during visual inspection by agency staff.

For quality processing, we were able to leverage a general DocAI processor that provided us with document quality scores (0-100%). These scores represented DocAI’s ability to read the text of the document with optical character recognition (OCR), with no training required. Within a matter of hours, we were able to start processing our documents and recording quality scores. After a bit of tuning with our thresholds, we found that any score above 1% should be fully legible by humans. 

Example: Poor quality document with a quality score of 0.6%
Example: Poor-quality document with a quality score of 0.6%

Using this value, we were able to programmatically determine if documents were legible or not, with an impressive 99.3% accuracy. Furthermore, we were able to make these determinations in just a few seconds, meaning we could provide real-time feedback to applicants as they upload documents during their digital journey.

With the ability to analyze the quality of documents in real-time, we can prompt customers to make corrections immediately, preventing many of these poor-quality documents from ever reaching agency staff. Based on the current volume of poor-quality documents, our DocAI accuracy results, and the estimated number of customers who will correct their uploads before submitting, we predict this process alone could reduce approximately 3,000 document correction requests per month. This means more efficient staff, a better customer experience, and faster overall processing times.

While the accuracy was beyond our initial expectations, we did expect to be successful in judging document quality, and knew this could result in enormous efficiencies for the agency staff. In this case, what was somewhat surprising to us was that the AI could improve upon the accuracy of human reviews. While agency staff may get fatigued, make poor judgment calls, and mistakenly approve poor-quality documents, AI can consistently make accurate, unbiased decisions and do it at scale.

During our analysis, we manually inspected the documents where DocAI and the agency’s review disagreed on the quality of a document. What we found was that 81% of the time, it was DocAI that was correct. Thousands of documents were being approved each month that, upon closer inspection, could not be accurately verified due to poor quality. This meant that in addition to efficiency gains, the accuracy of the review process could be significantly improved. By preventing many of these documents from being submitted, and by bringing extra attention to them during the review process, we estimated up to 4,500 fewer poor-quality documents would be accepted each month with our DocAI analysis in place.

Use Case 2: Classification (33%)

Objective

Reduce the 33% of document correction requests that are caused by incorrect documents being submitted, with a custom DocAI model to classify documents and provide real-time feedback.

Key Results

  • Uptraining boosted the accuracy of our initial classification model from 95% to 98%.
  • Estimated reduction of 3,500 document correction requests per month.
  • Estimated accuracy improvement, reducing 8,000 incorrect documents accepted each month.

For our classification use case, we needed to understand whether DocAI could recognize scenarios where a customer uploaded the wrong document. For example, if the application required the customer to upload their U.S. driver license, but they uploaded a municipal ID card instead, processing would be placed on hold until the customer uploaded the correct document. This is a common problem, with the agency currently initiating over 5,000 of these correction requests each month. 

Since DocAI does not offer a general, pretrained processor for classifying documents, we quickly realized that we would need to build and train a custom machine learning (ML) model. To do this, we turned to the Custom Document Classifier (CDC) processor, a new model type offered with Google Cloud’s Document AI Workbench

Here’s how we approached it.

  1. Identify our sample set: We used 4,000 documents across 10 of the most common document types for the agency.
  2. Identify training documents: In order to train the model to identify our documents, we identified 50 high-quality documents of each type. We also included 50 other documents that would be used to train as our “unknown” type.
  3. Create our model and label our documents: After creating a new model, we added our training documents, labeling each with the correct, expected type.
  4. Train our model: After completing the labeling, this was as simple as kicking off the training and waiting for it to complete.
  5. Test our model: With our trained model, we processed the rest of our sample set, recording the document type our model assigned.

Our initial model yielded promising results, with approximately 95% of documents in our training set correctly classified, and an average evaluation time of less than 5 seconds. As expected, fixed layout documents (e.g. passports, ID cards) did better than flexible layout documents (e.g. bank statements, utility bills).

We also found patterns with some of the misclassified documents. For example, one of the forms that we classified was offered in both English and Spanish. Because the English-language version appeared more frequently in the training set, it was classified correctly more often, while the less-frequently appearing Spanish-language version was often classified as “unknown.” This insight opened up an opportunity to take advantage of the adaptability of AI and machine learning, by “uptraining” our model to learn about these different variations.

To uptrain, we simply identified an additional training set of documents that we wanted to use to correct our model. In this case, we found 10-15 high-quality examples of the Spanish-language form, labeled them correctly, and added those to our existing training set. We then retrained the model and reprocessed our sample set. The result? We quickly saw our accuracy for this specific form improve from 95% to over 99%. We repeated this process three additional times for different document types and other common misses. Each retraining process took us only a few hours to complete, and in the end, our overall document classification accuracy improved from 95% to 98%. 

It’s important to note that for this initial proof of concept, we had a bias for speed and used only 50 training documents of each type. DocAI recommends up to 400 training documents for optimal results, which we believe could achieve even higher accuracy.

Relationship between training set size and F1 score (model accuracy)

Based on our accuracy results, the projected ROI for the use case was also significant. With our custom model providing real-time feedback to customers, we estimated that we could reduce correction requests due to incorrect documents by over 3,500 per month. 

We also saw additional opportunities to help agency staff improve their document classification decisions, by integrating the DocAI classification model into their back-office review workflow as well. We believe surfacing these DocAI insights can help prevent an additional 8,000 incorrect documents from being approved each month. By ensuring that documents are correctly classified for each application, the agency will be able to improve overall acceptance rates.

Use Case 3: Validation (24%)

Objective

Use DocAI data extraction to validate submitted documents against agency business rules, which cause 24% of document correction requests today.

Key Results

  • Passport data was extracted at 97% accuracy, with 6% of documents identified as having a potential business rule violation.
  • Bank Statement data was extracted at 90% accuracy, with 14% of documents identified as having a potential business rule violation.
  • Data from a complex agency form was extracted at 83% accuracy, with 15% of documents identified as being incomplete.

We saved our most complicated use case for last. Validation is a complex process requiring the 1) extraction of data from the document itself and 2) comparison of data against agency business rules. There are over 50 document types, each with its own configured business rules. We chose a sample set of 3 common documents of varying complexity:

  1. U.S. passports
  2. Bank statements
  3. A custom agency form

First, we looked at U.S. passports, which were relatively easy to process. All passports have a fixed layout, and DocAI offers a pretrained parser for passports. We were able to quickly process approximately 300 passport documents. We compared the extracted data for select fields (like First Name, Last Name, Date of Birth, and Expiration Date) to the actual data in the document. As expected, accuracy of the extracted text was the highest for this document. Over 97% of passport data was an exact match.

Next, we looked at a slightly more complicated scenario: bank statements. Unlike passports, these documents came in a variety of formats with a flexible layout. Luckily, DocAI also had a pretrained parser for bank statements. Out of 300 sample documents, we saw slightly lower accuracy, with about 90% of bank statement data matching the actual values. 

The final sample document was the most challenging to implement: a custom, agency-produced form that does not have a pretrained parser in DocAI. Not only did we need to build and train our own Custom Document Extractor (CDE) model for this form, but we needed to accommodate both typed and handwritten text in the fixed layout of the document. The form also had a compressed, dense layout that we suspected would affect the accuracy of the OCR.

To train a custom model, we loaded our training documents and labeled each field that contained data we wanted to extract. Since this was a dense form with lots of collected information, we identified 19 fields to label. Multiply that by about 100 documents in our training set, and we became intimately familiar with these forms and the DocAI Workbench tool we used to do the training.

The complexities of this document meant that training would be less straightforward. However, we were able to take advantage of the adaptability of AI and machine learning by iterating with our model. In areas where our initial models struggled, we could go back and “uptrain” with additional labeled documents to improve our accuracy.

Our results for the form were slightly mixed, but still successful, with an overall accuracy of 83% of data extracted perfectly. Some basic fields, like Address, saw 90-95% accuracy. As expected, other fields were more difficult to extract with high confidence. For example, a compressed form layout and variations in how users entered text made extracting the Date of Birth more challenging.

Labels, digit separators, and drop-down arrows in the Date of Birth field made it difficult to accurately extract just the user-entered text.
Labels, digit separators, and drop-down arrows in the Date of Birth field made it difficult to accurately extract just the user-entered text.

Once we successfully processed each document, the next step was to validate it against the agency’s business rules. Since each document has different requirements, we had to model each of these rules and then evaluate them against the data we extracted from the forms. We found that ~6% of passports had a violation due to expiry or mismatched name compared to the application. About 5% of bank statements were invalid due to not meeting the date or address criteria, and another 9% did not match the name of the customer on the application. For the form, we checked that the required fields were provided, which a surprising 15% of forms did not meet. For these 3 documents alone, proactively identifying these violations could prevent over 5,000 invalid documents from being submitted per month.

Conclusion

With the rapid advancement of AI/ML technology over the past few years, tools like Google Cloud’s DocAI can now offer massive business value to organizations struggling with manual document processing. For our client, we estimate these tools could reduce their document corrections requests by up to 50% while at the same time improving the overall accuracy of their reviews. That means:

  • More efficient staff, with fewer applications and documents to review each day
  • Faster overall processing time by reducing additional cycles to correct documents
  • Fewer abandoned transactions as a result of a correction request
  • Better customer experience, with improved real-time feedback for users and more applications being approved on the first submission

While we do not see AI completely eliminating the need for human review, the potential efficiency and accuracy gains were enough to have our client asking if we could turn it on for them tomorrow. Based on our experience with these AI tools, we expect to be able to have the first use cases live in a matter of weeks, processing thousands of documents per day. From there, we expect the other two use cases to be fast-follows, with additional ideas and iterations already being discussed.

An Adaptable Approach

While our 3 use cases were the highest value for our client, there are virtually limitless applications for a tool like DocAI when it comes to processing unstructured documents. We’ve shown you how AI can help government agencies with digital transformation. But this concept has applications for any business with a document-related workflow, like:

  • Reducing delays in medical treatment by pre-analyzing and routing paperwork between providers, insurance companies, and employers.
  • Improving fraud detection in benefit claims by analyzing key data elements extracted from documents and identifying patterns, such as reused account numbers. 
  • Expediting employee reimbursements by parsing data on uploaded receipts to create auto-generated expense reports.
  • Improved overall acceptance rate of applications due to fewer errors by staff during manual review

If you’d like to learn more about Google Cloud’s Document AI (DocAI), check out the Document AI website.

In our Practical Guide, we share insights and techniques that you can use to plan a Minimum Viable Product (MVP) of your AI Document Processing System (DPS).

GET THE GUIDE

Editor’s Note: This post was originally published in March 2023 and has been updated to reflect our usage of Google’s new Custom Document Classifier (CDC) processor.