CCADC utilizes two primary forms of document classification technology: Text-based and Image-Based Classification. Each method of our automatic document classification system has its own distinct advantages depending on the application requirements that are desired to be achieved by providing our customers with a high degree of configuration flexibility.
Text Based Classification allows for keyword(s) to be detected in a form in order to determine its appropriate document type. The keyword(s) can be defined as required to match expected text precisely or can be configured to be required to meet the sampling with a "degree of confidence."
Image-Based Classification allows for a batch sampling of images that may slightly vary yet be grouped together, and are subsequently assigned to a document type or document class. What is the benefit of having multiple samplings added to a single document type or class with our data capture system? Adding multiple sample images to the same class can increase the overall confidence of a recognition process. For example, such an approach allows you to "merge" 2 very similar templates in one such as an invoice that includes "VAT" and another without "VAT."
These aggregated samplings will be used to compare any new image introduced to the system against the library of known samples - allowing the CCADC module to automatically assign the proper document type to the incoming image(s). The workflow automation samples may be automatically added to the Classification engine, so the CCADC engine continues to learn and refine its confidence levels as new form variations are introduced into the CCADC library.