TopSoft SmartCapture 
is multi-purpose system designed from small to large paper documents’ flow processing. It allows to process documents of different type and any complexity from various sources. SmartCapture is ready to process and classify forms, invoices, agreements, contracts, letters and etc.
SmartCapture is a highly configurable, intelligent capture solution with a service-oriented architecture (SOA). This architecture allows the user to leverage on the existing capture services to provide a variety of workflows within the corporate enterprise to increase productivity and throughout. 

Extensible SmartCapture architecture allows processing different kinds of forms received from different inputs in the same project, using the same template, rules set and export. More over SmartCapture is used by several companies for joint processing of OCR forms and electronic forms with pre-printed 2D barcodes in the same streams.
Rules export system for SmartCapture is being designed, able to export required rules for usage in web server and client-side forms filling application (browsers, Adobe Acrobat Reader for example). This can be achieved pretty easy, since the SmartCapture uses the standard Jscript language for server rules definition, supported by most browsers and other form-filling applications. 
SmartCapture can be configured to plan and process unified form and rules designing system, able to automatically generate form/rules definitions and graphical representation for both paper (for OCR) and electronic forms, and export rules for the other proposed systems. The system will have the ability to modify different aspects (form image, fields position, etc) for specific form designs (electronic, paper, web), while preserving business logic, generic form rules, and ability to process all kind of forms in one stream.
SmartCapture can be configured to connect directly to some networked scanner devices; however, most network connected devices utilize a “hot folder” system where each user has a dedicated folder for their scanned images. The recognition software (SmartCapture) polls these folders and collects any images for processing. The processed images are then exported to a predetermined location.

SmartCapture contains a numbers of full function modules which can be separated into four main system components:

  1. Input;
  2. Processing;
  3. Administration;
  4. Output.

INPUT
SmartCapture capabilities allow a simultaneous documents’ processing of different types and received from various sources (simultaneous processing of different projects).Input subsystem can import paper documents into the system by means of scanning devices (scanners or Multi function devices) that support TWAIN scanning protocol. Images of documents can be manually added from a folder or automatically imported using Hot Folder settings. The supported input formats of images include PDF, BMP, PNG, JPEG and TIFF.
Import operations can be simplified and automated by a set of import profiles with pre-defined settings.
SmartCapture Input component includes following modules:
1) SmartScan (application to work with TWAIN scanners);SmartScan station provides the ability to scan documents and add already scanned images, performs preprocessing of images and the elimination of scan defects. Usage of SmartScan station also allows the user, in on-line mode, to set the required attributes for each document or batch of documents2) Hot folders (application to work with Multi function devices);3) FTP, email, web (TCP IP);4) SmartPage, PDF and other documents formats input.

PROCESSING
Processingsubsystem includes:

  • Recognition (full-text OCR, flexi-form, fixed-form ICR, barcodes, patchcodes and other separators).

Processing modules.

1) Machine processing (document forming, indexing, key fields search):

  • Fine-/Form- OCR station performs data extraction from all incoming into the system documents as well as data extraction from patch codes and bar codes.

2) Operator processing (character verification, error correction, quality control):

  • Verificationstation generally used for projects which require confirmation/editing of large set of the same type symbols (forms, questionnaires, documents which contain handprint symbols, etc.).
    • Group Verification allows uncertainly recognized symbols with the same values, extracted from different documents to be united into groups which can be easily and accurately confirmed by the user.
    • Context Verification provides the ability for the user to read/confirm uncertainly recognized symbols by viewing them in the context of the document.  This is commonly required when documents are of low quality (e.g., exposed areas of text or low quality of print).
  • Error Correction station. Rules checking is performed on all documents after the recognition process.  In case of a discrepancy during rules checking the document is presented to the operator on the Error Correction station. The operator any errors on the document and confirms/edits the uncertainly recognised symbols.  If the Verification station is used uncertainly recognized symbols will be required to be presented to the Error Correction station.
  • Quality Control station allows, in on-line mode, monitoring and work management of all stations, modules and system’s users.  The Quality Control station also allows access to all information about documents currently being processed.
  • Additional Control station – the main goal of additional Control station is to allow additional scrutiny, by a responsible operator, of documents which are considered important or sensitive (e.g., large payments, urgent documents, etc.);
  • External Modules Host is an additional module which gets documents from various sources, checks the logical control rules and export of documents. Usage of this module provides efficiency in case of different access parameters to the databases at data-centers and remote sub-units, and decreases the load on Administrator station.

ADMINISTRATION
  • Administrator station is a server and the main part of SmartCapture. This station performs;
    • Projects
    • User management
    • Documents forming
    • Rules checking
    • Forms exports results
    • Statistics reporting
  • Supervisor Module– allows, in on-line mode, the management of;
    • Users
    • Groups of users
    • Access rights for users or groups of users to project(s)
    • And provides the ability to implement a security policy.
  • Loggergenerates analytical reports and statistic data for given criteria including;
    • Number of processed documents
    • Number of edited fields and errors in each field and document
    • Display the number of documents which are processing by each user
    • Generation of statistics of each system’s module work.

Logger has the ability to display the documents lifecycle within system with information regarding corrected fields, information of operator who has made corrections, name of modified fields and the original and modified field values.


OUTPUT
  • External Modules Host is an additional module which gets documents from various sources, performs checking of the logical control rules and export of documents.
  • Built-in Export Module of Administrator station.

These modules allow;

  • export to almost any image formats;
  • export to many document formats (XML, PDF,  Word, Excel);
  • export to databases and document management systems;
  • сapable to export into WebShere MQ, SAP, Lotus Notes, FTP, E-mail services.

By combining different modules system administrator can quickly and easily create new services (projects), for example:

1. Searchable PDF conversion service:

  • Images and PDF input module
  • Full text recognition module
  • PDF export module.

2. Correspondence processing service:

  • TWAIN scanning module (station)
  • Flexi-templates recognition module
  • Rules checking and error correction module
  • XML export module
  • TIF export module. 

3. Forms processing service:

  • TWAIN scanning module
  • Forms
  • Recognition module
  • Verification module
  • Rules checking and Error correction module
  • Websphere MQ export module.

4. SmartPage service:

  • SmartPage client application
  • SmartPage interfacing module (with PDF import ability)
  • Full-text OCR module
  • PDF/Word/Excel export module.

5. Recognition Server service:

  • XML tickets input module
  • Full text recognition module
  • Verification module (optional)
  • Word/Excel/etc export module.

 

These same modules and processes allow for quick and easy implementation of any additional business workflows and processes. For the creation of a new Recognition Service it is usually enough to select some of available SmartCapture modules, combine/configure them in one project, and install it into the production system.  This can be done without stopping or interrupting other SmartCapture projects and services.

In addition it should be noted that every subsystem supports the additional custom modules written by TopSoft staff or customers themselves, using Open API, further extending SmartCapture possibilities.

Simultaneous processing of different documents’ types

SmartCapture can simultaneously process different types of documents within a single entry point. So, there is no need to presort documents – each type of forms is automatically processed within their rules. Singlepage and multipage, structured and unstructured documents or forms could be processed in the one stream.
 
Now the System processes more than 260 types of forms simultaneously in one stream in successfully implemented installations, and the number forms within this system is always increasing. To add a new form you may use the built-in templates editor tool allowing the system to meet the needs of the particular enterprise.  The built-in template editor is capable for fast and ease creation of a new form, it is enough to define the location and control rules to be implemented for recognizing fields on the form – the new form is then ready for processing. Adding of a new form doesn’t require the system reset and won’t interrupt processing of current documents.
 
So, the System’s built-in capabilities provide a centralized facility for all the paper based documents’ processing.

 

Scalability

The system is capable of adding additional stations at any time without interruption to the existing document processing. For example, this capability allows the scanning of documents of the one type in one organizational department, and then quickly and easily extends the system by adding new types of documents and attaching new departments. The system response and processing speed of the documents is defined only by the number of recognition stations in use. Simultaneous launching of the several processes provides efficient resource use of the multicore and multiprocessor systems. 
  SmartCapture automatically applies rules of data control during the document processing. The most-used types of controls include check on values and date, sum, etc. If necessary the system is capable of controlling the correctness of received data by the use of external rules developed according to the requirements of the customer. Information from attached dictionaries and reference books, databases, libraries and other external applications also could be used for checking procedure. The whole thing guarantees that the key information will be correct on 100%.

 

Ease of scanning

Paper based documents are able to get for processing to the System from any TWAIN-compatible scanning device. Meanwhile, the productivity of device is not limited – the System easily handles the image scanning and preprocessing on the scan rate of more than 100 pages/min. Scanning process is simple as possible – you just need to press the button “Scan” and the System will automatically perform scanning and sending documents on processing.SmartCapture supports server scan settings – system administrator can configure scanner a single time and all the SmartScan stations will receive scan settings. Besides the simplicity of configuration and management of all the company’s scanners, the server scan profiles helps to avoid the problems with incorrect scan settings specified by the operators, which can cause deterioration of documents processing.

 

Projects creation

SmartCapture includes all the capabilities required for creation of projects for document processing, starting with the definition of document sources and finishing by the export and interaction with external informational systems. For faster and easier creation of own projects there is a Project Creation Wizard.The built in template editor provides the capability of fast and easy creation of logical control rules for documents of any complexity.  The editor allows the connection of reference sources and allows the System to interact with any external software. The template editor provides determination of a document’s structure, rules of separation, recognition rules of labels and 2D barcodes, etc.It should be noticed that it is not necessary to stop documents processing or restart the processing system during all the stages of creation, debugging and launching of a project.

 

Documents receipt from various sources

SmartCapture has the capability of receiving documents from several sources. Due to the System’s flexibility, these sources could be as simple (files received from hard disk folders) as complex (after receiving a page from a web-site its information is analyzed and as a result only those pages which contain documents are required to be downloaded and processed).The most common usable source of images of documents is SmartScan. Besides the documents scanning it has built-in functions for image processing, quality improvement of images, scan defect elimination and also provides indication of the additional data to images and batches for the further processing.

Hard disk or network drive folders could be used as sources of images. The ability of receiving the images from folders allows SmartCapture to operate with MFD (multifunctional devices). For example, there is not only the capability of inspecting the documents’ occurrence in a folder in specified time intervals, but receiving the documents in case of adding the new files into a folder. Such interaction with file system increases the whole system’s performance. Also it is possible to specify a filter for the receiving files. For example, only the files with definite extension or the files with a specific name could be specified for the processing.

Documents could also be imported from many other sources such as FTP, http-sites, POP3- and Exchange-servers. The other sources of the documents’ import are external information systems, databases and non-typical sources via external libraries (dll). As a variant, documents could be sourced from SmartPage and SmartPDF.

 

Wide capabilities of export

SmartCapture is capable of any complicity exporting according to the company’s requirements. The System provides various export capabilities:
• The most commonly used images formats: TIFF, JPEG, PDF, PDF/A, BMP,   PNG;
Wide range of formats to save data: TXT, DBF, CSV, XML;
Export to databases and document management system;
It is possible to create exporter of any complexity using the built in script language
Capable to export into WebShere MQ, SAP, Lotus Notes, FTP, E-mail services.

For an each project there is a possibility to use several exporters at once. An export module may be created by the using of built-in tools that allow fast and ease configuration of the processing results for transfer to any external information systems.

For example, there is a project for incoming correspondence processing via Websphere MQ, it means that Websphere provides a command for SmartCapture to process received e-mailed document.  SmartCapture analyzes the command and in accordance with the given instructions performs the processing. The results are transferring by the MessageQueue in the XML-format.

2D barcodes – fast and accurate data input
Implementation of 2D barcode technology for documents processing allows input time for paper documents to be greatly reduced and minimizes errors.
During processing of documents with 2D barcode both barcode recognition and text information are performed on OCR stations. The technology of 2D barcodes recognition performed by SmartCapture is significantly different from other commercial solutions – during documents processing SmartCapture recognizes and compares both barcode data and the text data printed in the document. This processing method provides the maximum level of recognition.  This method virtually provides 100% accuracy.  This method also provides the ability to detect incorrect encryption of data into the barcode by a client.
Forming documents with 2D barcodes is able to be undertaken by the use of both a virtual printer, which is supplied in a kit for free, or by loading a form in PDF format from website and using Adobe Reader to create the 2D barcode.

 

Organization of data-centers for data processing
SmartCapture provides a capability to organize the unification of Administrator stations into one cluster. This scheme of SmartCapture organization allows automatically balancing of workloads among servers in the system, this increases the total productivity (due to the reduction of idle time). The server unification scheme greatly simplifies the administration of the system.Instead of managing several Administrator stations management of only one station is required. This increases the fault tolerance of the system.  Should one Administration station fail the documents will be automatically spread among the remaining working stations of the cluster. 

 

Logger – control of all processing documents in the system
 
Authorized operators, using the audit module, are able to create analytical reports and statistics according to specific criteria. For example, a report of processed documents, by number of edited fields or errors in each field and document, and to report the number of documents which were processed by each user. Logger is capable of creating statistics about each system’s module, provides information about document’s lifecycle in the system with imperative control of corrected fields, (information about user’s name that made changes in fields, names of modified fields and the original and edited field values).

 

Security Kit – complex of tools for informational security
 
The Security Kit tools provide the capability to decrease risks during documents processing, increasing effectiveness and security of processed information. The Security Kit includes:

  • Support for electronic digital signature, infrastructure of public keys which are corresponded to the standards; also successful systems work with usage the RSA algorithms.
  • Capabilities of secure receival, processing and exporting of information by the use of encryption.

System of privileges differentiation, template-role policy, interaction with LDAP.

For additional security advance there is Supervisor station. This station allows authorized employee, in-online mode, to manage users’ work in the system:

  • to create/remove users and groups of users;
  • to limit the access of users to the stations of the system;
  • to manage the access privileges to the SmartCapture projects, etc.

 

Server-mediator
 
Main function of server-mediator is management of document processing stream. For Example, using server-mediator it is possible to increase the priority of separate documents or forms stream. These documents/forms will then be processed prior to the processing of other less important forms. It is also possible to separate the priority of the scanning and processing of documents. This highlights the reliability and flexibility of system management.  The server-mediator is also capable of providing a reliable centralized storage of electronic documents in accordance with archive requirements for safety and protection of documents.
Server-mediator provides easy access via web-interface. By using server-mediator with Logger, authorized users can get images and detailed information about the processed document via a web interface. Document can be easily found via the hierarchy of documents or via a search method. Search results are displaying in the form of list with links on found documents. Search results can be saved for repeated usage.

 

 ExternalModulesHost
ExternalModulesHost (EMH) – a module, which links to the existing SmartCapture application, and is used to get documents from various sources, checking of logical rules and export of documents. Usage of this module greatly decreases the load on system’s server and local network of the company. EMH’s work is fully-automatic mode and does not require operator intervention.  This module is indispensable in case of different access parameters among departments or offices of the company.
For example, when performing processing on the same documents the first department uses information from database and the second one uses information from an external information system.

 

 E-documents processing
 
SmartCapture is capable to process documents received from different sources, but also receive and process various e-documents. It is significant that both paper and e-documents are processing within one project. It greatly decreases costs of organizations that use different systems of the data input. SmartCapture allows to a customer a possibility to organize the data input from any sources, but at the same not to bear additional costs in accordance with appearing changes in ratio of paper and e-documents incoming into the system.
For instance, one of the sources can be documents received by e-mail. SmartCapture will automatically analyze letters with attached graphical, textual or PDF-files, and will select and process only those that required by you. Each received file will be processed and data extracted according to definite rules. All extracted data will be checked according to referred business logic. Also, the electronic digital signature of the message will be checked.   

An alternative method of paper forms input is usage of the e-forms. SmartCapture greatly increases the process of e-forms input due to automatic extraction of the data from forms and assignation of the values to metadata. For Web-forms creation and publishing there is an application SmartDesigner that allows creating and locating own forms on the site. Also supports input of forms by using Adobe LiveCycle Forms (ex-Adobe Form Server). For instance, organization can locate forms on the web-site and clients can fill in the published forms in interactive mode and inputted data will be transmitted in the SmartCapture, or clients can save their forms locally, and then send by e-mail. The procedure of data import to these forms is obvious and does not require any special knowledge from your clients. For documents sending, client enters on the web-site and performs authentication basing on obtained certificate, that signed by the trust center of your organization, after that, client gets the access to those web-forms that you determined. Client works with the bank’s web-site through protected SSL-connection. This management of data input guaranties the identity of clients, also gives a guarantee that transmission data won’t be changed by third persons.  

 

  • works right out of the box;
  • superior recognition accuracy;
  • capability to process both single- and multi-page documents;
  • capability of finding data elsewhere, using any information available: relation to other objects on the page, contents of the field, its size, lines drawn around, etc.;
  • extended export settings (PDF, XML, DataBase, TXT, BMP, JPG, TIFF, etc.);
  • archiving/document management software with minimal operator’s intervention;
  • enhanced scan options

TopSoft SmartCapture 9.0  System requirements

Hardware requirements:

CPU: P4 2.0 GHz
RAM: 256 MB
Network adapter: 3Com 100Mbit
HDD: 40 GB 7200 rpm
SCSI2 Card or USB 2.0
TWAIN-compatible scanner

Software requirements:

Windows XP Service Pack 2 or Windows 2000 Service Pack 4
.NET Framework 3.5