Software using Vividata's older installer was installed by default in /usr/vividata on Linux and /opt/Vividata on Solaris. The new installer used for the API always installs in /opt/Vividata by default.
You may control where the API installer places the API by setting the environment variable VV_HOME to the desired directory prior to running the installer.
If you use some of Vividata's older software, you may already have the VV_HOME environment variable set in your environment. In this case, you should be careful when you install the API and later when you start the ocrxtrdaemon. If VV_HOME is inconsistent, then you will receive an error. To fix it, make sure VV_HOME matches the API installation directory when you start the ocrxtrdaemon.
Various levels of log output may be generated on both the client and daemon sides as output to stdout and stderr.
To generate log output from your client program, call the function vvLogSetLevel (see an example in vvxtrSample.cc):
void vvLogSetLevel( int logLevel );
The log level should be set anywhere from 1 to 1000, where the higher the number, the more output will be generated:
By default, only error messages are printed.
Similarly, the OCR Shop XTR/API daemon (ocrxtrdaemon) can generate log output to stdout and stderr. Control the verbosity of the daemon log output by setting the environment variable VV_DEBUG to the desired log level before starting the ocrxtrdaemon process. For example, set VV_DEBUG to 1000 for the maximum debugging output.
For both the client and the daemon, you can save the log output to a file by piping it from the command line:
clientProgram >& client.log
ocrxtrdaemon >& daemon.log
To make sure all temp files are placed in the directory of your choice, set these environment variables:
to the directory where you wish to store the temp files. Make sure that you do this in the shell where you run the daemon program ocrxtrdaemon.
The OCR Shop XTR/API itself operates as a client/server system, where your application links statically with the provided communicator library so that it may communicate dynamically with the daemon process. The main daemon process handles communication and can create multiple instances of the OCR engine, serving one or more client programs. As a result of this configuration, you have three basic options for using the OCR Shop XTR/API in your own client/server environment:
Note that scenarios 1 and 3 could require significantly more network traffic in order to transfer the image data back and forth between client and server machines.
In order to load image data from memory, you must pass a vvxtrImage structure containing the image data to vvEngAPI::vvReadImageData(const struct vvxtrImage * img).
A sample program is available that demonstrates this functionality: vvxtrSample2.cc
To compile this sample program, save the source code for vvxtrSample2.cc as a file called "vvxtrSample.cc". Then you can compile it with the same GNUmakefile and supporting files as distributed with the OCR Shop XTR/API, and found in /opt/Vividata/src after your installation.
Typeset, high-quality printed pages return the best recognition accuracy. The following factors most affect text-recognition accuracy:
No single combination of preprocessing settings and recognition parameters always results in the quickest, most accurate recognition job. However, if you use the settings most appropriate to each document, OCR Shop XTR™/API's speed and accuracy will be maximized.
OCR Shop XTR™/API may recognize some line-art graphics or areas of photographic regions as text if the artwork is poor and the lines resemble letter strokes. Adjusting the dm_black_threshold parameter may change how the OCR Engine differentiates between photographic regions and text regions. Individual regions can also be manually specified as graphical or textual content.
OCR Shop XTR™/API recognizes characters in almost any font in sizes from 5 to 72 points. The engine interprets font size based on the image's dpi, so set the input image dpi carefully in order to guarantee the image's fonts fall within the recognized sizes.
Following certain guidelines may improve recognition accuracy:
If you have control over the scanning process, you can improve recognition by eliminating skew and background noise. Some paper is so thin that the scanner reads text printed on the back side of the scanned page. Put a black piece of paper between the sheet and the lid of the scanner. By eliminating any need for the OCR Engine to deskew an image, recognition processing speed will improve.
Here are several items to consider which affect how fast the OCR Shop XTR/API processes your images:
vvERROR(xtrEngine->vvSetHint(vvHintLocalFilesystem));
See vvEngAPI::vvSetHint and vvHintLocalFilesystem.
dm_pp_auto_orient. If you know the quality of your images is high, turn off degraded image processing dm_pp_autosetdegrade. If you are certain you do not want to use the fax filter, make sure it is off and not set to automatic (dm_pp_fax_filter. See dm_pp_remove_halftone and the other preprocessing options.
dm_pp_deskew
When processing multiple input files, the sequence of operations is, for example:
Note on vvxtrCreateRemoteEngine:
vvEngAPI::vvxtrCreateRemoteEngine is different from vvEngAPI::vvInitInstance, vvEngAPI::vvRecognize, etc. because it is actually starting a new engine. It tells the ocrxtrdaemon to fork a new process that becomes the new engine. The "action" functions such as vvEngAPI::vvInitInstance and vvEngAPI::vvRecognize work within that engine and cause the engine to change state.
You can call vvEngAPI::vvxtrCreateRemoteEngine multiple times to create multiple engines that all run concurrently. Each engine has its own state and is used individually. A call to vvEngAPI::vvStartDoc, for example, is made to one specific engine.
When to set options:
Options for preprocessing and recognition do not have to be set just before the vvEngAPI::vvPreprocess and vvEngAPI::vvRecognize calls. After being set, the values are retained in the engine until the ocr session is ended or vvEngAPI::vvInitValues is called.
Reading image data from memory:
If you are reading your image data from memory, then you do not need to call vvEngAPI::vvOpenImageFile or vvEngAPI::vvCloseImageFile. You just need to call vvEngAPI::vvReadImageData and vvEngAPI::vvUnloadImage.
Ordering of output actions versus input actions:
The sequence of actions used to start, write and close the output document do not have to take place in the exact order above -- the output document is flexible with respect the input document. Recognition must take place before vvEngAPI::vvSpoolDoc may be called, but otherwise vvEngAPI::vvStartDoc can be called any time between vvEngAPI::vvStartOCRSes and vvEngAPI::vvSpoolDoc, and vvEngAPI::vvEndDoc may be called any time between vvEngAPI::vvSpoolDoc and vvEngAPI::vvEndOCRSes. vvEngAPI::vvSpoolDoc may be called multiple times to write multiple output pages.
For more information:
See this page for a list of the basic sequence of actions, further description of handling data, input and output, and actions.
Further down on the same page, the State table for actions describes in detail how the engine state works. Many of the actions can be considered stack-like: after you start an output document with vvEngAPI::vvStartDoc, you must close it with vvEngAPI::vvEndDoc before you can exit the OCR session; after you load image data into the engine with vvEngAPI::vvReadImageData, you must unload it with vvEngAPI::vvUnloadImage before you can load any new image data into the engine. Actions such as vvEngAPI::vvPreprocess and vvRecognize are a little different; they may be called multiple times and must obey a certain ordering -- vvEngAPI::vvPreprocess must be called before vvEngAPI::vvRecognize, vvEngAPI::vvRecognize must be called before vvEngAPI::vvSpoolDoc.
Before calling vvEngAPI::vvPreprocess, make sure you set the value dm_pp_auto_segment to vvNo.
When the engine runs preprocessing on an image during the vvEngAPI::vvPreprocess call, the engine will auto segment the input image if the preprocessing value dm_pp_auto_segment is set to vvYes, or it will not divide the image into regions if this value is set to vvNo. In either case, you can create user defined regions.
To get a list of the current regions, get the value of dm_region_ids from the engine by using the vvEngAPI::vvGetValue function.
To create a new region:
dm_current_region to an unused region id number. vvEngAPI::vvGetValue function. The minimal set of values you should specify is:
See the other values starting with "<code>::dm_region</code>" for other region properties you can specify. vvEngAPI::vvSetRegionProperties. dm_region_ids, you should see your new region listed.
A sample program to demonstrate creation of a new region is available upon request.
Note that regions may also be deleted; see the function vvEngAPI::vvRemoveRegion.
"\ref UOR" stands for "union of rectangles" and is used to describe the bounding box of a region. The value dm_region_uor_string defines the UOR for a region.
The UOR for a region may include one or more rectangles. Use of multiple rectangles permits oddly shaped regions, important for documents where text and images appear closely together, in such a way that one rectangle cannot encompass an entire text region without including part of what should be an image region.
To set the UOR for a region:
Using the function vvEngAPI::vvSetValue, you must set the current region (dm_current_region), then set the UOR definition (dm_region_uor_string) and the number of rectangles (dm_region_uor_count). Finally, the region information is committed in the OCR engine with a function call to vvEngAPI::vvSetRegionProperties.
Formatting the UOR string:
The UOR string (dm_region_uor_string) must be formatted correctly for your application to work correctly, and the region count (dm_region_uor_count) must be set accurately. In dm_region_uor_string, coordinates are separated by commas; rectangles are separated by semicolons; the string should contain no whitespace.
For example, if you want a region to consist of one rectangle with the coordinates (400,800) by (600,1400), then you would set dm_region_uor_string to 400,800,600,1400 and dm_region_uor_count to 1.
In general, the format of the dm_region_uor_string should be:
x1,y1,x2,y2;x3,y3,x4,y4;x5,y5,x6,y6
to specify three rectangles defined conceptually:
rectangle one: (x1,y1) (x2,y2)
rectangle two: (x3,y3) (x4,y4)
rectangle three: (x5,y5) (x6,y6)
A PDF document consists of any combination of text and bitmap images embedded in a PDF file. It may also contain structural information used for formatting and interactive features such as hyperlinks.
Because of the flexibility of the PDF file format, a PDF file may be used as an "image" file or as a "text" file. When used as an "image", PDF files are commonly used as optical character recognition (OCR) input, and when used as a "text" document, PDF files are often used as OCR output.
OCR is the process of converting image bitmap data into text data, so it should be clear which type of PDF files are appropriate as OCR input formats and OCR output formats.
In general, one may identify three types of PDF documents:
Image-only PDF documents contain only a bitmap of a document and are produced by encapsulating a bitmap image in a "pdf wrapper." The result is an exact representation of the bitmap image.
Image-only PDF file size is large because it consists solely of bitmap image data.
Image-only PDF documents contain no searchable text; they may not be indexed and the text may not be copied.
This format is used for OCR input, because it contains bitmap data and no text data.
Normal PDFs contain text and embedded graphical elements. The text is scalable and can be searched, copied, and indexed.
Normal PDF file size is small, because most data is textual and embedded graphics are usually small. Because text data is stored in normal PDF documents, the clarity is good due to scalable text, and the text is searchable.
Normal PDFs must be generated by some sort of editor or an OCR application; they can not be generated directly from a scanner.
Normal PDFs are a common OCR output format, because they may contain recognized text and approximate the original bitmap image's formatting and embedded graphics. See the output format vvPDFFormatNormal.
Normal PDFs are not normally used for OCR input, because they already contain text data and therefore do not need to be recognized.
Image+text PDFs are a hybrid between Normal PDFs and Image only PDFs and are used because they combine the best features of both. Like Image-only PDFs, Image+text PDFs display the entire original bitmap; everything visible in an Image+text PDF is bitmap data. However, Image+text PDFs also contain an invisible layer of text beneath the visible bitmap.
Image+text PDF file size is large, because it contains a full-page bitmap.
Image+text PDF text may be searched, copied, and indexed.
Image+text PDFs are a common OCR output format, because they contain recognized text, allowing them to be searched and indexed, while at the same time they retain the exact appearance of the original scanned bitmap image. See the output format vvPDFFormatText.
Image+text PDFs are not used for OCR input, because they already contain text data and therefore do not need to be recognized.
Note that Vividata's OCR applications will accept many Normal PDFs and Image+text PDFs as input. However, this can result in information loss, because the OCR application renders the input PDF file from text into bitmap data, then performs OCR on the bitmap data in order to convert it back to text. As a result, we do not recommend using Vividata's OCR applications to extract text from Normal PDFs and Image+text PDFs, and would instead suggest using a utility such as "pdftotext" to directly pull text data from a text PDF.
The XDOC format is a ScanSoft text output format which provides detailed information about the text, images, and formatting in a recognized document.
To use XDOC output, set the output document format to one of the following types of XDOC output:
The following values are associated with XDOC output and can be set to affect the information written to XDOC output:
These files, included with the API in /opt/Vividata/doc, provide specific information about the XDOC format, and enough detail for a user to parse the output.
In XDOC output, the font family is represented by one of the following abbreviations:
Rather than detecting specific fonts, the OCR engine detects the features of a font, such as whether it has serifs, in order to group it in a font family.
1.3.2