# OCR Shop XTR/API Frequently Asked Questions

### Installation and set-up

1. Why is the API not installed with my other Vividata softare?

Software using Vividata's older installer was installed by default in /usr/vividata on Linux and /opt/Vividata on Solaris. The new installer used for the API always installs in /opt/Vividata by default.

You may control where the API installer places the API by setting the environment variable VV_HOME to the desired directory prior to running the installer.

If you use some of Vividata's older software, you may already have the VV_HOME environment variable set in your environment. In this case, you should be careful when you install the API and later when you start the ocrxtrdaemon. If VV_HOME is inconsistent, then you will receive an error. To fix it, make sure VV_HOME matches the API installation directory when you start the ocrxtrdaemon.

2. How do I generate log output?

Various levels of log output may be generated on both the client and daemon sides as output to stdout and stderr.

To generate log output from your client program, call the function vvLogSetLevel (see an example in vvxtrSample.cc):

 void vvLogSetLevel( int logLevel );

The log level should be set anywhere from 1 to 1000, where the higher the number, the more output will be generated:

• 1 - Error messages only
• 250 - Warning messages
• 500 - Informational messages
• 515 - Debug information
• 520 - All information

By default, only error messages are printed.

Similarly, the OCR Shop XTR/API daemon (ocrxtrdaemon) can generate log output to stdout and stderr. Control the verbosity of the daemon log output by setting the environment variable VV_DEBUG to the desired log level before starting the ocrxtrdaemon process. For example, set VV_DEBUG to 1000 for the maximum debugging output.

For both the client and the daemon, you can save the log output to a file by piping it from the command line:

 clientProgram >& client.log ocrxtrdaemon >& daemon.log 

3. How do I customize where temp files are placed?

To make sure all temp files are placed in the directory of your choice, set these environment variables:

• TMP
• TMP_DIR
• VV_TMPDIR

to the directory where you wish to store the temp files. Make sure that you do this in the shell where you run the daemon program ocrxtrdaemon.

4. How can I set up the API in my client/server environment?

The OCR Shop XTR/API itself operates as a client/server system, where your application links statically with the provided communicator library so that it may communicate dynamically with the daemon process. The main daemon process handles communication and can create multiple instances of the OCR engine, serving one or more client programs. As a result of this configuration, you have three basic options for using the OCR Shop XTR/API in your own client/server environment:

1. All OCR and related processing takes place on the server side. You could install the OCR daemon and your client program based on our API on one server. All of your client machines would send images to and receive output from this server, through your own software. The server would have multiple XTR/API licenses installed on it, depending on your anticipated OCR needs. The server should be a multiprocessor machine or cluster of machines if you anticipate running many OCR jobs at once.

2. All OCR and related processing takes place on the client side. You could install the OCR daemon on each client machine, along with your client program based on our API. Each client machine would handle its own OCR requirements. Licenses would be installed on each client machine, or could be installed on one license server (floating licenses). If you install more than one license on each client machine or if you use floating licenses, you could run concurrent OCR processing on each client machine.

3. The OCR daemon runs on the server side and the client program runs on the client side. You could install the OCR daemon on the server, along with all of our licenses. Again, in this case, the server should be a multiprocessor machine or cluster of machines if you anticipate running many OCR jobs at once. The client program would be installed on each client machine, and would communicate with the OCR daemon across your network.

Note that scenarios 1 and 3 could require significantly more network traffic in order to transfer the image data back and forth between client and server machines.

### Image input

1. How do I load image data directly from memory into the OCR engine?

In order to load image data from memory, you must pass a vvxtrImage structure containing the image data to vvEngAPI::vvReadImageData(const struct vvxtrImage * img).

A sample program is available that demonstrates this functionality: vvxtrSample2.cc

To compile this sample program, save the source code for vvxtrSample2.cc as a file called "vvxtrSample.cc". Then you can compile it with the same GNUmakefile and supporting files as distributed with the OCR Shop XTR/API, and found in /opt/Vividata/src after your installation.

### Processing and Recognition

1. How can I improve recognition accuracy?

Typeset, high-quality printed pages return the best recognition accuracy. The following factors most affect text-recognition accuracy:

• Preprocessing Settings
• Recognition Parameters
• Line Art and Photographic Regions
• Document Quality
• Scanning Process

No single combination of preprocessing settings and recognition parameters always results in the quickest, most accurate recognition job. However, if you use the settings most appropriate to each document, OCR Shop XTR™/API's speed and accuracy will be maximized.

OCR Shop XTR™/API may recognize some line-art graphics or areas of photographic regions as text if the artwork is poor and the lines resemble letter strokes. Adjusting the dm_black_threshold parameter may change how the OCR Engine differentiates between photographic regions and text regions. Individual regions can also be manually specified as graphical or textual content.

OCR Shop XTR™/API recognizes characters in almost any font in sizes from 5 to 72 points. The engine interprets font size based on the image's dpi, so set the input image dpi carefully in order to guarantee the image's fonts fall within the recognized sizes.

Following certain guidelines may improve recognition accuracy:

• The print should be as clean and crisp as possible.
• Characters should be distinct, separated from each other and not blotched together or overlapping.
• The document should be free of handwritten notes, lines and doodles.
• Anything that is not a printed character slows recognition, and any character distorted by a mark will be unrecognizable.
• Try to avoid highly stylized fonts. For example, OCR Shop XTR™ may not recognize text in the Zapf Chancery® font accurately.
• Try to avoid underlined text. Underlining changes the shape of descenders on the letters q, g, y, p, and j.

If you have control over the scanning process, you can improve recognition by eliminating skew and background noise. Some paper is so thin that the scanner reads text printed on the back side of the scanned page. Put a black piece of paper between the sheet and the lid of the scanner. By eliminating any need for the OCR Engine to deskew an image, recognition processing speed will improve.

2. How can I improve performance?

Here are several items to consider which affect how fast the OCR Shop XTR/API processes your images:

• One of the primary benefits of using the API is that the OCR engine does not need to be shut down and restarted between each page. Make sure that you do not needlessly shut down and restart the OCR engine and OCR session. Unless you want to switch languages, you should be able to recognize an unlimited number of pages within one OCR session.

• The OCR Shop XTR/API works using a daemon to handle the OCR processing. This allows for the flexibility such that the daemon and client program do not have to run on the same system. However, if the OCR daemon and the client program do reside on the same filesystem, you should tell the daemon this in your client program so that it can optimize communications. Before starting the OCR session, make this function call to the OCR engine:

vvERROR(xtrEngine->vvSetHint(vvHintLocalFilesystem));

See vvEngAPI::vvSetHint and vvHintLocalFilesystem.

• Preprocessing takes up a large portion of the total OCR processing time, and some of the preprocessing functions are among the most processor intensive functions. Carefully pick and choose which preprocessing options you have turned on if you are concerned about time. And pay attention to the default settings; the defaults provide the best OCR results under most conditions, not the best balance of fast processing time and results in your particular circumstances. For example, if you know all of your images will be properly rotated, turn off dm_pp_auto_orient. If you know the quality of your images is high, turn off degraded image processing dm_pp_autosetdegrade. If you are certain you do not want to use the fax filter, make sure it is off and not set to automatic (dm_pp_fax_filter. See dm_pp_remove_halftone and the other preprocessing options.

• One particular preprocessing option that can take up significant time is deskewing. If your images are straight to start with, processing will be faster and you can turn off deskewing. If some or all of your images will be skewed, understand that it will take longer to process them because of the skew. See dm_pp_deskew

3. What is the correct sequence of actions for processing multiple files?

When processing multiple input files, the sequence of operations is, for example:

Note on vvxtrCreateRemoteEngine:

vvEngAPI::vvxtrCreateRemoteEngine is different from vvEngAPI::vvInitInstance, vvEngAPI::vvRecognize, etc. because it is actually starting a new engine. It tells the ocrxtrdaemon to fork a new process that becomes the new engine. The "action" functions such as vvEngAPI::vvInitInstance and vvEngAPI::vvRecognize work within that engine and cause the engine to change state.

You can call vvEngAPI::vvxtrCreateRemoteEngine multiple times to create multiple engines that all run concurrently. Each engine has its own state and is used individually. A call to vvEngAPI::vvStartDoc, for example, is made to one specific engine.

When to set options:

Options for preprocessing and recognition do not have to be set just before the vvEngAPI::vvPreprocess and vvEngAPI::vvRecognize calls. After being set, the values are retained in the engine until the ocr session is ended or vvEngAPI::vvInitValues is called.

If you are reading your image data from memory, then you do not need to call vvEngAPI::vvOpenImageFile or vvEngAPI::vvCloseImageFile. You just need to call vvEngAPI::vvReadImageData and vvEngAPI::vvUnloadImage.

Ordering of output actions versus input actions:

The sequence of actions used to start, write and close the output document do not have to take place in the exact order above -- the output document is flexible with respect the input document. Recognition must take place before vvEngAPI::vvSpoolDoc may be called, but otherwise vvEngAPI::vvStartDoc can be called any time between vvEngAPI::vvStartOCRSes and vvEngAPI::vvSpoolDoc, and vvEngAPI::vvEndDoc may be called any time between vvEngAPI::vvSpoolDoc and vvEngAPI::vvEndOCRSes. vvEngAPI::vvSpoolDoc may be called multiple times to write multiple output pages.

See this page for a list of the basic sequence of actions, further description of handling data, input and output, and actions.

Further down on the same page, the State table for actions describes in detail how the engine state works. Many of the actions can be considered stack-like: after you start an output document with vvEngAPI::vvStartDoc, you must close it with vvEngAPI::vvEndDoc before you can exit the OCR session; after you load image data into the engine with vvEngAPI::vvReadImageData, you must unload it with vvEngAPI::vvUnloadImage before you can load any new image data into the engine. Actions such as vvEngAPI::vvPreprocess and vvRecognize are a little different; they may be called multiple times and must obey a certain ordering -- vvEngAPI::vvPreprocess must be called before vvEngAPI::vvRecognize, vvEngAPI::vvRecognize must be called before vvEngAPI::vvSpoolDoc.

### Custom regions

1. How do I tell the engine to not divide the image into regions?

Before calling vvEngAPI::vvPreprocess, make sure you set the value dm_pp_auto_segment to vvNo.

2. How do I create my own regions?

When the engine runs preprocessing on an image during the vvEngAPI::vvPreprocess  call, the engine will auto segment the input image if the preprocessing value dm_pp_auto_segment is set to vvYes, or it will not divide the image into regions if this value is set to vvNo. In either case, you can create user defined regions.

To get a list of the current regions, get the value of dm_region_ids from the engine by using the vvEngAPI::vvGetValue function.

To create a new region:

• Set dm_current_region to an unused region id number.
• Set up the properties of the new region using the vvEngAPI::vvGetValue function. The minimal set of values you should specify is:
• dm_region_uor_string (See What is a UOR?.)
• dm_region_uor_count
• dm_region_type
See the other values starting with "<code>::dm_region</code>" for other region properties you can specify.
• Finish setting up this new region in the OCR engine by calling the function vvEngAPI::vvSetRegionProperties.
• Now if you query the engine again for the list of dm_region_ids, you should see your new region listed.

A sample program to demonstrate creation of a new region is available upon request.

Note that regions may also be deleted; see the function vvEngAPI::vvRemoveRegion.

3. What is a UOR?

"\ref UOR" stands for "union of rectangles" and is used to describe the bounding box of a region. The value dm_region_uor_string defines the UOR for a region.

The UOR for a region may include one or more rectangles. Use of multiple rectangles permits oddly shaped regions, important for documents where text and images appear closely together, in such a way that one rectangle cannot encompass an entire text region without including part of what should be an image region.

To set the UOR for a region:
Using the function vvEngAPI::vvSetValue, you must set the current region (dm_current_region), then set the UOR definition (dm_region_uor_string) and the number of rectangles (dm_region_uor_count). Finally, the region information is committed in the OCR engine with a function call to vvEngAPI::vvSetRegionProperties.

Formatting the UOR string:
The UOR string (dm_region_uor_string) must be formatted correctly for your application to work correctly, and the region count (dm_region_uor_count) must be set accurately. In dm_region_uor_string, coordinates are separated by commas; rectangles are separated by semicolons; the string should contain no whitespace.

For example, if you want a region to consist of one rectangle with the coordinates (400,800) by (600,1400), then you would set dm_region_uor_string to 400,800,600,1400 and dm_region_uor_count to 1.

In general, the format of the dm_region_uor_string should be:

x1,y1,x2,y2;x3,y3,x4,y4;x5,y5,x6,y6

to specify three rectangles defined conceptually:

rectangle one: (x1,y1) (x2,y2)
rectangle two: (x3,y3) (x4,y4)
rectangle three: (x5,y5) (x6,y6)

### File Formats

1. What are image PDFs versus text PDFs?

A PDF document consists of any combination of text and bitmap images embedded in a PDF file. It may also contain structural information used for formatting and interactive features such as hyperlinks.

Because of the flexibility of the PDF file format, a PDF file may be used as an "image" file or as a "text" file. When used as an "image", PDF files are commonly used as optical character recognition (OCR) input, and when used as a "text" document, PDF files are often used as OCR output.

OCR is the process of converting image bitmap data into text data, so it should be clear which type of PDF files are appropriate as OCR input formats and OCR output formats.

In general, one may identify three types of PDF documents:

• Image-only PDF

Image-only PDF documents contain only a bitmap of a document and are produced by encapsulating a bitmap image in a "pdf wrapper." The result is an exact representation of the bitmap image.

Image-only PDF file size is large because it consists solely of bitmap image data.

Image-only PDF documents contain no searchable text; they may not be indexed and the text may not be copied.

This format is used for OCR input, because it contains bitmap data and no text data.

• Normal PDF

Normal PDFs contain text and embedded graphical elements. The text is scalable and can be searched, copied, and indexed.

Normal PDF file size is small, because most data is textual and embedded graphics are usually small. Because text data is stored in normal PDF documents, the clarity is good due to scalable text, and the text is searchable.

Normal PDFs must be generated by some sort of editor or an OCR application; they can not be generated directly from a scanner.

Normal PDFs are a common OCR output format, because they may contain recognized text and approximate the original bitmap image's formatting and embedded graphics. See the output format vvPDFFormatNormal.

Normal PDFs are not normally used for OCR input, because they already contain text data and therefore do not need to be recognized.

• Image+text PDF

Image+text PDFs are a hybrid between Normal PDFs and Image only PDFs and are used because they combine the best features of both. Like Image-only PDFs, Image+text PDFs display the entire original bitmap; everything visible in an Image+text PDF is bitmap data. However, Image+text PDFs also contain an invisible layer of text beneath the visible bitmap.

Image+text PDF file size is large, because it contains a full-page bitmap.

Image+text PDF text may be searched, copied, and indexed.

Image+text PDFs are a common OCR output format, because they contain recognized text, allowing them to be searched and indexed, while at the same time they retain the exact appearance of the original scanned bitmap image. See the output format vvPDFFormatText.

Image+text PDFs are not used for OCR input, because they already contain text data and therefore do not need to be recognized.

Note that Vividata's OCR applications will accept many Normal PDFs and Image+text PDFs as input. However, this can result in information loss, because the OCR application renders the input PDF file from text into bitmap data, then performs OCR on the bitmap data in order to convert it back to text. As a result, we do not recommend using Vividata's OCR applications to extract text from Normal PDFs and Image+text PDFs, and would instead suggest using a utility such as "pdftotext" to directly pull text data from a text PDF.

2. What is the XDOC format and how do I use it?

The XDOC format is a ScanSoft text output format which provides detailed information about the text, images, and formatting in a recognized document.

To use XDOC output, set the output document format to one of the following types of XDOC output:

The following values are associated with XDOC output and can be set to affect the information written to XDOC output:

These files, included with the API in /opt/Vividata/doc, provide specific information about the XDOC format, and enough detail for a user to parse the output.

3. What do the font family abbreviations stand for in the XDOC output?

In XDOC output, the font family is represented by one of the following abbreviations:

• "H" - sans serif, variable pitch, compare to "Helvetica".
• "HC" - sans serif, variable pitch, condensed, compare to "Helvetica condensed", or "Arial Narrow".
• "T" - serif, variable pitch, compare to "Times".
• "TC" - serif, variable pitch, condensed, compare to "Times condensed".
• "C" - serif, fixed pitch, compare to "Courier".

Rather than detecting specific fonts, the OCR engine detects the features of a font, such as whether it has serifs, in order to group it in a font family.

Generated on Thu Dec 11 09:32:25 2003 for OCR Shop XTR/API User Documentation by 1.3.2