Main Page | Modules | Alphabetical List | Compound List | File List | Compound Members | File Members | Related Pages | Search

OCR Shop XTR/API Frequently Asked Questions

Installation and set-up

  1. Why is the API not installed with my other Vividata softare?
  2. How do I generate log output?
  3. How do I customize where temp files are placed?
  4. How can I set up the API in my client/server environment?


Image input

  1. How do I load image data directly from memory into the OCR engine?


Processing and Recognition

  1. How can I improve recognition accuracy?
  2. How can I improve performance?
  3. What is the correct sequence of actions for processing multiple files?


Custom regions

  1. How do I tell the engine to not divide the image into regions?
  2. How do I create my own regions?
  3. What is a UOR?

File Formats

  1. What are image PDFs versus text PDFs?
  2. What is the XDOC format and how do I use it?
  3. What do the font family abbreviations stand for in the XDOC output?




Installation and set-up

  1. Why is the API not installed with my other Vividata softare?

    Software using Vividata's older installer was installed by default in /usr/vividata on Linux and /opt/Vividata on Solaris. The new installer used for the API always installs in /opt/Vividata by default.

    You may control where the API installer places the API by setting the environment variable VV_HOME to the desired directory prior to running the installer.

    If you use some of Vividata's older software, you may already have the VV_HOME environment variable set in your environment. In this case, you should be careful when you install the API and later when you start the ocrxtrdaemon. If VV_HOME is inconsistent, then you will receive an error. To fix it, make sure VV_HOME matches the API installation directory when you start the ocrxtrdaemon.

  2. How do I generate log output?

    Various levels of log output may be generated on both the client and daemon sides as output to stdout and stderr.

    To generate log output from your client program, call the function vvLogSetLevel (see an example in vvxtrSample.cc):

    void vvLogSetLevel( int logLevel );

    The log level should be set anywhere from 1 to 1000, where the higher the number, the more output will be generated:

    By default, only error messages are printed.

    Similarly, the OCR Shop XTR/API daemon (ocrxtrdaemon) can generate log output to stdout and stderr. Control the verbosity of the daemon log output by setting the environment variable VV_DEBUG to the desired log level before starting the ocrxtrdaemon process. For example, set VV_DEBUG to 1000 for the maximum debugging output.

    For both the client and the daemon, you can save the log output to a file by piping it from the command line:

    clientProgram >& client.log
    ocrxtrdaemon >& daemon.log

  3. How do I customize where temp files are placed?

    To make sure all temp files are placed in the directory of your choice, set these environment variables:

    to the directory where you wish to store the temp files. Make sure that you do this in the shell where you run the daemon program ocrxtrdaemon.

  4. How can I set up the API in my client/server environment?

    The OCR Shop XTR/API itself operates as a client/server system, where your application links statically with the provided communicator library so that it may communicate dynamically with the daemon process. The main daemon process handles communication and can create multiple instances of the OCR engine, serving one or more client programs. As a result of this configuration, you have three basic options for using the OCR Shop XTR/API in your own client/server environment:

    1. All OCR and related processing takes place on the server side. You could install the OCR daemon and your client program based on our API on one server. All of your client machines would send images to and receive output from this server, through your own software. The server would have multiple XTR/API licenses installed on it, depending on your anticipated OCR needs. The server should be a multiprocessor machine or cluster of machines if you anticipate running many OCR jobs at once.

    2. All OCR and related processing takes place on the client side. You could install the OCR daemon on each client machine, along with your client program based on our API. Each client machine would handle its own OCR requirements. Licenses would be installed on each client machine, or could be installed on one license server (floating licenses). If you install more than one license on each client machine or if you use floating licenses, you could run concurrent OCR processing on each client machine.

    3. The OCR daemon runs on the server side and the client program runs on the client side. You could install the OCR daemon on the server, along with all of our licenses. Again, in this case, the server should be a multiprocessor machine or cluster of machines if you anticipate running many OCR jobs at once. The client program would be installed on each client machine, and would communicate with the OCR daemon across your network.

    Note that scenarios 1 and 3 could require significantly more network traffic in order to transfer the image data back and forth between client and server machines.


Image input

  1. How do I load image data directly from memory into the OCR engine?

    In order to load image data from memory, you must pass a vvxtrImage structure containing the image data to vvEngAPI::vvReadImageData(const struct vvxtrImage * img).

    A sample program is available that demonstrates this functionality: vvxtrSample2.cc

    To compile this sample program, save the source code for vvxtrSample2.cc as a file called "vvxtrSample.cc". Then you can compile it with the same GNUmakefile and supporting files as distributed with the OCR Shop XTR/API, and found in /opt/Vividata/src after your installation.


Processing and Recognition

  1. How can I improve recognition accuracy?

    Typeset, high-quality printed pages return the best recognition accuracy. The following factors most affect text-recognition accuracy:

    No single combination of preprocessing settings and recognition parameters always results in the quickest, most accurate recognition job. However, if you use the settings most appropriate to each document, OCR Shop XTR™/API's speed and accuracy will be maximized.

    OCR Shop XTR™/API may recognize some line-art graphics or areas of photographic regions as text if the artwork is poor and the lines resemble letter strokes. Adjusting the dm_black_threshold parameter may change how the OCR Engine differentiates between photographic regions and text regions. Individual regions can also be manually specified as graphical or textual content.

    OCR Shop XTR™/API recognizes characters in almost any font in sizes from 5 to 72 points. The engine interprets font size based on the image's dpi, so set the input image dpi carefully in order to guarantee the image's fonts fall within the recognized sizes.

    Following certain guidelines may improve recognition accuracy:

    If you have control over the scanning process, you can improve recognition by eliminating skew and background noise. Some paper is so thin that the scanner reads text printed on the back side of the scanned page. Put a black piece of paper between the sheet and the lid of the scanner. By eliminating any need for the OCR Engine to deskew an image, recognition processing speed will improve.

  2. How can I improve performance?

    Here are several items to consider which affect how fast the OCR Shop XTR/API processes your images:

  3. What is the correct sequence of actions for processing multiple files?

    When processing multiple input files, the sequence of operations is, for example:


    Note on vvxtrCreateRemoteEngine:

    vvEngAPI::vvxtrCreateRemoteEngine is different from vvEngAPI::vvInitInstance, vvEngAPI::vvRecognize, etc. because it is actually starting a new engine. It tells the ocrxtrdaemon to fork a new process that becomes the new engine. The "action" functions such as vvEngAPI::vvInitInstance and vvEngAPI::vvRecognize work within that engine and cause the engine to change state.

    You can call vvEngAPI::vvxtrCreateRemoteEngine multiple times to create multiple engines that all run concurrently. Each engine has its own state and is used individually. A call to vvEngAPI::vvStartDoc, for example, is made to one specific engine.

    When to set options:

    Options for preprocessing and recognition do not have to be set just before the vvEngAPI::vvPreprocess and vvEngAPI::vvRecognize calls. After being set, the values are retained in the engine until the ocr session is ended or vvEngAPI::vvInitValues is called.

    Reading image data from memory:

    If you are reading your image data from memory, then you do not need to call vvEngAPI::vvOpenImageFile or vvEngAPI::vvCloseImageFile. You just need to call vvEngAPI::vvReadImageData and vvEngAPI::vvUnloadImage.

    Ordering of output actions versus input actions:

    The sequence of actions used to start, write and close the output document do not have to take place in the exact order above -- the output document is flexible with respect the input document. Recognition must take place before vvEngAPI::vvSpoolDoc may be called, but otherwise vvEngAPI::vvStartDoc can be called any time between vvEngAPI::vvStartOCRSes and vvEngAPI::vvSpoolDoc, and vvEngAPI::vvEndDoc may be called any time between vvEngAPI::vvSpoolDoc and vvEngAPI::vvEndOCRSes. vvEngAPI::vvSpoolDoc may be called multiple times to write multiple output pages.

    For more information:

    See this page for a list of the basic sequence of actions, further description of handling data, input and output, and actions.

    Further down on the same page, the State table for actions describes in detail how the engine state works. Many of the actions can be considered stack-like: after you start an output document with vvEngAPI::vvStartDoc, you must close it with vvEngAPI::vvEndDoc before you can exit the OCR session; after you load image data into the engine with vvEngAPI::vvReadImageData, you must unload it with vvEngAPI::vvUnloadImage before you can load any new image data into the engine. Actions such as vvEngAPI::vvPreprocess and vvRecognize are a little different; they may be called multiple times and must obey a certain ordering -- vvEngAPI::vvPreprocess must be called before vvEngAPI::vvRecognize, vvEngAPI::vvRecognize must be called before vvEngAPI::vvSpoolDoc.


Custom regions

  1. How do I tell the engine to not divide the image into regions?

    Before calling vvEngAPI::vvPreprocess, make sure you set the value dm_pp_auto_segment to vvNo.

  2. How do I create my own regions?

    When the engine runs preprocessing on an image during the vvEngAPI::vvPreprocess call, the engine will auto segment the input image if the preprocessing value dm_pp_auto_segment is set to vvYes, or it will not divide the image into regions if this value is set to vvNo. In either case, you can create user defined regions.

    To get a list of the current regions, get the value of dm_region_ids from the engine by using the vvEngAPI::vvGetValue function.

    To create a new region:

    A sample program to demonstrate creation of a new region is available upon request.

    Note that regions may also be deleted; see the function vvEngAPI::vvRemoveRegion.


  3. What is a UOR?

    "\ref UOR" stands for "union of rectangles" and is used to describe the bounding box of a region. The value dm_region_uor_string defines the UOR for a region.

    The UOR for a region may include one or more rectangles. Use of multiple rectangles permits oddly shaped regions, important for documents where text and images appear closely together, in such a way that one rectangle cannot encompass an entire text region without including part of what should be an image region.

    To set the UOR for a region:
    Using the function vvEngAPI::vvSetValue, you must set the current region (dm_current_region), then set the UOR definition (dm_region_uor_string) and the number of rectangles (dm_region_uor_count). Finally, the region information is committed in the OCR engine with a function call to vvEngAPI::vvSetRegionProperties.

    Formatting the UOR string:
    The UOR string (dm_region_uor_string) must be formatted correctly for your application to work correctly, and the region count (dm_region_uor_count) must be set accurately. In dm_region_uor_string, coordinates are separated by commas; rectangles are separated by semicolons; the string should contain no whitespace.

    For example, if you want a region to consist of one rectangle with the coordinates (400,800) by (600,1400), then you would set dm_region_uor_string to 400,800,600,1400 and dm_region_uor_count to 1.

    In general, the format of the dm_region_uor_string should be:

    x1,y1,x2,y2;x3,y3,x4,y4;x5,y5,x6,y6

    to specify three rectangles defined conceptually:

    rectangle one: (x1,y1) (x2,y2)
    rectangle two: (x3,y3) (x4,y4)
    rectangle three: (x5,y5) (x6,y6)


File Formats

  1. What are image PDFs versus text PDFs?

    A PDF document consists of any combination of text and bitmap images embedded in a PDF file. It may also contain structural information used for formatting and interactive features such as hyperlinks.

    Because of the flexibility of the PDF file format, a PDF file may be used as an "image" file or as a "text" file. When used as an "image", PDF files are commonly used as optical character recognition (OCR) input, and when used as a "text" document, PDF files are often used as OCR output.

    OCR is the process of converting image bitmap data into text data, so it should be clear which type of PDF files are appropriate as OCR input formats and OCR output formats.

    In general, one may identify three types of PDF documents:

    Note that Vividata's OCR applications will accept many Normal PDFs and Image+text PDFs as input. However, this can result in information loss, because the OCR application renders the input PDF file from text into bitmap data, then performs OCR on the bitmap data in order to convert it back to text. As a result, we do not recommend using Vividata's OCR applications to extract text from Normal PDFs and Image+text PDFs, and would instead suggest using a utility such as "pdftotext" to directly pull text data from a text PDF.

  2. What is the XDOC format and how do I use it?

    The XDOC format is a ScanSoft text output format which provides detailed information about the text, images, and formatting in a recognized document.

    To use XDOC output, set the output document format to one of the following types of XDOC output:


    The following values are associated with XDOC output and can be set to affect the information written to XDOC output:


    These files, included with the API in /opt/Vividata/doc, provide specific information about the XDOC format, and enough detail for a user to parse the output.

  3. What do the font family abbreviations stand for in the XDOC output?

    In XDOC output, the font family is represented by one of the following abbreviations:

    Rather than detecting specific fonts, the OCR engine detects the features of a font, such as whether it has serifs, in order to group it in a font family.


Generated on Thu Dec 11 09:32:25 2003 for OCR Shop XTR/API User Documentation by doxygen 1.3.2