OCR Shop XTR Lite Quickstart January 7, 2013 v6.0 Download, Installation, Usage, and Troubleshooting Guide *** OCR Shop XTR Lite now supports 56 different languages! Read below to *** *** learn how to recognize documents in your language. *** =============================================================================== =============================================================================== OCR Shop XTR Lite Quickstart 1. Documentation and contact information 2. Downloading the software and license 3. Installation 4. Licensing 5. Running xtrclilite 6. Troubleshooting 7. Appendix =============================================================================== 1. Documentation and contact information Additional documentation is included in the software distribution and available from www.vividata.com under "Documentation": xtrclilite_quickstart.txt Installation, usage, and troubleshooting xtrclilite_readme.txt An overview of the distributed documentation and files xtrclilite_manual.pdf Comprehensive user documentation xtrclilite_release_notes.pdf OCR Shop XTR Lite 5.6 release notes For questions or purchase information: * Call Vividata at (U.S.A.) 1-510-658-6587 * Click "Contact Sales" at www.vividata.com For Vividata technical support: * To contact technical support, click "Contact Support" at www.vividata.com * To download documentation, click "Documentation" at www.vividata.com =============================================================================== 2. OCR Shop XTR Lite Quickstart: Downloading the software and license If you already have your OCR Shop XTR Lite software distribution and license key, go to section II. To download the OCR Shop XTR Lite software distribution and license: Step 1: Account and evaluation set-up New users: * Go to www.vividata.com to Register & Evaluate OCR Shop XTR Lite. * The website will prompt you for the hostname and machine id of your system during the registration process to create your evaluation license key. See "How to gather system information needed to create a license key" in the Appendix. * After registration, your password will be emailed to you. Current customers: * Log in to your Vividata account at www.vividata.com, using the password emailed to you after registration. * Click "New Evaluation" to register for an evaluation of OCR Shop XTR Lite. To extend an evaluation license or purchase OCR Shop XTR Lite: Call Vividata at (U.S.A.) 1-510-658-6587, or click Contact Sales at www.vividata.com. We will create a new license key for you after you make your request or purchase. Users who have purchased OCR Shop XTR Lite: You should already have a Vividata account, as well as access to the software and license key. Contact sales if you have questions about your account. Step 2: Download the software distribution * Log in at www.vividata.com * Click "Downloads" * Click the link corresponding to OCR Shop XTR Lite and your operating system. Save the distribution to your computer. * Verify that root has read and execute permission to the distribution file. If not, run the command "chmod 777 " where is the name of the distribution file. Step 3: Download your license Your license keys are always available for download from Vividata's website. * Log in at www.vividata.com. * Click "License Keys". * Find your OCR Shop XTR Lite key and click "Download this key". Save the file to your computer. Next, go to section II to install OCR Shop XTR Lite. =============================================================================== 3. OCR Shop XTR Lite Quickstart: Installation OCR Shop XTR Lite installs in /opt/Vividata by default. To install in an alternate directory, set the environment variable VV_HOME to the directory path before installing the software. Refer to Chapter 2 of the OCR Shop XTR Lite manual for complete installation instructions. If you have other Vividata software installed, the OCR Shop XTR Lite software and license key will install alongside that other software, and you will continue to be able to use all Vividata software. To install OCR Shop XTR Lite: 1. Log in as root. 2. Run the distribution. For example: ./xtrclilite-linux-5.6r1 The name of your distribution file may differ based on OS and version. 3. If the installation completes with no errors, then OCR Shop XTR Lite is now installed. See "Installed files" in the Appendix below for a complete list of installed files. If the installer reported errors, refer to "Troubleshooting" below, or contact Vividata (see above). Next, go to Section II to install your license key. =============================================================================== 4. OCR Shop XTR Lite Quickstart: Licensing After installing OCR Shop XTR Lite, you must install a license key before you may run the software. To install your license key: When you download your license, the license key is wrapped in a shell script that will install it. 1. Log in as root and run the script to place the license key in /opt/Vividata/config/vvlicense.dat: su root sh key.sh 2. Your license key is now installed. Go to Section III to test OCR Shop XTR Lite. ***** If OCR Shop XTR Lite is installed in a non-default directory: You must set the environment variable VV_HOME to the Vividata directory before running the license key install script. ***** How licensing works: After running the license key install script, your license key will be located in /opt/Vividata/config/vvlicense.dat, unless you are using a different Vividata directory. The license manager starts automatically when you first run OCR Shop XTR Lite. The license manager will continue to run in the background after OCR Shop XTR Lite exits. The process will be named "vvlicense" or "xtrclilite", depending on your system. Several license manager utilities are distributed with OCR Shop XTR Lite and placed in /opt/Vividata/bin on installation: vvlmstop Resets the license manager. vvlmhostid Reports the machine id of your system. vvlmreread Has the currently running license manager reload the license file. vvlmstatus Reports the status of the currently running license manager. xtrcliKeyRead Decodes a license key string, listing the options the key enables, whether the key is permanent or expiring, and the host information. =============================================================================== 5. OCR Shop XTR Lite Quickstart: Running xtrclilite Log in as a normal user (not root) to test OCR Shop XTR Lite. Sample input files are provided in /opt/Vividata/docs/sample_images/. The OCR Shop XTR Lite binary is installed by default in /opt/Vividata/bin/xtrclilite. Either place /opt/Vividata/bin in your PATH environment variable, or use the fully qualified path when invoking xtrclilite. A. Print out usage instructions and command-line options: xtrclilite --help B. Recognize an input image, using the default language (English) and default output format (ASCII): xtrclilite letter.tif out.txt C. Recognize an input image in French: xtrclilite french_german.tif out_french.txt -l french D. The sample image french_german.tif also contains German text. The results will be better if we use both French and German to recognize the document: xtrclilite french_german.tif out_french_german.txt -l german,french E. Recognize an input image in Russian and English, which requires Unicode output: xtrclilite cyrillic_with_english.tif out_cyrillic_eng.txt -l russian,asciieng -o unicode * Some languages are not supported by ASCII text and require the use of Unicode output. Refer to the OCR Shop XTR Lite manual for a list. * When you view a Unicode output file, make sure your viewer supports Unicode. * Go to www.unicode.org for more information about Unicode. F. Recognize a PDF input file, using the defaults for language and output filetype: xtrclilite letter.pdf out.txt Supported input image file formats include: PDF, PS, GIF, JPEG, TIFF, multipage TIFF, PBM, and RGB. PDF and PS input documents are licensed separately when you make your purchase. 56 languages are supported: You may select one or more on the command-line. Run the command-line help or refer to the manual for a list of supported languages. Languages are licensed separately when you make your purchase. xtrclilite exits with a return code of 0 on success, or a negative number on failure. For best results: * High contrast images result in better recognition. (A high contrast image shows, for example, black text on a white background, compared to a low contrast image with gray text on an off-white background.) * If you are scanning your own images, a resolution of 300x300dpi is ideal, however resolutions from 72 dpi to 900 dpi are supported. To improve your OCR results, see "Improving recognition results" under "Troubleshooting" below. ***** If OCR Shop XTR Lite is installed in a non-default directory: * Set the environment variable VV_HOME to the directory where OCR Shop XTR Lite resides before running xtrclilite. For example, if OCR Shop XTR Lite is in /opt/software/Vividata, set: In tcsh: setenv VV_HOME /opt/software/Vividata In bash: VV_HOME=/opt/software/Vividata; export VV_HOME ***** =============================================================================== 6. OCR Shop XTR Lite Quickstart: Troubleshooting * Improving recognition results * Installation errors * Licensing errors * Software crashes and problems ------------------------------------------------------------------------------- Improving recognition results The tips below will help you recognize problem images with OCR Shop XTR Lite. If these suggestions do not help, the full OCR Shop XTR will provide a better solution by giving you more control and flexibility. An evaluation of OCR Shop XTR is available from Vividata's website, or please contact Vividata directly (see above). * Non-English text: OCR Shop XTR Lite recognizes the English language by default. If your documents are in a language other than English, be sure to set the language command-line option or the VV_LANGUAGE environment variable. * Low contrast grayscale or color images: In low contrast images, the text is not significantly darker than the background color. Sometimes this can cause recognition problems. OCR Shop XTR Lite converts 8 and 24-bit images to binary image data prior to recognition. As a result, information might be lost when the input image is very low in contrast. What to do: We recommend using OCR Shop XTR, which has built in controls for increasing contrast and adjusting the threshold at which input images are binarized. An evaluation of OCR Shop XTR is available from Vividata's website, or please contact Vividata directly (see above). Otherwise, if you have control over the scanning process, increase the contrast when you scan your images. Alternatively, preprocess your image files to increase the contrast after scanning and before passing them to OCR Shop XTR Lite. * Input image resolution and font size Make sure the resolution of the input image, as well as the font size with respect to that resolution, are within normal limits. OCR Shop XTR Lite accepts: - Image resolutions from 72dpi to 900dpi - Fonts from 5 to 72 points The resolution of the input image determines what one "point" means in the font point size. The resolution of the input image is specified in the input image file, or, when not specified, is assumed to be 300x300 dpi by default. - There are 72 points per inch. - The point size of a font is measured from the top of the highest ascender to the bottom of the lowest descender. - The dpi specifies the number of pixels per inch. If the type in your image is particularly large or small, it might fall outside the accepted font point sizes, depending on the image resolution. For example, if your font size is 15 pixels high and the image resolution is 300dpi, then the font point size is approximately 3 points, too small for the engine to recognize well. Similarly, if your font size is 80 pixels high and the image resolution is 72dpi, then the font point size is approximately 80 points, too large for the engine to recognize well. You may approximate the point size of your font with the equation: [height of font in pixels] * 72 points/inch / [image dpi] = [point size of font] Remember the height of the font is measured from the top of the highest ascender to the bottom of the lowest descender. If you count the pixels, make sure you view that portion of your image at full resolution on your screen, sometimes referred to as "actual pixels". What to do: If you suspect the resolution of your input image is causing the fonts to fall outside the accepted range, or if your input image resolution is larger or smaller than the supported resolutions, adjust the settings on your scanner and scan your images at a supported resolution. A good starting point is 300x300 dpi. If you need more flexibility, Vividata's full OCR Shop XTR software includes a command-line option to override the input image resolution. Overriding the input image resolution lets you recognize a wider variety of input images with unusual resolutions or resolutions outside of the supported range. Without the full OCR Shop XTR, if you do not have control over the scanning process, you will need to preprocess your image yourself. Try using an image processing application to change the resolution of your input image before recognizing it with OCR Shop XTR Lite. * TIFF fill-order bit If your input image is a TIFF file and your results are much worse than you expect, given the quality and properties of the input image, it is possible that the "fill-order" bit is set in the input file and needs to be obeyed. Most TIFF images do not use the fill-order bit, and most that do set it, set it incorrectly. However, a very small number of TIFF images do have the fill-order bit set correctly and in these cases it should be obeyed. OCR Shop XTR Lite ignores the TIFF fill-order bit by default, but allows the user to choose to obey it. What to do: If you suspect the fill-order bit it set in your input TIFF image, set the environment variable VV_IGNORE_FILLORDER to "n", and OCR Shop XTR Lite will obey the TIFF fill-order bit. For example: In tcsh: setenv VV_IGNORE_FILLORDER n In bash: VV_IGNORE_FILLORDER=n; export VV_IGNORE_FILLORDER Then run OCR Shop XTR Lite again using the same input file. ------------------------------------------------------------------------------- Input errors * Check that the input file exists and has read permission for the user calling xtrclilite. * Check that the input file is valid and not corrupt. * Check that the input file is in one of the supported input formats. * If the input file is a PDF or Postscript (PS) file, make sure your license permits PDF/PS input. All evaluation licenses permit PDF/PS input, but purchased licenses only include PDF/PS input when you purchase it specifically. ------------------------------------------------------------------------------- Installation errors * Check permissions: If you are installing to the default directory, make sure you are logged in as root and that root has permission to install to /opt/Vividata. If you are installing to a non-default directory, make sure the environment variable VV_HOME points to the correct directory, and that the user who runs the distribution has write permission for that directory. * If the installation error, "Failed to extract files," is reported, the self-extracting executable may be truncated or corrupt. Try downloading the software from Vividata's website again, and verify that the download completes successfully. * If you are still unable to install, contact Vividata support (see above). Please send any error messages printed to the console, a copy of the log file /tmp/vvInstall.log, as well as the results of these commands: md5sum xtrclilite-linux-5.1r1 ls -l xtrclilite-linux-5.1r1 uname -a Note: The name of your distribution file may differ based on OS and version. ------------------------------------------------------------------------------- Licensing errors If OCR Shop XTR Lite reports a licensing error, first reset the license manager and then run xtrclilite again: /opt/Vividata/bin/vvlmstop /opt/Vividata/bin/xtrclilite image.tif out.txt If the licensing error persists, please verify that your license key matches your machine and has not expired: 1. Obtain your machine id by running: /opt/Vividata/bin/vvlmhostid 2. Decode your license key by running: /opt/Vividata/bin/xtrapiKeyRead -k where is the encoded string contained in your license file, /opt/Vividata/config/vvlicense.dat. 3. Look at the xtrcliKeyRead output and check that the machine id matches that printed out when you ran vvlmhostid. Verify that your license key has not expired. If your license key looks correct, please check the owner and permissions of the license manager log: * The license manager log is created in /tmp/vvLicense.log when the license manager is started on the first run of ocrxtr. If the environment variable TMP or TMP_DIR was set to a different directory when the license manager was started, vvLicense.log will be placed in that directory. * Check that you either own or have write permission to vvLicense.log. If you do not, xtrclilite may have trouble running and may report a license error. * If you suspect this is the case, remove vvLicense.log or turn on universal write permissions for the file. After doing this, shut down the license manager and restart it by running ocrxtr again. For example: rm /tmp/vvLicense.log /opt/Vividata/bin/vvlmstop /opt/Vividata/bin/xtrclilite image.tif out.txt If your license key seems incorrect or you are still unable to run the software after trying the above steps, contact Vividata support (see above). ------------------------------------------------------------------------------- Software crashes and problems If OCR Shop XTR Lite reports an error or crashes, please: 1. Set an environment variable called VV_DEBUG to 1000 to turn on verbose debug output; for example: In tcsh: setenv VV_DEBUG 1000 In bash: VV_DEBUG=1000; export VV_HOME Then run the same command-line again. 2. Contact Vividata support (see above) and send: - The command-line that caused the bug - The debug output and all error messages - The input image that triggered the problem, if possible - OCR Shop XTR Lite version number (run "xtrclilite --version") - System description (run "uname -a") - Any further description or information that would help us identify the problem =============================================================================== 7. Appendix * How to gather system information needed to create a license key * Installed files ------------------------------------------------------------------------------- How to gather system information needed to create a license key First, you need the hostname of the system where you plan to install OCR Shop XTR Lite. Run this command to get the hostname of your machine: hostname Next, you need the machine id of your machine. On Solaris, run this command: hostid On Linux, the MAC address is used for the machine id. To find the MAC address, run this command, and look for the number after "Ethernet HWaddr": ifconfig eth0 When you create your license key, the website will prompt you for the hostname and machine id. ------------------------------------------------------------------------------- Installed files When you install OCR Shop XTR Lite, the following files will be placed on your system: /opt/Vividata/bin: xtrclilite gs xtrapiKeyRead vvlmhostid vvlmreread vvlmstatus vvlmstop /opt/Vividata/bin/linux or sun_5x: xtrclilite gs xtrapiKeyRead vvlmhostid vvlmreread vvlmstatus vvlmstop /opt/Vividata/docs: xtrclilite_README.txt xtrclilite_quickstart.txt xtrclilite_release_notes.txt xtrclilite_manual.pdf /opt/Vividata/docs/sample_images: letter.tif cyrillic_with_english.tif letter.pdf french_german.tif /opt/Vividata/config: vvlicense.dat (this file may not be installed until you install your license key) /opt/Vividata/ghostscript: Fontmap Fontmap.GS /opt/Vividata/lib/langs BALTIC.shp LATIN2.shp dutch.lng greek.lng port.lng CYRILLIC.shp TURKISH.shp english.lng hungar.lng russian.lng CharSetTable.chr asciieng.lng finnish.lng italian.lng spanish.lng GREEK.shp czech.lng french.lng norsk.lng swedish.lng LATIN1.shp danish.lng german.lng polish.lng turkish.lng =============================================================================== ===============================================================================