OCR Shop XTR README File January 7, 2013 OCR Shop XTR is a powerful, optical character recognition application, using recognition technology from Scansoft/Nuance Communications Corporation. OCR Shop XTR quickly and accurately converts printed pages into readable text in one of many different output formats. Current version of OCR Shop XTR: 6.0 Documentation files included with this release in /opt/Vividata/docs: ocrxtr_README.txt This file ocrxtr_manual.pdf Comprehensive user documentation ocrxtr_release_notes.txt Release notes and known issues ocrxtr_tips.txt Tips for improving OCR results and performance To create output containing detailed content and structural information, along with confidence values, font metrics, and formatting: ocrxtr_xdoc.txt Instructions for using the XDOC format core12xdc.pdf Scansoft XDOC documentation kdoctext.h Supplemental XDOC information from Scansoft For forms processing and custom document segmentation: ocrxtr_rdiff.txt Introduction to region description files and forms processing rdiff_sample/ Files to accompany the rdiff example Sample image files for testing OCR Shop XTR are included in /opt/Vividata/bin/. Sample command-lines for use with these files are found below, in the section "Running ocrxtr". Most documentation files are also available on Vividata's website at http://www.vividata.com/support_docs.html. For questions or purchase information: * Click Contact Sales at http://www.vividata.com/sales_contact.html * Call Vividata at (USA) 1-510-658-6587 To contact Vividata technical support: * Click Contact Support at http://www.vividata.com/support_contact.html to send us an email. * If you registered with Vividata, you may reply to the email containing your Vividata password to send us an email directly. * For OCR Shop XTR documentation and help, go to http://www.vividata.com/support_docs.html =============================================================================== =============================================================================== OCR Shop XTR Quickstart I Installation II Licensing III Running ocrxtr IV Troubleshooting =============================================================================== I OCR Shop XTR Quickstart: Installation Refer to Chapter 2 of the OCR Shop XTR manual for complete installation instructions. To install OCR Shop XTR: 1. Log in as root. 2. Run the distribution, which is available for download when you log into your Vividata website account. For example: ./ocrxtr-linux-6.0.0 Note: The name of your distribution file may differ based on OS and version. 3. If the installation completes with no errors, then OCR Shop XTR is now installed in /opt/Vividata/. If the installer reported errors, please contact Vividata (see above). If you encounter any errors, see "Troubleshooting" below. Now go to Section II to install your license key, or see below if you are installing under special circumstances. To install OCR Shop XTR in a custom directory: Before running the installer, set the environment variable VV_HOME to the directory where you wish to install OCR Shop XTR. To install OCR Shop XTR as a non-root user: If you are installing as a non-root user, you probably cannot install to the default directory, /opt/Vividata/, because you will not have permission. As a result, before running the installer, you must set the environment variable VV_HOME to the directory where you would like to install OCR Shop XTR. Then you may run the installer as described above, and it will place OCR Shop XTR in the directory specified by VV_HOME. =============================================================================== II OCR Shop XTR Quickstart: Licensing After installing OCR Shop XTR, you must install a license key before you may run the software. To obtain your license: You may download your license key from Vividata's website when you log into your account. Or you may have received it directly from Vividata by email. If you already have a license key, continue with "To install your license key." 1. Creating your license key * New users: If you do not have a Vividata account, go to www.vividata.com to register and evaluate OCR Shop XTR. The website will prompt you for the hostname and hardware id of your system during the registration process to create your evaluation license key. See "How to gather system information needed to create a license key" below. * Existing users: If you have a Vividata account but have not evaluated OCR Shop XTR before, log in and select "New Evaluation" to create an evaluation license key for OCR Shop XTR. * To extend an evaluation license or purchase OCR Shop XTR: Call Vividata at (U.S.A.) 1-510-658-6587, or click Contact Sales at www.vividata.com. We will create a new license key for you after you make your request or purchase. How to gather system information needed to create a license key: First, you need the hostname of the system where you plan to install OCR Shop XTR. Run this command to get the hostname of your machine: hostname Next, you need the hardware id of your machine. On Solaris, run this command: hostid On Linux, the MAC address is used for the hardware id. To find the MAC address, run this command, and look for the number after "Ethernet HWaddr": ifconfig eth0 When you create your license key, the website will prompt you for the hostname and machine id. 2. Downloading your license key: Your license keys are always available for download from Vividata's website. Log in at www.vividata.com and click "License Keys." Click "Download this key" to download the license you wish to install. To install your license key: When you recieve your license key, it is wrapped in a shell script that will install the key. 1. Log in as root and run the script to place the license key in /opt/Vividata/config/vvlicense.dat: su root sh key.sh 2. Your license key is now installed. Go to Section III to test OCR Shop XTR. See below if you installed under special circumstances or would like information about how licensing works. If you installed under special circumstances: * If you installed OCR Shop XTR in a directory other than the default /opt/Vividata: You must have the environment variable VV_HOME set to the Vividata install directory before running the license key install script. * If you installed OCR Shop XTR as a non-root user: You may also run the license key install script as a non-root user. Make sure that you set VV_HOME to the correct directory before installing the license key. How licensing works: After running the license key install script, your license key will be located in /opt/Vividata/config/vvlicense.dat, unless you set VV_HOME to a different installation directory. The license manager starts automatically when you first run OCR Shop XTR. The license manager will continue to run in the background after OCR Shop XTR exits. The process will be named "vvlicense" or "ocrxtr", depending on your system. Several license manager utilities are distributed with OCR Shop XTR and placed in /opt/Vividata/bin on installation: vvlmstop Resets the license manager. vvlmhostid Reports the machine id of your system. vvlmreread Has the currently running license manager reload the license file. vvlmstatus Reports the status of the currently running license manager. xtrcliKeyRead Decodes a license key string, listing the options the key enables, whether the key is permanent or expiring, and the host information. =============================================================================== III OCR Shop XTR Quickstart: Running ocrxtr To test OCR Shop XTR, log in as a normal user and run the command-line: /opt/Vividata/bin/ocrxtr /opt/Vividata/bin/letter.tif It should generate an output text file in the current directory called out.letter.001. Make sure you have write permission for this directory! To list all available command-line options, run: /opt/Vividata/bin/ocrxtr -help Commonly used, sample command-lines: The sample image files used below are installed in /opt/Vividata/bin/. * Specify a custom output filename; in this example, the output file will be called, "out.letter.txt": ocrxtr -out_text_name="out.%s.txt" letter.tif *** "%s" is replaced in the output filename by the input filename, without its extension. *** * Overwrite output files: ocrxtr -overwrite=y letter.tif * Improve recognition of a low contrast image by adjusting the threshold for converting it to binary data: ocrxtr -black_threshold=30 low_contrast.tif * Adjust the input image resolution to improve recognition of an image where the default dpi results in characters that fall outside the expected point range: ocrxtr -in_res=150 low_resolution.tif * Create an output PDF document containing a full page image of the original input file, with invisible, searchable recognized text underneath: ocrxtr -out_text_format=pdf letter.tif *** This PDF format is good for archiving recognized documents, because no information is lost when the original image data is retained in full. *** *** This PDF format is the default, but may be explicitely specified with the option, "-pdf_format=img_text". * Create an output PDF document that intermixes text regions (OCR results) and images (corresponding to image regions) in a recognstruction of the original image's formatting: ocrxtr -out_text_format=pdf -pdf_format=normal letter.tif *** This PDF format is useful for understanding how the OCR engine handles your input images, because it shows you both the quality of the recognized text, and how well the OCR engine understands the formatting of the input image. *** * Preserve color image data from an input PDF file when an output PDF file is created: ocrxtr -out_depth=24 -out_text_format=pdf letter_24bit.pdf *** Setting out_depth to 8 or 24 increases the memory usage and processing time for PDF or PS input files. *** *** If "out_depth" were not set in this example, the output bit depth would be 1, the default for input PDF or PS documents. For other input formats, the bit depth of the input image would be retained in the output document by default. *** * Adjust the output bit-depth to reduce output PDF filesize when the input file has a bit-depth of 8 or 24: ocrxtr -out_depth=1 -out_text_format=pdf letter_24bit.tif *** In this example, setting out_depth to 1 reduces the output PDF filesize from 506K to 63K. *** * Create detailed output that includes word confidence values and formatting information: ocrxtr -out_text_format=xdocplus -xdoc_word_confidence=y letter.tif * Recognize text in languages other than English: ocrxtr -language=french,german french_german.tif * Recognize a non-Latin 1 language that is intermixed with some Latin 1 characters: ocrxtr -language=russian -english_chars=y -out_text_format=unicode cyrillic_with_english.tif *** Make sure you view the output Unicode file in a text viewer that supports Unicode. *** Please refer to the OCR Shop XTR manual (ocrxtr_manual.pdf) for tutorials and more information about running OCR Shop XTR, and to ocrxtr_tips.txt for suggestions on improving recognition quality and processing time. =============================================================================== IV OCR Shop XTR Quickstart: Troubleshooting Installation errors If you are logged in as root, check that you have permission to install to /opt/Vividata or, if VV_HOME is set, to the custom directory specified by VV_HOME. If you are logged in as a non-root user, check that VV_HOME is set to a directory for which you have write permission. If the installation error, "Failed to extract files," is reported, the self-extracting executable may be truncated or corrupt. Try downloading the software from Vividata's website again, and verify that the download completes successfully. It may be helpful to use a different web browser if you continue to have trouble downloading the complete file. If you are still unable to install, please contact Vividata support (see above). Please send any error messages printed to the console, the log file /tmp/vvInstall.log, plus the results of these commands: md5sum ocrxtr-linux-6.0.0 ls -l ocrxtr-linux-6.0.0 uname -a Note: The name of your distribution file may differ based on OS and version. Licensing errors If OCR Shop XTR reports a licensing error, first reset the license manager and then run ocrxtr again: /opt/Vividata/bin/vvlmstop /opt/Vividata/bin/ocrxtr letter.tif If the licensing error persists, please verify that your license key matches your machine and has not expired: 1. Obtain your machine id by running: /opt/Vividata/bin/vvlmhostid 2. Decode your license key by running: /opt/Vividata/bin/xtrcliKeyRead -k where is the encoded string contained in your license file, /opt/Vividata/config/vvlicense.dat. 3. Look at the xtrcliKeyRead output and check that the machine id matches that printed out when you ran vvlmhostid. Verify that your license key has not expired. If your license key looks correct, you might check the owner and permissions of the license manager log: * The license manager log is created in /tmp/vvLicense.log when the license manager is started on the first run of ocrxtr. If the environment variable TMP or TMP_DIR was set to a different directory when the license manager was started, vvLicense.log will be placed in that directory. * If you do not own vvLicense.log and if you do not have write permissions for it, ocrxtr may have trouble running and may report a license error. * If you suspect this is the case, try removing vvLicense.log or changing the permissions of the file. After doing this, shut down the license manager and restart it by running ocrxtr again. For example: /opt/Vividata/bin/vvlmstop rm /tmp/vvLicense.log /opt/Vividata/bin/ocrxtr /opt/Vividata/bin/letter.tif If your license key seems incorrect or you are still unable to run the software after trying the above steps, please contact Vividata support (see above). OCR Shop XTR crashes and errors If OCR Shop XTR reports an error or crashes, please: 1. Run the same command-line again, with the added option "-debug=1" to turn on verbose debug output. 2. Contact Vividata support (see above) and send: - The command-line that caused the bug - The debug output and all error messages - The input image that triggered the problem, if possible - OCR Shop XTR version number (run "ocrxtr -version") - System description (run "uname -a") - Any further description or information that would help us identify the problem Poor OCR results If the results of OCR Shop XTR are poorer than you expect based on the appearance of the input image, please refer to ocrxtr_tips.txt. Please submit further questions to Vividata support (see above). =============================================================================== ===============================================================================