OCR Shop XTR 5.5.0 Release Notes
July 7, 2004


For additional information, please refer to ocrxtr_README.txt and the OCR Shop
XTR manual.  Both are installed along with OCR Shop XTR in /opt/Vividata/docs
and the manual is available at www.vividata.com under Customer Service.

For technical questions or purchase information, contact Vividata:

    * Click Contact Support at www.vividata.com
    * Call  1-510-658-6587 


===============================================================================
Contents:

    * New Documentation
    * Major Updates and Bug Fixes
    * Hidden Options
    * Known Issues and Usage Notes
    * Major Updates from Previous Releases

===============================================================================
===============================================================================
New Documentation


OCR Shop XTR now comes with a Quickstart guide to help you install and start
using OCR Shop XTR.  Please find the Quickstart guide in
/opt/Vividata/docs/ocrxtr_README.txt.


New documentation files included /opt/Vividata/docs:

    ocrxtr_README.txt               Installation and usage information
    ocrxtr_release_notes.txt        Release notes and known issues
    ocrxtr_tips.txt                 Tips for improving OCR results and performance

    To create output containing detailed content and structural information,
    along with confidence values, font metrics, and formatting:

        ocrxtr_xdoc.txt             Instructions for using the XDOC format
        core12xdc.pdf               Scansoft XDOC documentation
        kdoctext.h                  Supplemental XDOC information from Scansoft

    For forms processing and custom document segmentation:

        ocrxtr_rdiff.txt            Introduction to forms processing with OCR Shop XTR
        rdiff_sample/               Files to accompany the rdiff example


New sample input images included in /opt/Vividata/bin:

    cyrillic_with_english.tif
    letter_24bit.pdf
    letter_24bit.tif
    letter.tif
    low_resolution.tif
    french_german.tif
    letter_24bit.psd
    letter.pdf
    low_contrast.tif


===============================================================================
Major Updates and Bug Fixes


New features and behavior changes
---------------------------------

* On installation, the user may now set VV_HOME to any directory
  in order to install OCR Shop XTR there.  The default installation directory
  is /opt/Vividata.  In addition, a non-root user may install OCR Shop XTR, if
  VV_HOME is set to a directory for which the user has read/write permission.

* OCR Shop XTR no longer requires the user to set VV_HOME when running
  the software.  A user is only required to set VV_HOME if OCR Shop XTR is
  installed in a directory other than the default /opt/Vividata.

* The installer, license manager, and OCR Shop XTR application log and error
  messages have been expanded and improved.

* OCR Shop XTR now ignores the TIFF fillorder bit by default when a TIFF image
  is used as input.  If the user needs OCR Shop XTR to obey the TIFF fillorder
  bit, the environment variable "VV_IGNORE_FILLORDER" should be set to "n",
  "no", or "0", or the command-line parameter "ignore_tiff_fillorder" should
  be set to "n", "no", or "0".  Please see the OCR Shop XTR manual for more
  information.

  In previous versions of OCR Shop XTR, the TIFF fillorder bit was obeyed by
  default.  OCR Shop XTR now ignores the TIFF fillorder bit by default,
  because many TIFF writers set the fillorder bit incorrectly.  If an
  incorrectly set TIFF fillorder bit is obeyed when OCR Shop XTR reads a TIFF
  input file, the image data will not be recognizable as text.

* A signal handler now catches unexpected crashes in OCR Shop XTR, which
  enables OCR Shop XTR to return the license token and delete temporary files
  after a crash.  Previously, if OCR Shop XTR crashed, the user would need to
  reset the license manager by hand; this is no longer neccessary.

* Added a new command-line option "out_depth" to specify the bit depth of any
  output image data.  This affects PDF, HTML, and graphics output.  This
  option provides more control to the user over the filesize of PDF, HTML, and
  graphics output.

  For PDF and PS input, the default "out_depth" is 1 bit-per-pixel.  For all
  other input image formats, the default "out_depth" is the bit depth of the
  input image.

* Improved the default processing time and memory usage for input PDF and PS
  files, and provided the user with control over the processing time and
  memory usage for PDF and PS input.

  The memory usage and processing times associated with a PDF or PS input file
  are most affected by the bit depth at which the input file is rendered.  By
  default, PDF and PS input files are now rendered at 1-bit per pixel.
  Previously, they were rendered at 24-bits per pixel, resulting in a much
  larger amount of image data to store and process, and potentially causing
  memory or swap file problems for large, multipage input PDF and PS files.

  Now, the bit depth at which PDF and PS input files are rendered is based on
  the setting of the "out_depth" option, so that OCR Shop XTR retains only as
  much image depth information as need in output PDF, HTML, or graphics.

  When creating PDF, HTML, or graphics output, the user should set "out_depth"
  to 8 or 24 if they wish to preserve 8 or 24-bit image data from an input
  PDF or PS file, keeping in mind how it might affect memory usage and
  processing times.  If "out_depth" is not set, the default output bit depth
  will be 1, even if the input PDF or PS file contained 8 or 24-bit image
  data.

  The "out_depth" setting affects only how PDF and PS input files are
  rendered and the appearance of output image data.  It does not affect the
  recognition quality, because the OCR engine only processes 1-bit image data.

  Note:  The default "out_depth" when using other types of input images (JPEG,
  TIFF, etc) is the bit depth of the input image.

* The default for the output PDF format has been changed to "img_text" from
  "normal", because "img_text," where the full page original image appears
  with the invisible recognized text underneath, is most commonly used.

* Improved error messages if the license log file can not be opened.

* The resolution at which the input image is interpreted (controlled manually
  by the "in_res" option) now affects the appearance of PDF output.

* A new command-line option, "timeout", was added to provide the user with the
  ability to set a timer in seconds for recognition.


Bug fixes
---------

* The image data of input images with non-square resolutions, such as faxes,
  was rotated too much when deskewed for PDF, HTML, or graphics output,
  affecting the appearance of these output formats, but not the OCR results.

  Now the non-square input images are deskewed the correct amount in graphical
  output.

* A bug fix eliminates a periodic crash bug that occurred on creation of full
  page graphics output using the "output_whole_image" flag, if the image was
  deskewed.

* When the input image was deskewed, any PDF, HTML, or graphics output was
  converted to a bit-depth of 24 bits per pixel.  Now, the output bit-depth
  will match the input image's bit-depth.

  The format of the output image data will also affect the output bit-depth
  used, because, for instance, the JPEG format does not support a bit-depth of
  1.  In addition, the output bit-depth is also affected by the "out_depth"
  command-line option.

  The user should be aware that PDF and PS input files are considered to have
  a bit-depth of 1.

* For input images with a bit-depth of 8 or 24 bits per pixel, OCR Shop XTR
  could not create valid output graphics files in the tiff-g31d, tiff-g32d, or
  tiff-g42d formats.

  OCR Shop XTR now creates valid output graphics files in these formats in
  all cases.

* Fixed a "hidden" option, whole_page_image, to allow user to generate a
  full page image graphic.

* A segmentation fault occured when a text file was passed to OCR Shop XTR
  instead of an image file.

* Input PDF files with filenames that contain spaces are now processed
  successfully.

* OCR Shop XTR would finish without creating an output file or would crash
  when the user did not have permission to write the output file, temp files,
  or log files.

  OCR Shop XTR now quits gracefully with a coherent error message when the
  user does not have write permission.

* OCR Shop XTR did not always check whether the user's license permitted PDF
  and PS input.  Now when the the user sends a PDF or PS file as input, OCR
  Shop XTR always checks to ensure that PDF/PS input is licensed.  If PDF/PS
  input is not licensed, it will print an error message.

* Automatic orientation correction worked only on the first page of a
  multipage output PDF document.  Now, OCR Shop XTR correctly rotates every
  page of an output PDF document.

* In some cases, images were misplaced in PDF output or OCR Shop XTR crashed
  while creating the PDF output.  Images are now placed correctly in PDF
  output and OCR Shop XTR should not crash.

* A bus error (signal 10) occurred on some systems on exit when VV_HOME was
  set.  This bus error no longer occurs.

* Processing of certain image files in particular circumstances caused
  OCR Shop XTR to crash.  Now OCR Shop XTR processes these files successfully.

* Images were sometimes not placed correctly in PDF "normal" output.


===============================================================================
Hidden Options

These options are available for debugging purposes, but are not officially
supported and do not appear in the OCR Shop XTR documentation.


* "-debug=1" instructs ocrxtr to write verbose debug output to the console.
  Turn on this option if you encounter a problem, and wish to report it to
  Vividata.

* "-output_whole_image=y" allows the user to create graphics output of the
  entire input image.

* "-out_debug_files=y" will have ocrxtr create two output files for debug
  purposes only:  unconv_input_file and converted_input_file.
  "unconv_input_file" is created after the input file is read, placed in an
  internal data structure, and written back out as a TIFF image file.
  "converted_input_file" is created after ocrxtr converts the input image data
  to 1-bit image data prior to recognition; this output TIFF show what data is
  sent to the OCR engine for processing.  Both of these files are multipage
  TIFF files; ocrxtr appends to them on subsequent runs, rather than
  overwriting them.


===============================================================================
Known Issues and Usage Notes

* The "remove_halftone" option is not functional at this time.

* For GIF input files, the image dpi stored in the header is ignored in
  favor of the default resolution of 300x300.  If the in_res option is
  set, it will take precedence, as expected.

* With XDocPlus output, a temp file called "facp.@fa" may be created in the
  working directory and not cleaned up when OCR Shop XTR finishes. 

* Using a binary file in place of a text file will probably cause a bus
  error and OCR Shop XTR will crash.  Setting any of these command-line
  options to the name of a binary file will trigger this bug: image_list,
  image_rdiff_list, user_lexicon, read_params.
  
* Deskewing is very slow for large image output, either through embedded
  document output or direct image output.  Deskewing occurs when the input
  image is crooked and must be rotated slightly for the engine to properly
  recognize the text; it may be turned off with the parameter "-deskew=N".
  Image output with deskewing is only really slow when the image being
  output is about the same size as the original input document.  Outputting
  small image regions is reasonably fast.

* The user lexicon, defined by the file named in the "user_lexicon" option,
  and the character set, defined by the "char_set" option, may contain ASCII
  characters, interpreted with respect to the loaded code page.  The user
  lexicon and character do not support non-ASCII characters.


===============================================================================
Major Updates from Previous Releases

Release 5.1

* OCR Shop XTR now uses a new command-line based installer.

* OCR Shop XTR also now uses a new proprietary license manager.  Old
  FlexLM license keys will not work with the new license manager; please
  contact Vividata if you have questions about upgrading.
  http://www.vividata.com

* PDF output has been upgraded, with improved compression for smaller output
  filesizes and improved operation.

* The conversion of 8 and 24-bit images to 1-bit has been updated to
  provide better OCR results.

===============================================================================
===============================================================================