Quickstart Guide to Region Description Files


This quickstart guide provides an overview and example of working with region
description files (rdiff files), for example, used in forms processing
applications.  Refer to the OCR Shop XTR manual for futher details on the
rdiff format.

    Contents:

        What is an rdiff file?
        What is contained in an rdiff file?
        Uses for rdiff files
        Sample
        More on creating rdiff files and rdiff syntax


What is an rdiff file?

    A region description file, or "rdiff" file, describes the coordinates
    and properties of each region of an input image.  

    A region is an area that contains only text or only pictures, and has
    certain associated properties such as coordinates.

    When passed as input, an rdiff file allows the user to define custom
    regions for OCR Shop XTR to use during processing.

    The user may also generate an output rdiff file in order to gain
    information about how OCR Shop XTR segmented the input image.
    

What is contained in an rdiff file?

    An rdiff file describes all the regions of an input document.  It may be
    created by OCR Shop XTR as output, or created by a user and passed to OCR
    Shop XTR as input.
    
    Most regions are specified as a text or image region, indicating how the
    OCR engine treats and processes them:
        
        * Text regions are recognized and used to generate text output.
        * The OCR engine does not recognize image regions, but it will use
          them in PDF, HTML, or graphics output.
    
    A region's primary feature is the area of the input image that it covers.
    This area may overlap other regions, and it may contain any number of
    rectangles, that taken as a set, define the area of the region.  This set
    of rectangles is called a "union of rectangles" or "UOR".

    Text regions in particular may contain additional descriptive details
    about the line height and stroke width of the characters, for example.

    The rdiff file also describes the order in which each region is written to
    an output file, and indicates which region is visible if regions overlap.
    
    Typically regions are defined automatically in the auto-segmentation step.
    However, by passing an rdiff file as input and turning off
    auto-segmentation, a user may define custom regions.


Rdiff files are most useful for:

    * Forms processing: Permitting the user to specify custom regions to
      process a large number of similar documents.

    * Providing information about image segmentation and region properties to
      the user.


Sample:

    The files used and generated for this sample are installed in:
        /opt/Vividata/bin/letter.tif
        /opt/Vividata/docs/rdiff_sample/*

    If you wish to create an rdiff file for use in forms processing, for
    example, the first step is having OCR Shop XTR generate an rdiff file to
    use as a starting point.  Pass OCR Shop XTR a sample image representative
    of the form you plan to process:

        ocrxtr -out_rdiff=initial.rdiff letter.tif

    OCR Shop XTR will create an rdiff file called "initial.rdiff".  The file
    in this example may be found in /opt/Vividata/docs/initial.rdiff, and the
    first few lines look like:

        BEGIN_IMAGE
        FILENAME  /w/kath/letter.tif
        IMAGE_WIDTH 2592
        IMAGE_HEIGHT 3412
        IMAGE_XRES 300
        IMAGE_YRES 300
        END_IMAGE
    
    To help you understand the rdiff file and regions, we suggest running OCR
    Shop XTR on the same image and generating text output by region:

        ocrxtr -output_text_by_region=y letter.tif

    OCR Shop XTR will create one output text file per text region, where each
    filename contains the corresponding region number.  If you view the text
    output files, the rdiff file initial.rdiff, and letter.tif, you can match
    up the text regions described in the initial.rdiff with the areas in the
    original image and the output in the text files.

    Make a copy of initial.rdiff to create the rdiff file that you will
    customize:

        cp initial.rdiff custom.rdiff

    Open custom.rdiff in a text editor.  In this example, use the rdiff
    file to tell OCR Shop XTR to recognize only the footer.  In custom.rdiff,
    eliminate all the regions except the one region we are interested in.
    Update the summary information in the rdiff file to reflect these changes.
    The edited custom.rdiff file looks like this:

        BEGIN_IMAGE 
        IMAGE_WIDTH 2592 
        IMAGE_HEIGHT 3412 
        IMAGE_XRES 300 
        IMAGE_YRES 300 
        END_IMAGE
 
        BEGIN_SUMMARY 
        TOTAL_REGIONS 1 
        TEXT_REGIONS 1 
        IMAGE_REGIONS 0 
        ORDER_FLAGS ANY 
        END_SUMMARY
 
        BEGIN_REGIONS  
 
        R_TYPE TEXT 
        R_SUBTYPE FOOTER 
        R_NUMBER 11 
        R_OUT_ORDER 2 
        LINE_HEIGHT 32 
        N_LINES 1 
        STROKE_WIDTH 4 
        ITALICNESS 2 
        ITALICNESS_CONFIDENCE 99 
        R_UOR_LIST 1 
        2808 2855 304 1527 

    Notice that we removed the information specific to the sample input file
    in the BEGIN_IMAGE/END_IMAGE section.  The FILENAME, IMAGE_WIDTH,
    IMAGE_HEIGHT, and RESOLUTION were specific to the sample image, and in our
    example, we anticipate that these values will vary with subsequent input
    files.

    Refer to Chaper 6 of the OCR Shop XTR manual for an exaplanation of each
    line in the rdiff file.

    Here are some other changes that could potentially be made to an rdiff
    file:

        * Eliminate further detailed information, such as a region's
          expected stroke width or line height, based on the expected
          consistency of the input files.

        * Change the area covered by the region we are recognizing, by
          modifying the listed coordinates or adding more rectangles to the
          union of rectangles (UOR).

        * Add additional image or text regions.  For example, open the
          sample input file in a graphical viewer to determine the coordinates
          of the rectangles covered by the new region, then add a section for
          the new region and update the summary information.

        * If multiple regions are defined in the rdiff file, control the order
          they appear in the output by adjusting R_OUT_ORDER.

    Now use custom.rdiff as input to recognize just the footer of any number
    of similarly formatted input files.  First test with the sample
    letter.tif.
    
    In order to use an rdiff file as input to OCR Shop XTR, create a text
    file, rdifflist.txt, that specifies the rdiff file to be used with each
    input file:

        letter.tif custom.rdiff
    
    This list permits you to create, for example, several rdiff files based on
    several different forms, then to match each input file to the rdiff file
    corresponding to the correct form.
    
    Now run OCR Shop XTR using rdifflist.txt:

        ocrxtr -image_rdiff_list=rdifflist.txt -auto_segment=n

    Turn auto-segmentation off so that OCR Shop XTR strictly obeys the rdiff
    file passed as input.

    After OCR Shop XTR completes, view the created output file,
    out.letter.001, which contains only the footer, as specified in the rdiff
    file:

        Reprinted from PC Magazine, © Ziff Davis Publishing Co., January 1998

    The next step would be to edit rdifflist.txt again and list all the images
    you wish to process in the same manner, then run OCR Shop XTR again with
    rdifflist.txt.


More on creating rdiff files and rdiff syntax

    Rdiff files serve two purposes:  They are passed as input to ocrxtr and
    are generated as output from ocrxtr.  

    Forms processing applications usually involve the first case, where an
    input rdiff file is needed to process a large number of files with similar
    layouts.  Generating an rdiff file as output from ocrxtr is often the
    easiest way for someone unfamiliar with the rdiff format to get started
    creating their own rdiff file to use as input.

    However, when you generate an output rdiff file to use as a template for
    your input rdiff file, it will contain a large number of fields that are
    not useful in input rdiff files.

    The minimum header fields needed in an input rdiff file is:

    BEGIN_IMAGE 
    END_IMAGE
 
    BEGIN_SUMMARY 
    TOTAL_REGIONS 
    TEXT_REGIONS
    IMAGE_REGIONS
    END_SUMMARY

    The minimum fields needed to describe each region are:

    R_TYPE
    R_NUMBER
    R_OUT_ORDER
    R_UOR_LIST (includes the UOR count and the UOR list)

    In most cases these fields provide all the information needed for forms
    processing.  When creating an input rdiff file, it is best to start with
    just these fields.  Then add more detail only if you specifically need it.

    The region subtype might be useful if you need to provide the OCR engine
    with more detailed information about a particular image region, or
    formatting information for a text region, for instance.  The region
    subtype (R_SUBTYPE) provides further detail about a region type.  Valid
    values for R_SUBTYPE are:

        REGION_UNFLAVORED
        REGION_TABLE
        REGION_TABLE_INSET
        REGION_HEADLINE
        REGION_TIMESTAMP
        REGION_LINEART
        REGION_HALFTONE
        REGION_INSET
        REGION_CAPTION
        REGION_PAGE_FOOTER
        REGION_PAGE_HEADER
        REGION_VRULING
        REGION_HRULING
        REGION_NOISE

    It is unlikely that other rdiff options that appear in an output rdiff
    file would be useful in an input rdiff file.  For example, STROKE_WIDTH is
    an estimate of the median stroke width of the characters in a region.  It
    is intended for use when generating output rdiff files, so that you as the
    user see what the OCR engine found to be the median stroke width.  It is
    not intended for use in an input rdiff file.