Quickstart Guide to Region Description Files This quickstart guide provides an overview and example of working with region description files (rdiff files), for example, used in forms processing applications. Refer to the OCR Shop XTR manual for futher details on the rdiff format. Contents: What is an rdiff file? What is contained in an rdiff file? Uses for rdiff files Sample More on creating rdiff files and rdiff syntax What is an rdiff file? A region description file, or "rdiff" file, describes the coordinates and properties of each region of an input image. A region is an area that contains only text or only pictures, and has certain associated properties such as coordinates. When passed as input, an rdiff file allows the user to define custom regions for OCR Shop XTR to use during processing. The user may also generate an output rdiff file in order to gain information about how OCR Shop XTR segmented the input image. What is contained in an rdiff file? An rdiff file describes all the regions of an input document. It may be created by OCR Shop XTR as output, or created by a user and passed to OCR Shop XTR as input. Most regions are specified as a text or image region, indicating how the OCR engine treats and processes them: * Text regions are recognized and used to generate text output. * The OCR engine does not recognize image regions, but it will use them in PDF, HTML, or graphics output. A region's primary feature is the area of the input image that it covers. This area may overlap other regions, and it may contain any number of rectangles, that taken as a set, define the area of the region. This set of rectangles is called a "union of rectangles" or "UOR". Text regions in particular may contain additional descriptive details about the line height and stroke width of the characters, for example. The rdiff file also describes the order in which each region is written to an output file, and indicates which region is visible if regions overlap. Typically regions are defined automatically in the auto-segmentation step. However, by passing an rdiff file as input and turning off auto-segmentation, a user may define custom regions. Rdiff files are most useful for: * Forms processing: Permitting the user to specify custom regions to process a large number of similar documents. * Providing information about image segmentation and region properties to the user. Sample: The files used and generated for this sample are installed in: /opt/Vividata/bin/letter.tif /opt/Vividata/docs/rdiff_sample/* If you wish to create an rdiff file for use in forms processing, for example, the first step is having OCR Shop XTR generate an rdiff file to use as a starting point. Pass OCR Shop XTR a sample image representative of the form you plan to process: ocrxtr -out_rdiff=initial.rdiff letter.tif OCR Shop XTR will create an rdiff file called "initial.rdiff". The file in this example may be found in /opt/Vividata/docs/initial.rdiff, and the first few lines look like: BEGIN_IMAGE FILENAME /w/kath/letter.tif IMAGE_WIDTH 2592 IMAGE_HEIGHT 3412 IMAGE_XRES 300 IMAGE_YRES 300 END_IMAGE To help you understand the rdiff file and regions, we suggest running OCR Shop XTR on the same image and generating text output by region: ocrxtr -output_text_by_region=y letter.tif OCR Shop XTR will create one output text file per text region, where each filename contains the corresponding region number. If you view the text output files, the rdiff file initial.rdiff, and letter.tif, you can match up the text regions described in the initial.rdiff with the areas in the original image and the output in the text files. Make a copy of initial.rdiff to create the rdiff file that you will customize: cp initial.rdiff custom.rdiff Open custom.rdiff in a text editor. In this example, use the rdiff file to tell OCR Shop XTR to recognize only the footer. In custom.rdiff, eliminate all the regions except the one region we are interested in. Update the summary information in the rdiff file to reflect these changes. The edited custom.rdiff file looks like this: BEGIN_IMAGE IMAGE_WIDTH 2592 IMAGE_HEIGHT 3412 IMAGE_XRES 300 IMAGE_YRES 300 END_IMAGE BEGIN_SUMMARY TOTAL_REGIONS 1 TEXT_REGIONS 1 IMAGE_REGIONS 0 ORDER_FLAGS ANY END_SUMMARY BEGIN_REGIONS R_TYPE TEXT R_SUBTYPE FOOTER R_NUMBER 11 R_OUT_ORDER 2 LINE_HEIGHT 32 N_LINES 1 STROKE_WIDTH 4 ITALICNESS 2 ITALICNESS_CONFIDENCE 99 R_UOR_LIST 1 2808 2855 304 1527 Notice that we removed the information specific to the sample input file in the BEGIN_IMAGE/END_IMAGE section. The FILENAME, IMAGE_WIDTH, IMAGE_HEIGHT, and RESOLUTION were specific to the sample image, and in our example, we anticipate that these values will vary with subsequent input files. Refer to Chaper 6 of the OCR Shop XTR manual for an exaplanation of each line in the rdiff file. Here are some other changes that could potentially be made to an rdiff file: * Eliminate further detailed information, such as a region's expected stroke width or line height, based on the expected consistency of the input files. * Change the area covered by the region we are recognizing, by modifying the listed coordinates or adding more rectangles to the union of rectangles (UOR). * Add additional image or text regions. For example, open the sample input file in a graphical viewer to determine the coordinates of the rectangles covered by the new region, then add a section for the new region and update the summary information. * If multiple regions are defined in the rdiff file, control the order they appear in the output by adjusting R_OUT_ORDER. Now use custom.rdiff as input to recognize just the footer of any number of similarly formatted input files. First test with the sample letter.tif. In order to use an rdiff file as input to OCR Shop XTR, create a text file, rdifflist.txt, that specifies the rdiff file to be used with each input file: letter.tif custom.rdiff This list permits you to create, for example, several rdiff files based on several different forms, then to match each input file to the rdiff file corresponding to the correct form. Now run OCR Shop XTR using rdifflist.txt: ocrxtr -image_rdiff_list=rdifflist.txt -auto_segment=n Turn auto-segmentation off so that OCR Shop XTR strictly obeys the rdiff file passed as input. After OCR Shop XTR completes, view the created output file, out.letter.001, which contains only the footer, as specified in the rdiff file: Reprinted from PC Magazine, © Ziff Davis Publishing Co., January 1998 The next step would be to edit rdifflist.txt again and list all the images you wish to process in the same manner, then run OCR Shop XTR again with rdifflist.txt. More on creating rdiff files and rdiff syntax Rdiff files serve two purposes: They are passed as input to ocrxtr and are generated as output from ocrxtr. Forms processing applications usually involve the first case, where an input rdiff file is needed to process a large number of files with similar layouts. Generating an rdiff file as output from ocrxtr is often the easiest way for someone unfamiliar with the rdiff format to get started creating their own rdiff file to use as input. However, when you generate an output rdiff file to use as a template for your input rdiff file, it will contain a large number of fields that are not useful in input rdiff files. The minimum header fields needed in an input rdiff file is: BEGIN_IMAGE END_IMAGE BEGIN_SUMMARY TOTAL_REGIONS TEXT_REGIONS IMAGE_REGIONS END_SUMMARY The minimum fields needed to describe each region are: R_TYPE R_NUMBER R_OUT_ORDER R_UOR_LIST (includes the UOR count and the UOR list) In most cases these fields provide all the information needed for forms processing. When creating an input rdiff file, it is best to start with just these fields. Then add more detail only if you specifically need it. The region subtype might be useful if you need to provide the OCR engine with more detailed information about a particular image region, or formatting information for a text region, for instance. The region subtype (R_SUBTYPE) provides further detail about a region type. Valid values for R_SUBTYPE are: REGION_UNFLAVORED REGION_TABLE REGION_TABLE_INSET REGION_HEADLINE REGION_TIMESTAMP REGION_LINEART REGION_HALFTONE REGION_INSET REGION_CAPTION REGION_PAGE_FOOTER REGION_PAGE_HEADER REGION_VRULING REGION_HRULING REGION_NOISE It is unlikely that other rdiff options that appear in an output rdiff file would be useful in an input rdiff file. For example, STROKE_WIDTH is an estimate of the median stroke width of the characters in a region. It is intended for use when generating output rdiff files, so that you as the user see what the OCR engine found to be the median stroke width. It is not intended for use in an input rdiff file.