Overview of the XDOC output format in OCR Shop XTR The XDOC format is a ScanSoft text output format which provides detailed information about the text, images, and formatting in a recognized document. OCR Shop XTR offers three types of XDOC output, as specified in the "out_text_format" parameter: xdoc Enhanced XDOC format xdoclite XDOC format with no format analysis xdocplus XDOC format with style sheet data Several additional OCR Shop XTR parameters control confidence and bounding box information: xdoc_word_confidence Output word confidences in XDOC xdoc_char_confidence Output character confidences in XDOC xdoc_word_coords Output word bounding boxes in XDOC xdoc_char_coords Output character bounding boxes in XDOC In addition, OCR Shop XTR offers these options, which are not officially supported: xdoc_word_pixels Use pixel coordinates for word bounding boxes in XDOC no_header_footer Do not label headers and footers accept_thresh Acceptibility threshold (number corresponds to the confidence values seen in XDOC output) quest_thresh Questionability threshold Please see the documents core12xdc.pdf and kdoctext.h for detailed information about the XDOC format. These documents provide enough information to be able to parse the XDOC format.