The OCR Shop XTR™/API is a software development kit for closely integrating optical character recognition capabilities in your own software. With support for Linux and UNIX operating systems, the OCR Shop XTR/API incorporates Vividata's powerful image processing technology with ScanSoft's optical character recognition technology to provide powerful and detailed document processing abilities.
Operating System Support:
| Sun Solaris™ SPARC® (Solaris™ 2.7+) |
| Linux™ x86 (Kernel 2.0 and higher) |
For questions about support of other Linux™ and UNIX® operating systems, including Mac OS ® X, please Contact Sales.
Licensing of features
All features are included with the development seat. However, for end user run-time licenses, some features and input/output formats are supported as add-on options and are not licensed in a basic run-time license key. Please Contact Sales (510-658-6587 Ext. 107) for pricing information and details.
Supported input formats
- GIF
- JPEG
- PBM
- PNG
- PPM
- PostScript®
- Rasterfile
- SGI®-RGB
- TIFF
- XWD
- X11
Supported languages and character sets
Text recognition uses a character set and one or more languages. English is the default language and all available languages are included with the development seat. For run-time licenses, additional languages are available as add-on language packs. Multiple languages may be specified for a single document, as long as they use the same character set.
Character sets:
The character set defines the shape of the letters for a particular language.
- Only one character set may be used at a time, based on the chose language.
- English characters are an exception, and may be recognized in conjunction with any character set.
Languages:
- A document using a language with a dictionary is recognized based on the character shapes and with reference to the dictionary file specific to the chosen language.
- A document using a language without a dictionary is recognized based on the character shapes alone, without reference to a dictionary file.
| Languages with dictionaries: | |
| czech (Central Europe - 1250) danish (Latin 1 - 1252) dutch (Latin 1 - 1252) english (Latin 1 - 1252) finnish (Latin 1 - 1252) french (Latin 1 - 1252) german (Latin 1 - 1252) greek (Greek - 1253) hungar (Hungarian; Central Europe - 1250) italian (Latin 1 - 1252) norsk (Norwegian; Latin 1 - 1252) polish (Central Europe - 1250) port (Portuguese; Latin 1 - 1252) russian (Cyrillic -1251) spanish (Latin 1 - 1252) swedish (Latin 1 - 1252) turkish (Turkish - 1254) |
|
| Languages without dictionaries: | |
| romanian (Central Europe - 1250) estonian (Baltic - 1257) afrikaans (Latin 1 - 1252) albanian (Central Europe - 1250) aymara (Latin 1 - 1252) basque (Latin 1 - 1252) breton (Latin 1 - 1252) bulgarian (Cyrillic - 1251) byelorussian (Cyrillic - 1251) croatian (Central Europe - 1250) faroese (Latin 1 - 1252) flemish (Latin 1 - 1252) friulian (Latin 1 - 1252) gaelic (Latin 1 - 1252) galician (Latin 1 - 1252) greenlandic (Latin 1 - 1252) hawaiian (Baltic - 1257) icelandic (Latin 1 - 1252) indonesian (Latin 1 - 1252) kurdishlat (Kurdish Latin; Turkish - 1254) latin (Latin 1 - 1252) latvian (Baltic - 1257) lithuanian (Baltic - 1257) sorbianl (Sorbian - Lower; Central Europe - 1250) macedonianc (Cyrillic - 1251) malaysian (Latin 1 - 1252) piginenglish (Latin 1 - 1252) serbian (Cyrillic - 1251) ukranian (Cyrillic - 1251) catalan (Latin 1 - 1252) sbcroatian (Serbo-Croation; Central Europe - 1250) slovak (Central Europe - 1250) slovenian (Central Europe - 1250) swahili (Latin 1 - 1252) tahitian (Latin 1 - 1252) sorbianu (Sorbian - Upper; Central Europe - 1250) welsh (Latin 1 - 1252) frisianw (Frisian - West; Latin 1 - 1252) zulu (Latin 1 - 1252) |
|
Supported output text document formats:
- iso text
- 8bit text
- Unicode text
- XDOC (see Overview of the XDOC Output Format below)
- html (to be supported in a future release)
Supported output subimage (graphical) formats:
- tiff
- ras
- epsf
- x11
- tiff-pack
- tiff-g31d
- tiff-g32d
- tiff-g42d
- pal-tiff
- jpeg
- png
- xwd
- rgb
- rgb-rle
Licensing:
A development seat consists of:
- One copy of the API distribution, including libraries, headers, and sample code.
- Five instances of the software per development seat license.
- One point of contact, meaning one developer working with the API who can contact Vividata™ for support in using the API.
Development seat licenses are regulated by the terms of the contract.
Run-time licenses are regulated via license keys installed on the target system. The keys specify which options are available on a system, along with the number of concurrent instances permitted. Keys are added and updated on a system using a provided utility.
Licensing controls:
- Restriction of the software to the machine ID of the computer (on x86 systems, the MAC address of one of the ethernet cards is used, on SPARC® systems, the hostid is used)
- The number of concurrent instances that will run on that computer
- The languages licensed for that computer
- The features licensed
for that computer (including PDF/PostScript® in, PDF output,
and HTML output)
Overview of the XDOC Output Format
The XDOC output format is a ScanSoft text format which provides detailed information about the text, fonts, images, and formatting in a recognized document, as well as coordinate and confidence values for both characters and words.
OCR Shop XTR offers three types of XDOC output, as specified
by the "dm_out_text_format" option:
| vvTextFormatXdoc | Enhanced XDOC format |
| vvTextFormatXDoclite | XDOC format with no format analysis |
| vvTextFormatXdocplus | XDOC format with style sheet data |
Use the following options to include confidence and bounding
box information in the XDOC output:
| dm_xdc_wconf | Output word confidences in XDOC |
| dm_xdc_cconf | Output character confidences in XDOC |
| dm_xdc_wbox | Output word bounding boxes in XDOC |
| dm_xdc_cbox | Output character bounding boxes in XDOC |
Additional options related to XDOC output include:
| dm_xdc_wbox_pixels | Use pixel coordinates for word bounding boxes in XDOC |
| dm_no_hdr_ftr | Do not label headers and footers |
| dm_accept_thresh | Acceptibility threshold (number corresponds to the confidence values seen in XDOC output) |
| dm_quest_thresh | Questionability threshold |
Please contact Vividata Support for
detailed information about the XDOC format and documentation
needed for XDOC parsing.

