hOCR
hOCR is an open standard of data representation for formatted text obtained from optical character recognition (OCR). The definition encodes text, style, layout information, recognition confidence metrics and other information using Extensible Markup Language (XML) in form of Hypertext Markup Language (HTML) or XHTML.[1]
Applications
Software that utilizes this format includes:
See also
- ALTO (XML) -- another OCR data representation format
References
- ↑ Breuel, T. (2007-09-01). "The hOCR Microformat for OCR Workflow and Results". Ninth International Conference on Document Analysis and Recognition (ICDAR 2007). 2: 1063–1067. doi:10.1109/ICDAR.2007.4377078.
External links
- specification of current version 1.2
- hocr-tools on GitHub
- moz-hocr-edit hOCR document editor
This article is issued from Wikipedia - version of the 9/30/2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.