DjVu

This article is about a computer file format. For a computer-assisted translation software tool, see Déjà Vu (software).

DjVu

Filename extension	`.djvu, .djv`
Internet media type	`image/vnd.djvu, image/x-djvu`
Type code	DJVU
Developed by	AT&T Labs – Research
Initial release	1998 (1998)
Latest release	Version 26^[1] (June 2006 (2006-06))
Type of format	Image file formats
Open format?	GNU GPLv2 for DjVu Reference Library and DjVuLibre-3.5; License grants under the GNU GPL for several patents that cover aspects of the library^[2]
Website	www.djvu.org

DjVu (/ˌdeɪʒɑːˈvuː/ DAY-zhah-VOO,^[3] like French: déjà vu [deʒavy]) is a computer file format designed primarily to store scanned documents, especially those containing a combination of text, line drawings, indexed color images, and photographs. It uses technologies such as image layer separation of text and background/images, progressive loading, arithmetic coding, and lossy compression for bitonal (monochrome) images. This allows high-quality, readable images to be stored in a minimum of space, so that they can be made available on the web.

DjVu has been promoted as an alternative to PDF, promising smaller files than PDF for most scanned documents.^[4] The DjVu developers report that color magazine pages compress to 40–70 kB, black-and-white technical papers compress to 15–40 kB, and ancient manuscripts compress to around 100 kB; a satisfactory JPEG image typically requires 500 kB.^[5] Like PDF, DjVu can contain an OCR text layer, making it easy to perform copy and paste and text search operations.

Free browser plug-ins and desktop viewers from different developers are available from the djvu.org website. DjVu is supported by a number of multi-format document viewers and e-book reader software on Linux (Okular, Evince) and Windows (SumatraPDF).

History

The DjVu technology was originally developed^[5] by Yann LeCun, Léon Bottou, Patrick Haffner, and Paul G. Howard at AT&T Labs from 1996 to 2001.

Due to its declared higher compression ratio (and thus smaller file size) and the ease of converting large volumes of text into DjVu format, and because it is an open file format, some independent technologists (such as Brewster Kahle^[6]) have historically considered it superior to PDF.

The DjVu library distributed as part of the open-source package DjVuLibre has become the reference implementation for the DjVu format. DjVuLibre has been maintained and updated by the original developers of DjVu since 2002.^[7]

The DjVu file format specification has gone through a number of revisions:

Revision history
Support status	Version	Release date	Notes
Unsupported	1–19^[1]	1996–1999	Developmental versions by AT&T labs preceding the sale of the format to LizardTech.
Unsupported	Version 20 ^[1]	April 1999	DjVu version 3. DjVu changed from a single-page format to a multipage format.
Older, still supported	Version 21^[1]	September 1999	Indirect storage format replaced. The searchable text layer was added.
Older, still supported	Version 22^[1]	April 2001	Page orientation, color JB2
Unsupported	Version 23^[1]	July 2002	CID chunk
Unsupported	Version 24^[1]	February 2003	LTAnno chunk
Older, still supported	Version 25^[1]	May 2003	NAVM chunk. Support for DjVu bookmarks (outlines) was added. Changes made by Versions 23 and 24 were made obsolete.
Current	Version 26^[1]	April 2005	Text/line annotations

Technical overview

File structure

The DjVu file format is based on the Interchange File Format and is composed of hierarchically organized chunks. The IFF structure is preceded by a 4-byte AT&T magic number. Following is a single FORM chunk with a secondary identifier of either DJVU or DJVM for a single-page or a multi-page document, respectively.

Chunk types

Chunk identifier	Contained by	Description
FORM:DJVU	FORM:DJVM	Describes a single page. Can either be at the root of a document and be a single-page document or referred to from a `DIRM` chunk.
FORM:DJVM	N/A	Describes a multi-page document. Is the document's root chunk.
FORM:DJVI	FORM:DJVM	Contains data shared by multiple pages.
FORM:THUM	FORM:DJVM	Contains thumbnails.
INFO	FORM:DJVU	Must be the first chunk. Describes the page width, height, format version, DPI, gamma, and rotation.
DIRM	FORM:DJVM	Must be the first chunk. References other `FORM` chunks. These chunks can either follow this chunk inside the `FORM:DJVM` chunk or be contained in external files. These types of documents are referred to as bundled or indirect, respectively.
NAVM	FORM:DJVM	If present, must immediately follow the `DIRM` chunk. Contains a BZZ-compressed outline of the document.

Compression

DjVu divides a single image into many different images, then compresses them separately. To create a DjVu file, the initial image is first separated into three images: a background image, a foreground image, and a mask image. The background and foreground images are typically lower-resolution color images (e.g., 100 dpi); the mask image is a high-resolution bilevel image (e.g., 300 dpi) and is typically where the text is stored. The background and foreground images are then compressed using a wavelet-based compression algorithm named IW44.^[5] The mask image is compressed using a method called JB2 (similar to JBIG2). The JB2 encoding method identifies nearly identical shapes on the page, such as multiple occurrences of a particular character in a given font, style, and size. It compresses the bitmap of each unique shape separately, and then encodes the locations where each shape appears on the page. Thus, instead of compressing a letter "e" in a given font multiple times, it compresses the letter "e" once (as a compressed bit image) and then records every place on the page it occurs.

Optionally, these shapes may be mapped to UTF-8 codes (either by hand or potentially by a text recognition system), and stored in the DjVu file. If this mapping exists, it is possible to select and copy text.

Since JBIG2 was based on JB2, both compression methods have the same problems when performing lossy compression. Numbers may be substituted with similar looking numbers (such as replacing 6 with 8) if the text was scanned at a low DPI prior to lossy compression.

Format licensing

DjVu is an open file format with patents.^[4] The file format specification is published, as well as source code for the reference library.^[4] The original authors distribute an open-source implementation named "DjVuLibre" under the GNU General Public License. The rights to the commercial development of the encoding software have been transferred to different companies over the years, including AT&T Corporation, LizardTech, Celartem and Cuminas.

Support

SumatraPDF (Windows) among others can manipulate DjVu files.

In 2002, the DjVu file format was chosen by the Internet Archive as a format in which its Million Book Project provides scanned public domain books online (along with TIFF and PDF).^[8]

Wikimedia Commons, a media repository used by Wikipedia among others, conditionally permits PDF and DjVu media files.^[9]

References

1 2 3 4 5 6 7 8 9 DjVu File Format Version, By Jim Rile, Posted: Fri Feb 23, 2007 1:08 am, PlanetDjVu
↑ "DjVu Licensing". DjVu Sourceforge page. Sourceforge.net. 2011-08-17. Retrieved 2011-09-21.
↑ "DjVu Technology". Cuminas. Retrieved 2014-02-12.
1 2 3 "What is DjVu – DjVu.org". DjVu.org. Retrieved 2009-03-05.
1 2 3 Léon Bottou; Patrick Haffner; Paul G. Howard; Patrice Simard; Yoshua Bengio; Yann Le Cun (1998). "High Quality Document Image Compression with DjVu, 7(3):410–425" (PDF). Journal of Electronic Imaging.
↑ Brewster Kahle (December 16, 2004). "Universal Access to All Knowledge" (Audio; Speech at 1h:31m:20s). Conversations Network.
↑ http://djvu.sourceforge.net/
↑ "Image file formats – OLPC". Wiki.laptop.org. Retrieved 2008-09-09.
↑ PDF and DjVu

External links

Wikimedia Commons has media related to DjVu.

"The premier menu for DjVu resources" (status of the site, which is maintained by an anonymous webmaster, is unclear)
DjVuLibre site
Jakub Wilk's pdf2djvu and other DjVu tools
Poliqarp for DjVu search engine and other DjVu tools
Why won't Google index DjVu files after all this time? – topic on PlanetDjVu
Any2Djvu Server - online document converter
Cuminas Software Downloads
Table of Djvu Programmes (Russian)

Multi-purpose office document file formats

Editable document formats	Compound Document Format Microsoft Office XML formats Office Open XML Open Document Architecture OpenDoc OpenDocument OpenOffice.org XML Revisable-Form Text Rich Text Format Uniform Office Format Word Document

Fixed document formats	DjVu Envoy Open XML Paper Specification Portable Document Format

Related topics	Character encoding ASCII Unicode TeX

Graphics file formats

Raster	ANI ANIM APNG ART BMP BPG BSAVE CAL CIN CPC CPT DDS DPX ECW EXR FITS FLIC FLIF FPX GIF HDRi HEVC ICER ICNS ICO / CUR ICS ILBM JBIG JBIG2 JNG JPEG JPEG 2000 JPEG-LS JPEG XR KRA MNG MIFF NRRD ORA PAM PBM / PGM / PPM / PNM PCX PGF PICtor PNG PSD / PSB PSP QTVR RAS RBE JPEG-HDR Logluv TIFF SGI TGA TIFF TIFF/EP TIFF/IT UFO/ UFP WBMP WebP XBM XCF XPM XWD

Raw	CIFF DNG

Vector	AI CDR CGM DXF EVA EMF Gerber HVIF IGES PGML SVG VML WMF Xar

Compound	CDF DjVu EPS PDF PICT PS SWF XAML

Related	Exchangeable image file format (Exif) Extensible Metadata Platform (XMP)

Category Comparison

This article is issued from Wikipedia - version of the 11/26/2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.