Comparison of archive formats

There are many popular computer data archive formats for creating and maintaining archive files. The tables below compare many popular archive formats.

Features

The table compares various features column-by-column in the table below:

Purpose

Archive formats are used for backups, mobility, and archiving. Many archive formats compress the data to consume less storage space and result in quicker transfer times as the same data is represented by fewer bytes. Another benefit is that files are combined into one archive file which has less overhead for managing or transferring.

There are numerous compression algorithms available to losslessly compress archived data and some algorithms work better (smaller archive or faster compression) with particular data types.

Archive formats are also used by most operating systems to package software for easier distribution and installation than binary executables.

Filename extension

The DOS and Windows operating systems required filenames to include a three-character extension to identify the file type and use. Filename extensions must be unique for each type of file. Many operating systems identify a file's type from its contents without the need for an extension in its name. However, the use of three-character extensions has been embraced as a useful and efficient shorthand for identifying file types—both for computer software, and for humans.

Integrity check

Archive files are often stored on magnetic media, which is subject to data storage errors. Early tape media had a higher rate of errors than is expected for magnetic media today. Many archive formats contain extra data embedded in the files in order to detect data storage or transmission errors, and the software used to read the archive files contain logic to detect errors.

Recovery record

Many archive formats contain redundant data embedded in the files in order to detect data storage or transmission errors, and the software used to read the archive files contain logic to detect and correct errors.

Encryption

In order to protect the data being stored or transferred from being read if intercepted, many archive formats include the capability to encrypt the data. There are multiple mathematical algorithms available to encrypt data.

Comparison

Containers and Compression

Format Filename
extension
Created
by
Introduced in Based on Compression Integrity check Recovery record Encryption supported Unicode filenames Modification date resolution Pre-processing
Archive (ar) .a CSRG ? Original No No No No No 1 s ?
cpio .cpio Bell Labs 1983 Unix System V ? No Partial, select formats only No No No 1 s ?
Shell Archive (shar and makeself) .shar, .run ? 1994 4.4BSD Original No Yes, commonly MD5 Partial Partial Partial arbitrary (typically 1 s) ?
Tape Archive (tar) .tar Bell Labs 1975 Version 6 Unix ? No Partial, metadata only. Full integrity providable by filters such as gzip. No No Optional1 1 s No
Extended TAR format (pax) .tar OpenGroup 2001 Sun proposal + TAR No metadata No No Yes arbitrary (typically 1 ns) ?
BagIt - The Library of Congress 2007 file system No Yes No No Yes No ?
7z .7z Igor Pavlov 2000 LZMA Yes Yes,
CRC32
No Yes,
AES-256
Yes 1 ms (maybe better?) Yes
ACE .ace Marcel Lemke ? ? Yes Yes Yes Yes, Blowfish Yes ? ?
AFA .afa Vicente Sánchez-Alarcos 2009 Original Yes Yes Yes Yes, AES and CAST Yes ? ?
ARC .arc Thom Henderson (SEA) 1985 ? Yes CRC16 No weak XOR only No 2s ?
ARJ .arj Robert Jung 1991 AR001 and AR002 Yes Yes Yes weak XOR with initial constant No ? ?
B1 .b1 Catalina Group Ltd 2011 LZMA Yes Yes No Yes, AES Yes ? ?
Cabinet .cab Microsoft 1992 Windows 3.1 DEFLATE Yes Optional PKCS7 Authenticode signature No Optional (with SDK) Yes 2 s ?
Compact File Set .cfs Joe Lowe (Pismo Technic Inc.) 2008 ZIP/LZMA Yes Yes ? Yes Yes ? ?
Compact Pro .cpt Bill Goodman 1990 (as "Compactor") Original Yes Yes No Yes ? ? ?
Disk Archive (DAR) .dar Denis Corbin 2002 Original Yes Yes Yes2 Yes Yes 1 µs Yes
DGCA .dgc Shin-ichi Tsuruta 2001 GCA Yes Yes Yes Yes Yes ? ?
FreeArc .arc Bulat Ziganshin 2006 LZMA, PPMD, TTA Yes Yes Yes Yes, AES, Blowfish, Twofish and Serpent Yes ? ?
LHA (also LZH) .lzh, .lha Haruyasu Yoshizaki 1988 Frozen Yes Only on recent LHA releases No No No 1–2 s ?
LZX .lzx Jonathan Forbes and Tomi Poutanen 1995 LZ77 Yes Only on recent LZX releases ? ? ? ? ?
Sparc .arc David Pilling 1989 ? Yes ? ? ? ? ? ?
WinMount format .mou ? 2007 ? Yes Yes Yes Yes Yes ? ?
Macintosh Disk Image .dmg Apple Computer 2001 Mac OS X Original Yes Yes ? Yes ? ? ?
Partition Image (PartImage) .partimg François Dupoux and Franck Ladurelle 2000 ? Yes ? ? ? ? ? ?
PAQ Family (Several formats)4 .paq#*, .lpaq#* Matt Mahoney 20022006 Original Yes ? ? ? ? ? ?
PEA .pea Giorgio Tani 2006 Original, Deflate based compression Yes Yes Adler32, CRC32, CRC64, MD5, SHA1, RIPEMD-160, SHA256, SHA512, Whirlpool No Yes Authenticated Encryption, AES128 and AES256 in EAX mode Yes system dependent Yes arbitrary ?
PIM .pim Ilia Muraviev 20042008 Original Yes Yes No No Yes No ?
Quadruple D .qda Taku Hayase (aka sandman) 1997 ? Yes ? ? ? ? ? ?
RAR .rar Eugene Roshal 1993 Original Yes Yes,
CRC32,
BLAKE2
Yes,
Reed-Solomon
Yes,
AES-256
Yes,
UTF-8
2 s, 1 s, 6.5536 ms, 25.6 µs or 100 ns 3 Dropped
RK .rk M Software, Ltd. 2004 Original Yes Yes No Yes, AES, Square, Twofish Yes 1 s ?
NuFX .shk Andy Nicholas 1989 Original Yes CRC16 No No No 1 s ?
StuffIt (also SIT) .sit Raymond Lau 1987 ? Yes ? ? Yes ? ? ?
StuffIt X (also SITx) .sitx Aladdin/Allume Systems 2002 ? Yes ? Optional Yes, RC4,Blowfish,
AES,DES
Yes ? ?
UltraCompressor II .uc .uc0 .uc2
.ucn .ur2 .ue2
Nico de Vries 1992–1996 LZ77 and Huffman coding Yes Yes Yes Yes, triple DES ? ? ?
Windows Image .wim Microsoft 2006 Original Yes Yes No Partial5 Yes 100 ns ?
ZIP (also PKZIP) .zip Phil Katz 1989 DEFLATE Yes Yes No Yes, AES Yes 1-2 s, depending on version ?
ZPAQ .zpaq Matt Mahoney 2009 PAQ Yes Yes, SHA-1 No Yes, AES-256 Yes ? ?

Notes

^1 While the original tar format uses the ASCII character encoding, current implementations use the UTF-8 (Unicode) encoding, which is backwards compatible with ASCII.
^2 Supports the external Parchive program (par2).
^3 From 3.20 release RAR can store modification, creation and last access time with the precision up to 0.0000001 second (= 0.1 µs).
^4 The PAQ family (with its lighter weight derivative LPAQ) went through many revisions, each revision suggested its own extension. For example: ".paq9a".
^5 WIM can store the ciphertext of encrypted files on an NTFS volume, but such files can only by decrypted if an administrator extracts the file to an NTFS volume, and the decryption key is available (typically from the file's original owner on the same Windows installation). Microsoft has also distributed some download versions of the Windows operating system as encrypted WIM files, but via an external encryption process and not a feature of WIM.

Software Packaging and Distribution

Format Filename
extension
Created
by
Introduced in Based on Integrity check Recovery record Encryption supported Unicode filenames Modification date resolution
Debian package (deb) .deb Debian 1994 Debian 0.91 ar, tar, and gzip Yes No No Yes 1 s
Macintosh Installer .pkg, .mpkg (metapackage) NeXT 1989 NeXTSTEP 1.0 pax and gzip Yes ? ? Yes ?
RPM Package Manager (RPM) .rpm Red Hat 1995 Red Hat Linux 1.0 cpio and gzip Yes ? ? ? 1 s
Slackware Package .tgz Patrick Volkerding 1993 Slackware 1.0 tar and gzip Yes No No ? ?
Windows Installer (also MSI) .msi Microsoft 2000 Windows 2000 OLE Structured Storage, Cabinet and SQL Optional PKCS7 Authenticode Signature No No No 2 s
Java Archive (JAR1) .jar Sun Microsystems 1997 JDK 1.1 PKZIP Yes ? ? Yes ?
Google Chrome extension package .crx Google 2009 (Chrome 4.0) Zip ? ? Yes[1] ? ?
Pacman .pkg.tar.xz, .pkg.tar (no compression) Judd Vinet 2001 (before ArchLinux 0.1) tar and xz (formerly gzip) Yes No No Yes 1 s

Notes

^1 Not to be confused with the archiver JAR written by Robert K. Jung, which produces ".j" files.

Features

Archive format Built-in compression Self-extracting Directory Structure POSIX attributes ACLs Alternate data streams
cpio No1 No Yes Yes ? ?
tar No1 No Yes Yes Yes (in Solaris implementation)
dar Yes3 No Yes Yes Yes Yes
ar No No No Yes No ?
pax No No Yes Yes Yes ?
dump No1 No Yes Yes Yes ?
shar No Yes Yes Yes ? ?
makeself Yes Yes Yes Yes Yes ?
zip Yes Yes2 Yes No ? ?
rar Yes Yes2 Yes No ? ?
ace Yes ? Yes No ? ?
arj Yes Yes2 Yes No No ?
zoo Yes ? Yes No ? ?
ISO 9660 (CD-ROM) No1 No Yes (with Rock Ridge extension) No ?
cab Yes Yes2 ? No ? ?
rpm Yes No Yes Yes ? ?
deb Yes No Yes Yes ? ?
7z Yes No Yes Yes ? ?
Archive format Built-in compression Self-extracting Directory Structure POSIX attributes ACLs Alternate data streams

Notes

^1 Compression is not a built-in feature of the formats, however, the resulting archive can be compressed with any algorithm of choice. Several implementations include functionality to do this automatically
^2 That is, most implementations can optionally produce a self-extracting executable
^3 Per-file compression with gzip, bzip2, lzo, xz, lzma (as opposed to compressing the whole archive). An individual can choose not to compress already compressed filenames based on their suffix as well.

References

See also

This article is issued from Wikipedia - version of the 11/27/2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.