PDF/A

PDF/A
Filename extension .pdf
Internet media type application/pdf
Type code 'PDF ' (including a single space)
Uniform Type Identifier (UTI) com.adobe.pdf
Magic number %PDF
Developed by ISO
Initial release October 1, 2005 (2005-10-01)
Extended from PDF
Standard ISO 19005

PDF/A is an ISO-standardized version of the Portable Document Format (PDF) specialized for use in the archiving and long-term preservation of electronic documents. PDF/A differs from PDF by prohibiting features ill-suited to long-term archiving, such as font linking (as opposed to font embedding) and encryption.[1] The ISO requirements for PDF/A file viewers include color management guidelines, support for embedded fonts, and a user interface for reading embedded annotations.

Standards

ISO 19005 – Document management – Electronic document file format for long-term preservation (PDF/A)
Abbr. Subtitle Published Standard Based on Ref.
PDF/A-1 Part 1: Use of PDF 1.4 2005-10-01 ISO 19005-1 PDF 1.4 (Adobe Systems, PDF Reference, third edition) [2]
PDF/A-2 Part 2: Use of ISO 32000-1 2011-07-01 ISO 19005-2 PDF 1.7 (ISO 32000-1:2008) [3]
PDF/A-3 Part 3: Use of ISO 32000-1 with support for embedded files 2012-10-15 ISO 19005-3 PDF 1.7 (ISO 32000-1:2008) [4]

Background

PDF is a standard for encoding documents in an "as printed" form that is portable between systems. However, the suitability of a PDF file for archival preservation depends on options chosen when the PDF is created: most notably, whether to embed the necessary fonts for rendering the document; whether to use encryption; and whether to preserve additional information from the original document beyond what is needed to print it.

PDF/A was originally a new joint activity between the Association for Suppliers of Printing, Publishing and Converting Technologies (NPES) and the Association for Information and Image Management to develop an international standard defining the use of the Portable Document Format (PDF) for archiving documents.[5] The goal was to address the growing need to electronically archive documents in a way that would ensure preservation of their contents over an extended period of time and ensure that those documents would be able to be retrieved and rendered with a consistent and predictable result in the future.[6] This need exists in a wide variety of government and industry areas world-wide, including legal systems, libraries, newspapers, and regulated industries.[7]

Description

The PDF/A standard does not define an archiving strategy or the goals of an archiving system. It identifies a "profile" for electronic documents that ensures the documents can be reproduced exactly the same way using various software in years to come. A key element to this reproducibility is the requirement for PDF/A documents to be 100% self-contained. All of the information necessary for displaying the document in the same manner is embedded in the file. This includes, but is not limited to, all content (text, raster images and vector graphics), fonts, and color information. A PDF/A document is not permitted to be reliant on information from external sources (e.g. font programs and data streams), but may include annotations (e.g. hypertext links) that link to external documents.[8]

Other key elements to PDF/A conformance include:[9][10][11]

Conformance levels and versions

PDF/A-1

Part 1 of the standard was first published on October 1, 2005,[2] and specifies two levels of conformance for PDF files:[12]

Level B conformance requires only that standards necessary for the reliable reproduction of a document's visual appearance be followed, while Level A conformance includes all Level B requirements in addition to features intended to improve a document's accessibility, such as:

Level A conformance was intended to increase the accessibility of conforming files for physically impaired users by allowing assistive software, such as screen readers, to more precisely extract and interpret a file's contents.[12] A later standard, PDF/UA, was developed to eliminate what became considered some of PDF/A's shortcomings, replacing many of its general guidelines with more detailed technical specifications.[13]

PDF/A-2

Part 2 of the standard, published on July 1, 2011,[3] addresses some of the new features added with versions 1.5, 1.6 and 1.7 of the PDF Reference. PDF/A-1 files will not necessarily conform to PDF/A-2, and PDF/A-2 compliant files will not necessarily conform to PDF/A-1.

Part 2 of the PDF/A Standard is based on a PDF 1.7 (ISO 32000-1), rather than PDF 1.4 and offers a number of new features:

Part 2 defines three conformance levels. PDF/A-2a, PDF/A-2b correspond to conformance levels a and b in PDF/A-1. A new conformance level, PDF/A-2u, represents Level B conformance (PDF/A-2b) with the additional requirement that all text in the document have Unicode mapping.[12][14]

PDF/A-3

Part 3 of the standard, published on October 15, 2012,[4] differs from PDF/A-2 in only one regard – it allows embedding of arbitrary file formats (such as XML, CSV, CAD, word-processing documents, spreadsheet documents, and others) into PDF/A conforming documents.[15]

Identification

A PDF/A document can be identified as such through PDF/A-specific metadata located in the "http://www.aiim.org/pdfa/ns/id/" namespace. This metadata represents a claim of conformance; in itself it does not ensure conformance:

Validation

Isartor Test Suite

Industry collaboration in the original PDF/A Competence Center led to the development of the Isartor Test Suite in 2007 and 2008. The test suite consists of 204 PDF files intentionally constructed to systematically fail each of the requirements for PDF/A-1b conformance, allowing developers to test the ability of their software to validate against the standard's most basic level of conformance.[17][18] By mid-2009 the test suite had already made an appreciable difference in the general quality of PDF/A validation software.[19]

veraPDF

Working with the other members of the veraPDF consortium, including the Open Preservation Foundation,[20] to respond to the EU Commission's PREFORMA project[21] the PDF Association launched the PDF Validation Technical Working Group in November 2014 to articulate a plan for developing an industry-supported PDF/A validator.[22]

Based on its test corpora (which incorporates the Isartor Test Suite) and software development plan the veraPDF consortium subsequently won phase 2 of the PREFORMA contract in April 2015.[23] As of August 2016 the veraPDF software was well-advanced.[24] Phase 2 will be completed by December 2016, and will be followed by Phase 3, a six-month testing and acceptance period.

PDF/A viewers

The PDF/A specification also states some requirements for a conforming PDF/A viewer, which must

When encountering a file that claims conformance with PDF/A, some PDF viewers will default to a special "PDF/A viewing mode" to fulfill conforming reader requirements. To take one example, Adobe Acrobat and Adobe Reader 9 include an alert to advise the user that PDF/A viewing mode has been activated. Some PDF viewers allow users to disable the PDF/A viewing mode or to remove the PDF/A information from a file.[25][26]

Drawbacks

A PDF/A document must embed all fonts in use; accordingly, a PDF/A file will often be larger than an equivalent PDF file that does not include embedded fonts.

The use of transparency is forbidden in PDF/A-1. The majority of PDF generation tools that allow for PDF/A document compliance, such as the PDF export in OpenOffice.org or PDF export tool in Microsoft Office 2007 suites, will also make any transparent images in a given document non-transparent. That restriction was removed in PDF/A-2.[9]

Some archivists have voiced concerns that PDF/A-3, which allows arbitrary files to be embedded in PDF/A documents, could result in circumvention of memory institution procedures and restrictions on archived formats.[27]

The PDF Association had addressed various misconceptions[28] regarding PDF/A in its publication "PDF/A in a Nutshell 2.0".

Converting a PDF (up to version 1.4) into a PDF/A-2 usually works as expected, except for problems with glyphs. According to the PDF Association, "Problems can occur before and/or during the generation of PDFs. A PDF/A file can be formally correct yet still have incorrect glyphs. Only a careful visual check can uncover this problem. Because generation problems also affect Unicode mapping, the problem attracts the attention when a visual check is carried out on the extracted text. In PDF/A, text/font usage is specified uniquely enough to ensure that it cannot be incorrect. If viewers or printers do not offer complete support for encoding systems, this can result in problems with regard to PDF/A."[29] Meaning that for a document to be completely compliant with the standard, it will be correct internally, while the system used for viewing or printing the document may produce undesired results.

A document produced with OCR conversion into PDF/A-2 or PDF/A-3 doesn't support the notdefglyph flag. Therefore, this type of conversion can result in unrendered content.

See also

References

  1. Oettler, Alexandra (2013-02-07). "PDF/A facts – an introduction to the standard" (PDF). PDF Association. Retrieved 2014-07-11.
  2. 1 2 "ISO 19005-1:2005". ISO. Retrieved 2016-07-27.
  3. 1 2 "ISO 19005-2:2011". ISO. Retrieved 2016-07-27.
  4. 1 2 "ISO 19005-3:2012". ISO. Retrieved 2016-07-27.
  5. "A short history of PDF/A" (PDF). PDF Association. 2013-02-07. Retrieved 2014-07-11.
  6. Oettler, Alexandra (2013-02-07). "The most important reasons to use PDF/A" (PDF). PDF Association. Retrieved 2014-07-11.
  7. Oettler, Alexandra (2013-02-07). "Typical uses for PDF/A" (PDF). PDF Association. Retrieved 2014-07-11.
  8. Oettler, Alexandra (2013-02-07). "The technical side of the PDF/A standard" (PDF). PDF Association. Retrieved 2014-07-11.
  9. 1 2 "PDF/A – A Look at the Technical Side" (PDF). Retrieved 2011-07-06.
  10. 1 2 "PDF/A-2 Standard Published by ISO! The New Standard Includes Great Technical Enhancements." (PDF). 2011-07-01. Retrieved 2011-07-06.
  11. Frequently Asked Questions (FAQs) – ISO 19005-1:2005 – PDF/A-1, Date: July 10, 2006 (PDF), 2006-07-10, retrieved 2011-07-06
  12. 1 2 3 "Improved PDF/A-1b" (PDF). PDF Association. 2011-08-05. Retrieved 2012-09-26.
  13. Oettler, Alexandra (2013-02-07). "PDF/A and the other PDF standards" (PDF). PDF Association. Retrieved 2014-07-12.
  14. PDF/A-2, PDF for Long-term Preservation, Use of ISO 32000-1 (PDF 1.7), Library of Congress, retrieved 2012-09-26
  15. "PDF Association Arranges Its First Seminar on PDF/A to Include Standards 1 to 3" (PDF). PDF Association. 2012-03-29.
  16. Oettler, Alexandra (2013-02-07). "Validation: is it really PDF/A?" (PDF). PDF Association. Retrieved 2014-07-11.
  17. Isartor Test Suite (PDF). PDF/A Competence Center. 2008-08-12. Retrieved 2016-09-23.
  18. "Isartor Test Suite" (PDF). PDF Association. 2011-08-03. Retrieved 2016-09-23.
  19. "Bavaria Report" (PDF). PDFlib. 2009. Archived from the original on 2015-04-21. Retrieved 2015-04-30.
  20. "Open Preservation Foundation veraPDF project". Open Preservation Foundation. Retrieved 2015-04-30.
  21. PREFORMA, an EU Commission funded project
  22. "A consortium including the PDF Association wins phase 1 of an EU Commission tender to create an open-source PDF/A validator" (PDF). PDF Association. 2014-11-13. Retrieved 2015-04-30.
  23. PREFORMA starts prototyping phase, retrieved 2015-04-30
  24. "veraPDF 0.22 released" (PDF). Retrieved 23 September 2016.
  25. "How to Remove PDF/A Information from a file". Retrieved 2014-04-10.
  26. "Change the PDF/A viewing mode". Retrieved 2014-04-10.
  27. Archivists: No flowers for PDF/A-3, retrieved 2014-07-12
  28. The myths and legends surrounding PDF/A (PDF), retrieved 2014-07-12
  29. PDF/A – A Look at the Technical Side (PDF), retrieved 2015-08-14

Further reading

This article is issued from Wikipedia - version of the 11/5/2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.