StAX
Streaming API for XML (StAX) is an application programming interface (API) to read and write XML documents, originating from the Java programming language community.
Traditionally, XML APIs are either:
- DOM based - the entire document is read into memory as a tree structure for random access by the calling application
- event based - the application registers to receive events as entities are encountered within the source document.
Both have advantages: DOM, for example, allows for random access to the document, and SAX has a small memory footprint and is typically much faster.
These two access metaphors can be thought of as polar opposites. A tree based API allows unlimited, random access and manipulation, while an event based API is a 'one shot' pass through the source document.
StAX was designed as a median between these two opposites. In the StAX metaphor, the programmatic entry point is a cursor that represents a point within the document. The application moves the cursor forward - 'pulling' the information from the parser as it needs. This is different from an event based API - such as SAX - which 'pushes' data to the application - requiring the application to maintain state between events as necessary to keep track of location within the document.
Origins
StAX has its roots in a number of incompatible pull APIs for XML, most notably XMLPULL, the authors of which (Stefan Haustein and Aleksander Slominski) collaborated with, amongst others, BEA Systems, Oracle, Sun and James Clark.
Examples
From JSR-173 Specification• Final, V1.0 (used under fair use).
Quote:
- The following Java API shows the main methods for reading XML in the cursor approach.
public interface XMLStreamReader {
public int next() throws XMLStreamException;
public boolean hasNext() throws XMLStreamException;
public String getText();
public String getLocalName();
public String getNamespaceURI();
// ...other methods not shown
}
- The writing side of the API has methods that correspond to the reading side for “StartElement” and “EndElement” event types.
public interface XMLStreamWriter {
public void writeStartElement(String localName) throws XMLStreamException;
public void writeEndElement() throws XMLStreamException;
public void writeCharacters(String text) throws XMLStreamException;
// ...other methods not shown
}
- 5.3.1 XMLStreamReader
- This example illustrates how to instantiate an input factory, create a reader and iterate over the elements of an XML document.
XMLInputFactory xmlInputFactory = XMLInputFactory.newInstance();
XMLStreamReader xmlStreamReader = xmlInputFactory.createXMLStreamReader(...);
while (xmlStreamReader.hasNext()) {
xmlStreamReader.next();
}
See also
Competing and complementary ways to process XML in Java (the order is loosely based on initial date of introduction):
- Document Object Model (DOM), the first standardized, language/platform-independent tree-based XML processing model; alternate Java tree models include JDOM, Dom4j, and XOM
- Simple API for XML (SAX), the standard XML push API
- Java XML Binding API (JAXB), works on top of another parser (usually streaming parser), binds contained data to/from Java objects.
- Streaming XML
- XQuery API for Java
External links
- Java Implementations
- Sun Java StAX XML Processor Open source. Ships as part of Sun Java Standard Edition 6 runtime.
- Reference Implementation (for JSR-173, API specification, under the Apache Software License)
- Woodstox Open source StAX implementation (LGPL or Apache license)
- Aalto is an ultra-high-performance parser (Apache license)
- Utilities and add-ons
- StAX-Utils Provides a set of utility classes that make it easy for developers to integrate StAX into their existing XML processing applications.
- StAX-Utils includes classes to provide XML file indenting and formatting.
- StaxMate is a lightweight framework that builds on top of Stax API and provides more convenient nested/filtered cursor for reading xml, nested outputters for writing xml (with optional indentation) and other tools (build DOM from Stax sources, write to Stax destinations) for interoperability.
- Parsers built on top of StAX
- Apache Axiom is a lightweight XML object model based on top of Stax and also provides lazy object building.
- Apache Pivot uses StAX for the serialization of user interface markup written in BXML.
- JavaFX 2.0 uses StAX for the serialization of user interface markup written in FXML.
- Non-standard Java StAX-like parsers
- XPP Parser based on the very similar but older XMLPull API.
- kXML A Java Micro Edition parser that uses the XMLPull API.
- Javolution provides a real-time StAX-like implementation which does not force object creation (e.g. String) and has smaller effect on memory footprint/garbage collection (Note: to reduce object creation, most StAX implementations maintain lookup tables to retrieve and reuse frequently used String objects).
- Non-Java XML pull parsers
- Qt has XML parser (QXmlStreamReader) and writer (QXmlStreamWriter)
- irrXML is a simple and fast open source XML parser for C++
- LlamaXML is the XML C++ pull parser and writer
- libxml2 is the XML C parser and toolkit (MIT License)
- The XmlReader class in Microsoft's .NET Framework is a pull-style XML parser.
- Articles and resources
- Introduction to StAX XML.com, Harold, Elliotte Rusty
- Java Streaming API for XML (Stax) - Tutorial
- JSR (#173)
- Download JSR specification document as a pdf here: download now
- XMLPull Patterns Article on XML Pull (and StAX) design patterns by Aleksander Slominski.
- XMLPull.org
- StAX and Sax comparison.
- Using StAX with JAXB for efficiency
- StAX and Java eg. from DevX.com