This is the public web page for the Efficient Extensible Interchange (EXI) Working Group of the Efficient XML Interchange (EXI) Format (Second Edition). Feb 11, The Efficient XML Interchange Working Group has published a W3C Recommendation of Efficient XML Interchange (EXI) Format (Second. , Efficient XML Interchange (EXI) Format (Second Edition) Recommendation. , Proposed Edited Recommendation. .
|Published (Last):||27 February 2004|
|PDF File Size:||20.53 Mb|
|ePub File Size:||10.86 Mb|
|Price:||Free* [*Free Regsitration Required]|
W3C liabilitytrademark and document use rules apply. It is oriented towards quickly understanding how the EXI format can be used in practice and how options can be set to achieve specific needs. Furthermore, additional details about data type representation, compression, and their interaction with other format features are presented.
Efficient XML Interchange by Example provides a detailed, bit-level description of both, a schema-less and a schema-informed example. This section describes the status of this document at the time of its publication. Other documents may supersede this document.
A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http: At the same time the encoding and decoding examples have been improved and harmonized to ease understanding. This is a draft document and may be updated, replaced or obsoleted by other documents at any time.
It is inappropriate to cite this document as other than work in progress. The group does not expect this document to become a W3C Recommendation. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim s must disclose the information in accordance with section 6 of the W3C Patent Policy.
Efficient XML Interchange (EXI) Format
Please send comments about this document to the public-exi w3. Hereinafter, the presentation assumes that the reader is familiar with the basic concepts of XML and the way XML Schema can be used to describe and enforce constraints on XML documents.
The document is comprised of two major parts. The first part describes the structure and content of an EXI document with and without compression. More specifically, it describes the concept of an EXI stream and how it is generated using EXI grammars, as well as the implications on structure and content ordering in an EXI stream when compression is enabled.
As a practical application of the concepts from the first part, the second part presents a complete bit-level description of an EXI document. The development of the Efficient XML Interchange EXI format was guided by five design principlesnamely, the format had to be general, minimal, efficient, flexible, and interoperable. The format satisfies these prerequisites, achieving generality, efficiency, and flexibility while at the same time keeping complexity in check.
Many of the concepts employed by the EXI format are applicable to the encoding of arbitrary languages that can be described by a grammar. Even though EXI utilizes schema information to improve compactness and processing efficiency, it does not depend on accurate, complete, or current schemas to work.
The EXI header conveys format version information and may also include the set of options that were used during encoding. If these options are omitted, it is assumed that the decoder has access to them out of band.
The EXI body comprises an event sequence describing the document or document fragment that is encoded. The header communicates encoding properties that are needed to decode the EXI body.
The minimal header can be represented in a single byte. This keeps the overhead and complexity to a minimum and does not sacrifice compactness, especially for small documents where a header can introduce a large constant factor.
The value of this single bit is used to indicate the presence or absence of the EXI Options that appear later in the header. A leading 0 zero bit indicates that the document is encoded according to the final version of the recommendation, while a leading 1 one indicates that it is a preview version. The differentiation is introduced to facilitate early releases of preview versions with less strict interoperability requirements.
Only final versions are required to be processed by compliant processors. The leading bit is followed by one or more 4-bit sequences which are collectively interpreted as a format version number starting at 1. The EXI Options specify how the body of an EXI stream is encoded and, as stated earlier, their presence is controlled by the presence bit earlier in the header. When the EXI Options document does not specify a value for a particular option, the default value is assumed.
Most of the options are straightforward and act as boolean values to enable or disable a feature. The preserve options shown in the table above are really a family of options that control which XML items are preserved and which XML items are ignored. These are collectively known as fidelity options. These options can be used to eliminate the associated overhead of communicating unused XML items.
Efficient XML Interchange
Fidelity options are used to manage filters for certain XML items as shown in the following table. Naturally, XML items that are discarded at encoding time xnl to a particular setting of the fidelity options cannot be reconstructed exactly at decoding time. The next section deals with the EXI Body and discusses in more detail the effects of enabling and disabling fidelity options.
EXI events may have additional content associated with them. For example, the attribute event AT “foo” may have an attribute value foo1 associated with it. The following table shows all the possible event types together with their associated information items distinguished by structure and content.
In EXI terminology, content denotes attribute and character values while all other information items are considered as belonging to the structure category.
XEP-0322: Efficient XML Interchange (EXI) Format
For named XML items, such as elements and attributes, there are three types of events: SE qnameSE uri: These events differ in their associated structure: The decision to use one type of event over the other will be explained later after introducing the notion of EXI grammars.
Additionally, Fidelity Options may allow the preservation of namespace prefixes. The fidelity options introduced xmp Section 2. Grammar pruning simplifies the encoding and decoding process and also improves compactness by filtering out unused event types.
The order in which attributes are encoded may be different in schema-less and schema-informed EXI streams, as is the exact content associated with each event. The actual number of bits used to represent each type of event, excluding its content, differs depending on the context. The more event types can occur in a certain context, the larger the number of bits required to represent an event in that context.
What constitutes a context in this case is more formally defined by an EXI grammar production in the next section.
EXI is intercjange knowledge based encoding that uses a set of grammars to determine which events are most likely to occur at any given point in an EXI stream and encodes the most likely alternatives in fewer bits. It does this by mapping the stream of events to a lower entropy set of representative values and encoding those values using a set of simple variable length codes or an EXI compression algorithm. EXI grammars are regular grammars in which productions are associated with event codes. Since EXI grammars are regular grammars, the sequence of event codes written by an encoder corresponds to a path in the finite automaton that accepts the grammar.
In reality, given that XML is not a regular language, a single grammar cannot be used to represent an entire XML event stream. An event code is represented by a sequence of one to three parts, where each part is a non-negative integer. Event codes in an EXI grammar are assigned to productions in such a way that shorter event codes are used to represent productions that are more likely to occur.
Conversely, longer event codes are used to represent productions that are less likely to occur. EXI grammars are designed in a way that the average number of bits needed to represent each production is less than that for a grammar in which more likely and less likely productions are not distinguished. The following tables illustrate this principle via an example. In the first table, where productions are not separated according to their probability, a 4-bit code is needed to represent each entry.
In the second table, on the other hand, code lengths vary from 2 bits to 6 bits since productions are grouped based on their likelihood to occur. Assuming the content model for the element being encoded corresponds to the sequence AT category AT date i.
In particular, EXI grammars can take advantage of the fact that, on any given grammar, certain XML items are more popular than others. For example, by simple inspection of real-world documents, it is easy to verify that attributes occur more frequently than processing instructions and should therefore receive shorter event codes.
Further improvements in grammar design are possible if schema information is available. Fofmat this case, we can not only take advantage of generic XML knowledge but also of knowledge that is rormat to the type of documents being encoded. The following two sections describe the differences between the built-in grammars and the schema-informed grammars. Note that an EXI encoder may only have partial schema information in which case it will use a combination of built-in ezi schema-informed grammars during encoding.
There are built-in grammars to encode documents, fragments, and elements. Document grammars and fragment grammars describe the top-level structure, while element grammars describe the structure of every element. Fragment grammars are more lenient than document grammars; for example, they allow multiple top-level elements to be encoded as siblings. The EXI format describes a mechanism by which built-in grammars are dynamically extended using information from the actual instance being encoded.
Stated differently, the EXI format describes a learning mechanism to further improve efficiency when no schema information is available statically. Newly learned productions are assigned short event codes, improving compactness for every intrchange use of those productions. In addition, by adding new productions to the grammar, certain data associated with an event only needs to be encoded once. As pointed out in the previous section, EXI grammars are always regular and can, therefore, be accepted by finite automata FA.
To provide a more operational view of an EXI processor, we will opt for the use of FA to explain how grammars work. The following figure shows a stack of grammars in which the top-level grammar accepts “note” elements. State transitions in intetchange correspond to the built-in element grammar; state transitions in red have been learned as a result of encoding the element before. The built-in element automaton has two distinguished states: The former accepts attribute and namespace events that must occur before any element content; the latter accepts only element content which excludes attribute and namespace events.
This separation enables the use of short codes which improves compactness and processing time. As stated earlier, transitions in red are extensions to intercgange built-in element grammar based on knowledge acquired about the element note. In particular, this suggests that SE subject xm expected to occur before SE bodyand that both of these SE events are expected to occur after any AT event. Itnerchange grammars can be further improved if schema information is known statically.
Schema information can be interpreted in two different ways or encoding modes: In strict mode, the instances being encoded must be largely valid with respect to the schema; most deviations from the schema will result in an encoding error.
In non-strict mode, any deviations are accepted and encoded using more generic events. Examples of deviations are attributes whose actual values do interchqnge match the type defined in the schema or elements whose structure does not correspond to that in the schema.
Instead of being dynamically extensible as the built-in grammars, schema-informed grammars are created statically based on the information in the available schema.