You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
1602 lines
74 KiB
1602 lines
74 KiB
<?xml version="1.0" encoding="iso-8859-1" ?>
|
|
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
|
|
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
|
|
<html xmlns="http://www.w3.org/1999/xhtml">
|
|
<head>
|
|
<title>Canonical XML</title>
|
|
<style type="text/css">
|
|
code { font-family: monospace }
|
|
</style>
|
|
|
|
<link href="http://www.w3.org/StyleSheets/TR/W3C-REC" type=
|
|
"text/css" rel="stylesheet" />
|
|
<meta http-equiv="Content-Type" content=
|
|
"text/html; charset=iso-8859-1" />
|
|
</head>
|
|
<body>
|
|
<p><a href="http://www.w3.org/"><img src=
|
|
"http://www.w3.org/Icons/w3c_home" alt="W3C" border="0"
|
|
height="48" width="72" /></a></p>
|
|
|
|
<div class="head">
|
|
<h1 class="notoc">Canonical XML<br />
|
|
Version 1.0</h1>
|
|
|
|
<h2 class="notoc">W3C Recommendation 15 March 2001</h2>
|
|
|
|
<dl>
|
|
<dt>This version:</dt>
|
|
|
|
<dd><a href="http://www.w3.org/TR/2001/REC-xml-c14n-20010315">
|
|
http://www.w3.org/TR/2001/REC-xml-c14n-20010315</a></dd>
|
|
|
|
<dt>Latest version:</dt>
|
|
|
|
<dd><a href="http://www.w3.org/TR/xml-c14n">
|
|
http://www.w3.org/TR/xml-c14n</a></dd>
|
|
|
|
<dt>Previous version:</dt>
|
|
|
|
<dd><a href="http://www.w3.org/TR/2001/PR-xml-c14n-20010119 ">
|
|
http://www.w3.org/TR/2001/PR-xml-c14n-20010119 </a></dd>
|
|
|
|
<dt>Author/Editor:</dt>
|
|
|
|
<dd>John Boyer, PureEdge Solutions Inc., <a href=
|
|
"mailto:jboyer@PureEdge.com">jboyer@PureEdge.com</a></dd>
|
|
</dl>
|
|
|
|
<p class="copyright">
|
|
<a href="http://www.w3.org/Consortium/Legal/ipr-notice-20000612#Copyright">
|
|
Copyright</a> © 2001 <a href="http://www.w3.org/">
|
|
<abbr title="World Wide Web Consortium">W3C</abbr></a><sup>®</sup>
|
|
(<a href="http://www.lcs.mit.edu/"><abbr title="Massachusetts Institute of
|
|
Technology">MIT</abbr></a>, <a href="http://www.inria.fr/"><abbr
|
|
lang="fr" title="Institut National de Recherche en Informatique et
|
|
Automatique">INRIA</abbr></a>,
|
|
<a href="http://www.keio.ac.jp/">Keio</a>), All Rights Reserved. W3C
|
|
<a href="http://www.w3.org/Consortium/Legal/ipr-notice-20000612#Legal_Disclaimer">
|
|
liability</a>,
|
|
<a href="http://www.w3.org/Consortium/Legal/ipr-notice-20000612#W3C_Trademarks">trademark</a>,
|
|
<a href="http://www.w3.org/Consortium/Legal/copyright-documents-19990405">document use</a> and <a href="http://www.w3.org/Consortium/Legal/copyright-software-19980720">software licensing</a> rules apply.</p>
|
|
|
|
<hr title="Separator from Header" />
|
|
</div>
|
|
|
|
<h2 class="notoc">Abstract</h2>
|
|
|
|
<p>Any XML document is part of a set of XML documents that are
|
|
logically equivalent within an application context, but which vary
|
|
in physical representation based on syntactic changes permitted by
|
|
XML 1.0 <a href="#XML">[XML]</a> and Namespaces in XML <a href=
|
|
"#namespaces">[Names]</a>. This specification describes a method
|
|
for generating a physical representation, the canonical form, of an
|
|
XML document that accounts for the permissible changes. Except for
|
|
limitations regarding a few unusual cases, if two documents have
|
|
the same canonical form, then the two documents are logically
|
|
equivalent within the given application context. Note that two
|
|
documents may have differing canonical forms yet still be
|
|
equivalent in a given context based on application-specific
|
|
equivalence rules for which no generalized XML specification could
|
|
account.</p>
|
|
|
|
<h2><a name="status">Status of this document</a> </h2>
|
|
|
|
<p><i>This section describes the status of this document at the time of its publication.
|
|
Other documents may supersede this document. The latest status of this document series is
|
|
maintained at the W3C. </i></p>
|
|
|
|
<p>This document has been reviewed by W3C Members and other interested parties and has
|
|
been endorsed by the Director as a <a
|
|
href="http://www.w3.org/Consortium/Process-20010208/tr.html#RecsW3C">W3C Recommendation</a>.
|
|
It is a stable document and may be used as reference material or cited as a normative
|
|
reference from another document. </p>
|
|
|
|
<p>This document has been produced by the <a
|
|
href="http://www.w3.org/Signature/Overview.html">IETF/W3C XML Signature Working Group</a>,
|
|
(see also <a href="http://www.w3.org/Signature/Activity.html">W3C XML Signature Activity
|
|
Statement</a>). This version includes a few minor editorial improvements from the previous
|
|
version. The only substantive change is the addition of a reference to the corrigendum [<a
|
|
href="#NFC-Corrigendum">NFC-Corrigendum</a>] of <em>TR15, Unicode
|
|
Normalization Forms</em> [<a href="#ref-NFC">NFC</a>]. This corrigendum
|
|
corrects a mistake by which the character U+FB1D HEBREW LETTER YOD WITH HIRIQ was
|
|
mistakenly omitted from the <a
|
|
href="http://www.unicode.org/Public/3.0-Update1/CompositionExclusions-2.txt">Composition
|
|
Exclusions</a> of <em>Unicode 3.0</em>. Canonical XML implementations must now (correctly)
|
|
exclude this character from character composition during [<a href="#ref-NFC">NFC</a>]
|
|
processing.</p>
|
|
|
|
<p>The Canonical XML specification was reviewed extensively during its development, as
|
|
provided by the W3C Process. The Working Group successfully resolved all issues raised
|
|
during <a href="http://www.w3.org/Signature/2000/09/06-c14n-last-call-issues.html">last
|
|
call and call for implementation</a> and documented the existence of interoperable
|
|
implementations in its <a
|
|
href="http://www.w3.org/Signature/2000/10/10-c14n-interop.html">interoperability report</a>.</p>
|
|
|
|
<p>Please report errors in this document to the editor and cc: the public email list <a
|
|
href="mailto:w3c-ietf-xmldsig@w3.org">w3c-ietf-xmldsig@w3.org</a>. Any such errors will be
|
|
documented in an errata available at <a href="http://www.w3.org/2001/03/C14N-errata">http://www.w3.org/2001/03/C14N-errata</a>.</p>
|
|
|
|
<p>A list of all current W3C Technical Reports can be found at <a
|
|
href="http://www.w3.org/TR/">http://www.w3.org/TR</a>. </p>
|
|
|
|
<h2><a id="contents" name="contents">Table of Contents</a></h2>
|
|
<ol>
|
|
<li><a href="#Intro">Introduction</a>
|
|
<ol>
|
|
<li><a href="#Terminology">Terminology</a></li>
|
|
<li><a href="#Applications">Applications</a></li>
|
|
<li><a href="#Limitations">Limitations</a></li>
|
|
</ol>
|
|
</li>
|
|
<li><a href="#XMLCanonicalization">XML Canonicalization</a>
|
|
<ol>
|
|
<li><a href="#DataModel">Data Model</a></li>
|
|
<li><a href="#DocumentOrder">Document Order</a></li>
|
|
<li><a href="#ProcessingModel">Processing Model</a></li>
|
|
<li><a href="#DocSubsets">Document Subsets</a></li>
|
|
</ol>
|
|
</li>
|
|
<li><a href="#Examples">Examples of XML Canonicalization</a>
|
|
<ol>
|
|
<li><a href="#Example-OutsideDoc">PIs, Comments, and Outside of Document
|
|
Element</a></li>
|
|
<li><a href="#Example-WhitespaceInContent">Whitespace in Document
|
|
Content</a></li>
|
|
<li><a href="#Example-SETags">Start and End Tags</a></li>
|
|
<li><a href="#Example-Chars">Character Modifications and Character
|
|
References</a></li>
|
|
<li><a href="#Example-Entities">Entity References</a></li>
|
|
<li><a href="#Example-UTF8">UTF-8 Encoding</a></li>
|
|
<li><a href="#Example-DocSubsets">Document Subsets</a></li>
|
|
</ol>
|
|
</li>
|
|
<li><a href="#Resolutions">Resolutions</a>
|
|
<ol>
|
|
<li><a href="#NoXMLDecl">No XML Declaration</a></li>
|
|
<li><a href="#NoCharModelNorm">No Character Model Normalization</a></li>
|
|
<li><a href="#WhitespaceRoot">Handling of Whitespace Outside Document Element</a></li>
|
|
<li><a href="#NoNSPrefixRewriting">No Namespace Prefix Rewriting</a></li>
|
|
<li><a href="#NSAttrOrder">Order of Namespace Declarations and Attributes</a></li>
|
|
<li><a href="#SuperfluousNSDecl">Superfluous Namespace Declarations</a></li>
|
|
<li><a href="#PropagateDefaultNSDecl">Propagation of Default Namespace Declaration in Document Subsets</a></li>
|
|
<li><a href="#SortByNSURI">Sorting Attributes by Namespace URI</a></li>
|
|
</ol>
|
|
</li>
|
|
<li><a href="#bibliography">References</a></li>
|
|
<li><a href="#acks">Acknowledgements</a></li>
|
|
</ol>
|
|
<hr />
|
|
<!-- =============================================================================== -->
|
|
|
|
<h2><a id="Intro" name="Intro"></a>1 Introduction</h2>
|
|
|
|
<p>The XML 1.0 Recommendation <a href="#XML">[XML]</a> specifies the syntax of
|
|
a class of resources called XML documents. The Namespaces in XML Recommendation
|
|
<a href="#namespaces">[Names]</a> specifies additional syntax and semantics
|
|
for XML documents. It is possible for XML documents which are equivalent for
|
|
the purposes of many applications to differ in physical representation. For
|
|
example, they may differ in their entity structure, attribute ordering, and
|
|
character encoding. It is the goal of this specification to establish a method
|
|
for determining whether two documents are identical, or whether an application
|
|
has not changed a document, except for transformations permitted by XML 1.0
|
|
and Namespaces in XML.</p>
|
|
|
|
<h3><a id="Terminology" name="Terminology">1.1 Terminology</a></h3>
|
|
|
|
<p>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
|
|
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document
|
|
are to be interpreted as described in RFC 2119 <a
|
|
href="#Keywords">[Keywords]</a>.</p>
|
|
|
|
<p>See <a href="#namespaces">[Names]</a> for the definition of <a
|
|
href="http://www.w3.org/TR/REC-xml-names/#NT-QName">QName</a>.</p>
|
|
|
|
<p>A <i>document subset</i> is a portion of an XML document indicated by a
|
|
node-set that may not include all of the nodes in the document.</p>
|
|
|
|
<p>The <i>canonical form</i> of an XML document is physical representation of
|
|
the document produced by the method described in this specification. The
|
|
changes are summarized in the following list:</p>
|
|
<ul>
|
|
<li>The document is encoded in <a href="#UTF-8">UTF-8</a></li>
|
|
<li>Line breaks normalized to #xA on input, before parsing</li>
|
|
<li>Attribute values are normalized, as if by a validating processor</li>
|
|
<li>Character and parsed entity references are replaced</li>
|
|
<li>CDATA sections are replaced with their character content</li>
|
|
<li>The XML declaration and document type declaration (DTD) are removed</li>
|
|
<li>Empty elements are converted to start-end tag pairs</li>
|
|
<li>Whitespace outside of the document element and within start and end tags
|
|
is normalized</li>
|
|
<li>All whitespace in character content is retained (excluding characters
|
|
removed during line feed normalization)</li>
|
|
<li>Attribute value delimiters are set to quotation marks (double quotes)</li>
|
|
<li>Special characters in attribute values and character content are
|
|
replaced by character references</li>
|
|
<li>Superfluous namespace declarations are removed from each element</li>
|
|
<li>Default attributes are added to each element</li>
|
|
<li>Lexicographic order is imposed on the namespace declarations and
|
|
attributes of each element</li>
|
|
</ul>
|
|
|
|
<p>The term <i>canonical XML</i> refers to XML that is in canonical form. The
|
|
<i>XML canonicalization method</i> is the algorithm defined by this
|
|
specification that generates the canonical form of a given XML document or
|
|
document subset. The term <i>XML canonicalization</i> refers to the process of
|
|
applying the XML canonicalization method to an XML document or document
|
|
subset.</p>
|
|
|
|
<p>The XPath 1.0 Recommendation <a href="#XPath">[XPath]</a> defines the term
|
|
<i>node-set</i> and specifies a data model for representing an input XML
|
|
document as a set of nodes of various types (element, attribute, namespace,
|
|
text, comment, processing instruction, and root). The nodes are included in or
|
|
excluded from a node-set based on the evaluation of an expression. Within this
|
|
specification, a node-set is used to directly indicate whether or not each
|
|
node should be rendered in the canonical form (in this sense, it is used as a
|
|
formal mathematical set). A node that is excluded from the set is not rendered
|
|
in the canonical form being generated, even if its parent node is included in
|
|
the node-set. However, an omitted node may still impact the rendering of its
|
|
descendants (e.g. by augmenting the namespace context of the descendants).</p>
|
|
|
|
<h3><a id="Applications" name="Applications">1.2 Applications</a></h3>
|
|
|
|
<p>Since the XML 1.0 Recommendation <a href="#XML">[XML]</a> and the
|
|
Namespaces in XML Recommendation <a href="#namespaces"> [Names]</a> define
|
|
multiple syntactic methods for expressing the same information, XML
|
|
applications tend to take liberties with changes that have no impact on the
|
|
information content of the document. XML canonicalization is designed to be
|
|
useful to applications that require the ability to test whether the
|
|
information content of a document or document subset has been changed. This is
|
|
done by comparing the canonical form of the original document before
|
|
application processing with the canonical form of the document result of the
|
|
application processing.</p>
|
|
|
|
<p>For example, a digital signature over the canonical form of an XML document
|
|
or document subset would allow the signature digest calculations to be
|
|
oblivious to changes in the original document's physical representation,
|
|
provided that the changes are defined to be logically equivalent by the XML
|
|
1.0 or Namespaces in XML. During signature generation, the digest is computed
|
|
over the canonical form of the document. The document is then transferred to
|
|
the relying party, which validates the signature by reading the document and
|
|
computing a digest of the canonical form of the received document. The
|
|
equivalence of the digests computed by the signing and relying parties (and
|
|
hence the equivalence of the canonical forms over which they were computed)
|
|
ensures that the information content of the document has not been altered since
|
|
it was signed.</p>
|
|
|
|
<h3><a id="Limitations" name="Limitations">1.3 Limitations</a></h3>
|
|
|
|
<p>Two XML documents may have differing information content that is
|
|
nonetheless logically equivalent within a given application context. Although
|
|
two XML documents are equivalent (aside from limitations given in this section)
|
|
if their canonical forms are identical, it is not a goal of this work to establish
|
|
a method such that two XML documents are equivalent if <i>and only if</i> their
|
|
canonical forms are identical. Such a method is unachievable, in part due to
|
|
application-specific rules such as those governing unimportant whitespace and
|
|
equivalent data (e.g. <code><color>black</color></code> versus
|
|
<code><color>rgb(0,0,0)</color></code>). There are also equivalencies
|
|
established by other W3C Recommendations and Working Drafts. Accounting for
|
|
these additional equivalence rules is beyond the scope of this work. They can
|
|
be applied by the application or become the subject of future
|
|
specifications.</p>
|
|
|
|
<p>The canonical form of an XML document may not be completely operational
|
|
within the application context, though the circumstances under which this
|
|
occurs are unusual. This problem may be of concern in certain applications
|
|
since the canonical form of a document and the canonical form of the
|
|
canonical form of the document are equivalent. For example, in a digital
|
|
signature application, it cannot be established whether the operational
|
|
original document or the non-operational canonical form was signed
|
|
because the canonical form can be substituted for the original document
|
|
without changing the digest calculation. However, the security risk only
|
|
occurs in the unusual circumstances described below, which can all be
|
|
resolved or at least detected prior to digital signature generation.</p>
|
|
|
|
<p>The difficulties arise due to the loss of the following information not
|
|
available in the <a href="#DataModel">data model</a>:</p>
|
|
<ol>
|
|
<li>base URI, especially in content derived from the replacement text of
|
|
external general parsed entity references</li>
|
|
<li>notations and external unparsed entity references</li>
|
|
<li>attribute types in the document type declaration</li>
|
|
</ol>
|
|
|
|
<p>In the first case, note that a document containing a relative URI <a
|
|
href="#URI">[URI]</a> is only operational when accessed from a specific URI
|
|
that provides the proper base URI. In addition, if the document contains
|
|
external general parsed entity references to content containing relative URIs,
|
|
then the relative URIs will not be operational in the canonical form, which
|
|
replaces the entity reference with internal content (thereby implicitly
|
|
changing the default base URI of that content). Both of these problems can
|
|
typically be solved by adding support for the <code>xml:base</code> attribute
|
|
<a href="#XBase">[XBase]</a> to the application, then adding appropriate
|
|
<code>xml:base</code> attributes to document element and all top-level
|
|
elements in external entities. In addition, applications often have an
|
|
opportunity to resolve relative URIs prior to the need for a canonical form.
|
|
For example, in a digital signature application, a document is often retrieved
|
|
and processed prior to signature generation. The processing SHOULD create a
|
|
new document in which relative URIs have been converted to absolute URIs,
|
|
thereby mitigating any security risk for the new document.</p>
|
|
|
|
<p>In the second case, the loss of external unparsed entity references and the
|
|
notations that bind them to applications means that canonical forms cannot
|
|
properly distinguish among XML documents that incorporate unparsed data via
|
|
this mechanism. This is an unusual case precisely because most XML processors
|
|
currently discard the document type declaration, which discards the notation,
|
|
the entity's binding to a URI, and the attribute type that binds the attribute
|
|
value to an entity name. For documents that must be subjected to more than one
|
|
XML processor, the XML design typically indicates a reference to unparsed data
|
|
using a URI in the attribute value.</p>
|
|
|
|
<p>In the third case, the loss of attribute types can affect the canonical
|
|
form in different ways depending on the type. Attributes of type ID cease to
|
|
be ID attributes. Hence, any XPath expressions that refer to the canonical
|
|
form using the <code>id()</code> function cease to operate. The attribute
|
|
types ENTITY and ENTITIES are not part of this case; they are covered in the
|
|
second case above. Attributes of enumerated type and of type ID, IDREF,
|
|
IDREFS, NMTOKEN, NMTOKENS, and NOTATION fail to be appropriately constrained
|
|
during future attempts to change the attribute value if the canonical form
|
|
replaces the original document during application processing. Applications can
|
|
avoid the difficulties of this case by ensuring that an appropriate document
|
|
type declaration is prepended prior to using the canonical form in further XML
|
|
processing. This is likely to be an easy task since attribute lists are
|
|
usually acquired from a standard external DTD subset, and any entity and
|
|
notation declarations not also in the external DTD subset are typically
|
|
constructed from application configuration information and added to the
|
|
internal DTD subset.</p>
|
|
|
|
<p>While these limitations are not severe, it would be possible to resolve them
|
|
in a future version of XML canonicalization if, for example, a new version of
|
|
XPath were created based on the XML Information Set <a href="#Infoset">[Infoset]</a>
|
|
currently under development at the W3C.</p>
|
|
<!-- =============================================================================== -->
|
|
|
|
<h2><a id="XMLCanonicalization" name="XMLCanonicalization">2 XML
|
|
Canonicalization</a></h2>
|
|
|
|
<h3><a id="DataModel" name="DataModel"></a>2.1 Data Model</h3>
|
|
|
|
<p>The data model defined in the XPath 1.0 Recommendation <a
|
|
href="#XPath">[XPath]</a> is used to represent the input XML document or
|
|
document subset. Implementations SHOULD but need not be based on an XPath
|
|
implementation. XML canonicalization is defined in terms of the XPath
|
|
definition of a node-set, and implementations MUST produce equivalent
|
|
results.</p>
|
|
|
|
<p>The first parameter of input to the XML canonicalization method is either
|
|
an XPath node-set or an octet stream containing a well-formed XML document.
|
|
Implementations MUST support the octet stream input and SHOULD also support
|
|
the document subset feature via node-set input. For the purpose of describing
|
|
canonicalization in terms of an XPath node-set, this section describes how an
|
|
octet stream is converted to an XPath node-set.</p>
|
|
|
|
<p><a id="WithComments" name="WithComments">The second parameter of input to
|
|
the XML canonicalization method is a boolean flag indicating whether or not
|
|
comments should be included in the canonical form output by the XML
|
|
canonicalization method.</a> If a canonical form contains comments
|
|
corresponding to the comment nodes in the input node-set, the result is called
|
|
<i>canonical XML with comments</i>. Note that the XPath data model does not
|
|
create comment nodes for comments appearing within the document type declaration
|
|
(DTD). Implementations are REQUIRED to be capable of producing canonical XML
|
|
excluding all comments that may have appeared in the input document or document
|
|
subset. Support for canonical XML with comments is RECOMMENDED.</p>
|
|
|
|
<p>If an XML document must be converted to a node-set, XPath REQUIRES that an
|
|
XML processor be used to create the nodes of its data model to fully represent
|
|
the document. The XML processor performs the following tasks in order:</p>
|
|
<ol>
|
|
<li>normalize line feeds</li>
|
|
<li>normalize attribute values</li>
|
|
<li>replace CDATA sections with their character content</li>
|
|
<li>resolve character and parsed entity references</li>
|
|
</ol>
|
|
|
|
<p>The input octet stream MUST contain a well-formed XML document, but the
|
|
input need not be validated. However, the attribute value normalization and
|
|
entity reference resolution MUST be performed in accordance with the behaviors
|
|
of a validating XML processor. As well, nodes for default attributes (declared
|
|
in the ATTLIST with an <a
|
|
href="http://www.w3.org/TR/REC-xml#NT-AttValue">AttValue</a> but not
|
|
specified) are created in each element. Thus, the declarations in the document
|
|
type declaration are used to help create the canonical form, even though the
|
|
document type declaration is not retained in the canonical form.</p>
|
|
|
|
<p>The XPath data model represents data using UCS characters. Implementations
|
|
MUST use XML processors that support <a href="#UTF-8">UTF-8</a> and <a
|
|
href="#UTF-16">UTF-16</a> and translate to the UCS character domain. For
|
|
UTF-16, the leading byte order mark is treated as an artifact of encoding and
|
|
stripped from the UCS character data (subsequent zero width non-breaking
|
|
spaces appearing within the UTF-16 data are not removed) <a
|
|
href="#UTF-16">[UTF-16, Section 3.2]</a>. Support for <a
|
|
href="#ISO-8859-1">ISO-8859-1</a> encoding is RECOMMENDED, and all other
|
|
character encodings are OPTIONAL.</p>
|
|
|
|
<p>All whitespace within the root document element MUST be preserved (except
|
|
for any #xD characters deleted by line delimiter normalization). This includes
|
|
all whitespace in external entities. Whitespace outside of the root document
|
|
element MUST be discarded.</p>
|
|
|
|
<p>In the XPath data model, there exist the following node types: root,
|
|
element, comment, processing instruction, text, attribute and namespace. There
|
|
exists a single root node whose children are processing instruction nodes and
|
|
comment nodes to represent information outside of the document element (and
|
|
outside of the document type declaration). The root node also has a single
|
|
element node representing the top-level document element. Each element node
|
|
can have child nodes of type element, text, processing instruction, and
|
|
comment. The attributes and namespaces associated with an element are not
|
|
considered to be child nodes of the element, but they are associated with the
|
|
element by inclusion in the element's attribute and namespace axes. Note that
|
|
attribute and namespace axes may not directly correspond to the text appearing
|
|
in the element's start tag in the original document.</p>
|
|
|
|
<p><b>Note:</b> An element has attribute nodes to represent the non-namespace
|
|
attribute declarations appearing in its start tag <i> as well as</i> nodes to
|
|
represent the default attributes.</p>
|
|
|
|
<p>By virtue of the XPath data model, XML canonicalization is namespace-aware
|
|
<a href="#namespaces">[Names]</a>. However, it cannot and therefore does not
|
|
account for namespace equivalencies using namespace prefix rewriting (see <a
|
|
href="#NoNSPrefixRewriting">explanation in Section 4</a>). In the XPath data
|
|
model, each element and attribute has a name returned by the function
|
|
<code>name()</code> which can, at the discretion of the application, be the
|
|
QName appearing in the original document. XML canonicalization REQUIRES that
|
|
the XML processor retain sufficient information such that the QName of the
|
|
element as it appeared in the original document can be provided.</p>
|
|
|
|
<p><b>Note:</b> An element <b><i>E</i></b> has namespace nodes that represent
|
|
its namespace declarations <i>as well as</i> any namespace declarations made
|
|
by its ancestors that have not been overridden in <b><i>E</i></b>'s
|
|
declarations, the default namespace if it is non-empty, and the declaration of
|
|
the prefix <code>xml</code>.</p>
|
|
|
|
<p><b>Note:</b> This specification supports the recent
|
|
<a href="#PlenaryDecision">XML plenary decision</a> to deprecate relative
|
|
namespace URIs as follows: implementations of XML canonicalization MUST
|
|
report an operation failure on documents containing relative namespace URIs.
|
|
XML canonicalization MUST NOT be implemented with an XML parser that converts
|
|
relative URIs to absolute URIs.</p>
|
|
|
|
<p>Character content is represented in the XPath data model with text nodes.
|
|
All consecutive characters are placed into a single text node. Furthermore,
|
|
the text node's characters are represented in the UCS character domain. The
|
|
XML canonicalization method does not perform character model normalization
|
|
(see <a href="#NoCharModelNorm">explanation in Section 4</a>). However, the XML
|
|
processor used to prepare the XPath data model input is REQUIRED to use
|
|
Unicode Normalization Form C [<a href="#ref-NFC">NFC</a>,
|
|
<a href="#NFC-Corrigendum">NFC-Corrigendum</a>] when converting an XML document
|
|
to the UCS character domain from any encoding that is not UCS-based (currently,
|
|
UCS-based encodings include UTF-8, UTF-16, UTF-16BE, and UTF-16LE, UCS-2, and
|
|
UCS-4).</p>
|
|
|
|
<p>Since XML canonicalization converts an XPath node-set into a canonical
|
|
form, the first parameter MUST either be an XPath node-set or it must be
|
|
converted from an octet stream to a node-set by performing the XML processing
|
|
necessary to create the XPath nodes described above, then setting an initial
|
|
XPath evaluation context of:</p>
|
|
<ul>
|
|
<li>A <b>context node</b>, initialized to the root node of the input XML
|
|
document.</li>
|
|
<li>A <b>context position</b>, initialized to 1.</li>
|
|
<li>A <b>context size</b>, initialized to 1.</li>
|
|
<li>Any <b>library of functions</b> conforming to the XPath
|
|
Recommendation.</li>
|
|
<li>An empty set of <b>variable bindings</b>.</li>
|
|
<li>An empty set of <b>namespace declarations</b>.</li>
|
|
</ul>
|
|
|
|
<p>and evaluating the following default expression:</p>
|
|
|
|
<table cellpadding="5" border="1" bgcolor="#80ffff" width="100%">
|
|
<tbody>
|
|
<tr align="left">
|
|
<td><strong>Comment Parameter Value</strong></td>
|
|
<td><strong><a name="DefaultExpression" id="DefaultExpression">Default
|
|
XPath Expression</a></strong></td>
|
|
</tr>
|
|
<tr>
|
|
<td>Without (false)</td>
|
|
<td><code>(//. | //@* |
|
|
//namespace::*)[not(self::comment())]</code></td>
|
|
</tr>
|
|
<tr>
|
|
<td>With (true)</td>
|
|
<td><code>(//. | //@* | //namespace::*)</code></td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
|
|
<p>The expressions in this table generate a node-set containing every node of
|
|
the XML document (except the comments if the comment parameter value is
|
|
false).</p>
|
|
|
|
<p>If the input is an XPath node-set, then the node-set must explicitly
|
|
contain every node to be rendered to the canonical form. For example, the
|
|
result of the XPath expression <code> id("E")</code> is a node-set containing
|
|
only the node corresponding to the element with an ID attribute value of "E".
|
|
Since none of its descendant nodes, attribute nodes and namespace nodes are in
|
|
the set, the canonical form would consist solely of the element's start and
|
|
end tags, less the attribute and namespace declarations, with no internal
|
|
content. <a href="#Example-DocSubsets">Section 3.7</a> exemplifies how to
|
|
serialize an identified element along with its internal content, attributes
|
|
and namespace declarations.</p>
|
|
<!-- =============================================================================== -->
|
|
|
|
<h3><a id="DocumentOrder" name="DocumentOrder"></a>2.2 Document Order</h3>
|
|
|
|
<p>Although an XPath node-set is defined to be unordered, the XPath 1.0
|
|
Recommendation <a href="#XPath">[XPath]</a> defines the term <i>document
|
|
order</i> to be the order in which the first character of the XML
|
|
representation of each node occurs in the XML representation of the document
|
|
after expansion of general entities, except for namespace and attribute nodes
|
|
whose document order is application-dependent.</p>
|
|
|
|
<p>The XML canonicalization method processes a node-set by imposing the
|
|
following additional document order rules on the namespace and attribute nodes
|
|
of each element:</p>
|
|
<ul>
|
|
<li>An element's namespace and attribute nodes have a document order
|
|
position greater than the element but less than any child node of the
|
|
element.</li>
|
|
<li>Namespace nodes have a lesser document order position than attribute
|
|
nodes.</li>
|
|
<li>An element's namespace nodes are sorted lexicographically by local name
|
|
(the default namespace node, if one exists, has no local name and is
|
|
therefore lexicographically least).</li>
|
|
<li>An element's attribute nodes are sorted lexicographically with namespace
|
|
URI as the primary key and local name as the secondary key (an empty
|
|
namespace URI is lexicographically least).</li>
|
|
</ul>
|
|
|
|
<p>Lexicographic comparison, which orders strings from least to greatest
|
|
alphabetically, is based on the UCS codepoint values, which is
|
|
equivalent to lexicographic ordering based on UTF-8.</p>
|
|
<!-- =============================================================================== -->
|
|
|
|
<h3><a id="ProcessingModel" name="ProcessingModel"></a>2.3 Processing
|
|
Model</h3>
|
|
|
|
<p>The XPath node-set is converted into an octet stream, the canonical form,
|
|
by generating the representative UCS characters for each node in the node-set
|
|
in ascending <a href="#DocumentOrder"> document order</a>, then encoding the
|
|
result in UTF-8 (without a leading byte order mark). No node is processed more
|
|
than once. Note that processing an element node <b><i>E</i></b> includes the
|
|
processing of all members of the node-set for which <b><i>E</i></b> is an
|
|
ancestor. Therefore, directly after the representative text for
|
|
<b><i>E</i></b> is generated, <b><i>E</i></b> and all nodes for which
|
|
<b><i>E</i></b> is an ancestor are removed from the node-set (or some
|
|
logically equivalent operation occurs such that the node-set's next node in
|
|
document order has not been processed). Note, however, that an element node is
|
|
not removed from the node-set until after its children are processed.</p>
|
|
|
|
<p>The result of processing a node depends on its type and on whether or not
|
|
it is in the node-set. If a node is not in the node-set, then no text is
|
|
generated for the node except for the result of processing its namespace and
|
|
attribute axes (elements only) and its children (elements and the root node).
|
|
If the node is in the node-set, then text is generated to represent the node
|
|
in the canonical form in addition to the text generated by processing the
|
|
node's namespace and attribute axes and child nodes.</p>
|
|
|
|
<p><b>NOTE:</b> The node-set is treated as a set of nodes, not a list of
|
|
subtrees. To canonicalize an element including its namespaces, attributes, and
|
|
content, the node-set must actually contain all of the nodes corresponding to
|
|
these parts of the document, not just the element node.</p>
|
|
|
|
<p>The text generated for a node is dependent on the node type and given in
|
|
the following list:</p>
|
|
<ul>
|
|
<li><b>Root Node-</b> The root node is the parent of the top-level
|
|
document element. The result of processing each of its child nodes that
|
|
is in the node-set in document order. The root node does not generate a
|
|
byte order mark, XML declaration, nor anything from within the document
|
|
type declaration.</li>
|
|
<li><b>Element Nodes-</b> If the element is not in the node-set, then the
|
|
result is obtained by processing the namespace axis, then the attribute
|
|
axis, then processing the child nodes of the element that are in the
|
|
node-set (in document order). If the element is in the node-set, then the
|
|
result is an open angle bracket (<), the element QName, the result of
|
|
processing the namespace axis, the result of processing the attribute
|
|
axis, a close angle bracket (>), the result of processing the child
|
|
nodes of the element that are in the node-set (in document order), an open
|
|
angle bracket, a forward slash (/), the element QName, and a close angle
|
|
bracket.</li>
|
|
<li style="list-style: none">
|
|
<ul>
|
|
<li><i>Namespace Axis-</i> Consider a list <b><i>L</i></b> containing
|
|
only namespace nodes in the axis and in the node-set in lexicographic
|
|
order (ascending). To begin processing <b><i>L</i></b>,
|
|
if the first node is not the default namespace node (a node with no
|
|
namespace URI and no local name), then generate a space followed by
|
|
<code>xmlns=""</code> <i>if and only</i> if the following conditions
|
|
are met: <br />
|
|
<br />
|
|
|
|
<ul>
|
|
<li>the element <b><i>E</i></b> that owns the axis is in the
|
|
node-set</li>
|
|
<li>The nearest ancestor element of <b><i>E</i></b> in the node-set
|
|
has a default namespace node in the node-set (default namespace
|
|
nodes always have non-empty values in XPath)</li>
|
|
</ul>
|
|
<p>The latter condition eliminates unnecessary occurrences of
|
|
<code>xmlns=""</code> in the canonical form since an element only
|
|
receives an <code>xmlns=""</code> if its default namespace is empty
|
|
and if it has an immediate parent in the canonical form that has a
|
|
non-empty default namespace. To finish processing <b><i>L</i></b>,
|
|
simply process every namespace node in <b><i>L</i></b>, except omit
|
|
namespace node with local name <code>xml</code>, which defines
|
|
the <code>xml</code> prefix, if its string value is
|
|
<code>http://www.w3.org/XML/1998/namespace</code>.</p>
|
|
</li>
|
|
<li><i>Attribute Axis-</i> In lexicographic order (ascending), process
|
|
each node that is in the element's attribute axis and in the node-set.</li>
|
|
</ul>
|
|
</li>
|
|
<li><b>Namespace Nodes-</b> A namespace node <b><i>N</i></b> is ignored if
|
|
the nearest ancestor element of the node's parent element that is in the
|
|
node-set has a namespace node in the node-set with the same local name and
|
|
value as <b><i>N</i></b>. Otherwise, process the namespace node
|
|
<b><i>N</i></b> in the same way as an attribute node, except assign the
|
|
local name <code>xmlns</code> to the default namespace node if it exists
|
|
(in XPath, the default namespace node has an empty URI and local
|
|
name).
|
|
</li>
|
|
<li><b>Attribute Nodes-</b> a space, the node's QName, an equals sign, an
|
|
open quotation mark (double quote), the modified string value, and a close
|
|
quotation mark (double quote).
|
|
The string value of the node is modified by replacing all ampersands
|
|
(&) with <code>&amp;</code>, all open angle brackets (<) with
|
|
<code>&lt;</code>, all quotation mark characters with
|
|
<code>&quot;</code>, and the whitespace characters #x9, #xA, and #xD,
|
|
with character references. The character references are written in
|
|
uppercase hexadecimal with no leading zeroes (for example, #xD is
|
|
represented by the character reference <code>&#xD;</code>).</li>
|
|
<li><b>Text Nodes-</b> the string value, except all ampersands are replaced
|
|
by <code>&amp;</code>, all open angle brackets (<) are replaced by
|
|
<code>&lt;</code>, all closing angle brackets (>) are replaced by
|
|
<code>&gt;</code>, and all #xD characters are replaced by
|
|
<code>&#xD;</code>.</li>
|
|
<li><b>Processing Instruction (PI) Nodes-</b> The opening PI symbol
|
|
(<code><?</code>), the PI target name of the node, a leading space and
|
|
the string value if it is not empty, and the closing PI symbol
|
|
(<code>?></code>). If the string value is empty, then the leading space
|
|
is not added. Also, a trailing #xA is rendered after the closing PI symbol
|
|
for PI children of the root node with a lesser document order than the
|
|
document element, and a leading #xA is rendered before the opening PI
|
|
symbol of PI children of the root node with a greater document order than
|
|
the document element.</li>
|
|
<li><b>Comment Nodes-</b> Nothing if generating canonical XML without
|
|
comments. For canonical XML with comments, generate the opening comment
|
|
symbol (<code><!--</code>), the string value of the node, and the
|
|
closing comment symbol (<code>--></code>). Also, a trailing #xA is
|
|
rendered after the closing comment symbol for comment children of the root
|
|
node with a lesser document order than the document element, and a leading
|
|
#xA is rendered before the opening comment symbol of comment children of
|
|
the root node with a greater document order than the document element.
|
|
(Comment children of the root node represent comments outside of the
|
|
top-level document element and outside of the document type declaration).</li>
|
|
</ul>
|
|
|
|
<p>The <a href="http://www.w3.org/TR/REC-xml-names/#NT-QName">QName</a> of a
|
|
node is either the local name if the namespace prefix string is empty or the
|
|
namespace prefix, a colon, then the local name of the element. The namespace
|
|
prefix used in the QName MUST be the same one which appeared in the input
|
|
document.</p>
|
|
<!-- =============================================================================== -->
|
|
|
|
<h3><a id="DocSubsets" name="DocSubsets"></a>2.4 Document Subsets</h3>
|
|
|
|
<p>Some applications require the ability to create a physical representation
|
|
for an XML document subset (other than the one generated by default, which can
|
|
be a proper subset of the document if the comments are omitted).
|
|
Implementations of XML canonicalization that are based on XPath can provide
|
|
this functionality with little additional overhead by accepting a node-set as
|
|
input rather than an octet stream.</p>
|
|
|
|
<p>The processing of an element node <b><i>E</i></b> MUST be modified slightly
|
|
when an XPath node-set is given as input and the element's parent is omitted
|
|
from the node-set. The method for processing the attribute axis of an element
|
|
<b><i>E</i></b> in the node-set is enhanced. All element nodes along
|
|
<b><i>E</i></b>'s <code>ancestor</code> axis are examined for nearest
|
|
occurrences of attributes in the <code>xml</code> namespace, such as <code>
|
|
xml:lang</code> and <code>xml:space</code> (whether or not they are in the
|
|
node-set). From this list of attributes, remove any that are in
|
|
<b><i>E</i></b>'s attribute axis (whether or not they are in the node-set).
|
|
Then, lexicographically merge this attribute list with the nodes of
|
|
<b><i>E</i></b>'s attribute axis that are in the node-set. The result of
|
|
visiting the attribute axis is computed by processing the attribute nodes in
|
|
this merged attribute list.</p>
|
|
|
|
<p><b>NOTE:</b> XML entities can derive application-specific meaning from
|
|
anywhere in the XML markup as well as by rules not expressed in XML 1.0 and
|
|
the Namespaces in XML Recommendations. Clearly, these rules cannot be specified
|
|
in this document, so the creator of the input node-set must be responsible for
|
|
preserving the information necessary to capture the full semantics of the
|
|
members of the resulting node-set.</p>
|
|
|
|
<p>The canonical XML generated for an entire XML document is well-formed. The
|
|
canonical form of an XML document subset may not be well-formed XML. However,
|
|
since the canonical form may be subjected to further XML processing,
|
|
most XPath node-sets provided for canonicalization will be designed to produce
|
|
a canonical form that is a well-formed XML document or external general parsed
|
|
entity. Whether from a full document or a document subset, if the canonical
|
|
form is well-formed XML, then subsequent applications of the same XML
|
|
canonicalization method to the canonical form make no changes.</p>
|
|
<!-- =============================================================================== -->
|
|
|
|
<h2><a id="Examples" name="Examples"></a>3 Examples of XML
|
|
Canonicalization</h2>
|
|
|
|
<p>The examples in this section assume a non-validating processor, primarily so
|
|
that a document type declaration can be used to declare entities as well as
|
|
default attributes and attributes of various types (such as ID and enumerated) without
|
|
having to declare all attributes for all elements in the document. As well, one
|
|
example contains an element that deliberately violates a validity constraint (because
|
|
it is still well-formed).</p>
|
|
|
|
<h3><a id="Example-OutsideDoc" name="Example-OutsideDoc"></a>3.1 PIs,
|
|
Comments, and Outside of Document Element</h3>
|
|
|
|
<table cellpadding="5" border="1" bgcolor="#80ffff" width="100%">
|
|
<tbody>
|
|
<tr>
|
|
<td width="30%"><strong>Input Document</strong></td>
|
|
<td>
|
|
<code>
|
|
<?xml version="1.0"?><br/>
|
|
<br/>
|
|
<?xml-stylesheet href="doc.xsl"<br/>
|
|
type="text/xsl" ?><br/>
|
|
<br/>
|
|
<!DOCTYPE doc SYSTEM "doc.dtd"><br/>
|
|
<br/>
|
|
<doc>Hello, world!<!-- Comment 1 --></doc><br/>
|
|
<br/>
|
|
<?pi-without-data ?><br/>
|
|
<br/>
|
|
<!-- Comment 2 --><br/>
|
|
<br/>
|
|
<!-- Comment 3 --><br/>
|
|
</code>
|
|
<!--
|
|
<?xml version="1.0"?>
|
|
|
|
<?xml-stylesheet href="doc.xsl"
|
|
type="text/xsl" ?>
|
|
|
|
<!DOCTYPE doc SYSTEM "doc.dtd">
|
|
|
|
<doc>Hello, world!<!== Comment 1 ==></doc>
|
|
|
|
<?pi-without-data ?>
|
|
|
|
<!== Comment 2 ==>
|
|
|
|
<!== Comment 3 ==>
|
|
-->
|
|
</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td width="30%"><strong>Canonical Form (uncommented)</strong></td>
|
|
<td>
|
|
<code>
|
|
<?xml-stylesheet href="doc.xsl"<br/>
|
|
type="text/xsl" ?><br/>
|
|
<doc>Hello, world!</doc><br/>
|
|
<?pi-without-data?>
|
|
</code>
|
|
<!--
|
|
<?xml-stylesheet href="doc.xsl"
|
|
type="text/xsl" ?>
|
|
<doc>Hello, world!</doc>
|
|
<?pi-without-data?>-->
|
|
</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td width="30%"><strong>Canonical Form (commented)</strong></td>
|
|
<td>
|
|
<code>
|
|
<?xml-stylesheet href="doc.xsl"<br/>
|
|
type="text/xsl" ?><br/>
|
|
<doc>Hello, world!<!-- Comment 1 --></doc><br/>
|
|
<?pi-without-data?><br/>
|
|
<!-- Comment 2 --><br/>
|
|
<!-- Comment 3 -->
|
|
</code>
|
|
<!--
|
|
<?xml-stylesheet href="doc.xsl"
|
|
type="text/xsl" ?>
|
|
<doc>Hello, world!<!== Comment 1 ==></doc>
|
|
<?pi-without-data?>
|
|
<!== Comment 2 ==>
|
|
<!== Comment 3 ==>-->
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
|
|
<p>Demonstrates:</p>
|
|
<ul>
|
|
<li>Loss of XML declaration</li>
|
|
<li>Loss of DTD</li>
|
|
<li>Normalization of whitespace outside of document element (first character
|
|
of both canonical forms is '<'; single line breaks separate PIs and
|
|
comments outside of document element)</li>
|
|
<li>Loss of whitespace between PITarget and its data</li>
|
|
<li>Retention of whitespace inside PI data</li>
|
|
<li>Comment removal from uncommented canonical form, including delimiter for
|
|
comments outside document element (the last character in both canonical
|
|
forms is '>')</li>
|
|
</ul>
|
|
|
|
<h3><a id="Example-WhitespaceInContent"
|
|
name="Example-WhitespaceInContent"></a>3.2 Whitespace in Document Content</h3>
|
|
|
|
<table cellpadding="5" border="1" bgcolor="#80ffff" width="100%">
|
|
<tbody><tr>
|
|
<td width="30%"><strong>Input Document</strong></td>
|
|
<td>
|
|
<code>
|
|
<doc><br/>
|
|
<clean> </clean><br/>
|
|
<dirty> A B </dirty><br/>
|
|
<mixed><br/>
|
|
A<br/>
|
|
<clean> </clean><br/>
|
|
B<br/>
|
|
<dirty> A B </dirty><br/>
|
|
C<br/>
|
|
</mixed><br/>
|
|
</doc>
|
|
</code>
|
|
<!--
|
|
<doc>
|
|
<clean> </clean>
|
|
<dirty> A B </dirty>
|
|
<mixed>
|
|
A
|
|
<clean> </clean>
|
|
B
|
|
<dirty> A B </dirty>
|
|
C
|
|
</mixed>
|
|
</doc>
|
|
-->
|
|
</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td width="30%"><strong>Canonical Form</strong></td>
|
|
<td>
|
|
<code>
|
|
<doc><br/>
|
|
<clean> </clean><br/>
|
|
<dirty> A B </dirty><br/>
|
|
<mixed><br/>
|
|
A<br/>
|
|
<clean> </clean><br/>
|
|
B<br/>
|
|
<dirty> A B </dirty><br/>
|
|
C<br/>
|
|
</mixed><br/>
|
|
</doc>
|
|
</code>
|
|
<!--
|
|
<doc>
|
|
<clean> </clean>
|
|
<dirty> A B </dirty>
|
|
<mixed>
|
|
A
|
|
<clean> </clean>
|
|
B
|
|
<dirty> A B </dirty>
|
|
C
|
|
</mixed>
|
|
</doc>
|
|
-->
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
|
|
<p>Demonstrates:</p>
|
|
<ul>
|
|
<li>Retain all whitespace between consecutive start tags, clean or dirty</li>
|
|
<li>Retain all whitespace between consecutive end tags, clean or dirty</li>
|
|
<li>Retain all whitespace between end tag/start tag pair, clean or dirty</li>
|
|
<li>Retain all whitespace in character content, clean or dirty</li>
|
|
</ul>
|
|
|
|
<p><b>Note:</b> In this example, the input document and canonical form are
|
|
identical. Both end with '>' character.</p>
|
|
|
|
<h3><a id="Example-SETags" name="Example-SETags"></a>3.3 Start and End
|
|
Tags</h3>
|
|
|
|
<table cellpadding="5" border="1" bgcolor="#80ffff" width="100%">
|
|
<tbody><tr>
|
|
<td width="30%"><strong>Input Document</strong></td>
|
|
<td>
|
|
<code>
|
|
<!DOCTYPE doc [<!ATTLIST e9 attr CDATA "default">]><br/>
|
|
<doc><br/>
|
|
<e1 /><br/>
|
|
<e2 ></e2><br/>
|
|
<e3 name = "elem3" id="elem3" /><br/>
|
|
<e4 name="elem4" id="elem4" ></e4><br/>
|
|
<e5 a:attr="out" b:attr="sorted" attr2="all" attr="I'm"<br/>
|
|
xmlns:b="http://www.ietf.org"<br/>
|
|
xmlns:a="http://www.w3.org"<br/>
|
|
xmlns="http://example.org"/><br/>
|
|
<e6 xmlns="" xmlns:a="http://www.w3.org"><br/>
|
|
<e7 xmlns="http://www.ietf.org"><br/>
|
|
<e8 xmlns="" xmlns:a="http://www.w3.org"><br/>
|
|
<e9 xmlns="" xmlns:a="http://www.ietf.org"/><br/>
|
|
</e8><br/>
|
|
</e7><br/>
|
|
</e6><br/>
|
|
</doc>
|
|
</code>
|
|
<!--
|
|
<!DOCTYPE doc [<!ATTLIST e9 attr CDATA "default">]>
|
|
<doc>
|
|
<e1 />
|
|
<e2 ></e2>
|
|
<e3 name = "elem3" id="elem3" />
|
|
<e4 name="elem4" id="elem4" ></e4>
|
|
<e5 a:attr="out" b:attr="sorted" attr2="all" attr="I'm"
|
|
xmlns:b="http://www.ietf.org"
|
|
xmlns:a="http://www.w3.org"
|
|
xmlns="http://example.org"/>
|
|
<e6 xmlns="" xmlns:a="http://www.w3.org">
|
|
<e7 xmlns="http://www.ietf.org">
|
|
<e8 xmlns="" xmlns:a="http://www.w3.org">
|
|
<e9 xmlns="" xmlns:a="http://www.ietf.org"/>
|
|
</e8>
|
|
</e7>
|
|
</e6>
|
|
</doc>
|
|
-->
|
|
</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td width="30%"><strong>Canonical Form</strong></td>
|
|
<td>
|
|
<code>
|
|
<doc> <br/>
|
|
<e1></e1> <br/>
|
|
<e2></e2> <br/>
|
|
<e3 id="elem3" name="elem3"></e3> <br/>
|
|
<e4 id="elem4" name="elem4"></e4> <br/>
|
|
<e5 xmlns="http://example.org" xmlns:a="http://www.w3.org" xmlns:b="http://www.ietf.org" attr="I'm" attr2="all" b:attr="sorted" a:attr="out"></e5> <br/>
|
|
<e6 xmlns:a="http://www.w3.org"> <br/>
|
|
<e7 xmlns="http://www.ietf.org"> <br/>
|
|
<e8 xmlns=""> <br/>
|
|
<e9 xmlns:a="http://www.ietf.org" attr="default"></e9> <br/>
|
|
</e8> <br/>
|
|
</e7> <br/>
|
|
</e6> <br/>
|
|
</doc>
|
|
</code>
|
|
<!--
|
|
<doc>
|
|
<e1></e1>
|
|
<e2></e2>
|
|
<e3 id="elem3" name="elem3"></e3>
|
|
<e4 id="elem4" name="elem4"></e4>
|
|
<e5 xmlns="http://example.org" xmlns:a="http://www.w3.org" xmlns:b="http://www.ietf.org" attr="I'm" attr2="all" b:attr="sorted" a:attr="out"></e5>
|
|
<e6 xmlns:a="http://www.w3.org">
|
|
<e7 xmlns="http://www.ietf.org">
|
|
<e8 xmlns="">
|
|
<e9 xmlns:a="http://www.ietf.org" attr="default"></e9>
|
|
</e8>
|
|
</e7>
|
|
</e6>
|
|
</doc>
|
|
-->
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
|
|
<p>Demonstrates:</p>
|
|
<ul>
|
|
<li>Empty element conversion to start-end tag pair</li>
|
|
<li>Normalization of whitespace in start and end tags</li>
|
|
<li>Relative order of namespace and attribute axes</li>
|
|
<li>Lexicographic ordering of namespace and attribute axes</li>
|
|
<li>Retention of namespace prefixes from original document</li>
|
|
<li>Elimination of superfluous namespace declarations</li>
|
|
<li>Addition of default attribute</li>
|
|
</ul>
|
|
|
|
<p><b>Note:</b> Some start tags in the canonical form are very long, but each
|
|
start tag in this example is entirely on a single line.</p>
|
|
|
|
<p><b>Note:</b> In <code>e5</code>, <code>b:attr</code> precedes
|
|
<code>a:attr</code> because the primary key is namespace URI not namespace
|
|
prefix, and <code>attr2</code> precedes <code> b:attr</code> because the
|
|
default namespace is not applied to unqualified attributes (so the namespace
|
|
URI for <code>attr2</code> is empty).</p>
|
|
|
|
<h3><a id="Example-Chars" name="Example-Chars"></a>3.4 Character Modifications
|
|
and Character References</h3>
|
|
|
|
<table cellpadding="5" border="1" bgcolor="#80ffff" width="100%">
|
|
<tbody><tr>
|
|
<td width="30%"><strong>Input Document</strong></td>
|
|
<td>
|
|
<code>
|
|
<!DOCTYPE doc [<br/>
|
|
<!ATTLIST normId id ID #IMPLIED><br/>
|
|
<!ATTLIST normNames attr NMTOKENS #IMPLIED><br/>
|
|
]><br/>
|
|
<doc><br/>
|
|
<text>First line&#x0d;&#10;Second line</text><br/>
|
|
<value>&#x32;</value><br/>
|
|
<compute><![CDATA[value>"0" && value<"10" ?"valid":"error"]]></compute><br/>
|
|
<compute expr='value>"0" &amp;&amp; value&lt;"10" ?"valid":"error"'>valid</compute><br/>
|
|
<norm attr=' &apos; &#x20;&#13;&#xa;&#9; &apos; '/><br/>
|
|
<normNames attr=' A &#x20;&#13;&#xa;&#9; B '/><br/>
|
|
<normId id=' &apos; &#x20;&#13;&#xa;&#9; &apos; '/><br/>
|
|
</doc><br/>
|
|
</code>
|
|
<!--
|
|
<!DOCTYPE doc [
|
|
<!ATTLIST normId id ID #IMPLIED>
|
|
<!ATTLIST normNames attr NMTOKENS #IMPLIED>
|
|
]>
|
|
<doc>
|
|
<text>First line
 Second line</text>
|
|
<value>2</value>
|
|
<compute><![CDATA[value>"0" && value<"10" ?"valid":"error"]]></compute>
|
|
<compute expr='value>"0" && value<"10" ?"valid":"error"'>valid</compute>
|
|
<norm attr=' '   
	 ' '/>
|
|
<normNames attr=' A   
	 B '/>
|
|
<normId id=' '   
	 ' '/>
|
|
</doc>
|
|
-->
|
|
</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td width="30%"><strong>Canonical Form</strong></td>
|
|
<td>
|
|
<code>
|
|
<doc><br/>
|
|
<text>First line&#xD;<br/>
|
|
Second line</text><br/>
|
|
<value>2</value><br/>
|
|
<compute>value&gt;"0" &amp;&amp; value&lt;"10" ?"valid":"error"</compute><br/>
|
|
<compute expr="value>&quot;0&quot; &amp;&amp; value&lt;&quot;10&quot; ?&quot;valid&quot;:&quot;error&quot;">valid</compute><br/>
|
|
<norm attr=" ' &#xD;&#xA;&#x9; ' "></norm><br/>
|
|
<normNames attr="A &#xD;&#xA;&#x9; B"></normNames><br/>
|
|
<normId id="' &#xD;&#xA;&#x9; '"></normId><br/>
|
|
</doc>
|
|
</code>
|
|
<!--
|
|
<doc>
|
|
<text>First line
|
|
Second line</text>
|
|
<value>2</value>
|
|
<compute>value>"0" && value<"10" ?"valid":"error"</compute>
|
|
<compute expr="value>"0" && value<"10" ?"valid":"error"">valid</compute>
|
|
<norm attr=" ' 
	 ' "></norm>
|
|
<normNames attr="A 
	 B"></normNames>
|
|
<normId id="' 
	 '"></normId>
|
|
</doc>
|
|
-->
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
|
|
<p>Demonstrates:</p>
|
|
<ul>
|
|
<li>Character reference replacement</li>
|
|
<li>Attribute value delimiters set to quotation marks (double quotes)</li>
|
|
<li>Attribute value normalization</li>
|
|
<li>CDATA section replacement</li>
|
|
<li>Encoding of special characters as character references in attribute
|
|
values (&amp;, &lt;, &quot;, &#xD;, &#xA;, &#x9;)</li>
|
|
<li>Encoding of special characters as character references in text
|
|
(&amp;, &lt;, &gt;, &#xD;)</li>
|
|
</ul>
|
|
|
|
<p><b>Note:</b> The last element, <code>normId</code>, is well-formed but
|
|
violates a validity constraint for attributes of type ID. For testing
|
|
canonical XML implementations based on validating processors, remove the
|
|
line containing this element from the input and canonical form. In general,
|
|
XML consumers should be discouraged from using this feature of XML.</p>
|
|
|
|
<p><b>Note:</b> Whitespace character references other than &#x20; are not
|
|
affected by attribute value normalization <a href="#XML">[XML]</a>.</p>
|
|
|
|
<p><b>Note:</b> In the canonical form, the value of the attribute named
|
|
<code>attr</code> in the element <code>norm</code> begins with a space, an
|
|
apostrophe (single quote), then <i>four</i> spaces before the first character
|
|
reference.</p>
|
|
|
|
<p><b>Note:</b> The <code>expr</code> attribute of the second
|
|
<code>compute</code> element contains no line breaks.</p>
|
|
|
|
<h3><a id="Example-Entities" name="Example-Entities"></a>3.5 Entity
|
|
References</h3>
|
|
|
|
<table cellpadding="5" border="1" bgcolor="#80ffff" width="100%">
|
|
<tbody>
|
|
<tr>
|
|
<td width="30%"><strong>Input Document</strong></td>
|
|
<td>
|
|
<code>
|
|
<!DOCTYPE doc [<br/>
|
|
<!ATTLIST doc attrExtEnt ENTITY #IMPLIED><br/>
|
|
<!ENTITY ent1 "Hello"><br/>
|
|
<!ENTITY ent2 SYSTEM "world.txt"><br/>
|
|
<!ENTITY entExt SYSTEM "earth.gif" NDATA gif><br/>
|
|
<!NOTATION gif SYSTEM "viewgif.exe"><br/>
|
|
]><br/>
|
|
<doc attrExtEnt="entExt"><br/>
|
|
&ent1;, &ent2;!<br/>
|
|
</doc><br/>
|
|
<br/>
|
|
<!-- Let world.txt contain "world" (excluding the quotes) -->
|
|
</code>
|
|
<!--
|
|
<!DOCTYPE doc [
|
|
<!ATTLIST doc attrExtEnt ENTITY #IMPLIED>
|
|
<!ENTITY ent1 "Hello">
|
|
<!ENTITY ent2 SYSTEM "world.txt">
|
|
<!ENTITY entExt SYSTEM "earth.gif" NDATA gif>
|
|
<!NOTATION gif SYSTEM "viewgif.exe">
|
|
]>
|
|
<doc attrExtEnt="entExt">
|
|
&ent1;, &ent2;!
|
|
</doc>
|
|
|
|
<!== Let world.txt contain "world" (excluding the quotes) ==>
|
|
-->
|
|
</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td width="30%"><strong>Canonical Form (uncommented)</strong></td>
|
|
<td>
|
|
<code>
|
|
<doc attrExtEnt="entExt"><br/>
|
|
Hello, world!<br/>
|
|
</doc>
|
|
</code>
|
|
<!--
|
|
<doc attrExtEnt="entExt">
|
|
Hello, world!
|
|
</doc>
|
|
-->
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
|
|
<p>Demonstrates:</p>
|
|
<ul>
|
|
<li>Internal parsed entity reference replacement</li>
|
|
<li>External parsed entity reference replacement (including whitespace
|
|
outside elements and PIs)</li>
|
|
<li>External unparsed entity reference</li>
|
|
</ul>
|
|
|
|
<h3><a id="Example-UTF8" name="Example-UTF8"></a>3.6 UTF-8 Encoding</h3>
|
|
|
|
<table cellpadding="5" border="1" bgcolor="#80ffff" width="100%">
|
|
<tbody>
|
|
<tr>
|
|
<td width="30%"><strong>Input Document</strong></td>
|
|
<td>
|
|
<code>
|
|
<?xml version="1.0" encoding="ISO-8859-1"?><br/>
|
|
<doc>&#169;</doc>
|
|
</code>
|
|
<!--
|
|
<?xml version="1.0" encoding="ISO-8859-1"?>
|
|
<doc>©</doc>
|
|
-->
|
|
</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td width="30%"><strong>Canonical Form</strong></td>
|
|
<td>
|
|
<code>
|
|
<doc>#xC2#xA9</doc>
|
|
</code>
|
|
<!--
|
|
<doc>#xC2#xA9</doc>
|
|
-->
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
|
|
<p>Demonstrates:</p>
|
|
<ul>
|
|
<li>Effect of transcoding from a sample encoding to UTF-8</li>
|
|
</ul>
|
|
|
|
<p><b>Note:</b> The content of the doc element is NOT the string #xC2#xA9 but
|
|
rather the two octets whose hexadecimal values are C2 and A9, which is the
|
|
UTF-8 encoding of the UCS codepoint for the copyright sign (©).</p>
|
|
|
|
<h3><a id="Example-DocSubsets" name="Example-DocSubsets"></a>3.7 Document
|
|
Subsets</h3>
|
|
|
|
<table cellpadding="5" border="1" bgcolor="#80ffff" width="100%">
|
|
<tbody>
|
|
|
|
<tr>
|
|
<td width="30%"><strong>Input Document</strong></td>
|
|
<td>
|
|
<code>
|
|
<!DOCTYPE doc [ <br/>
|
|
<!ATTLIST e2 xml:space (default|preserve) 'preserve'> <br/>
|
|
<!ATTLIST e3 id ID #IMPLIED> <br/>
|
|
]> <br/>
|
|
<doc xmlns="http://www.ietf.org" xmlns:w3c="http://www.w3.org"> <br/>
|
|
<e1> <br/>
|
|
<e2 xmlns=""> <br/>
|
|
<e3 id="E3"/> <br/>
|
|
</e2> <br/>
|
|
</e1> <br/>
|
|
</doc>
|
|
</code>
|
|
<!--
|
|
<!DOCTYPE doc [
|
|
<!ATTLIST e2 xml:space (default|preserve) 'preserve'>
|
|
<!ATTLIST e3 id ID #IMPLIED>
|
|
]>
|
|
<doc xmlns="http://www.ietf.org" xmlns:w3c="http://www.w3.org">
|
|
<e1>
|
|
<e2 xmlns="">
|
|
<e3 id="E3"/>
|
|
</e2>
|
|
</e1>
|
|
</doc>
|
|
-->
|
|
</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td width="30%"><strong>Document Subset Expression</strong></td>
|
|
<td>
|
|
<code>
|
|
<!-- Evaluate with declaration xmlns:ietf="http://www.ietf.org" --> <br/>
|
|
<br/>
|
|
(//. | //@* | //namespace::*) <br/>
|
|
[ <br/>
|
|
self::ietf:e1 or (parent::ietf:e1 and not(self::text() or self::e2)) <br/>
|
|
or <br/>
|
|
count(id("E3")|ancestor-or-self::node()) = count(ancestor-or-self::node()) <br/>
|
|
]</code>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td width="30%"><strong>Canonical Form</strong></td>
|
|
<td>
|
|
<code>
|
|
<e1 xmlns="http://www.ietf.org" xmlns:w3c="http://www.w3.org"><e3 xmlns="" id="E3" xml:space="preserve"></e3></e1>
|
|
</code>
|
|
<!--
|
|
<e1 xmlns="http://www.ietf.org" xmlns:w3c="http://www.w3.org"><e3 xmlns="" id="E3" xml:space="preserve"></e3></e1>
|
|
-->
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
|
|
<p>Demonstrates:</p>
|
|
<ul>
|
|
<li>Empty default namespace propagation from omitted parent element</li>
|
|
<li>Propagation of attributes in the <code>xml</code> namespace in document subsets</li>
|
|
<li>Persistence of omitted namespace declarations in descendants</li>
|
|
</ul>
|
|
|
|
<p><b>Note:</b> In the document subset expression, the subexpression
|
|
<code>(//. | //@* | //namespace::*)</code> selects all nodes in the input
|
|
document, subjecting each to the predicate expression in square brackets. The
|
|
expression is true for <code>e1</code> and its implicit namespace nodes, and
|
|
it is true if the element identified by E3 is in the <code>ancestor-or-self</code>
|
|
path of the context node (such that ancestor-or-self stays the same size under
|
|
union with the element identified by E3).</p>
|
|
|
|
<p><b>Note:</b> The canonical form contains no line delimiters.</p>
|
|
<!-- =============================================================================== -->
|
|
|
|
<h2><a id="Resolutions" name="Resolutions"></a>4 Resolutions</h2>
|
|
|
|
<p>This section discusses a number of key decision points as well as a
|
|
rationale for each decision. Although this specification now defines XML
|
|
canonicalization in terms of the <a href="#XPath"> XPath</a> data model rather
|
|
than <a href="#Infoset">XML Infoset</a>, the canonical form described in this
|
|
document is quite similar in most respects to the canonical form described in
|
|
the January 2000 Canonical XML draft <a href="#C14N-20000119">[C14N-20000119]</a>.
|
|
However, some differences exist, and a number of the subsections discuss the
|
|
changes.</p>
|
|
|
|
<h3><a id="NoXMLDecl" name="NoXMLDecl"></a>4.1 No XML Declaration</h3>
|
|
|
|
<p>The XML declaration, including version number and character encoding is
|
|
omitted from the canonical form. The encoding is not needed since the
|
|
canonical form is encoded in UTF-8. The version is not needed since the
|
|
absence of a version number unambiguously indicates XML 1.0.</p>
|
|
|
|
<p>Future versions of XML will be required to include an XML declaration to
|
|
indicate the version number. However, canonicalization method described in
|
|
this specification may not be applicable to future versions of XML without
|
|
some modifications. When canonicalization of a new version of XML is required,
|
|
this specification could be updated to include the XML declaration as
|
|
presumably the absence of the XML declaration from the XPath data model can be
|
|
remedied by that time (e.g. by reissuing a new XPath based on the <a
|
|
href="#Infoset">Infoset</a> data model).</p>
|
|
|
|
<h3><a id="NoCharModelNorm" name="NoCharModelNorm"></a>4.2 No Character Model
|
|
Normalization</h3>
|
|
|
|
<p>The Unicode standard <a href="#Unicode">[Unicode]</a> allows multiple
|
|
different representations of certain "precomposed characters" (a simple
|
|
example is "ç"). Thus two XML documents with content that is equivalent for
|
|
the purposes of most applications may contain differing character sequences.
|
|
The W3C is preparing a normalized representation <a href="#CharModel">
|
|
[CharModel]</a>. The <a href="#C14N-20000119">C14N-20000119</a> Canonical XML
|
|
draft used this normalized form. However, many XML 1.0 processors do not
|
|
perform this normalization. Furthermore, applications that must solve this
|
|
problem typically enforce character model normalization at all times starting
|
|
when character content is created in order to avoid processing failures that
|
|
could otherwise result (e.g. see example from <a href="#CowanExample">Cowan</a>).
|
|
Therefore, character model normalization has been moved out of scope for
|
|
XML canonicalization. However, the XML processor used to prepare the XPath data
|
|
model input is required (by the <a href="#DataModel">Data Model</a>) to use
|
|
Normalization Form C [<a href="#ref-NFC">NFC</a>,
|
|
<a href="#NFC-Corrigendum">NFC-Corrigendum</a>] when converting an XML document
|
|
to the UCS character domain from any encoding that is not UCS-based (currently,
|
|
UCS-based encodings include UTF-8, UTF-16, UTF-16BE, and UTF-16LE, UCS-2, and
|
|
UCS-4).</p>
|
|
|
|
<h3><a id="WhitespaceRoot" name="WhitespaceRoot"></a>4.3 Handling of
|
|
Whitespace Outside Document Element</h3>
|
|
|
|
<p>The <a href="#C14N-20000119">C14N-20000119</a> Canonical XML draft
|
|
placed a #xA after each PI outside of the document element as well as a #xA
|
|
after the end tag of the document element. The method in this specification
|
|
performs the same function except for omitting the final #xA after the last PI
|
|
(or comment or end tag of the document element). This technique ensures that
|
|
PI (and comment) children of the root are separated from markup by a line feed
|
|
even if root node or the document element are omitted from the output
|
|
node-set.</p>
|
|
|
|
<h3><a id="NoNSPrefixRewriting" name="NoNSPrefixRewriting"></a>4.4 No
|
|
Namespace Prefix Rewriting</h3>
|
|
|
|
<p>The <a href="#C14N-20000119">C14N-20000119</a> Canonical XML draft
|
|
described a method for rewriting namespace prefixes such that two documents
|
|
having logically equivalent namespace declarations would also have identical
|
|
namespace prefixes. The goal was to eliminate dependence on the particular
|
|
namespace prefixes in a document when testing for logical equivalence.
|
|
However, there now exist a number of contexts in which namespace prefixes can
|
|
impart information value in an XML document. For example, an XPath expression
|
|
in an attribute value or element content can reference a namespace prefix. Thus,
|
|
rewriting the namespace prefixes would damage such a document by changing its
|
|
meaning (and it cannot be logically equivalent if its meaning has changed).</p>
|
|
|
|
<p>More formally, let D1 be a document containing an XPath in an attribute
|
|
value or element content that refers to namespace prefixes used in D1. Further
|
|
assume that the namespace prefixes in D1 will all be rewritten by the
|
|
canonicalization method. Let D2 = D1, then modify the namespace prefixes in D2
|
|
and modify the XPath expression's references to namespace prefixes such that
|
|
D2 and D1 remain logically equivalent. Since namespace rewriting does not
|
|
include occurrences of namespace references in attribute values and element
|
|
content, the canonical form of D1 does not equal the canonical form of D2
|
|
because the XPath will be different. Thus, although namespace rewriting
|
|
normalizes the namespace declarations, the goal eliminating dependence on the
|
|
particular namespace prefixes in the document is not achieved.</p>
|
|
|
|
<p>Moreover, it is possible to prove that namespace rewriting is harmful,
|
|
rather than simply ineffective. Let D1 be a document containing an XPath in an
|
|
attribute value or element content that refers to namespace prefixes used in
|
|
D1. Further assume that the namespace prefixes in D1 will all be rewritten by
|
|
the canonicalization method. Now let D2 be the canonical form of D1. Clearly,
|
|
the canonical forms of D1 and D2 are equivalent (since D2 is the canonical
|
|
form of the canonical form of D1), yet D1 and D2 are not logically equivalent
|
|
because the aforementioned XPath works in D1 and doesn't work in D2.</p>
|
|
|
|
<p>Note that an argument similar to this can be leveled against the XML
|
|
canonicalization method based on any of the cases in the <a
|
|
href="#Limitations">Limitations</a>, the problems cannot easily be fixed in
|
|
those cases, whereas here we have an opportunity to avoid purposefully
|
|
introducing such a limitation.</p>
|
|
|
|
<p>Applications that must test for logical equivalence must perform more
|
|
sophisticated tests than mere octet stream comparison. However, this is quite
|
|
likely to be necessary in any case in order to test for logical equivalencies
|
|
based on application rules as well as rules from other XML-related
|
|
recommendations, working drafts, and future works.</p>
|
|
|
|
<h3><a id="NSAttrOrder" name="NSAttrOrder"></a>4.5 Order of Namespace
|
|
Declarations and Attributes</h3>
|
|
|
|
<p>The <a href="#C14N-20000119">C14N-20000119</a> Canonical XML draft
|
|
alternated between namespace declarations and attribute declarations. This is
|
|
part of the namespace prefix rewriting scheme, which this specification
|
|
eliminates. This specification follows the XPath data model of putting all
|
|
namespace nodes before all attribute nodes.</p>
|
|
|
|
<h3><a id="SuperfluousNSDecl" name="SuperfluousNSDecl"></a>4.6 Superfluous
|
|
Namespace Declarations</h3>
|
|
|
|
<p>Unnecessary namespace declarations are not made in the canonical form.
|
|
Whether for an empty default namespace, a non-empty default namespace, or a
|
|
namespace prefix binding, the XML canonicalization method omits a declaration
|
|
if it determines that the immediate parent element <i>in the canonical
|
|
form</i> has an equivalent declaration in scope. The root document element is
|
|
handled specially since it has no parent element. All namespace declarations
|
|
in it are retained, except the declaration of an empty default namespace is
|
|
automatically omitted.</p>
|
|
|
|
<p>Relative to the method of simply rendering the entire namespace context of
|
|
each element, implementations are not hindered by more than a constant factor
|
|
in processing time and memory use. The advantages include:</p>
|
|
<ul>
|
|
<li>Eliminates overrun of <code>xmlns=""</code> from canonical forms of
|
|
applications that may not even use namespaces, or support them only
|
|
minimally.</li>
|
|
<li>Eliminates namespace declarations from elements where they may not
|
|
belong according to the application's content model, thereby simplifying
|
|
the task of reattaching a document type declaration to a canonical
|
|
form.</li>
|
|
</ul>
|
|
|
|
<p>Note that in document subsets, an element with omissions from its ancestral
|
|
element chain will be rendered to the canonical form with namespace
|
|
declarations that may have been made in its omitted ancestors, thus preserving
|
|
the meaning of the element.</p>
|
|
|
|
<h3><a id="PropagateDefaultNSDecl" name="PropagateDefaultNSDecl"></a>4.7
|
|
Propagation of Default Namespace Declaration in Document Subsets</h3>
|
|
|
|
The XPath data model represents an empty default namespace with the absence of
|
|
a node, not with the presence of a default namespace node having an empty value.
|
|
Thus, with respect to the fact that element <code>e3</code> in the following
|
|
examples is not namespace qualified, we cannot tell the difference between
|
|
<code><e1 xmlns="a:b"><e2 xmlns=""><e3/></e2></e1></code>
|
|
versus
|
|
<code><e1 xmlns="a:b"><e2><e3 xmlns=""/></e2></e1></code>.
|
|
All we know is that <code>e3</code> was not namespace qualified on input, so we preserve
|
|
this information on output if <code>e2</code> is omitted so that <code>e3</code>
|
|
does not take on the default namespace qualification of <code>e1</code>.
|
|
|
|
<h3><a id="SortByNSURI" name="SortByNSURI"></a>4.8 Sorting Attributes by Namespace URI</h3>
|
|
|
|
Given the requirement to preserve the namespace prefixes declared in a document,
|
|
sorting attributes with the prefix, rather than the namespace URI, as the
|
|
primary key is viable and easier to implement. However, the namespace URI was
|
|
selected as the primary key because this is closer to the intent of the
|
|
<a href="#namespaces">Namespaces in XML</a> specification, which is to identify
|
|
namespaces by URI and local name, not by a prefix and local name. The effect of
|
|
the sort is to group together all attributes that are in the same namespace.
|
|
|
|
<!-- =============================================================================== -->
|
|
|
|
<h2><a id="bibliography" name="bibliography"></a>5 References</h2>
|
|
<dl>
|
|
<dt><a id="C14N-20000119" name="C14N-20000119">C14N-20000119</a></dt>
|
|
<dd><i>Canonical XML Version 1.0</i>, W3C Working Draft. T. Bray, J.
|
|
Clark, J. Tauber, and J. Cowan. January 19, 2000. <a
|
|
href="http://www.w3.org/TR/2000/WD-xml-c14n-20000119.html">
|
|
http://www.w3.org/TR/2000/WD-xml-c14n-20000119.html</a>.</dd>
|
|
<dt><a id="CharModel" name="CharModel">CharModel</a></dt>
|
|
<dd><i>Character Model for the World Wide Web</i>, W3C Working Draft. eds.
|
|
Martin J. Dürst, François Yergeau, Misha Wolf, Asmus Freytag and Tex Texin. <a
|
|
href="http://www.w3.org/TR/charmod/">
|
|
http://www.w3.org/TR/charmod/</a>.</dd>
|
|
<dt><a id="CowanExample" name="CowanExample">Cowan</a></dt>
|
|
<dd><i>Example of Harmful Effect of Character Model Normalization</i>,
|
|
Letter in XML Signature Working Group Mail Archive. John Cowan, July 7,
|
|
2000. <a
|
|
href="http://lists.w3.org/Archives/Public/w3c-ietf-xmldsig/2000JulSep/0038.html">
|
|
http://lists.w3.org/Archives/Public/w3c-ietf-xmldsig/2000JulSep/0038.html</a>.</dd>
|
|
<dt><a id="Infoset" name="Infoset">Infoset</a></dt>
|
|
<dd><i>XML Information Set</i>, W3C Working Draft. eds. John Cowan and Richard Tobin. <a
|
|
href="http://www.w3.org/TR/xml-infoset/">
|
|
http://www.w3.org/TR/xml-infoset</a>.</dd>
|
|
<dt><a id="ISO-8859-1" name="ISO-8859-1">ISO-8859-1</a></dt>
|
|
<dd><i>ISO-8859-1 Latin 1 Character Set</i>. <a
|
|
href="http://www.utoronto.ca/webdocs/HTMLdocs/NewHTML/iso_table.html">
|
|
http://www.utoronto.ca/webdocs/HTMLdocs/NewHTML/iso_table.html</a> or <a
|
|
href="http://www.iso.ch/cate/cat.html">
|
|
http://www.iso.ch/cate/cat.html</a>.</dd>
|
|
<dt><a id="Keywords" name="Keywords">Keywords</a></dt>
|
|
<dd><i>Key words for use in RFCs to Indicate Requirement Levels</i>, IETF
|
|
RFC 2119. S. Bradner. March 1997. <a
|
|
href="http://www.ietf.org/rfc/rfc2119.txt">
|
|
http://www.ietf.org/rfc/rfc2119.txt</a>.</dd>
|
|
<dt><a id="namespaces" name="namespaces">Namespaces</a></dt>
|
|
<dd><i>Namespaces in XML</i>, W3C Recommendation. eds. Tim Bray, Dave
|
|
Hollander, and Andrew Layman. <a
|
|
href="http://www.w3.org/TR/REC-xml-names/">
|
|
http://www.w3.org/TR/REC-xml-names/</a>.</dd>
|
|
<dt><a id="ref-NFC" name="ref-NFC">NFC</a></dt>
|
|
<dd><i>TR15, Unicode Normalization Forms.</i> M. Davis, M. Dürst. Revision
|
|
18: November 1999. <a
|
|
href="http://www.unicode.org/unicode/reports/tr15/tr15-18.html">
|
|
http://www.unicode.org/unicode/reports/tr15/tr15-18.html</a>.</dd>
|
|
<dt><a id="NFC-Corrigendum" name="NFC-Corrigendum">NFC-Corrigendum</a></dt>
|
|
<dd><i>Normalization Corrigendum</i>. The Unicode Consortium.
|
|
<a href="http://www.unicode.org/unicode/uni2errata/Normalization_Corrigendum.html">
|
|
http://www.unicode.org/unicode/uni2errata/Normalization_Corrigendum.html</a>.</dd>
|
|
<dt><a id="Unicode" name="Unicode">Unicode</a></dt>
|
|
<dd><i>The Unicode Standard, version 3.0.</i> The Unicode Consortium. ISBN
|
|
0-201-61633-5. <a
|
|
href="http://www.unicode.org/unicode/standard/versions/Unicode3.0.html">
|
|
http://www.unicode.org/unicode/standard/versions/Unicode3.0.html</a>.</dd>
|
|
<dt><a id="UTF-16" name="UTF-16">UTF-16</a></dt>
|
|
<dd><i>UTF-16, an encoding of ISO 10646</i>, IETF RFC 2781. P. Hoffman ,
|
|
F. Yergeau. February 2000. <a
|
|
href="http://www.ietf.org/rfc/rfc2781.txt">
|
|
http://www.ietf.org/rfc/rfc2781.txt</a>.</dd>
|
|
<dt><a id="UTF-8" name="UTF-8">UTF-8</a></dt>
|
|
<dd><i>UTF-8, a transformation format of ISO 10646</i>, IETF RFC 2279. F.
|
|
Yergeau. January 1998. <a href="http://www.ietf.org/rfc/rfc2279.txt">
|
|
http://www.ietf.org/rfc/rfc2279.txt</a>.</dd>
|
|
<dt><a id="URI" name="URI">URI</a></dt>
|
|
<dd><i>Uniform Resource Identifiers (URI): Generic Syntax</i>, IETF RFC
|
|
2396. T. Berners-Lee, R. Fielding, L. Masinter. August 1998 <a
|
|
href="http://www.ietf.org/rfc/rfc2396.txt">
|
|
http://www.ietf.org/rfc/rfc2396.txt</a>.</dd>
|
|
<dt><a id="XBase" name="XBase">XBase</a></dt>
|
|
<dd><i>XML Base</i> ed. Jonathan Marsh. 07 June 2000. <a
|
|
href="http://www.w3.org/TR/xmlbase/">
|
|
http://www.w3.org/TR/xmlbase/</a>.</dd>
|
|
<dt><a id="XML" name="XML">XML</a></dt>
|
|
<dd><i>Extensible Markup Language (XML) 1.0 (Second Edition)</i>,
|
|
W3C Recommendation. eds. Tim Bray, Jean Paoli, C. M. Sperberg-McQueen
|
|
and Eve Maler. 6 October 2000. <a href="http://www.w3.org/TR/REC-xml">
|
|
http://www.w3.org/TR/REC-xml</a>.</dd>
|
|
<dt><a id="XML-DSig" name="XML-DSig">XML DSig</a></dt>
|
|
<dd><i>XML-Signature Syntax and Processing</i>, IETF Draft/W3C
|
|
Candidate Recommendation. D. Eastlake, J. Reagle, D. Solo, M. Bartel,
|
|
J. Boyer, B. Fox, and E. Simon. 31 October 2000.
|
|
<a href="http://www.w3.org/TR/xmldsig-core/">http://www.w3.org/TR/xmldsig-core/</a>.</dd>
|
|
<dt><a id="PlenaryDecision" name="PlenaryDecision">XML Plenary Decision</a></dt>
|
|
<dd><i>W3C XML Plenary Decision on relative URI References In namespace declarations</i>,
|
|
W3C Document. 11 September 2000. <a
|
|
href="http://lists.w3.org/Archives/Public/xml-uri/2000Sep/0083.html">
|
|
http://lists.w3.org/Archives/Public/xml-uri/2000Sep/0083.html</a>.</dd>
|
|
<dt><a id="XPath" name="XPath">XPath</a></dt>
|
|
<dd><i>XML Path Language (XPath) Version 1.0</i>, W3C Recommendation.
|
|
eds. James Clark and Steven DeRose. 16 November 1999. <a
|
|
href="http://www.w3.org/TR/1999/REC-xpath-19991116">
|
|
http://www.w3.org/TR/1999/REC-xpath-19991116</a>.</dd>
|
|
</dl>
|
|
|
|
<!-- =============================================================================== -->
|
|
|
|
<h2><a id="acks" name="acks"></a>6 Acknowledgements (Informative)</h2>
|
|
|
|
<p>The following people provided valuable feedback that improved the quality
|
|
of this specification:</p>
|
|
<ul>
|
|
<li>Doug Bunting, Ariba</li>
|
|
<li>John Cowan, Reuters</li>
|
|
<li>Martin J. Dürst, W3C</li>
|
|
<li>Donald Eastlake 3rd, Motorola</li>
|
|
<li>Merlin Hughes, Baltimore</li>
|
|
<li>Gregor Karlinger, IAIK TU Graz</li>
|
|
<li>Susan Lesch, W3C</li>
|
|
<li>Jonathan Marsh, Microsoft</li>
|
|
<li>Joseph Reagle, W3C</li>
|
|
<li>Petteri Stenius, Done360</li>
|
|
<li>Kent TAMURA, IBM</li>
|
|
</ul>
|
|
</body>
|
|
</html>
|