You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
4057 lines
174 KiB
4057 lines
174 KiB
|
|
|
|
|
|
<!doctype html public '-//W3C//DTD HTML 4.0 Transitional//EN' 'http://www.w3.org/TR/REC-html40-971218/loose.dtd'><HTML><HEAD><meta name='GENERATOR' content='XML/XH/Lark'><link rel='STYLESHEET' type='text/css' href='/StyleSheets/TR/rec.css'><TITLE>Extensible Markup Language (XML) 1.0</TITLE></HEAD><BODY BACKGROUND='/Icons/Backgrounds/recbg.jpg'>
|
|
|
|
<H3 align='right'><A HREF='http://www.w3.org/'><IMG border='0' align='left' alt='W3C' src='http://www.w3.org/Icons/WWW/w3c_home'></A>REC-xml-19980210</H3><br><H1 align='center'>Extensible Markup Language (XML) 1.0</H1>
|
|
|
|
<h3 align='center'>W3C Recommendation
|
|
10-February-1998 </h3>
|
|
|
|
<DL><DT>This version:</DT>
|
|
<dd><A HREF='http://www.w3.org/TR/1998/REC-xml-19980210'>
|
|
http://www.w3.org/TR/1998/REC-xml-19980210</A>
|
|
<dd><A HREF='http://www.w3.org/TR/1998/REC-xml-19980210.xml'>
|
|
http://www.w3.org/TR/1998/REC-xml-19980210.xml</A>
|
|
<dd><A HREF='http://www.w3.org/TR/1998/REC-xml-19980210.html'>
|
|
http://www.w3.org/TR/1998/REC-xml-19980210.html</A>
|
|
<dd><A HREF='http://www.w3.org/TR/1998/REC-xml-19980210.pdf'>
|
|
http://www.w3.org/TR/1998/REC-xml-19980210.pdf</A>
|
|
<dd><A HREF='http://www.w3.org/TR/1998/REC-xml-19980210.ps'>
|
|
http://www.w3.org/TR/1998/REC-xml-19980210.ps</A>
|
|
|
|
<DT>Latest version:</DT>
|
|
<dd><A HREF='http://www.w3.org/TR/REC-xml'>
|
|
http://www.w3.org/TR/REC-xml</A>
|
|
|
|
<DT>Previous version:</DT>
|
|
<dd><A HREF='http://www.w3.org/TR/PR-xml-971208'>
|
|
http://www.w3.org/TR/PR-xml-971208</A>
|
|
|
|
|
|
<dt>Editors:</dt>
|
|
<DD>Tim Bray
|
|
(Textuality and Netscape)
|
|
<A HREF='mailto:tbray@textuality.com'><tbray@textuality.com></A></DD>
|
|
<DD>Jean Paoli
|
|
(Microsoft)
|
|
<A HREF='mailto:jeanpa@microsoft.com'><jeanpa@microsoft.com></A></DD>
|
|
<DD>C. M. Sperberg-McQueen
|
|
(University of Illinois at Chicago)
|
|
<A HREF='mailto:cmsmcq@uic.edu'><cmsmcq@uic.edu></A></DD>
|
|
</dl>
|
|
<H2>Abstract</H2>
|
|
<P>The Extensible Markup Language (XML) is a subset of
|
|
SGML that is completely described in this document. Its goal is to
|
|
enable generic SGML to be served, received, and processed on the Web
|
|
in the way that is now possible with HTML. XML has been designed for
|
|
ease of implementation and for interoperability with both SGML and
|
|
HTML.</P>
|
|
|
|
<h2>Status of this document</h2>
|
|
<P>This document has been reviewed by W3C Members and
|
|
other interested parties and has been endorsed by the
|
|
Director as a W3C Recommendation. It is a stable
|
|
document and may be used as reference material or cited
|
|
as a normative reference from another document. W3C's
|
|
role in making the Recommendation is to draw attention
|
|
to the specification and to promote its widespread
|
|
deployment. This enhances the functionality and
|
|
interoperability of the Web.</P>
|
|
<P>
|
|
This document specifies a syntax created by subsetting an existing,
|
|
widely used international text processing standard (Standard
|
|
Generalized Markup Language, ISO 8879:1986(E) as amended and
|
|
corrected) for use on the World Wide Web. It is a product of the W3C
|
|
XML Activity, details of which can be found at <A HREF='http://www.w3.org/XML'>http://www.w3.org/XML</A>. A list of
|
|
current W3C Recommendations and other technical documents can be found
|
|
at <A HREF='http://www.w3.org/TR'>http://www.w3.org/TR</A>.
|
|
</P>
|
|
<P>This specification uses the term URI, which is defined by <A href='#Berners-Lee'>[Berners-Lee et al.]</A>, a work in progress expected to update <A href='#RFC1738'>[IETF RFC1738]</A> and <A href='#RFC1808'>[IETF RFC1808]</A>.
|
|
</P>
|
|
<P>The list of known errors in this specification is
|
|
available at
|
|
<A HREF='http://www.w3.org/XML/xml-19980210-errata'>http://www.w3.org/XML/xml-19980210-errata</A>.</P>
|
|
<P>Please report errors in this document to
|
|
<A HREF='mailto:xml-editor@w3.org'>xml-editor@w3.org</A>.
|
|
</P>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<H1>Extensible Markup Language (XML) 1.0</H1><H1></H1><h2>Table of Contents</h2>1. <A HREF='#sec-intro'>Introduction</A><BR>
|
|
1.1 <A HREF='#sec-origin-goals'>Origin and Goals</A><BR>
|
|
1.2 <A HREF='#sec-terminology'>Terminology</A><BR>
|
|
2. <A HREF='#sec-documents'>Documents</A><BR>
|
|
2.1 <A HREF='#sec-well-formed'>Well-Formed XML Documents</A><BR>
|
|
2.2 <A HREF='#charsets'>Characters</A><BR>
|
|
2.3 <A HREF='#sec-common-syn'>Common Syntactic Constructs</A><BR>
|
|
2.4 <A HREF='#syntax'>Character Data and Markup</A><BR>
|
|
2.5 <A HREF='#sec-comments'>Comments</A><BR>
|
|
2.6 <A HREF='#sec-pi'>Processing Instructions</A><BR>
|
|
2.7 <A HREF='#sec-cdata-sect'>CDATA Sections</A><BR>
|
|
2.8 <A HREF='#sec-prolog-dtd'>Prolog and Document Type Declaration</A><BR>
|
|
2.9 <A HREF='#sec-rmd'>Standalone Document Declaration</A><BR>
|
|
2.10 <A HREF='#sec-white-space'>White Space Handling</A><BR>
|
|
2.11 <A HREF='#sec-line-ends'>End-of-Line Handling</A><BR>
|
|
2.12 <A HREF='#sec-lang-tag'>Language Identification</A><BR>
|
|
3. <A HREF='#sec-logical-struct'>Logical Structures</A><BR>
|
|
3.1 <A HREF='#sec-starttags'>Start-Tags, End-Tags, and Empty-Element Tags</A><BR>
|
|
3.2 <A HREF='#elemdecls'>Element Type Declarations</A><BR>
|
|
3.2.1 <A HREF='#sec-element-content'>Element Content</A><BR>
|
|
3.2.2 <A HREF='#sec-mixed-content'>Mixed Content</A><BR>
|
|
3.3 <A HREF='#attdecls'>Attribute-List Declarations</A><BR>
|
|
3.3.1 <A HREF='#sec-attribute-types'>Attribute Types</A><BR>
|
|
3.3.2 <A HREF='#sec-attr-defaults'>Attribute Defaults</A><BR>
|
|
3.3.3 <A HREF='#AVNormalize'>Attribute-Value Normalization</A><BR>
|
|
3.4 <A HREF='#sec-condition-sect'>Conditional Sections</A><BR>
|
|
4. <A HREF='#sec-physical-struct'>Physical Structures</A><BR>
|
|
4.1 <A HREF='#sec-references'>Character and Entity References</A><BR>
|
|
4.2 <A HREF='#sec-entity-decl'>Entity Declarations</A><BR>
|
|
4.2.1 <A HREF='#sec-internal-ent'>Internal Entities</A><BR>
|
|
4.2.2 <A HREF='#sec-external-ent'>External Entities</A><BR>
|
|
4.3 <A HREF='#TextEntities'>Parsed Entities</A><BR>
|
|
4.3.1 <A HREF='#sec-TextDecl'>The Text Declaration</A><BR>
|
|
4.3.2 <A HREF='#wf-entities'>Well-Formed Parsed Entities</A><BR>
|
|
4.3.3 <A HREF='#charencoding'>Character Encoding in Entities</A><BR>
|
|
4.4 <A HREF='#entproc'>XML Processor Treatment of Entities and References</A><BR>
|
|
4.4.1 <A HREF='#not-recognized'>Not Recognized</A><BR>
|
|
4.4.2 <A HREF='#included'>Included</A><BR>
|
|
4.4.3 <A HREF='#include-if-valid'>Included If Validating</A><BR>
|
|
4.4.4 <A HREF='#forbidden'>Forbidden</A><BR>
|
|
4.4.5 <A HREF='#inliteral'>Included in Literal</A><BR>
|
|
4.4.6 <A HREF='#notify'>Notify</A><BR>
|
|
4.4.7 <A HREF='#bypass'>Bypassed</A><BR>
|
|
4.4.8 <A HREF='#as-PE'>Included as PE</A><BR>
|
|
4.5 <A HREF='#intern-replacement'>Construction of Internal Entity Replacement Text</A><BR>
|
|
4.6 <A HREF='#sec-predefined-ent'>Predefined Entities</A><BR>
|
|
4.7 <A HREF='#Notations'>Notation Declarations</A><BR>
|
|
4.8 <A HREF='#sec-doc-entity'>Document Entity</A><BR>
|
|
5. <A HREF='#sec-conformance'>Conformance</A><BR>
|
|
5.1 <A HREF='#proc-types'>Validating and Non-Validating Processors</A><BR>
|
|
5.2 <A HREF='#safe-behavior'>Using XML Processors</A><BR>
|
|
6. <A HREF='#sec-notation'>Notation</A><BR>
|
|
<h3>Appendices</h3>A. <A HREF='#sec-bibliography'>References</A><BR>
|
|
A.1 <A HREF='#sec-existing-stds'>Normative References</A><BR>
|
|
A.2 <A HREF='#null'>Other References</A><BR>
|
|
B. <A HREF='#CharClasses'>Character Classes</A><BR>
|
|
C. <A HREF='#sec-xml-and-sgml'>XML and SGML (Non-Normative)</A><BR>
|
|
D. <A HREF='#sec-entexpand'>Expansion of Entity and Character References (Non-Normative)</A><BR>
|
|
E. <A HREF='#determinism'>Deterministic Content Models (Non-Normative)</A><BR>
|
|
F. <A HREF='#sec-guessing'>Autodetection of Character Encodings (Non-Normative)</A><BR>
|
|
G. <A HREF='#sec-xml-wg'>W3C XML Working Group (Non-Normative)</A><BR>
|
|
|
|
<HR>
|
|
|
|
|
|
|
|
<H2><A NAME='sec-intro'>1. Introduction</a></h2>
|
|
<P>Extensible Markup Language, abbreviated XML, describes a class of
|
|
data objects called <A href='#dt-xml-doc'>XML documents</A> and
|
|
partially describes the behavior of
|
|
computer programs which process them. XML is an application profile or
|
|
restricted form of SGML, the Standard Generalized Markup
|
|
Language <A href='#ISO8879'>[ISO 8879]</A>.
|
|
By construction, XML documents
|
|
are conforming SGML documents.
|
|
</P>
|
|
<P>XML documents are made up of storage units called <A href='#dt-entity'>entities</A>, which contain either parsed
|
|
or unparsed data.
|
|
Parsed data is made up of <A href='#dt-character'>characters</A>,
|
|
some
|
|
of which form <A href='#dt-chardata'>character data</A>,
|
|
and some of which form <A href='#dt-markup'>markup</A>.
|
|
Markup encodes a description of the document's storage layout and
|
|
logical structure. XML provides a mechanism to impose constraints on
|
|
the storage layout and logical structure.</P>
|
|
<P><a name='dt-xml-proc'></a>A software module
|
|
called an <b>XML processor</b> is used to read XML documents
|
|
and provide access to their content and structure. <a name='dt-app'></a>It is assumed that an XML processor is
|
|
doing its work on behalf of another module, called the
|
|
<b>application</b>. This specification describes the
|
|
required behavior of an XML processor in terms of how it must read XML
|
|
data and the information it must provide to the application.</P>
|
|
|
|
|
|
|
|
<H3><A NAME='sec-origin-goals'>1.1 Origin and Goals</a></h3>
|
|
<P>XML was developed by an XML Working Group (originally known as the
|
|
SGML Editorial Review Board) formed under the auspices of the World
|
|
Wide Web Consortium (W3C) in 1996.
|
|
It was chaired by Jon Bosak of Sun
|
|
Microsystems with the active participation of an XML Special
|
|
Interest Group (previously known as the SGML Working Group) also
|
|
organized by the W3C. The membership of the XML Working Group is given
|
|
in an appendix. Dan Connolly served as the WG's contact with the W3C.
|
|
</P>
|
|
<P>The design goals for XML are:<OL>
|
|
<LI>XML shall be straightforwardly usable over the
|
|
Internet.</LI>
|
|
<LI>XML shall support a wide variety of applications.</LI>
|
|
<LI>XML shall be compatible with SGML.</LI>
|
|
<LI>It shall be easy to write programs which process XML
|
|
documents.</LI>
|
|
<LI>The number of optional features in XML is to be kept to the
|
|
absolute minimum, ideally zero.</LI>
|
|
<LI>XML documents should be human-legible and reasonably
|
|
clear.</LI>
|
|
<LI>The XML design should be prepared quickly.</LI>
|
|
<LI>The design of XML shall be formal and concise.</LI>
|
|
<LI>XML documents shall be easy to create.</LI>
|
|
<LI>Terseness in XML markup is of minimal importance.</LI></OL>
|
|
|
|
|
|
<P>This specification,
|
|
together with associated standards
|
|
(Unicode and ISO/IEC 10646 for characters,
|
|
Internet RFC 1766 for language identification tags,
|
|
ISO 639 for language name codes, and
|
|
ISO 3166 for country name codes),
|
|
provides all the information necessary to understand
|
|
XML Version 1.0
|
|
and construct computer programs to process it.</P>
|
|
<P>This version of the XML specification
|
|
|
|
may be distributed freely, as long as
|
|
all text and legal notices remain intact.</P>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<H3><A NAME='sec-terminology'>1.2 Terminology</a></h3>
|
|
|
|
<P>The terminology used to describe XML documents is defined in the body of
|
|
this specification.
|
|
The terms defined in the following list are used in building those
|
|
definitions and in describing the actions of an XML processor:
|
|
<DL>
|
|
|
|
<DT><B>may</B></DT>
|
|
<DD><a name='dt-may'></a>Conforming documents and XML
|
|
processors are permitted to but need not behave as
|
|
described.</DD>
|
|
|
|
|
|
|
|
<DT><B>must</B></DT>
|
|
<DD>Conforming documents and XML processors
|
|
are required to behave as described; otherwise they are in error.
|
|
|
|
</DD>
|
|
|
|
|
|
|
|
<DT><B>error</B></DT>
|
|
<DD><a name='dt-error'></a>A violation of the rules of this
|
|
specification; results are
|
|
undefined. Conforming software may detect and report an error and may
|
|
recover from it.</DD>
|
|
|
|
|
|
|
|
<DT><B>fatal error</B></DT>
|
|
<DD><a name='dt-fatal'></a>An error
|
|
which a conforming <A href='#dt-xml-proc'>XML processor</A>
|
|
must detect and report to the application.
|
|
After encountering a fatal error, the
|
|
processor may continue
|
|
processing the data to search for further errors and may report such
|
|
errors to the application. In order to support correction of errors,
|
|
the processor may make unprocessed data from the document (with
|
|
intermingled character data and markup) available to the application.
|
|
Once a fatal error is detected, however, the processor must not
|
|
continue normal processing (i.e., it must not
|
|
continue to pass character data and information about the document's
|
|
logical structure to the application in the normal way).
|
|
</DD>
|
|
|
|
|
|
|
|
<DT><B>at user option</B></DT>
|
|
<DD>Conforming software may or must (depending on the modal verb in the
|
|
sentence) behave as described; if it does, it must
|
|
provide users a means to enable or disable the behavior
|
|
described.</DD>
|
|
|
|
|
|
|
|
<DT><B>validity constraint</B></DT>
|
|
<DD>A rule which applies to all
|
|
<A href='#dt-valid'>valid</A> XML documents.
|
|
Violations of validity constraints are errors; they must, at user option,
|
|
be reported by
|
|
<A href='#dt-validating'>validating XML processors</A>.</DD>
|
|
|
|
|
|
|
|
<DT><B>well-formedness constraint</B></DT>
|
|
<DD>A rule which applies to all <A href='#dt-wellformed'>well-formed</A> XML documents.
|
|
Violations of well-formedness constraints are
|
|
<A href='#dt-fatal'>fatal errors</A>.</DD>
|
|
|
|
|
|
|
|
|
|
<DT><B>match</B></DT>
|
|
<DD><a name='dt-match'></a>(Of strings or names:)
|
|
Two strings or names being compared must be identical.
|
|
Characters with multiple possible representations in ISO/IEC 10646 (e.g.
|
|
characters with
|
|
both precomposed and base+diacritic forms) match only if they have the
|
|
same representation in both strings.
|
|
At user option, processors may normalize such characters to
|
|
some canonical form.
|
|
No case folding is performed.
|
|
(Of strings and rules in the grammar:)
|
|
A string matches a grammatical production if it belongs to the
|
|
language generated by that production.
|
|
(Of content and content models:)
|
|
An element matches its declaration when it conforms
|
|
in the fashion described in the constraint
|
|
"<A href='#elementvalid'>Element Valid</A>".
|
|
|
|
</DD>
|
|
|
|
|
|
|
|
<DT><B>for compatibility</B></DT>
|
|
<DD><a name='dt-compat'></a>A feature of
|
|
XML included solely to ensure that XML remains compatible with SGML.
|
|
</DD>
|
|
|
|
|
|
|
|
<DT><B>for interoperability</B></DT>
|
|
<DD><a name='dt-interop'></a>A
|
|
non-binding recommendation included to increase the chances that XML
|
|
documents can be processed by the existing installed base of SGML
|
|
processors which predate the
|
|
WebSGML Adaptations Annex to ISO 8879.</DD>
|
|
|
|
|
|
</DL>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<H2><A NAME='sec-documents'>2. Documents</a></h2>
|
|
|
|
<P><a name='dt-xml-doc'></a>
|
|
A data object is an
|
|
<b>XML document</b> if it is
|
|
<A href='#dt-wellformed'>well-formed</A>, as
|
|
defined in this specification.
|
|
A well-formed XML document may in addition be
|
|
<A href='#dt-valid'>valid</A> if it meets certain further
|
|
constraints.</P>
|
|
|
|
<P>Each XML document has both a logical and a physical structure.
|
|
Physically, the document is composed of units called <A href='#dt-entity'>entities</A>. An entity may <A href='#dt-entref'>refer</A> to other entities to cause their
|
|
inclusion in the document. A document begins in a "root" or <A href='#dt-docent'>document entity</A>.
|
|
Logically, the document is composed of declarations, elements,
|
|
comments,
|
|
character references, and
|
|
processing
|
|
instructions, all of which are indicated in the document by explicit
|
|
markup.
|
|
The logical and physical structures must nest properly, as described
|
|
in "<A href='#wf-entities'>4.3.2 Well-Formed Parsed Entities</A>".
|
|
</P>
|
|
|
|
|
|
|
|
<H3><A NAME='sec-well-formed'>2.1 Well-Formed XML Documents</a></h3>
|
|
|
|
<P><a name='dt-wellformed'></a>
|
|
A textual object is
|
|
a well-formed XML document if:
|
|
<OL>
|
|
<LI>Taken as a whole, it
|
|
matches the production labeled <code><a href='#NT-document'>document</A></code>.</LI>
|
|
<LI>It
|
|
meets all the well-formedness constraints given in this specification.
|
|
</LI>
|
|
<LI>Each of the <A href='#dt-parsedent'>parsed entities</A>
|
|
which is referenced directly or indirectly within the document is
|
|
<A href='#wf-entities'>well-formed</A>.</LI>
|
|
</OL>
|
|
|
|
<P>
|
|
</p>
|
|
<table cellpadding='5' border='1' bgcolor='#f5dcb3' width='100%'><tr align='left'><td><strong>
|
|
Document</strong></td></tr><tr><td>
|
|
<table border='0' bgcolor='#f5dcb3'>
|
|
<tr valign='top'><td align='right'><a name='NT-document'></a>[1] </td><td align='right'><font><code>document</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code><a href='#NT-prolog'>prolog</A>
|
|
<a href='#NT-element'>element</A>
|
|
<a href='#NT-Misc'>Misc</A>*</code></font></td></tr>
|
|
</table>
|
|
</td></tr></table>
|
|
<p>
|
|
</P>
|
|
<P>Matching the <code><a href='#NT-document'>document</A></code> production
|
|
implies that:
|
|
<OL>
|
|
<LI>It contains one or more
|
|
<A href='#dt-element'>elements</A>.
|
|
</LI>
|
|
|
|
<LI><a name='dt-root'></a>There is exactly
|
|
one element, called the <b>root</b>, or document element, no
|
|
part of which appears in the <A href='#dt-content'>content</A> of any other element.
|
|
For all other elements, if the start-tag is in the content of another
|
|
element, the end-tag is in the content of the same element. More
|
|
simply stated, the elements, delimited by start- and end-tags, nest
|
|
properly within each other.
|
|
</LI>
|
|
</OL>
|
|
|
|
|
|
<P><a name='dt-parentchild'></a>As a consequence
|
|
of this,
|
|
for each non-root element
|
|
<CODE>C</CODE> in the document, there is one other element <CODE>P</CODE>
|
|
in the document such that
|
|
<CODE>C</CODE> is in the content of <CODE>P</CODE>, but is not in
|
|
the content of any other element that is in the content of
|
|
<CODE>P</CODE>.
|
|
<CODE>P</CODE> is referred to as the
|
|
<b>parent</b> of <CODE>C</CODE>, and <CODE>C</CODE> as a
|
|
<b>child</b> of <CODE>P</CODE>.</P>
|
|
|
|
|
|
|
|
<H3><A NAME='charsets'>2.2 Characters</a></h3>
|
|
|
|
<P><a name='dt-text'></a>A parsed entity contains
|
|
<b>text</b>, a sequence of
|
|
<A href='#dt-character'>characters</A>,
|
|
which may represent markup or character data.
|
|
<a name='dt-character'></a>A <b>character</b>
|
|
is an atomic unit of text as specified by
|
|
ISO/IEC 10646 <A href='#ISO10646'>[ISO/IEC 10646]</A>.
|
|
Legal characters are tab, carriage return, line feed, and the legal
|
|
graphic characters of Unicode and ISO/IEC 10646.
|
|
The use of "compatibility characters", as defined in section 6.8
|
|
of <A href='#Unicode'>[Unicode]</A>, is discouraged.
|
|
|
|
</p>
|
|
<table cellpadding='5' border='1' bgcolor='#f5dcb3' width='100%'><tr align='left'><td><strong>
|
|
Character Range</strong></td></tr><tr><td>
|
|
<table border='0' bgcolor='#f5dcb3'>
|
|
|
|
<tr valign='top'><td align='right'><a name='NT-Char'></a>[2] </td><td align='right'><font><code>Char</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code>#x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD]
|
|
| [#x10000-#x10FFFF]</code></font></td>
|
|
<td align='center'><font><code> /* </code></font></td><td align='left'><font><code>any Unicode character, excluding the
|
|
surrogate blocks, FFFE, and FFFF. */</code></font></td> </tr>
|
|
|
|
</table>
|
|
</td></tr></table>
|
|
<p>
|
|
</P>
|
|
|
|
<P>The mechanism for encoding character code points into bit patterns may
|
|
vary from entity to entity. All XML processors must accept the UTF-8
|
|
and UTF-16 encodings of 10646; the mechanisms for signaling which of
|
|
the two is in use, or for bringing other encodings into play, are
|
|
discussed later, in "<A href='#charencoding'>4.3.3 Character Encoding in Entities</A>".
|
|
</P>
|
|
|
|
|
|
|
|
|
|
|
|
<H3><A NAME='sec-common-syn'>2.3 Common Syntactic Constructs</a></h3>
|
|
|
|
<P>This section defines some symbols used widely in the grammar.</P>
|
|
<P><code><a href='#NT-S'>S</A></code> (white space) consists of one or more space (#x20)
|
|
characters, carriage returns, line feeds, or tabs.
|
|
|
|
</p>
|
|
<table cellpadding='5' border='1' bgcolor='#f5dcb3' width='100%'><tr align='left'><td><strong>
|
|
White Space</strong></td></tr><tr><td>
|
|
<table border='0' bgcolor='#f5dcb3'>
|
|
|
|
<tr valign='top'><td align='right'><a name='NT-S'></a>[3] </td><td align='right'><font><code>S</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code>(#x20 | #x9 | #xD | #xA)+</code></font></td>
|
|
</tr>
|
|
|
|
</table>
|
|
</td></tr></table>
|
|
<p></P>
|
|
<P>Characters are classified for convenience as letters, digits, or other
|
|
characters. Letters consist of an alphabetic or syllabic
|
|
base character possibly
|
|
followed by one or more combining characters, or of an ideographic
|
|
character.
|
|
Full definitions of the specific characters in each class
|
|
are given in "<A href='#CharClasses'>B. Character Classes</A>".</P>
|
|
<P><a name='dt-name'></a>A <b>Name</b> is a token
|
|
beginning with a letter or one of a few punctuation characters, and continuing
|
|
with letters, digits, hyphens, underscores, colons, or full stops, together
|
|
known as name characters.
|
|
Names beginning with the string "<CODE>xml</CODE>", or any string
|
|
which would match <CODE>(('X'|'x') ('M'|'m') ('L'|'l'))</CODE>, are
|
|
reserved for standardization in this or future versions of this
|
|
specification.
|
|
</P>
|
|
|
|
<P><b>Note:</b> The colon character within XML names is reserved for experimentation with
|
|
name spaces.
|
|
Its meaning is expected to be
|
|
standardized at some future point, at which point those documents
|
|
using the colon for experimental purposes may need to be updated.
|
|
(There is no guarantee that any name-space mechanism
|
|
adopted for XML will in fact use the colon as a name-space delimiter.)
|
|
In practice, this means that authors should not use the colon in XML
|
|
names except as part of name-space experiments, but that XML processors
|
|
should accept the colon as a name character.</P>
|
|
|
|
<P>An
|
|
<code><a href='#NT-Nmtoken'>Nmtoken</A></code> (name token) is any mixture of
|
|
name characters.
|
|
</p>
|
|
<table cellpadding='5' border='1' bgcolor='#f5dcb3' width='100%'><tr align='left'><td><strong>
|
|
Names and Tokens</strong></td></tr><tr><td>
|
|
<table border='0' bgcolor='#f5dcb3'>
|
|
<tr valign='top'><td align='right'><a name='NT-NameChar'></a>[4] </td><td align='right'><font><code>NameChar</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code><a href='#NT-Letter'>Letter</A>
|
|
| <a href='#NT-Digit'>Digit</A>
|
|
| '.' | '-' | '_' | ':'
|
|
| <a href='#NT-CombiningChar'>CombiningChar</A>
|
|
| <a href='#NT-Extender'>Extender</A></code></font></td>
|
|
</tr>
|
|
<tr valign='top'><td align='right'><a name='NT-Name'></a>[5] </td><td align='right'><font><code>Name</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code>(<a href='#NT-Letter'>Letter</A> | '_' | ':')
|
|
(<a href='#NT-NameChar'>NameChar</A>)*</code></font></td></tr>
|
|
<tr valign='top'><td align='right'><a name='NT-Names'></a>[6] </td><td align='right'><font><code>Names</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code><a href='#NT-Name'>Name</A>
|
|
(<a href='#NT-S'>S</A> <a href='#NT-Name'>Name</A>)*</code></font></td></tr>
|
|
<tr valign='top'><td align='right'><a name='NT-Nmtoken'></a>[7] </td><td align='right'><font><code>Nmtoken</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code>(<a href='#NT-NameChar'>NameChar</A>)+</code></font></td></tr>
|
|
<tr valign='top'><td align='right'><a name='NT-Nmtokens'></a>[8] </td><td align='right'><font><code>Nmtokens</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code><a href='#NT-Nmtoken'>Nmtoken</A> (<a href='#NT-S'>S</A> <a href='#NT-Nmtoken'>Nmtoken</A>)*</code></font></td></tr>
|
|
</table>
|
|
</td></tr></table>
|
|
<p>
|
|
</P>
|
|
<P>Literal data is any quoted string not containing
|
|
the quotation mark used as a delimiter for that string.
|
|
Literals are used
|
|
for specifying the content of internal entities
|
|
(<code><a href='#NT-EntityValue'>EntityValue</A></code>),
|
|
the values of attributes (<code><a href='#NT-AttValue'>AttValue</A></code>),
|
|
and external identifiers
|
|
(<code><a href='#NT-SystemLiteral'>SystemLiteral</A></code>).
|
|
Note that a <code><a href='#NT-SystemLiteral'>SystemLiteral</A></code>
|
|
can be parsed without scanning for markup.
|
|
</p>
|
|
<table cellpadding='5' border='1' bgcolor='#f5dcb3' width='100%'><tr align='left'><td><strong>
|
|
Literals</strong></td></tr><tr><td>
|
|
<table border='0' bgcolor='#f5dcb3'>
|
|
<tr valign='top'><td align='right'><a name='NT-EntityValue'></a>[9] </td><td align='right'><font><code>EntityValue</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code>'"'
|
|
([^%&"]
|
|
| <a href='#NT-PEReference'>PEReference</A>
|
|
| <a href='#NT-Reference'>Reference</A>)*
|
|
'"'
|
|
</code></font></td>
|
|
</tr><tr valign='top'><td align='right'></td><td></td><td></td><td align='left'><font><code>|
|
|
"'"
|
|
([^%&']
|
|
| <a href='#NT-PEReference'>PEReference</A>
|
|
| <a href='#NT-Reference'>Reference</A>)*
|
|
"'"</code></font></td>
|
|
</tr>
|
|
<tr valign='top'><td align='right'><a name='NT-AttValue'></a>[10] </td><td align='right'><font><code>AttValue</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code>'"'
|
|
([^<&"]
|
|
| <a href='#NT-Reference'>Reference</A>)*
|
|
'"'
|
|
</code></font></td>
|
|
</tr><tr valign='top'><td align='right'></td><td></td><td></td><td align='left'><font><code>|
|
|
"'"
|
|
([^<&']
|
|
| <a href='#NT-Reference'>Reference</A>)*
|
|
"'"</code></font></td>
|
|
</tr>
|
|
<tr valign='top'><td align='right'><a name='NT-SystemLiteral'></a>[11] </td><td align='right'><font><code>SystemLiteral</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code>('"' [^"]* '"') | ("'" [^']* "'")
|
|
</code></font></td>
|
|
</tr>
|
|
<tr valign='top'><td align='right'><a name='NT-PubidLiteral'></a>[12] </td><td align='right'><font><code>PubidLiteral</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code>'"' <a href='#NT-PubidChar'>PubidChar</A>*
|
|
'"'
|
|
| "'" (<a href='#NT-PubidChar'>PubidChar</A> - "'")* "'"</code></font></td>
|
|
</tr>
|
|
<tr valign='top'><td align='right'><a name='NT-PubidChar'></a>[13] </td><td align='right'><font><code>PubidChar</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code>#x20 | #xD | #xA
|
|
| [a-zA-Z0-9]
|
|
| [-'()+,./:=?;!*#@$_%]</code></font></td>
|
|
</tr>
|
|
</table>
|
|
</td></tr></table>
|
|
<p>
|
|
</P>
|
|
|
|
|
|
|
|
|
|
|
|
<H3><A NAME='syntax'>2.4 Character Data and Markup</a></h3>
|
|
|
|
<P><A href='#dt-text'>Text</A> consists of intermingled
|
|
<A href='#dt-chardata'>character
|
|
data</A> and markup.
|
|
<a name='dt-markup'></a><b>Markup</b> takes the form of
|
|
<A href='#dt-stag'>start-tags</A>,
|
|
<A href='#dt-etag'>end-tags</A>,
|
|
<A href='#dt-empty'>empty-element tags</A>,
|
|
<A href='#dt-entref'>entity references</A>,
|
|
<A href='#dt-charref'>character references</A>,
|
|
<A href='#dt-comment'>comments</A>,
|
|
<A href='#dt-cdsection'>CDATA section</A> delimiters,
|
|
<A href='#dt-doctype'>document type declarations</A>, and
|
|
<A href='#dt-pi'>processing instructions</A>.
|
|
|
|
</P>
|
|
<P><a name='dt-chardata'></a>All text that is not markup
|
|
constitutes the <b>character data</b> of
|
|
the document.</P>
|
|
<P>The ampersand character (&) and the left angle bracket (<)
|
|
may appear in their literal form <EM>only</EM> when used as markup
|
|
delimiters, or within a <A href='#dt-comment'>comment</A>, a
|
|
<A href='#dt-pi'>processing instruction</A>,
|
|
or a <A href='#dt-cdsection'>CDATA section</A>.
|
|
|
|
They are also legal within the <A href='#dt-litentval'>literal entity
|
|
value</A> of an internal entity declaration; see
|
|
"<A href='#wf-entities'>4.3.2 Well-Formed Parsed Entities</A>".
|
|
|
|
If they are needed elsewhere,
|
|
they must be <A href='#dt-escape'>escaped</A>
|
|
using either <A href='#dt-charref'>numeric character references</A>
|
|
or the strings
|
|
"<CODE>&amp;</CODE>" and "<CODE>&lt;</CODE>" respectively.
|
|
The right angle
|
|
bracket (>) may be represented using the string
|
|
"<CODE>&gt;</CODE>", and must, <A href='#dt-compat'>for
|
|
compatibility</A>,
|
|
be escaped using
|
|
"<CODE>&gt;</CODE>" or a character reference
|
|
when it appears in the string
|
|
"<CODE>]]></CODE>"
|
|
in content,
|
|
when that string is not marking the end of
|
|
a <A href='#dt-cdsection'>CDATA section</A>.
|
|
</P>
|
|
<P>
|
|
In the content of elements, character data
|
|
is any string of characters which does
|
|
not contain the start-delimiter of any markup.
|
|
In a CDATA section, character data
|
|
is any string of characters not including the CDATA-section-close
|
|
delimiter, "<CODE>]]></CODE>".</P>
|
|
<P>
|
|
To allow attribute values to contain both single and double quotes, the
|
|
apostrophe or single-quote character (') may be represented as
|
|
"<CODE>&apos;</CODE>", and the double-quote character (") as
|
|
"<CODE>&quot;</CODE>".
|
|
</p>
|
|
<table cellpadding='5' border='1' bgcolor='#f5dcb3' width='100%'><tr align='left'><td><strong>
|
|
Character Data</strong></td></tr><tr><td>
|
|
<table border='0' bgcolor='#f5dcb3'>
|
|
<tr valign='top'>
|
|
<td align='right'><a name='NT-CharData'></a>[14] </td><td align='right'><font><code>CharData</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code>[^<&]* - ([^<&]* ']]>' [^<&]*)</code></font></td>
|
|
</tr>
|
|
</table>
|
|
</td></tr></table>
|
|
<p>
|
|
</P>
|
|
|
|
|
|
|
|
|
|
<H3><A NAME='sec-comments'>2.5 Comments</a></h3>
|
|
|
|
<P><a name='dt-comment'></a><b>Comments</b> may
|
|
appear anywhere in a document outside other
|
|
<A href='#dt-markup'>markup</A>; in addition,
|
|
they may appear within the document type declaration
|
|
at places allowed by the grammar.
|
|
They are not part of the document's <A href='#dt-chardata'>character
|
|
data</A>; an XML
|
|
processor may, but need not, make it possible for an application to
|
|
retrieve the text of comments.
|
|
<A href='#dt-compat'>For compatibility</A>, the string
|
|
"<CODE>--</CODE>" (double-hyphen) must not occur within
|
|
comments.
|
|
</p>
|
|
<table cellpadding='5' border='1' bgcolor='#f5dcb3' width='100%'><tr align='left'><td><strong>
|
|
Comments</strong></td></tr><tr><td>
|
|
<table border='0' bgcolor='#f5dcb3'>
|
|
<tr valign='top'><td align='right'><a name='NT-Comment'></a>[15] </td><td align='right'><font><code>Comment</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code>'<!--'
|
|
((<a href='#NT-Char'>Char</A> - '-')
|
|
| ('-' (<a href='#NT-Char'>Char</A> - '-')))*
|
|
'-->'</code></font></td>
|
|
</tr>
|
|
</table>
|
|
</td></tr></table>
|
|
<p>
|
|
</P>
|
|
<P>An example of a comment:
|
|
</p><table cellpadding='5' border='1' bgcolor='#80ffff' width='100%'><tr><td><code><font><!-- declarations for <head> & <body> --></font></code></td></tr></table><p>
|
|
</P>
|
|
|
|
|
|
|
|
|
|
<H3><A NAME='sec-pi'>2.6 Processing Instructions</a></h3>
|
|
|
|
<P><a name='dt-pi'></a><b>Processing
|
|
instructions</b> (PIs) allow documents to contain instructions
|
|
for applications.
|
|
|
|
</p>
|
|
<table cellpadding='5' border='1' bgcolor='#f5dcb3' width='100%'><tr align='left'><td><strong>
|
|
Processing Instructions</strong></td></tr><tr><td>
|
|
<table border='0' bgcolor='#f5dcb3'>
|
|
<tr valign='top'><td align='right'><a name='NT-PI'></a>[16] </td><td align='right'><font><code>PI</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code>'<?' <a href='#NT-PITarget'>PITarget</A>
|
|
(<a href='#NT-S'>S</A>
|
|
(<a href='#NT-Char'>Char</A>* -
|
|
(<a href='#NT-Char'>Char</A>* '?>' <a href='#NT-Char'>Char</A>*)))?
|
|
'?>'</code></font></td></tr>
|
|
<tr valign='top'><td align='right'><a name='NT-PITarget'></a>[17] </td><td align='right'><font><code>PITarget</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code><a href='#NT-Name'>Name</A> -
|
|
(('X' | 'x') ('M' | 'm') ('L' | 'l'))</code></font></td>
|
|
</tr>
|
|
</table>
|
|
</td></tr></table>
|
|
<p>
|
|
PIs are not part of the document's <A href='#dt-chardata'>character
|
|
data</A>, but must be passed through to the application. The
|
|
PI begins with a target (<code><a href='#NT-PITarget'>PITarget</A></code>) used
|
|
to identify the application to which the instruction is directed.
|
|
The target names "<CODE>XML</CODE>", "<CODE>xml</CODE>", and so on are
|
|
reserved for standardization in this or future versions of this
|
|
specification.
|
|
The
|
|
XML <A href='#dt-notation'>Notation</A> mechanism
|
|
may be used for
|
|
formal declaration of PI targets.
|
|
</P>
|
|
|
|
|
|
|
|
|
|
<H3><A NAME='sec-cdata-sect'>2.7 CDATA Sections</a></h3>
|
|
|
|
<P><a name='dt-cdsection'></a><b>CDATA sections</b>
|
|
may occur
|
|
anywhere character data may occur; they are
|
|
used to escape blocks of text containing characters which would
|
|
otherwise be recognized as markup. CDATA sections begin with the
|
|
string "<CODE><![CDATA[</CODE>" and end with the string
|
|
"<CODE>]]></CODE>":
|
|
</p>
|
|
<table cellpadding='5' border='1' bgcolor='#f5dcb3' width='100%'><tr align='left'><td><strong>
|
|
CDATA Sections</strong></td></tr><tr><td>
|
|
<table border='0' bgcolor='#f5dcb3'>
|
|
<tr valign='top'><td align='right'><a name='NT-CDSect'></a>[18] </td><td align='right'><font><code>CDSect</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code><a href='#NT-CDStart'>CDStart</A>
|
|
<a href='#NT-CData'>CData</A>
|
|
<a href='#NT-CDEnd'>CDEnd</A></code></font></td></tr>
|
|
<tr valign='top'><td align='right'><a name='NT-CDStart'></a>[19] </td><td align='right'><font><code>CDStart</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code>'<![CDATA['</code></font></td>
|
|
</tr>
|
|
<tr valign='top'><td align='right'><a name='NT-CData'></a>[20] </td><td align='right'><font><code>CData</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code>(<a href='#NT-Char'>Char</A>* -
|
|
(<a href='#NT-Char'>Char</A>* ']]>' <a href='#NT-Char'>Char</A>*))
|
|
</code></font></td>
|
|
</tr>
|
|
<tr valign='top'><td align='right'><a name='NT-CDEnd'></a>[21] </td><td align='right'><font><code>CDEnd</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code>']]>'</code></font></td>
|
|
</tr>
|
|
</table>
|
|
</td></tr></table>
|
|
<p>
|
|
|
|
Within a CDATA section, only the <code><a href='#NT-CDEnd'>CDEnd</A></code> string is
|
|
recognized as markup, so that left angle brackets and ampersands may occur in
|
|
their literal form; they need not (and cannot) be escaped using
|
|
"<CODE>&lt;</CODE>" and "<CODE>&amp;</CODE>". CDATA sections
|
|
cannot nest.
|
|
</P>
|
|
|
|
<P>An example of a CDATA section, in which "<CODE><greeting></CODE>" and
|
|
"<CODE></greeting></CODE>"
|
|
are recognized as <A href='#dt-chardata'>character data</A>, not
|
|
<A href='#dt-markup'>markup</A>:
|
|
</p><table cellpadding='5' border='1' bgcolor='#80ffff' width='100%'><tr><td><code><font><![CDATA[<greeting>Hello, world!</greeting>]]></font></code></td></tr></table><p>
|
|
</P>
|
|
|
|
|
|
|
|
|
|
<H3><A NAME='sec-prolog-dtd'>2.8 Prolog and Document Type Declaration</a></h3>
|
|
|
|
<P><a name='dt-xmldecl'></a>XML documents
|
|
may, and should,
|
|
begin with an <b>XML declaration</b> which specifies
|
|
the version of
|
|
XML being used.
|
|
For example, the following is a complete XML document, <A href='#dt-wellformed'>well-formed</A> but not
|
|
<A href='#dt-valid'>valid</A>:
|
|
</p><table cellpadding='5' border='1' bgcolor='#80ffff' width='100%'><tr><td><code><font><?xml version="1.0"?><BR><greeting>Hello, world!</greeting><BR></font></code></td></tr></table><p>
|
|
and so is this:
|
|
</p><table cellpadding='5' border='1' bgcolor='#80ffff' width='100%'><tr><td><code><font><greeting>Hello, world!</greeting><BR></font></code></td></tr></table><p>
|
|
</P>
|
|
|
|
<P>The version number "<CODE>1.0</CODE>" should be used to indicate
|
|
conformance to this version of this specification; it is an error
|
|
for a document to use the value "<CODE>1.0</CODE>"
|
|
if it does not conform to this version of this specification.
|
|
It is the intent
|
|
of the XML working group to give later versions of this specification
|
|
numbers other than "<CODE>1.0</CODE>", but this intent does not
|
|
indicate a
|
|
commitment to produce any future versions of XML, nor if any are produced, to
|
|
use any particular numbering scheme.
|
|
Since future versions are not ruled out, this construct is provided
|
|
as a means to allow the possibility of automatic version recognition, should
|
|
it become necessary.
|
|
Processors may signal an error if they receive documents labeled with
|
|
versions they do not support.
|
|
</P>
|
|
<P>The function of the markup in an XML document is to describe its
|
|
storage and logical structure and to associate attribute-value pairs
|
|
with its logical structures. XML provides a mechanism, the <A href='#dt-doctype'>document type declaration</A>, to define
|
|
constraints on the logical structure and to support the use of
|
|
predefined storage units.
|
|
|
|
<a name='dt-valid'></a>An XML document is
|
|
<b>valid</b> if it has an associated document type
|
|
declaration and if the document
|
|
complies with the constraints expressed in it.</P>
|
|
<P>The document type declaration must appear before
|
|
the first <A href='#dt-element'>element</A> in the document.
|
|
</p>
|
|
<table cellpadding='5' border='1' bgcolor='#f5dcb3' width='100%'><tr align='left'><td><strong>
|
|
Prolog</strong></td></tr><tr><td>
|
|
<table border='0' bgcolor='#f5dcb3'>
|
|
|
|
<tr valign='top'><td align='right'><a name='NT-prolog'></a>[22] </td><td align='right'><font><code>prolog</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code><a href='#NT-XMLDecl'>XMLDecl</A>?
|
|
<a href='#NT-Misc'>Misc</A>*
|
|
(<a href='#NT-doctypedecl'>doctypedecl</A>
|
|
<a href='#NT-Misc'>Misc</A>*)?</code></font></td></tr>
|
|
<tr valign='top'><td align='right'><a name='NT-XMLDecl'></a>[23] </td><td align='right'><font><code>XMLDecl</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code>'<?xml'
|
|
<a href='#NT-VersionInfo'>VersionInfo</A>
|
|
<a href='#NT-EncodingDecl'>EncodingDecl</A>?
|
|
<a href='#NT-SDDecl'>SDDecl</A>?
|
|
<a href='#NT-S'>S</A>?
|
|
'?>'</code></font></td>
|
|
</tr>
|
|
<tr valign='top'><td align='right'><a name='NT-VersionInfo'></a>[24] </td><td align='right'><font><code>VersionInfo</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code><a href='#NT-S'>S</A> 'version' <a href='#NT-Eq'>Eq</A>
|
|
(' <a href='#NT-VersionNum'>VersionNum</A> '
|
|
| " <a href='#NT-VersionNum'>VersionNum</A> ")</code></font></td>
|
|
</tr>
|
|
<tr valign='top'><td align='right'><a name='NT-Eq'></a>[25] </td><td align='right'><font><code>Eq</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code><a href='#NT-S'>S</A>? '=' <a href='#NT-S'>S</A>?</code></font></td></tr>
|
|
<tr valign='top'>
|
|
<td align='right'><a name='NT-VersionNum'></a>[26] </td><td align='right'><font><code>VersionNum</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code>([a-zA-Z0-9_.:] | '-')+</code></font></td>
|
|
</tr>
|
|
<tr valign='top'><td align='right'><a name='NT-Misc'></a>[27] </td><td align='right'><font><code>Misc</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code><a href='#NT-Comment'>Comment</A> | <a href='#NT-PI'>PI</A> |
|
|
<a href='#NT-S'>S</A></code></font></td></tr>
|
|
|
|
</table>
|
|
</td></tr></table>
|
|
<p></P>
|
|
|
|
<P><a name='dt-doctype'></a>The XML
|
|
<b>document type declaration</b>
|
|
contains or points to
|
|
<A href='#dt-markupdecl'>markup declarations</A>
|
|
that provide a grammar for a
|
|
class of documents.
|
|
This grammar is known as a document type definition,
|
|
or <b>DTD</b>.
|
|
The document type declaration can point to an external subset (a
|
|
special kind of
|
|
<A href='#dt-extent'>external entity</A>) containing markup
|
|
declarations, or can
|
|
contain the markup declarations directly in an internal subset, or can do
|
|
both.
|
|
The DTD for a document consists of both subsets taken
|
|
together.
|
|
</P>
|
|
<P><a name='dt-markupdecl'></a>
|
|
A <b>markup declaration</b> is
|
|
an <A href='#dt-eldecl'>element type declaration</A>,
|
|
an <A href='#dt-attdecl'>attribute-list declaration</A>,
|
|
an <A href='#dt-entdecl'>entity declaration</A>, or
|
|
a <A href='#dt-notdecl'>notation declaration</A>.
|
|
|
|
These declarations may be contained in whole or in part
|
|
within <A href='#dt-PE'>parameter entities</A>,
|
|
as described in the well-formedness and validity constraints below.
|
|
For fuller information, see
|
|
"<A href='#sec-physical-struct'>4. Physical Structures</A>".</P>
|
|
|
|
<table cellpadding='5' border='1' bgcolor='#f5dcb3' width='100%'><tr align='left'><td><strong>
|
|
Document Type Definition</strong></td></tr><tr><td>
|
|
<table border='0' bgcolor='#f5dcb3'>
|
|
|
|
<tr valign='top'><td align='right'><a name='NT-doctypedecl'></a>[28] </td><td align='right'><font><code>doctypedecl</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code>'<!DOCTYPE' <a href='#NT-S'>S</A>
|
|
<a href='#NT-Name'>Name</A> (<a href='#NT-S'>S</A>
|
|
<a href='#NT-ExternalID'>ExternalID</A>)?
|
|
<a href='#NT-S'>S</A>? ('['
|
|
(<a href='#NT-markupdecl'>markupdecl</A>
|
|
| <a href='#NT-PEReference'>PEReference</A>
|
|
| <a href='#NT-S'>S</A>)*
|
|
']'
|
|
<a href='#NT-S'>S</A>?)? '>'</code></font></td>
|
|
<td align='center'><font><code> [ </code></font></td><td align='left'><font><code>VC: <a href='#vc-roottype'>Root Element Type</A> ]</code></font></td>
|
|
</tr>
|
|
<tr valign='top'><td align='right'><a name='NT-markupdecl'></a>[29] </td><td align='right'><font><code>markupdecl</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code><a href='#NT-elementdecl'>elementdecl</A>
|
|
| <a href='#NT-AttlistDecl'>AttlistDecl</A>
|
|
| <a href='#NT-EntityDecl'>EntityDecl</A>
|
|
| <a href='#NT-NotationDecl'>NotationDecl</A>
|
|
| <a href='#NT-PI'>PI</A>
|
|
| <a href='#NT-Comment'>Comment</A>
|
|
</code></font></td>
|
|
<td align='center'><font><code> [ </code></font></td><td align='left'><font><code>VC: <a href='#vc-PEinMarkupDecl'>Proper Declaration/PE Nesting</A> ]</code></font></td>
|
|
</tr><tr valign='top'><td></td><td></td><td></td><td></td><td align='center'><font><code> [ </code></font></td><td align='left'><font><code>WFC: <a href='#wfc-PEinInternalSubset'>PEs in Internal Subset</A> ]</code></font></td>
|
|
</tr>
|
|
|
|
|
|
</table>
|
|
</td></tr></table>
|
|
|
|
|
|
<P>The markup declarations may be made up in whole or in part of
|
|
the <A href='#dt-repltext'>replacement text</A> of
|
|
<A href='#dt-PE'>parameter entities</A>.
|
|
The productions later in this specification for
|
|
individual nonterminals (<code><a href='#NT-elementdecl'>elementdecl</A></code>,
|
|
<code><a href='#NT-AttlistDecl'>AttlistDecl</A></code>, and so on) describe
|
|
the declarations <EM>after</EM> all the parameter entities have been
|
|
<A href='#dt-include'>included</A>.</P>
|
|
|
|
<A NAME='vc-roottype'></A><P><b>Validity Constraint:
|
|
Root Element Type</b><br>
|
|
|
|
The <code><a href='#NT-Name'>Name</A></code> in the document type declaration must
|
|
match the element type of the <A href='#dt-root'>root element</A>.
|
|
|
|
</P>
|
|
|
|
<A NAME='vc-PEinMarkupDecl'></A><P><b>Validity Constraint:
|
|
Proper Declaration/PE Nesting</b><br>
|
|
Parameter-entity
|
|
<A href='#dt-repltext'>replacement text</A> must be properly nested
|
|
with markup declarations.
|
|
That is to say, if either the first character
|
|
or the last character of a markup
|
|
declaration (<code><a href='#NT-markupdecl'>markupdecl</A></code> above)
|
|
is contained in the replacement text for a
|
|
<A href='#dt-PERef'>parameter-entity reference</A>,
|
|
both must be contained in the same replacement text.
|
|
</P>
|
|
<A NAME='wfc-PEinInternalSubset'></A><P><b>Well-Formedness Constraint:
|
|
PEs in Internal Subset</b><br>
|
|
In the internal DTD subset,
|
|
<A href='#dt-PERef'>parameter-entity references</A>
|
|
can occur only where markup declarations can occur, not
|
|
within markup declarations. (This does not apply to
|
|
references that occur in
|
|
external parameter entities or to the external subset.)
|
|
|
|
</P>
|
|
<P>
|
|
Like the internal subset, the external subset and
|
|
any external parameter entities referred to in the DTD
|
|
must consist of a series of complete markup declarations of the types
|
|
allowed by the non-terminal symbol
|
|
<code><a href='#NT-markupdecl'>markupdecl</A></code>, interspersed with white space
|
|
or <A href='#dt-PERef'>parameter-entity references</A>.
|
|
However, portions of the contents
|
|
of the
|
|
external subset or of external parameter entities may conditionally be ignored
|
|
by using
|
|
the <A href='#dt-cond-section'>conditional section</A>
|
|
construct; this is not allowed in the internal subset.
|
|
|
|
</p>
|
|
<table cellpadding='5' border='1' bgcolor='#f5dcb3' width='100%'><tr align='left'><td><strong>
|
|
External Subset</strong></td></tr><tr><td>
|
|
<table border='0' bgcolor='#f5dcb3'>
|
|
|
|
<tr valign='top'><td align='right'><a name='NT-extSubset'></a>[30] </td><td align='right'><font><code>extSubset</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code><a href='#NT-TextDecl'>TextDecl</A>?
|
|
<a href='#NT-extSubsetDecl'>extSubsetDecl</A></code></font></td></tr>
|
|
<tr valign='top'><td align='right'><a name='NT-extSubsetDecl'></a>[31] </td><td align='right'><font><code>extSubsetDecl</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code>(
|
|
<a href='#NT-markupdecl'>markupdecl</A>
|
|
| <a href='#NT-conditionalSect'>conditionalSect</A>
|
|
| <a href='#NT-PEReference'>PEReference</A>
|
|
| <a href='#NT-S'>S</A>
|
|
)*</code></font></td>
|
|
</tr>
|
|
|
|
</table>
|
|
</td></tr></table>
|
|
<p></P>
|
|
<P>The external subset and external parameter entities also differ
|
|
from the internal subset in that in them,
|
|
<A href='#dt-PERef'>parameter-entity references</A>
|
|
are permitted <EM>within</EM> markup declarations,
|
|
not only <EM>between</EM> markup declarations.</P>
|
|
<P>An example of an XML document with a document type declaration:
|
|
</p><table cellpadding='5' border='1' bgcolor='#80ffff' width='100%'><tr><td><code><font><?xml version="1.0"?><BR><!DOCTYPE greeting SYSTEM "hello.dtd"><BR><greeting>Hello, world!</greeting><BR></font></code></td></tr></table><p>
|
|
The <A href='#dt-sysid'>system identifier</A>
|
|
"<CODE>hello.dtd</CODE>" gives the URI of a DTD for the document.</P>
|
|
<P>The declarations can also be given locally, as in this
|
|
example:
|
|
</p><table cellpadding='5' border='1' bgcolor='#80ffff' width='100%'><tr><td><code><font><?xml version="1.0" encoding="UTF-8" ?><BR><!DOCTYPE greeting [<BR> <!ELEMENT greeting (#PCDATA)><BR>]><BR><greeting>Hello, world!</greeting><BR></font></code></td></tr></table><p>
|
|
If both the external and internal subsets are used, the
|
|
internal subset is considered to occur before the external subset.
|
|
|
|
This has the effect that entity and attribute-list declarations in the
|
|
internal subset take precedence over those in the external subset.
|
|
</P>
|
|
|
|
|
|
|
|
|
|
<H3><A NAME='sec-rmd'>2.9 Standalone Document Declaration</a></h3>
|
|
<P>Markup declarations can affect the content of the document,
|
|
as passed from an <A href='#dt-xml-proc'>XML processor</A>
|
|
to an application; examples are attribute defaults and entity
|
|
declarations.
|
|
The standalone document declaration,
|
|
which may appear as a component of the XML declaration, signals
|
|
whether or not there are such declarations which appear external to
|
|
the <A href='#dt-docent'>document entity</A>.
|
|
</p>
|
|
<table cellpadding='5' border='1' bgcolor='#f5dcb3' width='100%'><tr align='left'><td><strong>
|
|
Standalone Document Declaration</strong></td></tr><tr><td>
|
|
<table border='0' bgcolor='#f5dcb3'>
|
|
|
|
<tr valign='top'><td align='right'><a name='NT-SDDecl'></a>[32] </td><td align='right'><font><code>SDDecl</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code>
|
|
<a href='#NT-S'>S</A>
|
|
'standalone' <a href='#NT-Eq'>Eq</A>
|
|
(("'" ('yes' | 'no') "'") | ('"' ('yes' | 'no') '"'))
|
|
</code></font></td>
|
|
<td align='center'><font><code> [ </code></font></td><td align='left'><font><code>VC: <a href='#vc-check-rmd'>Standalone Document Declaration</A> ]</code></font></td></tr>
|
|
|
|
</table>
|
|
</td></tr></table>
|
|
<p></P>
|
|
<P>
|
|
In a standalone document declaration, the value "<CODE>yes</CODE>" indicates
|
|
that there
|
|
are no markup declarations external to the <A href='#dt-docent'>document
|
|
entity</A> (either in the DTD external subset, or in an
|
|
external parameter entity referenced from the internal subset)
|
|
which affect the information passed from the XML processor to
|
|
the application.
|
|
The value "<CODE>no</CODE>" indicates that there are or may be such
|
|
external markup declarations.
|
|
Note that the standalone document declaration only
|
|
denotes the presence of external <EM>declarations</EM>; the presence, in a
|
|
document, of
|
|
references to external <EM>entities</EM>, when those entities are
|
|
internally declared,
|
|
does not change its standalone status.</P>
|
|
<P>If there are no external markup declarations, the standalone document
|
|
declaration has no meaning.
|
|
If there are external markup declarations but there is no standalone
|
|
document declaration, the value "<CODE>no</CODE>" is assumed.</P>
|
|
<P>Any XML document for which <CODE>standalone="no"</CODE> holds can
|
|
be converted algorithmically to a standalone document,
|
|
which may be desirable for some network delivery applications.</P>
|
|
<A NAME='vc-check-rmd'></A><P><b>Validity Constraint:
|
|
Standalone Document Declaration</b><br>
|
|
The standalone document declaration must have
|
|
the value "<CODE>no</CODE>" if any external markup declarations
|
|
contain declarations of:<UL>
|
|
<LI>attributes with <A href='#dt-default'>default</A> values, if
|
|
elements to which
|
|
these attributes apply appear in the document without
|
|
specifications of values for these attributes, or</LI>
|
|
<LI>entities (other than <CODE>amp</CODE>,
|
|
<CODE>lt</CODE>,
|
|
<CODE>gt</CODE>,
|
|
<CODE>apos</CODE>,
|
|
<CODE>quot</CODE>),
|
|
if <A href='#dt-entref'>references</A> to those
|
|
entities appear in the document, or
|
|
</LI>
|
|
<LI>attributes with values subject to
|
|
<A href='#AVNormalize'>normalization</A>, where the
|
|
attribute appears in the document with a value which will
|
|
change as a result of normalization, or
|
|
</LI>
|
|
<LI>
|
|
element types with <A href='#dt-elemcontent'>element content</A>,
|
|
if white space occurs
|
|
directly within any instance of those types.
|
|
</LI>
|
|
</UL>
|
|
|
|
|
|
<P>An example XML declaration with a standalone document declaration:</p><table cellpadding='5' border='1' bgcolor='#80ffff' width='100%'><tr><td><code><font><?xml version="1.0" standalone='yes'?></font></code></td></tr></table><p></P>
|
|
|
|
|
|
|
|
<H3><A NAME='sec-white-space'>2.10 White Space Handling</a></h3>
|
|
|
|
<P>In editing XML documents, it is often convenient to use "white space"
|
|
(spaces, tabs, and blank lines, denoted by the nonterminal
|
|
<code><a href='#NT-S'>S</A></code> in this specification) to
|
|
set apart the markup for greater readability. Such white space is typically
|
|
not intended for inclusion in the delivered version of the document.
|
|
On the other hand, "significant" white space that should be preserved in the
|
|
delivered version is common, for example in poetry and
|
|
source code.</P>
|
|
<P>An <A href='#dt-xml-proc'>XML processor</A>
|
|
must always pass all characters in a document that are not
|
|
markup through to the application. A <A href='#dt-validating'>
|
|
validating XML processor</A> must also inform the application
|
|
which of these characters constitute white space appearing
|
|
in <A href='#dt-elemcontent'>element content</A>.
|
|
</P>
|
|
<P>A special <A href='#dt-attr'>attribute</A>
|
|
named <code>xml:space</code> may be attached to an element
|
|
to signal an intention that in that element,
|
|
white space should be preserved by applications.
|
|
In valid documents, this attribute, like any other, must be
|
|
<A href='#dt-attdecl'>declared</A> if it is used.
|
|
When declared, it must be given as an
|
|
<A href='#dt-enumerated'>enumerated type</A> whose only
|
|
possible values are "<CODE>default</CODE>" and "<CODE>preserve</CODE>".
|
|
For example:</p><table cellpadding='5' border='1' bgcolor='#80ffff' width='100%'><tr><td><code><font> <!ATTLIST poem xml:space (default|preserve) 'preserve'></font></code></td></tr></table><p></P>
|
|
<P>The value "<CODE>default</CODE>" signals that applications'
|
|
default white-space processing modes are acceptable for this element; the
|
|
value "<CODE>preserve</CODE>" indicates the intent that applications preserve
|
|
all the white space.
|
|
This declared intent is considered to apply to all elements within the content
|
|
of the element where it is specified, unless overriden with another instance
|
|
of the <code>xml:space</code> attribute.
|
|
</P>
|
|
<P>The <A href='#dt-root'>root element</A> of any document
|
|
is considered to have signaled no intentions as regards application space
|
|
handling, unless it provides a value for
|
|
this attribute or the attribute is declared with a default value.
|
|
</P>
|
|
|
|
|
|
|
|
|
|
<H3><A NAME='sec-line-ends'>2.11 End-of-Line Handling</a></h3>
|
|
<P>XML <A href='#dt-parsedent'>parsed entities</A> are often stored in
|
|
computer files which, for editing convenience, are organized into lines.
|
|
These lines are typically separated by some combination of the characters
|
|
carriage-return (#xD) and line-feed (#xA).</P>
|
|
<P>To simplify the tasks of <A href='#dt-app'>applications</A>,
|
|
wherever an external parsed entity or the literal entity value
|
|
of an internal parsed entity contains either the literal
|
|
two-character sequence "#xD#xA" or a standalone literal
|
|
#xD, an <A href='#dt-xml-proc'>XML processor</A> must
|
|
pass to the application the single character #xA.
|
|
(This behavior can
|
|
conveniently be produced by normalizing all
|
|
line breaks to #xA on input, before parsing.)
|
|
</P>
|
|
|
|
|
|
|
|
<H3><A NAME='sec-lang-tag'>2.12 Language Identification</a></h3>
|
|
<P>In document processing, it is often useful to
|
|
identify the natural or formal language
|
|
in which the content is
|
|
written.
|
|
A special <A href='#dt-attr'>attribute</A> named
|
|
<code>xml:lang</code> may be inserted in
|
|
documents to specify the
|
|
language used in the contents and attribute values
|
|
of any element in an XML document.
|
|
In valid documents, this attribute, like any other, must be
|
|
<A href='#dt-attdecl'>declared</A> if it is used.
|
|
The values of the attribute are language identifiers as defined
|
|
by <A href='#RFC1766'>[IETF RFC 1766]</A>, "Tags for the Identification of Languages":
|
|
</p>
|
|
<table cellpadding='5' border='1' bgcolor='#f5dcb3' width='100%'><tr align='left'><td><strong>
|
|
Language Identification</strong></td></tr><tr><td>
|
|
<table border='0' bgcolor='#f5dcb3'>
|
|
<tr valign='top'><td align='right'><a name='NT-LanguageID'></a>[33] </td><td align='right'><font><code>LanguageID</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code><a href='#NT-Langcode'>Langcode</A>
|
|
('-' <a href='#NT-Subcode'>Subcode</A>)*</code></font></td></tr>
|
|
<tr valign='top'><td align='right'><a name='NT-Langcode'></a>[34] </td><td align='right'><font><code>Langcode</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code><a href='#NT-ISO639Code'>ISO639Code</A> |
|
|
<a href='#NT-IanaCode'>IanaCode</A> |
|
|
<a href='#NT-UserCode'>UserCode</A></code></font></td>
|
|
</tr>
|
|
<tr valign='top'><td align='right'><a name='NT-ISO639Code'></a>[35] </td><td align='right'><font><code>ISO639Code</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code>([a-z] | [A-Z]) ([a-z] | [A-Z])</code></font></td></tr>
|
|
<tr valign='top'><td align='right'><a name='NT-IanaCode'></a>[36] </td><td align='right'><font><code>IanaCode</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code>('i' | 'I') '-' ([a-z] | [A-Z])+</code></font></td></tr>
|
|
<tr valign='top'><td align='right'><a name='NT-UserCode'></a>[37] </td><td align='right'><font><code>UserCode</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code>('x' | 'X') '-' ([a-z] | [A-Z])+</code></font></td></tr>
|
|
<tr valign='top'><td align='right'><a name='NT-Subcode'></a>[38] </td><td align='right'><font><code>Subcode</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code>([a-z] | [A-Z])+</code></font></td></tr>
|
|
</table>
|
|
</td></tr></table>
|
|
<p>
|
|
The <code><a href='#NT-Langcode'>Langcode</A></code> may be any of the following:
|
|
<UL>
|
|
<LI>a two-letter language code as defined by
|
|
<A href='#ISO639'>[ISO 639]</A>, "Codes
|
|
for the representation of names of languages"</LI>
|
|
<LI>a language identifier registered with the Internet
|
|
Assigned Numbers Authority <A href='#IANA'>[IANA]</A>; these begin with the
|
|
prefix "<CODE>i-</CODE>" (or "<CODE>I-</CODE>")</LI>
|
|
<LI>a language identifier assigned by the user, or agreed on
|
|
between parties in private use; these must begin with the
|
|
prefix "<CODE>x-</CODE>" or "<CODE>X-</CODE>" in order to ensure that they do not conflict
|
|
with names later standardized or registered with IANA</LI>
|
|
</UL>
|
|
|
|
<P>There may be any number of <code><a href='#NT-Subcode'>Subcode</A></code> segments; if
|
|
the first
|
|
subcode segment exists and the Subcode consists of two
|
|
letters, then it must be a country code from
|
|
<A href='#ISO3166'>[ISO 3166]</A>, "Codes
|
|
for the representation of names of countries."
|
|
If the first
|
|
subcode consists of more than two letters, it must be
|
|
a subcode for the language in question registered with IANA,
|
|
unless the <code><a href='#NT-Langcode'>Langcode</A></code> begins with the prefix
|
|
"<CODE>x-</CODE>" or
|
|
"<CODE>X-</CODE>". </P>
|
|
<P>It is customary to give the language code in lower case, and
|
|
the country code (if any) in upper case.
|
|
Note that these values, unlike other names in XML documents,
|
|
are case insensitive.</P>
|
|
<P>For example:
|
|
</p><table cellpadding='5' border='1' bgcolor='#80ffff' width='100%'><tr><td><code><font><p xml:lang="en">The quick brown fox jumps over the lazy dog.</p><BR><p xml:lang="en-GB">What colour is it?</p><BR><p xml:lang="en-US">What color is it?</p><BR><sp who="Faust" desc='leise' xml:lang="de"><BR> <l>Habe nun, ach! Philosophie,</l><BR> <l>Juristerei, und Medizin</l><BR> <l>und leider auch Theologie</l><BR> <l>durchaus studiert mit heißem Bemüh'n.</l><BR> </sp></font></code></td></tr></table><p></P>
|
|
|
|
<P>The intent declared with <code>xml:lang</code> is considered to apply to
|
|
all attributes and content of the element where it is specified,
|
|
unless overridden with an instance of <code>xml:lang</code>
|
|
on another element within that content.</P>
|
|
|
|
<P>A simple declaration for <code>xml:lang</code> might take
|
|
the form
|
|
</p><table cellpadding='5' border='1' bgcolor='#80ffff' width='100%'><tr><td><code><font>xml:lang NMTOKEN #IMPLIED</font></code></td></tr></table><p>
|
|
but specific default values may also be given, if appropriate. In a
|
|
collection of French poems for English students, with glosses and
|
|
notes in English, the xml:lang attribute might be declared this way:
|
|
</p><table cellpadding='5' border='1' bgcolor='#80ffff' width='100%'><tr><td><code><font> <!ATTLIST poem xml:lang NMTOKEN 'fr'><BR> <!ATTLIST gloss xml:lang NMTOKEN 'en'><BR> <!ATTLIST note xml:lang NMTOKEN 'en'></font></code></td></tr></table><p>
|
|
</P>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<H2><A NAME='sec-logical-struct'>3. Logical Structures</a></h2>
|
|
|
|
<P><a name='dt-element'></a>Each <A href='#dt-xml-doc'>XML document</A> contains one or more
|
|
<b>elements</b>, the boundaries of which are
|
|
either delimited by <A href='#dt-stag'>start-tags</A>
|
|
and <A href='#dt-etag'>end-tags</A>, or, for <A href='#dt-empty'>empty</A> elements, by an <A href='#dt-eetag'>empty-element tag</A>. Each element has a type,
|
|
identified by name, sometimes called its "generic
|
|
identifier" (GI), and may have a set of
|
|
attribute specifications. Each attribute specification
|
|
has a <A href='#dt-attrname'>name</A> and a <A href='#dt-attrval'>value</A>.
|
|
</P>
|
|
|
|
<table cellpadding='5' border='1' bgcolor='#f5dcb3' width='100%'><tr align='left'><td><strong>Element</strong></td></tr><tr><td>
|
|
<table border='0' bgcolor='#f5dcb3'>
|
|
<tr valign='top'><td align='right'><a name='NT-element'></a>[39] </td><td align='right'><font><code>element</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code><a href='#NT-EmptyElemTag'>EmptyElemTag</A></code></font></td>
|
|
</tr><tr valign='top'><td align='right'></td><td></td><td></td><td align='left'><font><code>| <a href='#NT-STag'>STag</A> <a href='#NT-content'>content</A>
|
|
<a href='#NT-ETag'>ETag</A></code></font></td>
|
|
<td align='center'><font><code> [ </code></font></td><td align='left'><font><code>WFC: <a href='#GIMatch'>Element Type Match</A> ]</code></font></td>
|
|
</tr><tr valign='top'><td></td><td></td><td></td><td></td><td align='center'><font><code> [ </code></font></td><td align='left'><font><code>VC: <a href='#elementvalid'>Element Valid</A> ]</code></font></td>
|
|
</tr>
|
|
</table>
|
|
</td></tr></table>
|
|
|
|
<P>This specification does not constrain the semantics, use, or (beyond
|
|
syntax) names of the element types and attributes, except that names
|
|
beginning with a match to <CODE>(('X'|'x')('M'|'m')('L'|'l'))</CODE>
|
|
are reserved for standardization in this or future versions of this
|
|
specification.
|
|
</P>
|
|
<A NAME='GIMatch'></A><P><b>Well-Formedness Constraint:
|
|
Element Type Match</b><br>
|
|
|
|
The <code><a href='#NT-Name'>Name</A></code> in an element's end-tag must match
|
|
the element type in
|
|
the start-tag.
|
|
|
|
</P>
|
|
<A NAME='elementvalid'></A><P><b>Validity Constraint:
|
|
Element Valid</b><br>
|
|
An element is
|
|
valid if
|
|
there is a declaration matching
|
|
<code><a href='#NT-elementdecl'>elementdecl</A></code> where the
|
|
<code><a href='#NT-Name'>Name</A></code> matches the element type, and
|
|
one of the following holds:
|
|
<OL>
|
|
<LI>The declaration matches <code>EMPTY</code> and the element has no
|
|
<A href='#dt-content'>content</A>.</LI>
|
|
<LI>The declaration matches <code><a href='#NT-children'>children</A></code> and
|
|
the sequence of
|
|
<A href='#dt-parentchild'>child elements</A>
|
|
belongs to the language generated by the regular expression in
|
|
the content model, with optional white space (characters
|
|
matching the nonterminal <code><a href='#NT-S'>S</A></code>) between each pair
|
|
of child elements.</LI>
|
|
<LI>The declaration matches <code><a href='#NT-Mixed'>Mixed</A></code> and
|
|
the content consists of <A href='#dt-chardata'>character
|
|
data</A> and <A href='#dt-parentchild'>child elements</A>
|
|
whose types match names in the content model.</LI>
|
|
<LI>The declaration matches <code>ANY</code>, and the types
|
|
of any <A href='#dt-parentchild'>child elements</A> have
|
|
been declared.</LI>
|
|
</OL>
|
|
|
|
|
|
|
|
<H3><A NAME='sec-starttags'>3.1 Start-Tags, End-Tags, and Empty-Element Tags</a></h3>
|
|
|
|
<P><a name='dt-stag'></a>The beginning of every
|
|
non-empty XML element is marked by a <b>start-tag</b>.
|
|
</p>
|
|
<table cellpadding='5' border='1' bgcolor='#f5dcb3' width='100%'><tr align='left'><td><strong>
|
|
Start-tag</strong></td></tr><tr><td>
|
|
<table border='0' bgcolor='#f5dcb3'>
|
|
|
|
<tr valign='top'><td align='right'><a name='NT-STag'></a>[40] </td><td align='right'><font><code>STag</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code>'<' <a href='#NT-Name'>Name</A>
|
|
(<a href='#NT-S'>S</A> <a href='#NT-Attribute'>Attribute</A>)*
|
|
<a href='#NT-S'>S</A>? '>'</code></font></td>
|
|
<td align='center'><font><code> [ </code></font></td><td align='left'><font><code>WFC: <a href='#uniqattspec'>Unique Att Spec</A> ]</code></font></td>
|
|
</tr>
|
|
<tr valign='top'><td align='right'><a name='NT-Attribute'></a>[41] </td><td align='right'><font><code>Attribute</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code><a href='#NT-Name'>Name</A> <a href='#NT-Eq'>Eq</A>
|
|
<a href='#NT-AttValue'>AttValue</A></code></font></td>
|
|
<td align='center'><font><code> [ </code></font></td><td align='left'><font><code>VC: <a href='#ValueType'>Attribute Value Type</A> ]</code></font></td>
|
|
</tr><tr valign='top'><td></td><td></td><td></td><td></td><td align='center'><font><code> [ </code></font></td><td align='left'><font><code>WFC: <a href='#NoExternalRefs'>No External Entity References</A> ]</code></font></td>
|
|
</tr><tr valign='top'><td></td><td></td><td></td><td></td><td align='center'><font><code> [ </code></font></td><td align='left'><font><code>WFC: <a href='#CleanAttrVals'>No < in Attribute Values</A> ]</code></font></td></tr>
|
|
|
|
</table>
|
|
</td></tr></table>
|
|
<p>
|
|
The <code><a href='#NT-Name'>Name</A></code> in
|
|
the start- and end-tags gives the
|
|
element's <b>type</b>.
|
|
<a name='dt-attr'></a>
|
|
The <code><a href='#NT-Name'>Name</A></code>-<code><a href='#NT-AttValue'>AttValue</A></code> pairs are
|
|
referred to as
|
|
the <b>attribute specifications</b> of the element,
|
|
<a name='dt-attrname'></a>with the
|
|
<code><a href='#NT-Name'>Name</A></code> in each pair
|
|
referred to as the <b>attribute name</b> and
|
|
<a name='dt-attrval'></a>the content of the
|
|
<code><a href='#NT-AttValue'>AttValue</A></code> (the text between the
|
|
<CODE>'</CODE> or <CODE>"</CODE> delimiters)
|
|
as the <b>attribute value</b>.
|
|
</P>
|
|
<A NAME='uniqattspec'></A><P><b>Well-Formedness Constraint:
|
|
Unique Att Spec</b><br>
|
|
|
|
No attribute name may appear more than once in the same start-tag
|
|
or empty-element tag.
|
|
|
|
</P>
|
|
<A NAME='ValueType'></A><P><b>Validity Constraint:
|
|
Attribute Value Type</b><br>
|
|
|
|
The attribute must have been declared; the value must be of the type
|
|
declared for it.
|
|
(For attribute types, see "<A href='#attdecls'>3.3 Attribute-List Declarations</A>".)
|
|
|
|
</P>
|
|
<A NAME='NoExternalRefs'></A><P><b>Well-Formedness Constraint:
|
|
No External Entity References</b><br>
|
|
|
|
Attribute values cannot contain direct or indirect entity references
|
|
to external entities.
|
|
|
|
</P>
|
|
<A NAME='CleanAttrVals'></A><P><b>Well-Formedness Constraint:
|
|
No <CODE><</CODE> in Attribute Values</b><br>
|
|
The <A href='#dt-repltext'>replacement text</A> of any entity
|
|
referred to directly or indirectly in an attribute
|
|
value (other than "<CODE>&lt;</CODE>") must not contain
|
|
a <CODE><</CODE>.
|
|
</P>
|
|
<P>An example of a start-tag:
|
|
</p><table cellpadding='5' border='1' bgcolor='#80ffff' width='100%'><tr><td><code><font><termdef id="dt-dog" term="dog"></font></code></td></tr></table><p></P>
|
|
<P><a name='dt-etag'></a>The end of every element
|
|
that begins with a start-tag must
|
|
be marked by an <b>end-tag</b>
|
|
containing a name that echoes the element's type as given in the
|
|
start-tag:
|
|
</p>
|
|
<table cellpadding='5' border='1' bgcolor='#f5dcb3' width='100%'><tr align='left'><td><strong>
|
|
End-tag</strong></td></tr><tr><td>
|
|
<table border='0' bgcolor='#f5dcb3'>
|
|
|
|
<tr valign='top'><td align='right'><a name='NT-ETag'></a>[42] </td><td align='right'><font><code>ETag</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code>'</' <a href='#NT-Name'>Name</A>
|
|
<a href='#NT-S'>S</A>? '>'</code></font></td></tr>
|
|
|
|
</table>
|
|
</td></tr></table>
|
|
<p>
|
|
</P>
|
|
<P>An example of an end-tag:</p><table cellpadding='5' border='1' bgcolor='#80ffff' width='100%'><tr><td><code><font></termdef></font></code></td></tr></table><p></P>
|
|
<P><a name='dt-content'></a>The
|
|
<A href='#dt-text'>text</A> between the start-tag and
|
|
end-tag is called the element's
|
|
<b>content</b>:
|
|
</p>
|
|
<table cellpadding='5' border='1' bgcolor='#f5dcb3' width='100%'><tr align='left'><td><strong>
|
|
Content of Elements</strong></td></tr><tr><td>
|
|
<table border='0' bgcolor='#f5dcb3'>
|
|
|
|
<tr valign='top'><td align='right'><a name='NT-content'></a>[43] </td><td align='right'><font><code>content</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code>(<a href='#NT-element'>element</A> | <a href='#NT-CharData'>CharData</A>
|
|
| <a href='#NT-Reference'>Reference</A> | <a href='#NT-CDSect'>CDSect</A>
|
|
| <a href='#NT-PI'>PI</A> | <a href='#NT-Comment'>Comment</A>)*</code></font></td>
|
|
</tr>
|
|
|
|
</table>
|
|
</td></tr></table>
|
|
<p>
|
|
</P>
|
|
<P><a name='dt-empty'></a>If an element is <b>empty</b>,
|
|
it must be represented either by a start-tag immediately followed
|
|
by an end-tag or by an empty-element tag.
|
|
<a name='dt-eetag'></a>An
|
|
<b>empty-element tag</b> takes a special form:
|
|
</p>
|
|
<table cellpadding='5' border='1' bgcolor='#f5dcb3' width='100%'><tr align='left'><td><strong>
|
|
Tags for Empty Elements</strong></td></tr><tr><td>
|
|
<table border='0' bgcolor='#f5dcb3'>
|
|
|
|
<tr valign='top'><td align='right'><a name='NT-EmptyElemTag'></a>[44] </td><td align='right'><font><code>EmptyElemTag</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code>'<' <a href='#NT-Name'>Name</A> (<a href='#NT-S'>S</A>
|
|
<a href='#NT-Attribute'>Attribute</A>)* <a href='#NT-S'>S</A>?
|
|
'/>'</code></font></td>
|
|
<td align='center'><font><code> [ </code></font></td><td align='left'><font><code>WFC: <a href='#uniqattspec'>Unique Att Spec</A> ]</code></font></td>
|
|
</tr>
|
|
|
|
</table>
|
|
</td></tr></table>
|
|
<p>
|
|
</P>
|
|
<P>Empty-element tags may be used for any element which has no
|
|
content, whether or not it is declared using the keyword
|
|
<code>EMPTY</code>.
|
|
<A href='#dt-interop'>For interoperability</A>, the empty-element
|
|
tag must be used, and can only be used, for elements which are
|
|
<A href='#dt-eldecl'>declared</A> <code>EMPTY</code>.</P>
|
|
<P>Examples of empty elements:
|
|
</p><table cellpadding='5' border='1' bgcolor='#80ffff' width='100%'><tr><td><code><font><IMG align="left"<BR> src="http://www.w3.org/Icons/WWW/w3c_home" /><BR><br></br><BR><br/></font></code></td></tr></table><p></P>
|
|
|
|
|
|
|
|
|
|
<H3><A NAME='elemdecls'>3.2 Element Type Declarations</a></h3>
|
|
|
|
<P>The <A href='#dt-element'>element</A> structure of an
|
|
<A href='#dt-xml-doc'>XML document</A> may, for
|
|
<A href='#dt-valid'>validation</A> purposes,
|
|
be constrained
|
|
using element type and attribute-list declarations.
|
|
An element type declaration constrains the element's
|
|
<A href='#dt-content'>content</A>.
|
|
</P>
|
|
|
|
<P>Element type declarations often constrain which element types can
|
|
appear as <A href='#dt-parentchild'>children</A> of the element.
|
|
At user option, an XML processor may issue a warning
|
|
when a declaration mentions an element type for which no declaration
|
|
is provided, but this is not an error.</P>
|
|
<P><a name='dt-eldecl'></a>An <b>element
|
|
type declaration</b> takes the form:
|
|
</p>
|
|
<table cellpadding='5' border='1' bgcolor='#f5dcb3' width='100%'><tr align='left'><td><strong>
|
|
Element Type Declaration</strong></td></tr><tr><td>
|
|
<table border='0' bgcolor='#f5dcb3'>
|
|
|
|
<tr valign='top'><td align='right'><a name='NT-elementdecl'></a>[45] </td><td align='right'><font><code>elementdecl</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code>'<!ELEMENT' <a href='#NT-S'>S</A>
|
|
<a href='#NT-Name'>Name</A>
|
|
<a href='#NT-S'>S</A>
|
|
<a href='#NT-contentspec'>contentspec</A>
|
|
<a href='#NT-S'>S</A>? '>'</code></font></td>
|
|
<td align='center'><font><code> [ </code></font></td><td align='left'><font><code>VC: <a href='#EDUnique'>Unique Element Type Declaration</A> ]</code></font></td></tr>
|
|
<tr valign='top'><td align='right'><a name='NT-contentspec'></a>[46] </td><td align='right'><font><code>contentspec</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code>'EMPTY'
|
|
| 'ANY'
|
|
| <a href='#NT-Mixed'>Mixed</A>
|
|
| <a href='#NT-children'>children</A>
|
|
</code></font></td>
|
|
</tr>
|
|
|
|
</table>
|
|
</td></tr></table>
|
|
<p>
|
|
where the <code><a href='#NT-Name'>Name</A></code> gives the element type
|
|
being declared.
|
|
</P>
|
|
|
|
<A NAME='EDUnique'></A><P><b>Validity Constraint:
|
|
Unique Element Type Declaration</b><br>
|
|
|
|
No element type may be declared more than once.
|
|
|
|
</P>
|
|
|
|
<P>Examples of element type declarations:
|
|
</p><table cellpadding='5' border='1' bgcolor='#80ffff' width='100%'><tr><td><code><font><!ELEMENT br EMPTY><BR><!ELEMENT p (#PCDATA|emph)* ><BR><!ELEMENT %name.para; %content.para; ><BR><!ELEMENT container ANY></font></code></td></tr></table><p></P>
|
|
|
|
|
|
|
|
<H4><A NAME='sec-element-content'>3.2.1 Element Content</a></h4>
|
|
|
|
<P><a name='dt-elemcontent'></a>An element <A href='#dt-stag'>type</A> has
|
|
<b>element content</b> when elements of that
|
|
type must contain only <A href='#dt-parentchild'>child</A>
|
|
elements (no character data), optionally separated by
|
|
white space (characters matching the nonterminal
|
|
<code><a href='#NT-S'>S</A></code>).
|
|
|
|
In this case, the
|
|
constraint includes a content model, a simple grammar governing
|
|
the allowed types of the child
|
|
elements and the order in which they are allowed to appear.
|
|
The grammar is built on
|
|
content particles (<code><a href='#NT-cp'>cp</A></code>s), which consist of names,
|
|
choice lists of content particles, or
|
|
sequence lists of content particles:
|
|
</p>
|
|
<table cellpadding='5' border='1' bgcolor='#f5dcb3' width='100%'><tr align='left'><td><strong>
|
|
Element-content Models</strong></td></tr><tr><td>
|
|
<table border='0' bgcolor='#f5dcb3'>
|
|
|
|
<tr valign='top'><td align='right'><a name='NT-children'></a>[47] </td><td align='right'><font><code>children</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code>(<a href='#NT-choice'>choice</A>
|
|
| <a href='#NT-seq'>seq</A>)
|
|
('?' | '*' | '+')?</code></font></td></tr>
|
|
<tr valign='top'><td align='right'><a name='NT-cp'></a>[48] </td><td align='right'><font><code>cp</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code>(<a href='#NT-Name'>Name</A>
|
|
| <a href='#NT-choice'>choice</A>
|
|
| <a href='#NT-seq'>seq</A>)
|
|
('?' | '*' | '+')?</code></font></td></tr>
|
|
<tr valign='top'><td align='right'><a name='NT-choice'></a>[49] </td><td align='right'><font><code>choice</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code>'(' <a href='#NT-S'>S</A>? cp
|
|
( <a href='#NT-S'>S</A>? '|' <a href='#NT-S'>S</A>? <a href='#NT-cp'>cp</A> )*
|
|
<a href='#NT-S'>S</A>? ')'</code></font></td>
|
|
<td align='center'><font><code> [ </code></font></td><td align='left'><font><code>VC: <a href='#vc-PEinGroup'>Proper Group/PE Nesting</A> ]</code></font></td></tr>
|
|
<tr valign='top'><td align='right'><a name='NT-seq'></a>[50] </td><td align='right'><font><code>seq</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code>'(' <a href='#NT-S'>S</A>? cp
|
|
( <a href='#NT-S'>S</A>? ',' <a href='#NT-S'>S</A>? <a href='#NT-cp'>cp</A> )*
|
|
<a href='#NT-S'>S</A>? ')'</code></font></td>
|
|
<td align='center'><font><code> [ </code></font></td><td align='left'><font><code>VC: <a href='#vc-PEinGroup'>Proper Group/PE Nesting</A> ]</code></font></td></tr>
|
|
|
|
|
|
</table>
|
|
</td></tr></table>
|
|
<p>
|
|
where each <code><a href='#NT-Name'>Name</A></code> is the type of an element which may
|
|
appear as a <A href='#dt-parentchild'>child</A>.
|
|
Any content
|
|
particle in a choice list may appear in the <A href='#dt-elemcontent'>element content</A> at the location where
|
|
the choice list appears in the grammar;
|
|
content particles occurring in a sequence list must each
|
|
appear in the <A href='#dt-elemcontent'>element content</A> in the
|
|
order given in the list.
|
|
The optional character following a name or list governs
|
|
whether the element or the content particles in the list may occur one
|
|
or more (<CODE>+</CODE>), zero or more (<CODE>*</CODE>), or zero or
|
|
one times (<CODE>?</CODE>).
|
|
The absence of such an operator means that the element or content particle
|
|
must appear exactly once.
|
|
This syntax
|
|
and meaning are identical to those used in the productions in this
|
|
specification.</P>
|
|
<P>
|
|
The content of an element matches a content model if and only if it is
|
|
possible to trace out a path through the content model, obeying the
|
|
sequence, choice, and repetition operators and matching each element in
|
|
the content against an element type in the content model. <A href='#dt-compat'>For compatibility</A>, it is an error
|
|
if an element in the document can
|
|
match more than one occurrence of an element type in the content model.
|
|
For more information, see "<A href='#determinism'>E. Deterministic Content Models</A>".
|
|
|
|
|
|
</P>
|
|
<A NAME='vc-PEinGroup'></A><P><b>Validity Constraint:
|
|
Proper Group/PE Nesting</b><br>
|
|
Parameter-entity
|
|
<A href='#dt-repltext'>replacement text</A> must be properly nested
|
|
with parenthetized groups.
|
|
That is to say, if either of the opening or closing parentheses
|
|
in a <code><a href='#NT-choice'>choice</A></code>, <code><a href='#NT-seq'>seq</A></code>, or
|
|
<code><a href='#NT-Mixed'>Mixed</A></code> construct
|
|
is contained in the replacement text for a
|
|
<A href='#dt-PERef'>parameter entity</A>,
|
|
both must be contained in the same replacement text.
|
|
<A href='#dt-interop'>For interoperability</A>,
|
|
if a parameter-entity reference appears in a
|
|
<code><a href='#NT-choice'>choice</A></code>, <code><a href='#NT-seq'>seq</A></code>, or
|
|
<code><a href='#NT-Mixed'>Mixed</A></code> construct, its replacement text
|
|
should not be empty, and
|
|
neither the first nor last non-blank
|
|
character of the replacement text should be a connector
|
|
(<CODE>|</CODE> or <CODE>,</CODE>).
|
|
|
|
</P>
|
|
<P>Examples of element-content models:
|
|
</p><table cellpadding='5' border='1' bgcolor='#80ffff' width='100%'><tr><td><code><font><!ELEMENT spec (front, body, back?)><BR><!ELEMENT div1 (head, (p | list | note)*, div2*)><BR><!ELEMENT dictionary-body (%div.mix; | %dict.mix;)*></font></code></td></tr></table><p></P>
|
|
|
|
|
|
|
|
|
|
<H4><A NAME='sec-mixed-content'>3.2.2 Mixed Content</a></h4>
|
|
|
|
<P><a name='dt-mixed'></a>An element
|
|
<A href='#dt-stag'>type</A> has
|
|
<b>mixed content</b> when elements of that type may contain
|
|
character data, optionally interspersed with
|
|
<A href='#dt-parentchild'>child</A> elements.
|
|
In this case, the types of the child elements
|
|
may be constrained, but not their order or their number of occurrences:
|
|
</p>
|
|
<table cellpadding='5' border='1' bgcolor='#f5dcb3' width='100%'><tr align='left'><td><strong>
|
|
Mixed-content Declaration</strong></td></tr><tr><td>
|
|
<table border='0' bgcolor='#f5dcb3'>
|
|
|
|
<tr valign='top'><td align='right'><a name='NT-Mixed'></a>[51] </td><td align='right'><font><code>Mixed</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code>'(' <a href='#NT-S'>S</A>?
|
|
'#PCDATA'
|
|
(<a href='#NT-S'>S</A>?
|
|
'|'
|
|
<a href='#NT-S'>S</A>?
|
|
<a href='#NT-Name'>Name</A>)*
|
|
<a href='#NT-S'>S</A>?
|
|
')*' </code></font></td>
|
|
</tr><tr valign='top'><td align='right'></td><td></td><td></td><td align='left'><font><code>| '(' <a href='#NT-S'>S</A>? '#PCDATA' <a href='#NT-S'>S</A>? ')'
|
|
</code></font></td><td align='center'><font><code> [ </code></font></td><td align='left'><font><code>VC: <a href='#vc-PEinGroup'>Proper Group/PE Nesting</A> ]</code></font></td>
|
|
</tr><tr valign='top'><td></td><td></td><td></td><td></td><td align='center'><font><code> [ </code></font></td><td align='left'><font><code>VC: <a href='#vc-MixedChildrenUnique'>No Duplicate Types</A> ]</code></font></td>
|
|
</tr>
|
|
|
|
|
|
</table>
|
|
</td></tr></table>
|
|
<p>
|
|
where the <code><a href='#NT-Name'>Name</A></code>s give the types of elements
|
|
that may appear as children.
|
|
</P>
|
|
<A NAME='vc-MixedChildrenUnique'></A><P><b>Validity Constraint:
|
|
No Duplicate Types</b><br>
|
|
The same name must not appear more than once in a single mixed-content
|
|
declaration.
|
|
</P>
|
|
<P>Examples of mixed content declarations:
|
|
</p><table cellpadding='5' border='1' bgcolor='#80ffff' width='100%'><tr><td><code><font><!ELEMENT p (#PCDATA|a|ul|b|i|em)*><BR><!ELEMENT p (#PCDATA | %font; | %phrase; | %special; | %form;)* ><BR><!ELEMENT b (#PCDATA)></font></code></td></tr></table><p></P>
|
|
|
|
|
|
|
|
|
|
|
|
<H3><A NAME='attdecls'>3.3 Attribute-List Declarations</a></h3>
|
|
|
|
<P><A href='#dt-attr'>Attributes</A> are used to associate
|
|
name-value pairs with <A href='#dt-element'>elements</A>.
|
|
Attribute specifications may appear only within <A href='#dt-stag'>start-tags</A>
|
|
and <A href='#dt-eetag'>empty-element tags</A>;
|
|
thus, the productions used to
|
|
recognize them appear in "<A href='#sec-starttags'>3.1 Start-Tags, End-Tags, and Empty-Element Tags</A>".
|
|
Attribute-list
|
|
declarations may be used:
|
|
<UL>
|
|
<LI>To define the set of attributes pertaining to a given
|
|
element type.</LI>
|
|
<LI>To establish type constraints for these
|
|
attributes.</LI>
|
|
<LI>To provide <A href='#dt-default'>default values</A>
|
|
for attributes.</LI>
|
|
</UL>
|
|
|
|
|
|
<P><a name='dt-attdecl'></a>
|
|
<b>Attribute-list declarations</b> specify the name, data type, and default
|
|
value (if any) of each attribute associated with a given element type:
|
|
</p>
|
|
<table cellpadding='5' border='1' bgcolor='#f5dcb3' width='100%'><tr align='left'><td><strong>
|
|
Attribute-list Declaration</strong></td></tr><tr><td>
|
|
<table border='0' bgcolor='#f5dcb3'>
|
|
<tr valign='top'><td align='right'><a name='NT-AttlistDecl'></a>[52] </td><td align='right'><font><code>AttlistDecl</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code>'<!ATTLIST' <a href='#NT-S'>S</A>
|
|
<a href='#NT-Name'>Name</A>
|
|
<a href='#NT-AttDef'>AttDef</A>*
|
|
<a href='#NT-S'>S</A>? '>'</code></font></td>
|
|
</tr>
|
|
<tr valign='top'><td align='right'><a name='NT-AttDef'></a>[53] </td><td align='right'><font><code>AttDef</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code><a href='#NT-S'>S</A> <a href='#NT-Name'>Name</A>
|
|
<a href='#NT-S'>S</A> <a href='#NT-AttType'>AttType</A>
|
|
<a href='#NT-S'>S</A> <a href='#NT-DefaultDecl'>DefaultDecl</A></code></font></td>
|
|
</tr>
|
|
</table>
|
|
</td></tr></table>
|
|
<p>
|
|
The <code><a href='#NT-Name'>Name</A></code> in the
|
|
<code><a href='#NT-AttlistDecl'>AttlistDecl</A></code> rule is the type of an element. At
|
|
user option, an XML processor may issue a warning if attributes are
|
|
declared for an element type not itself declared, but this is not an
|
|
error. The <code><a href='#NT-Name'>Name</A></code> in the
|
|
<code><a href='#NT-AttDef'>AttDef</A></code> rule is
|
|
the name of the attribute.</P>
|
|
<P>
|
|
When more than one <code><a href='#NT-AttlistDecl'>AttlistDecl</A></code> is provided for a
|
|
given element type, the contents of all those provided are merged. When
|
|
more than one definition is provided for the same attribute of a
|
|
given element type, the first declaration is binding and later
|
|
declarations are ignored.
|
|
<A href='#dt-interop'>For interoperability,</A> writers of DTDs
|
|
may choose to provide at most one attribute-list declaration
|
|
for a given element type, at most one attribute definition
|
|
for a given attribute name, and at least one attribute definition
|
|
in each attribute-list declaration.
|
|
For interoperability, an XML processor may at user option
|
|
issue a warning when more than one attribute-list declaration is
|
|
provided for a given element type, or more than one attribute definition
|
|
is provided
|
|
for a given attribute, but this is not an error.
|
|
</P>
|
|
|
|
|
|
|
|
<H4><A NAME='sec-attribute-types'>3.3.1 Attribute Types</a></h4>
|
|
|
|
<P>XML attribute types are of three kinds: a string type, a
|
|
set of tokenized types, and enumerated types. The string type may take
|
|
any literal string as a value; the tokenized types have varying lexical
|
|
and semantic constraints, as noted:
|
|
</p>
|
|
<table cellpadding='5' border='1' bgcolor='#f5dcb3' width='100%'><tr align='left'><td><strong>
|
|
Attribute Types</strong></td></tr><tr><td>
|
|
<table border='0' bgcolor='#f5dcb3'>
|
|
|
|
<tr valign='top'><td align='right'><a name='NT-AttType'></a>[54] </td><td align='right'><font><code>AttType</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code><a href='#NT-StringType'>StringType</A>
|
|
| <a href='#NT-TokenizedType'>TokenizedType</A>
|
|
| <a href='#NT-EnumeratedType'>EnumeratedType</A>
|
|
</code></font></td>
|
|
</tr>
|
|
<tr valign='top'><td align='right'><a name='NT-StringType'></a>[55] </td><td align='right'><font><code>StringType</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code>'CDATA'</code></font></td>
|
|
</tr>
|
|
<tr valign='top'><td align='right'><a name='NT-TokenizedType'></a>[56] </td><td align='right'><font><code>TokenizedType</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code>'ID'</code></font></td>
|
|
<td align='center'><font><code> [ </code></font></td><td align='left'><font><code>VC: <a href='#id'>ID</A> ]</code></font></td>
|
|
</tr><tr valign='top'><td></td><td></td><td></td><td></td><td align='center'><font><code> [ </code></font></td><td align='left'><font><code>VC: <a href='#one-id-per-el'>One ID per Element Type</A> ]</code></font></td>
|
|
</tr><tr valign='top'><td></td><td></td><td></td><td></td><td align='center'><font><code> [ </code></font></td><td align='left'><font><code>VC: <a href='#id-default'>ID Attribute Default</A> ]</code></font></td>
|
|
</tr><tr valign='top'><td align='right'></td><td></td><td></td><td align='left'><font><code>| 'IDREF'</code></font></td>
|
|
<td align='center'><font><code> [ </code></font></td><td align='left'><font><code>VC: <a href='#idref'>IDREF</A> ]</code></font></td>
|
|
</tr><tr valign='top'><td align='right'></td><td></td><td></td><td align='left'><font><code>| 'IDREFS'</code></font></td>
|
|
<td align='center'><font><code> [ </code></font></td><td align='left'><font><code>VC: <a href='#idref'>IDREF</A> ]</code></font></td>
|
|
</tr><tr valign='top'><td align='right'></td><td></td><td></td><td align='left'><font><code>| 'ENTITY'</code></font></td>
|
|
<td align='center'><font><code> [ </code></font></td><td align='left'><font><code>VC: <a href='#entname'>Entity Name</A> ]</code></font></td>
|
|
</tr><tr valign='top'><td align='right'></td><td></td><td></td><td align='left'><font><code>| 'ENTITIES'</code></font></td>
|
|
<td align='center'><font><code> [ </code></font></td><td align='left'><font><code>VC: <a href='#entname'>Entity Name</A> ]</code></font></td>
|
|
</tr><tr valign='top'><td align='right'></td><td></td><td></td><td align='left'><font><code>| 'NMTOKEN'</code></font></td>
|
|
<td align='center'><font><code> [ </code></font></td><td align='left'><font><code>VC: <a href='#nmtok'>Name Token</A> ]</code></font></td>
|
|
</tr><tr valign='top'><td align='right'></td><td></td><td></td><td align='left'><font><code>| 'NMTOKENS'</code></font></td>
|
|
<td align='center'><font><code> [ </code></font></td><td align='left'><font><code>VC: <a href='#nmtok'>Name Token</A> ]</code></font></td></tr>
|
|
|
|
</table>
|
|
</td></tr></table>
|
|
<p>
|
|
</P>
|
|
<A NAME='id'></A><P><b>Validity Constraint:
|
|
ID</b><br>
|
|
|
|
Values of type <code>ID</code> must match the
|
|
<code><a href='#NT-Name'>Name</A></code> production.
|
|
A name must not appear more than once in
|
|
an XML document as a value of this type; i.e., ID values must uniquely
|
|
identify the elements which bear them.
|
|
|
|
</P>
|
|
<A NAME='one-id-per-el'></A><P><b>Validity Constraint:
|
|
One ID per Element Type</b><br>
|
|
No element type may have more than one ID attribute specified.
|
|
</P>
|
|
<A NAME='id-default'></A><P><b>Validity Constraint:
|
|
ID Attribute Default</b><br>
|
|
An ID attribute must have a declared default of <code>#IMPLIED</code> or
|
|
<code>#REQUIRED</code>.
|
|
</P>
|
|
<A NAME='idref'></A><P><b>Validity Constraint:
|
|
IDREF</b><br>
|
|
|
|
Values of type <code>IDREF</code> must match
|
|
the <code><a href='#NT-Name'>Name</A></code> production, and
|
|
values of type <code>IDREFS</code> must match
|
|
<code><a href='#NT-Names'>Names</A></code>;
|
|
each <code><a href='#NT-Name'>Name</A></code> must match the value of an ID attribute on
|
|
some element in the XML document; i.e. <code>IDREF</code> values must
|
|
match the value of some ID attribute.
|
|
|
|
</P>
|
|
<A NAME='entname'></A><P><b>Validity Constraint:
|
|
Entity Name</b><br>
|
|
|
|
Values of type <code>ENTITY</code>
|
|
must match the <code><a href='#NT-Name'>Name</A></code> production,
|
|
values of type <code>ENTITIES</code> must match
|
|
<code><a href='#NT-Names'>Names</A></code>;
|
|
each <code><a href='#NT-Name'>Name</A></code> must
|
|
match the
|
|
name of an <A href='#dt-unparsed'>unparsed entity</A> declared in the
|
|
<A href='#dt-doctype'>DTD</A>.
|
|
|
|
</P>
|
|
<A NAME='nmtok'></A><P><b>Validity Constraint:
|
|
Name Token</b><br>
|
|
|
|
Values of type <code>NMTOKEN</code> must match the
|
|
<code><a href='#NT-Nmtoken'>Nmtoken</A></code> production;
|
|
values of type <code>NMTOKENS</code> must
|
|
match <A href='#NT-Nmtokens'>Nmtokens</A>.
|
|
|
|
</P>
|
|
|
|
<P><a name='dt-enumerated'></a><b>Enumerated attributes</b> can take one
|
|
of a list of values provided in the declaration. There are two
|
|
kinds of enumerated types:
|
|
</p>
|
|
<table cellpadding='5' border='1' bgcolor='#f5dcb3' width='100%'><tr align='left'><td><strong>
|
|
Enumerated Attribute Types</strong></td></tr><tr><td>
|
|
<table border='0' bgcolor='#f5dcb3'>
|
|
<tr valign='top'><td align='right'><a name='NT-EnumeratedType'></a>[57] </td><td align='right'><font><code>EnumeratedType</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code><a href='#NT-NotationType'>NotationType</A>
|
|
| <a href='#NT-Enumeration'>Enumeration</A>
|
|
</code></font></td></tr>
|
|
<tr valign='top'><td align='right'><a name='NT-NotationType'></a>[58] </td><td align='right'><font><code>NotationType</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code>'NOTATION'
|
|
<a href='#NT-S'>S</A>
|
|
'('
|
|
<a href='#NT-S'>S</A>?
|
|
<a href='#NT-Name'>Name</A>
|
|
(<a href='#NT-S'>S</A>? '|' <a href='#NT-S'>S</A>?
|
|
<a href='#NT-Name'>Name</A>)*
|
|
<a href='#NT-S'>S</A>? ')'
|
|
</code></font></td>
|
|
<td align='center'><font><code> [ </code></font></td><td align='left'><font><code>VC: <a href='#notatn'>Notation Attributes</A> ]</code></font></td></tr>
|
|
<tr valign='top'><td align='right'><a name='NT-Enumeration'></a>[59] </td><td align='right'><font><code>Enumeration</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code>'(' <a href='#NT-S'>S</A>?
|
|
<a href='#NT-Nmtoken'>Nmtoken</A>
|
|
(<a href='#NT-S'>S</A>? '|'
|
|
<a href='#NT-S'>S</A>?
|
|
<a href='#NT-Nmtoken'>Nmtoken</A>)*
|
|
<a href='#NT-S'>S</A>?
|
|
')'</code></font></td>
|
|
<td align='center'><font><code> [ </code></font></td><td align='left'><font><code>VC: <a href='#enum'>Enumeration</A> ]</code></font></td></tr>
|
|
</table>
|
|
</td></tr></table>
|
|
<p>
|
|
A <code>NOTATION</code> attribute identifies a
|
|
<A href='#dt-notation'>notation</A>, declared in the
|
|
DTD with associated system and/or public identifiers, to
|
|
be used in interpreting the element to which the attribute
|
|
is attached.
|
|
</P>
|
|
|
|
<A NAME='notatn'></A><P><b>Validity Constraint:
|
|
Notation Attributes</b><br>
|
|
|
|
Values of this type must match
|
|
one of the <A href='#Notations'>notation</A> names included in
|
|
the declaration; all notation names in the declaration must
|
|
be declared.
|
|
|
|
</P>
|
|
<A NAME='enum'></A><P><b>Validity Constraint:
|
|
Enumeration</b><br>
|
|
|
|
Values of this type
|
|
must match one of the <code><a href='#NT-Nmtoken'>Nmtoken</A></code> tokens in the
|
|
declaration.
|
|
|
|
</P>
|
|
<P><A href='#dt-interop'>For interoperability,</A> the same
|
|
<code><a href='#NT-Nmtoken'>Nmtoken</A></code> should not occur more than once in the
|
|
enumerated attribute types of a single element type.
|
|
</P>
|
|
|
|
|
|
|
|
|
|
<H4><A NAME='sec-attr-defaults'>3.3.2 Attribute Defaults</a></h4>
|
|
|
|
<P>An <A href='#dt-attdecl'>attribute declaration</A> provides
|
|
information on whether
|
|
the attribute's presence is required, and if not, how an XML processor should
|
|
react if a declared attribute is absent in a document.
|
|
</p>
|
|
<table cellpadding='5' border='1' bgcolor='#f5dcb3' width='100%'><tr align='left'><td><strong>
|
|
Attribute Defaults</strong></td></tr><tr><td>
|
|
<table border='0' bgcolor='#f5dcb3'>
|
|
|
|
<tr valign='top'><td align='right'><a name='NT-DefaultDecl'></a>[60] </td><td align='right'><font><code>DefaultDecl</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code>'#REQUIRED'
|
|
| '#IMPLIED' </code></font></td>
|
|
</tr><tr valign='top'><td align='right'></td><td></td><td></td><td align='left'><font><code>| (('#FIXED' S)? <a href='#NT-AttValue'>AttValue</A>)</code></font></td>
|
|
<td align='center'><font><code> [ </code></font></td><td align='left'><font><code>VC: <a href='#RequiredAttr'>Required Attribute</A> ]</code></font></td>
|
|
</tr><tr valign='top'><td></td><td></td><td></td><td></td><td align='center'><font><code> [ </code></font></td><td align='left'><font><code>VC: <a href='#defattrvalid'>Attribute Default Legal</A> ]</code></font></td>
|
|
</tr><tr valign='top'><td></td><td></td><td></td><td></td><td align='center'><font><code> [ </code></font></td><td align='left'><font><code>WFC: <a href='#CleanAttrVals'>No < in Attribute Values</A> ]</code></font></td>
|
|
</tr><tr valign='top'><td></td><td></td><td></td><td></td><td align='center'><font><code> [ </code></font></td><td align='left'><font><code>VC: <a href='#FixedAttr'>Fixed Attribute Default</A> ]</code></font></td>
|
|
</tr>
|
|
|
|
</table>
|
|
</td></tr></table>
|
|
<p>
|
|
|
|
</P>
|
|
<P>In an attribute declaration, <code>#REQUIRED</code> means that the
|
|
attribute must always be provided, <code>#IMPLIED</code> that no default
|
|
value is provided.
|
|
|
|
<a name='dt-default'></a>If the
|
|
declaration
|
|
is neither <code>#REQUIRED</code> nor <code>#IMPLIED</code>, then the
|
|
<code><a href='#NT-AttValue'>AttValue</A></code> value contains the declared
|
|
<b>default</b> value; the <code>#FIXED</code> keyword states that
|
|
the attribute must always have the default value.
|
|
If a default value
|
|
is declared, when an XML processor encounters an omitted attribute, it
|
|
is to behave as though the attribute were present with
|
|
the declared default value.</P>
|
|
<A NAME='RequiredAttr'></A><P><b>Validity Constraint:
|
|
Required Attribute</b><br>
|
|
If the default declaration is the keyword <code>#REQUIRED</code>, then
|
|
the attribute must be specified for
|
|
all elements of the type in the attribute-list declaration.
|
|
</P>
|
|
<A NAME='defattrvalid'></A><P><b>Validity Constraint:
|
|
Attribute Default Legal</b><br>
|
|
|
|
The declared
|
|
default value must meet the lexical constraints of the declared attribute type.
|
|
|
|
</P>
|
|
<A NAME='FixedAttr'></A><P><b>Validity Constraint:
|
|
Fixed Attribute Default</b><br>
|
|
If an attribute has a default value declared with the
|
|
<code>#FIXED</code> keyword, instances of that attribute must
|
|
match the default value.
|
|
</P>
|
|
|
|
<P>Examples of attribute-list declarations:
|
|
</p><table cellpadding='5' border='1' bgcolor='#80ffff' width='100%'><tr><td><code><font><!ATTLIST termdef<BR> id ID #REQUIRED<BR> name CDATA #IMPLIED><BR><!ATTLIST list<BR> type (bullets|ordered|glossary) "ordered"><BR><!ATTLIST form<BR> method CDATA #FIXED "POST"></font></code></td></tr></table><p></P>
|
|
|
|
|
|
|
|
<H4><A NAME='AVNormalize'>3.3.3 Attribute-Value Normalization</a></h4>
|
|
<P>Before the value of an attribute is passed to the application
|
|
or checked for validity, the
|
|
XML processor must normalize it as follows:
|
|
<UL>
|
|
<LI>a character reference is processed by appending the referenced
|
|
character to the attribute value</LI>
|
|
<LI>an entity reference is processed by recursively processing the
|
|
replacement text of the entity</LI>
|
|
<LI>a whitespace character (#x20, #xD, #xA, #x9) is processed by
|
|
appending #x20 to the normalized value, except that only a single #x20
|
|
is appended for a "#xD#xA" sequence that is part of an external
|
|
parsed entity or the literal entity value of an internal parsed
|
|
entity</LI>
|
|
<LI>other characters are processed by appending them to the normalized
|
|
value
|
|
</LI></UL>
|
|
|
|
|
|
<P>If the declared value is not CDATA, then the XML processor must
|
|
further process the normalized attribute value by discarding any
|
|
leading and trailing space (#x20) characters, and by replacing
|
|
sequences of space (#x20) characters by a single space (#x20)
|
|
character.</P>
|
|
<P>
|
|
All attributes for which no declaration has been read should be treated
|
|
by a non-validating parser as if declared
|
|
<code>CDATA</code>.
|
|
</P>
|
|
|
|
|
|
|
|
|
|
<H3><A NAME='sec-condition-sect'>3.4 Conditional Sections</a></h3>
|
|
<P><a name='dt-cond-section'></a>
|
|
<b>Conditional sections</b> are portions of the
|
|
<A href='#dt-doctype'>document type declaration external subset</A>
|
|
which are
|
|
included in, or excluded from, the logical structure of the DTD based on
|
|
the keyword which governs them.
|
|
</p>
|
|
<table cellpadding='5' border='1' bgcolor='#f5dcb3' width='100%'><tr align='left'><td><strong>
|
|
Conditional Section</strong></td></tr><tr><td>
|
|
<table border='0' bgcolor='#f5dcb3'>
|
|
|
|
<tr valign='top'><td align='right'><a name='NT-conditionalSect'></a>[61] </td><td align='right'><font><code>conditionalSect</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code><a href='#NT-includeSect'>includeSect</A>
|
|
| <a href='#NT-ignoreSect'>ignoreSect</A>
|
|
</code></font></td>
|
|
</tr>
|
|
<tr valign='top'><td align='right'><a name='NT-includeSect'></a>[62] </td><td align='right'><font><code>includeSect</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code>'<![' S? 'INCLUDE' S? '['
|
|
|
|
<a href='#NT-extSubsetDecl'>extSubsetDecl</A>
|
|
']]>'
|
|
</code></font></td>
|
|
</tr>
|
|
<tr valign='top'><td align='right'><a name='NT-ignoreSect'></a>[63] </td><td align='right'><font><code>ignoreSect</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code>'<![' S? 'IGNORE' S? '['
|
|
<a href='#NT-ignoreSectContents'>ignoreSectContents</A>*
|
|
']]>'</code></font></td>
|
|
</tr>
|
|
|
|
<tr valign='top'><td align='right'><a name='NT-ignoreSectContents'></a>[64] </td><td align='right'><font><code>ignoreSectContents</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code><a href='#NT-Ignore'>Ignore</A>
|
|
('<![' <a href='#NT-ignoreSectContents'>ignoreSectContents</A> ']]>'
|
|
<a href='#NT-Ignore'>Ignore</A>)*</code></font></td></tr>
|
|
<tr valign='top'><td align='right'><a name='NT-Ignore'></a>[65] </td><td align='right'><font><code>Ignore</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code><a href='#NT-Char'>Char</A>* -
|
|
(<a href='#NT-Char'>Char</A>* ('<![' | ']]>')
|
|
<a href='#NT-Char'>Char</A>*)
|
|
</code></font></td></tr>
|
|
|
|
|
|
</table>
|
|
</td></tr></table>
|
|
<p>
|
|
</P>
|
|
<P>Like the internal and external DTD subsets, a conditional section
|
|
may contain one or more complete declarations,
|
|
comments, processing instructions,
|
|
or nested conditional sections, intermingled with white space.
|
|
</P>
|
|
<P>If the keyword of the
|
|
conditional section is <code>INCLUDE</code>, then the contents of the conditional
|
|
section are part of the DTD.
|
|
If the keyword of the conditional
|
|
section is <code>IGNORE</code>, then the contents of the conditional section are
|
|
not logically part of the DTD.
|
|
Note that for reliable parsing, the contents of even ignored
|
|
conditional sections must be read in order to
|
|
detect nested conditional sections and ensure that the end of the
|
|
outermost (ignored) conditional section is properly detected.
|
|
If a conditional section with a
|
|
keyword of <code>INCLUDE</code> occurs within a larger conditional
|
|
section with a keyword of <code>IGNORE</code>, both the outer and the
|
|
inner conditional sections are ignored.</P>
|
|
<P>If the keyword of the conditional section is a
|
|
parameter-entity reference, the parameter entity must be replaced by its
|
|
content before the processor decides whether to
|
|
include or ignore the conditional section.</P>
|
|
<P>An example:
|
|
</p><table cellpadding='5' border='1' bgcolor='#80ffff' width='100%'><tr><td><code><font><!ENTITY % draft 'INCLUDE' ><BR><!ENTITY % final 'IGNORE' ><BR> <BR><![%draft;[<BR><!ELEMENT book (comments*, title, body, supplements?)><BR>]]><BR><![%final;[<BR><!ELEMENT book (title, body, supplements?)><BR>]]><BR></font></code></td></tr></table><p>
|
|
</P>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<H2><A NAME='sec-physical-struct'>4. Physical Structures</a></h2>
|
|
|
|
<P><a name='dt-entity'></a>An XML document may consist
|
|
of one or many storage units. These are called
|
|
<b>entities</b>; they all have <b>content</b> and are all
|
|
(except for the document entity, see below, and
|
|
the <A href='#dt-doctype'>external DTD subset</A>)
|
|
identified by <b>name</b>.
|
|
|
|
Each XML document has one entity
|
|
called the <A href='#dt-docent'>document entity</A>, which serves
|
|
as the starting point for the <A href='#dt-xml-proc'>XML
|
|
processor</A> and may contain the whole document.</P>
|
|
<P>Entities may be either parsed or unparsed.
|
|
<a name='dt-parsedent'></a>A <b>parsed entity's</b>
|
|
contents are referred to as its
|
|
<A href='#dt-repltext'>replacement text</A>;
|
|
this <A href='#dt-text'>text</A> is considered an
|
|
integral part of the document.</P>
|
|
|
|
<P><a name='dt-unparsed'></a>An
|
|
<b>unparsed entity</b>
|
|
is a resource whose contents may or may not be
|
|
<A href='#dt-text'>text</A>, and if text, may not be XML.
|
|
Each unparsed entity
|
|
has an associated <A href='#dt-notation'>notation</A>, identified by name.
|
|
Beyond a requirement
|
|
that an XML processor make the identifiers for the entity and
|
|
notation available to the application,
|
|
XML places no constraints on the contents of unparsed entities.
|
|
</P>
|
|
<P>
|
|
Parsed entities are invoked by name using entity references;
|
|
unparsed entities by name, given in the value of <code>ENTITY</code>
|
|
or <code>ENTITIES</code>
|
|
attributes.</P>
|
|
<P><a name='gen-entity'></a><b>General entities</b>
|
|
are entities for use within the document content.
|
|
In this specification, general entities are sometimes referred
|
|
to with the unqualified term <EM>entity</EM> when this leads
|
|
to no ambiguity.
|
|
<a name='dt-PE'></a>Parameter entities
|
|
are parsed entities for use within the DTD.
|
|
These two types of entities use different forms of reference and
|
|
are recognized in different contexts.
|
|
Furthermore, they occupy different namespaces; a parameter entity and
|
|
a general entity with the same name are two distinct entities.
|
|
</P>
|
|
|
|
|
|
|
|
<H3><A NAME='sec-references'>4.1 Character and Entity References</a></h3>
|
|
<P><a name='dt-charref'></a>
|
|
A <b>character reference</b> refers to a specific character in the
|
|
ISO/IEC 10646 character set, for example one not directly accessible from
|
|
available input devices.
|
|
</p>
|
|
<table cellpadding='5' border='1' bgcolor='#f5dcb3' width='100%'><tr align='left'><td><strong>
|
|
Character Reference</strong></td></tr><tr><td>
|
|
<table border='0' bgcolor='#f5dcb3'>
|
|
<tr valign='top'><td align='right'><a name='NT-CharRef'></a>[66] </td><td align='right'><font><code>CharRef</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code>'&#' [0-9]+ ';' </code></font></td>
|
|
</tr><tr valign='top'><td align='right'></td><td></td><td></td><td align='left'><font><code>| '&#x' [0-9a-fA-F]+ ';'</code></font></td>
|
|
<td align='center'><font><code> [ </code></font></td><td align='left'><font><code>WFC: <a href='#wf-Legalchar'>Legal Character</A> ]</code></font></td>
|
|
</tr>
|
|
</table>
|
|
</td></tr></table>
|
|
<p>
|
|
<A NAME='wf-Legalchar'></A><P><b>Well-Formedness Constraint:
|
|
Legal Character</b><br>
|
|
Characters referred to using character references must
|
|
match the production for
|
|
<A href='#NT-Char'>Char</A>.
|
|
</P>
|
|
If the character reference begins with "<CODE>&#x</CODE>", the digits and
|
|
letters up to the terminating <CODE>;</CODE> provide a hexadecimal
|
|
representation of the character's code point in ISO/IEC 10646.
|
|
If it begins just with "<CODE>&#</CODE>", the digits up to the terminating
|
|
<CODE>;</CODE> provide a decimal representation of the character's
|
|
code point.
|
|
|
|
|
|
<P><a name='dt-entref'></a>An <b>entity
|
|
reference</b> refers to the content of a named entity.
|
|
<a name='dt-GERef'></a>References to
|
|
parsed general entities
|
|
use ampersand (<CODE>&</CODE>) and semicolon (<CODE>;</CODE>) as
|
|
delimiters.
|
|
<a name='dt-PERef'></a>
|
|
<b>Parameter-entity references</b> use percent-sign (<CODE>%</CODE>) and
|
|
semicolon
|
|
(<CODE>;</CODE>) as delimiters.
|
|
</P>
|
|
|
|
<table cellpadding='5' border='1' bgcolor='#f5dcb3' width='100%'><tr align='left'><td><strong>
|
|
Entity Reference</strong></td></tr><tr><td>
|
|
<table border='0' bgcolor='#f5dcb3'>
|
|
<tr valign='top'><td align='right'><a name='NT-Reference'></a>[67] </td><td align='right'><font><code>Reference</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code><a href='#NT-EntityRef'>EntityRef</A>
|
|
| <a href='#NT-CharRef'>CharRef</A></code></font></td></tr>
|
|
<tr valign='top'><td align='right'><a name='NT-EntityRef'></a>[68] </td><td align='right'><font><code>EntityRef</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code>'&' <a href='#NT-Name'>Name</A> ';'</code></font></td>
|
|
<td align='center'><font><code> [ </code></font></td><td align='left'><font><code>WFC: <a href='#wf-entdeclared'>Entity Declared</A> ]</code></font></td>
|
|
</tr><tr valign='top'><td></td><td></td><td></td><td></td><td align='center'><font><code> [ </code></font></td><td align='left'><font><code>VC: <a href='#vc-entdeclared'>Entity Declared</A> ]</code></font></td>
|
|
</tr><tr valign='top'><td></td><td></td><td></td><td></td><td align='center'><font><code> [ </code></font></td><td align='left'><font><code>WFC: <a href='#textent'>Parsed Entity</A> ]</code></font></td>
|
|
</tr><tr valign='top'><td></td><td></td><td></td><td></td><td align='center'><font><code> [ </code></font></td><td align='left'><font><code>WFC: <a href='#norecursion'>No Recursion</A> ]</code></font></td>
|
|
</tr>
|
|
<tr valign='top'><td align='right'><a name='NT-PEReference'></a>[69] </td><td align='right'><font><code>PEReference</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code>'%' <a href='#NT-Name'>Name</A> ';'</code></font></td>
|
|
<td align='center'><font><code> [ </code></font></td><td align='left'><font><code>VC: <a href='#vc-entdeclared'>Entity Declared</A> ]</code></font></td>
|
|
</tr><tr valign='top'><td></td><td></td><td></td><td></td><td align='center'><font><code> [ </code></font></td><td align='left'><font><code>WFC: <a href='#norecursion'>No Recursion</A> ]</code></font></td>
|
|
</tr><tr valign='top'><td></td><td></td><td></td><td></td><td align='center'><font><code> [ </code></font></td><td align='left'><font><code>WFC: <a href='#indtd'>In DTD</A> ]</code></font></td>
|
|
</tr>
|
|
</table>
|
|
</td></tr></table>
|
|
|
|
|
|
<A NAME='wf-entdeclared'></A><P><b>Well-Formedness Constraint:
|
|
Entity Declared</b><br>
|
|
In a document without any DTD, a document with only an internal
|
|
DTD subset which contains no parameter entity references, or a document with
|
|
"<CODE>standalone='yes'</CODE>",
|
|
the <code><a href='#NT-Name'>Name</A></code> given in the entity reference must
|
|
<A href='#dt-match'>match</A> that in an
|
|
<A href='#sec-entity-decl'>entity declaration</A>, except that
|
|
well-formed documents need not declare
|
|
any of the following entities: <CODE>amp</CODE>,
|
|
<CODE>lt</CODE>,
|
|
<CODE>gt</CODE>,
|
|
<CODE>apos</CODE>,
|
|
<CODE>quot</CODE>.
|
|
The declaration of a parameter entity must precede any reference to it.
|
|
Similarly, the declaration of a general entity must precede any
|
|
reference to it which appears in a default value in an attribute-list
|
|
declaration.
|
|
Note that if entities are declared in the external subset or in
|
|
external parameter entities, a non-validating processor is
|
|
<A href='#include-if-valid'>not obligated to</A> read
|
|
and process their declarations; for such documents, the rule that
|
|
an entity must be declared is a well-formedness constraint only
|
|
if <A href='#sec-rmd'>standalone='yes'</A>.
|
|
</P>
|
|
<A NAME='vc-entdeclared'></A><P><b>Validity Constraint:
|
|
Entity Declared</b><br>
|
|
In a document with an external subset or external parameter
|
|
entities with "<CODE>standalone='no'</CODE>",
|
|
the <code><a href='#NT-Name'>Name</A></code> given in the entity reference must <A href='#dt-match'>match</A> that in an
|
|
<A href='#sec-entity-decl'>entity declaration</A>.
|
|
For interoperability, valid documents should declare the entities
|
|
<CODE>amp</CODE>,
|
|
<CODE>lt</CODE>,
|
|
<CODE>gt</CODE>,
|
|
<CODE>apos</CODE>,
|
|
<CODE>quot</CODE>, in the form
|
|
specified in "<A href='#sec-predefined-ent'>4.6 Predefined Entities</A>".
|
|
The declaration of a parameter entity must precede any reference to it.
|
|
Similarly, the declaration of a general entity must precede any
|
|
reference to it which appears in a default value in an attribute-list
|
|
declaration.
|
|
</P>
|
|
|
|
<A NAME='textent'></A><P><b>Well-Formedness Constraint:
|
|
Parsed Entity</b><br>
|
|
|
|
An entity reference must not contain the name of an <A href='#dt-unparsed'>unparsed entity</A>. Unparsed entities may be referred
|
|
to only in <A href='#dt-attrval'>attribute values</A> declared to
|
|
be of type <code>ENTITY</code> or <code>ENTITIES</code>.
|
|
|
|
</P>
|
|
<A NAME='norecursion'></A><P><b>Well-Formedness Constraint:
|
|
No Recursion</b><br>
|
|
|
|
A parsed entity must not contain a recursive reference to itself,
|
|
either directly or indirectly.
|
|
|
|
</P>
|
|
<A NAME='indtd'></A><P><b>Well-Formedness Constraint:
|
|
In DTD</b><br>
|
|
|
|
Parameter-entity references may only appear in the
|
|
<A href='#dt-doctype'>DTD</A>.
|
|
|
|
</P>
|
|
<P>Examples of character and entity references:
|
|
</p><table cellpadding='5' border='1' bgcolor='#80ffff' width='100%'><tr><td><code><font>Type <key>less-than</key> (&#x3C;) to save options.<BR>This document was prepared on &docdate; and<BR>is classified &security-level;.</font></code></td></tr></table><p></P>
|
|
<P>Example of a parameter-entity reference:
|
|
</p><table cellpadding='5' border='1' bgcolor='#80ffff' width='100%'><tr><td><code><font><!-- declare the parameter entity "ISOLat2"... --><BR><!ENTITY % ISOLat2<BR> SYSTEM "http://www.xml.com/iso/isolat2-xml.entities" ><BR><!-- ... now reference it. --><BR>%ISOLat2;</font></code></td></tr></table><p></P>
|
|
|
|
|
|
|
|
|
|
<H3><A NAME='sec-entity-decl'>4.2 Entity Declarations</a></h3>
|
|
|
|
<P><a name='dt-entdecl'></a>
|
|
Entities are declared thus:
|
|
</p>
|
|
<table cellpadding='5' border='1' bgcolor='#f5dcb3' width='100%'><tr align='left'><td><strong>
|
|
Entity Declaration</strong></td></tr><tr><td>
|
|
<table border='0' bgcolor='#f5dcb3'>
|
|
|
|
<tr valign='top'><td align='right'><a name='NT-EntityDecl'></a>[70] </td><td align='right'><font><code>EntityDecl</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code><a href='#NT-GEDecl'>GEDecl</A> | <a href='#NT-PEDecl'>PEDecl</A></code></font></td>
|
|
|
|
</tr>
|
|
<tr valign='top'><td align='right'><a name='NT-GEDecl'></a>[71] </td><td align='right'><font><code>GEDecl</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code>'<!ENTITY' <a href='#NT-S'>S</A> <a href='#NT-Name'>Name</A>
|
|
<a href='#NT-S'>S</A> <a href='#NT-EntityDef'>EntityDef</A>
|
|
<a href='#NT-S'>S</A>? '>'</code></font></td>
|
|
</tr>
|
|
<tr valign='top'><td align='right'><a name='NT-PEDecl'></a>[72] </td><td align='right'><font><code>PEDecl</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code>'<!ENTITY' <a href='#NT-S'>S</A> '%' <a href='#NT-S'>S</A>
|
|
<a href='#NT-Name'>Name</A> <a href='#NT-S'>S</A>
|
|
<a href='#NT-PEDef'>PEDef</A> <a href='#NT-S'>S</A>? '>'</code></font></td>
|
|
|
|
</tr>
|
|
<tr valign='top'><td align='right'><a name='NT-EntityDef'></a>[73] </td><td align='right'><font><code>EntityDef</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code><a href='#NT-EntityValue'>EntityValue</A>
|
|
| (<a href='#NT-ExternalID'>ExternalID</A>
|
|
<a href='#NT-NDataDecl'>NDataDecl</A>?)</code></font></td>
|
|
|
|
</tr>
|
|
|
|
<tr valign='top'><td align='right'><a name='NT-PEDef'></a>[74] </td><td align='right'><font><code>PEDef</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code><a href='#NT-EntityValue'>EntityValue</A>
|
|
| <a href='#NT-ExternalID'>ExternalID</A></code></font></td></tr>
|
|
|
|
</table>
|
|
</td></tr></table>
|
|
<p>
|
|
The <code><a href='#NT-Name'>Name</A></code> identifies the entity in an
|
|
<A href='#dt-entref'>entity reference</A> or, in the case of an
|
|
unparsed entity, in the value of an <code>ENTITY</code> or <code>ENTITIES</code>
|
|
attribute.
|
|
If the same entity is declared more than once, the first declaration
|
|
encountered is binding; at user option, an XML processor may issue a
|
|
warning if entities are declared multiple times.
|
|
</P>
|
|
|
|
|
|
|
|
<H4><A NAME='sec-internal-ent'>4.2.1 Internal Entities</a></h4>
|
|
|
|
<P><a name='dt-internent'></a>If
|
|
the entity definition is an
|
|
<code><a href='#NT-EntityValue'>EntityValue</A></code>,
|
|
the defined entity is called an <b>internal entity</b>.
|
|
There is no separate physical
|
|
storage object, and the content of the entity is given in the
|
|
declaration.
|
|
Note that some processing of entity and character references in the
|
|
<A href='#dt-litentval'>literal entity value</A> may be required to
|
|
produce the correct <A href='#dt-repltext'>replacement
|
|
text</A>: see "<A href='#intern-replacement'>4.5 Construction of Internal Entity Replacement Text</A>".
|
|
</P>
|
|
<P>An internal entity is a <A href='#dt-parsedent'>parsed
|
|
entity</A>.</P>
|
|
<P>Example of an internal entity declaration:
|
|
</p><table cellpadding='5' border='1' bgcolor='#80ffff' width='100%'><tr><td><code><font><!ENTITY Pub-Status "This is a pre-release of the<BR> specification."></font></code></td></tr></table><p></P>
|
|
|
|
|
|
|
|
|
|
<H4><A NAME='sec-external-ent'>4.2.2 External Entities</a></h4>
|
|
|
|
<P><a name='dt-extent'></a>If the entity is not
|
|
internal, it is an <b>external
|
|
entity</b>, declared as follows:
|
|
</p>
|
|
<table cellpadding='5' border='1' bgcolor='#f5dcb3' width='100%'><tr align='left'><td><strong>
|
|
External Entity Declaration</strong></td></tr><tr><td>
|
|
<table border='0' bgcolor='#f5dcb3'>
|
|
|
|
<tr valign='top'><td align='right'><a name='NT-ExternalID'></a>[75] </td><td align='right'><font><code>ExternalID</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code>'SYSTEM' <a href='#NT-S'>S</A>
|
|
<a href='#NT-SystemLiteral'>SystemLiteral</A></code></font></td>
|
|
</tr><tr valign='top'><td align='right'></td><td></td><td></td><td align='left'><font><code>| 'PUBLIC' <a href='#NT-S'>S</A>
|
|
<a href='#NT-PubidLiteral'>PubidLiteral</A>
|
|
<a href='#NT-S'>S</A>
|
|
<a href='#NT-SystemLiteral'>SystemLiteral</A>
|
|
</code></font></td>
|
|
</tr>
|
|
<tr valign='top'><td align='right'><a name='NT-NDataDecl'></a>[76] </td><td align='right'><font><code>NDataDecl</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code><a href='#NT-S'>S</A> 'NDATA' <a href='#NT-S'>S</A>
|
|
<a href='#NT-Name'>Name</A></code></font></td>
|
|
<td align='center'><font><code> [ </code></font></td><td align='left'><font><code>VC: <a href='#not-declared'>Notation Declared</A> ]</code></font></td></tr>
|
|
</table>
|
|
</td></tr></table>
|
|
<p>
|
|
If the <code><a href='#NT-NDataDecl'>NDataDecl</A></code> is present, this is a
|
|
general <A href='#dt-unparsed'>unparsed
|
|
entity</A>; otherwise it is a parsed entity.</P>
|
|
<A NAME='not-declared'></A><P><b>Validity Constraint:
|
|
Notation Declared</b><br>
|
|
|
|
The <code><a href='#NT-Name'>Name</A></code> must match the declared name of a
|
|
<A href='#dt-notation'>notation</A>.
|
|
|
|
</P>
|
|
<P><a name='dt-sysid'></a>The
|
|
<code><a href='#NT-SystemLiteral'>SystemLiteral</A></code>
|
|
is called the entity's <b>system identifier</b>. It is a URI,
|
|
which may be used to retrieve the entity.
|
|
Note that the hash mark (<CODE>#</CODE>) and fragment identifier
|
|
frequently used with URIs are not, formally, part of the URI itself;
|
|
an XML processor may signal an error if a fragment identifier is
|
|
given as part of a system identifier.
|
|
Unless otherwise provided by information outside the scope of this
|
|
specification (e.g. a special XML element type defined by a particular
|
|
DTD, or a processing instruction defined by a particular application
|
|
specification), relative URIs are relative to the location of the
|
|
resource within which the entity declaration occurs.
|
|
A URI might thus be relative to the
|
|
<A href='#dt-docent'>document entity</A>, to the entity
|
|
containing the <A href='#dt-doctype'>external DTD subset</A>,
|
|
or to some other <A href='#dt-extent'>external parameter entity</A>.
|
|
</P>
|
|
<P>An XML processor should handle a non-ASCII character in a URI by
|
|
representing the character in UTF-8 as one or more bytes, and then
|
|
escaping these bytes with the URI escaping mechanism (i.e., by
|
|
converting each byte to %HH, where HH is the hexadecimal notation of the
|
|
byte value).</P>
|
|
<P><a name='dt-pubid'></a>
|
|
In addition to a system identifier, an external identifier may
|
|
include a <b>public identifier</b>.
|
|
An XML processor attempting to retrieve the entity's content may use the public
|
|
identifier to try to generate an alternative URI. If the processor
|
|
is unable to do so, it must use the URI specified in the system
|
|
literal. Before a match is attempted, all strings
|
|
of white space in the public identifier must be normalized to single space characters (#x20),
|
|
and leading and trailing white space must be removed.</P>
|
|
<P>Examples of external entity declarations:
|
|
</p><table cellpadding='5' border='1' bgcolor='#80ffff' width='100%'><tr><td><code><font><!ENTITY open-hatch<BR> SYSTEM "http://www.textuality.com/boilerplate/OpenHatch.xml"><BR><!ENTITY open-hatch<BR> PUBLIC "-//Textuality//TEXT Standard open-hatch boilerplate//EN"<BR> "http://www.textuality.com/boilerplate/OpenHatch.xml"><BR><!ENTITY hatch-pic<BR> SYSTEM "../grafix/OpenHatch.gif"<BR> NDATA gif ></font></code></td></tr></table><p></P>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<H3><A NAME='TextEntities'>4.3 Parsed Entities</a></h3>
|
|
|
|
|
|
<H4><A NAME='sec-TextDecl'>4.3.1 The Text Declaration</a></h4>
|
|
<P>External parsed entities may each begin with a <b>text
|
|
declaration</b>.
|
|
</p>
|
|
<table cellpadding='5' border='1' bgcolor='#f5dcb3' width='100%'><tr align='left'><td><strong>
|
|
Text Declaration</strong></td></tr><tr><td>
|
|
<table border='0' bgcolor='#f5dcb3'>
|
|
|
|
<tr valign='top'><td align='right'><a name='NT-TextDecl'></a>[77] </td><td align='right'><font><code>TextDecl</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code>'<?xml'
|
|
<a href='#NT-VersionInfo'>VersionInfo</A>?
|
|
<a href='#NT-EncodingDecl'>EncodingDecl</A>
|
|
<a href='#NT-S'>S</A>? '?>'</code></font></td>
|
|
</tr>
|
|
|
|
</table>
|
|
</td></tr></table>
|
|
<p>
|
|
</P>
|
|
<P>The text declaration must be provided literally, not
|
|
by reference to a parsed entity.
|
|
No text declaration may appear at any position other than the beginning of
|
|
an external parsed entity.</P>
|
|
|
|
|
|
|
|
<H4><A NAME='wf-entities'>4.3.2 Well-Formed Parsed Entities</a></h4>
|
|
<P>The document entity is well-formed if it matches the production labeled
|
|
<code><a href='#NT-document'>document</A></code>.
|
|
An external general
|
|
parsed entity is well-formed if it matches the production labeled
|
|
<code><a href='#NT-extParsedEnt'>extParsedEnt</A></code>.
|
|
An external parameter
|
|
entity is well-formed if it matches the production labeled
|
|
<code><a href='#NT-extPE'>extPE</A></code>.
|
|
</p>
|
|
<table cellpadding='5' border='1' bgcolor='#f5dcb3' width='100%'><tr align='left'><td><strong>
|
|
Well-Formed External Parsed Entity</strong></td></tr><tr><td>
|
|
<table border='0' bgcolor='#f5dcb3'>
|
|
<tr valign='top'><td align='right'><a name='NT-extParsedEnt'></a>[78] </td><td align='right'><font><code>extParsedEnt</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code><a href='#NT-TextDecl'>TextDecl</A>?
|
|
<a href='#NT-content'>content</A></code></font></td>
|
|
</tr>
|
|
<tr valign='top'><td align='right'><a name='NT-extPE'></a>[79] </td><td align='right'><font><code>extPE</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code><a href='#NT-TextDecl'>TextDecl</A>?
|
|
<a href='#NT-extSubsetDecl'>extSubsetDecl</A></code></font></td>
|
|
</tr>
|
|
</table>
|
|
</td></tr></table>
|
|
<p>
|
|
An internal general parsed entity is well-formed if its replacement text
|
|
matches the production labeled
|
|
<code><a href='#NT-content'>content</A></code>.
|
|
All internal parameter entities are well-formed by definition.
|
|
</P>
|
|
<P>A consequence of well-formedness in entities is that the logical
|
|
and physical structures in an XML document are properly nested; no
|
|
<A href='#dt-stag'>start-tag</A>,
|
|
<A href='#dt-etag'>end-tag</A>,
|
|
<A href='#dt-empty'>empty-element tag</A>,
|
|
<A href='#dt-element'>element</A>,
|
|
<A href='#dt-comment'>comment</A>,
|
|
<A href='#dt-pi'>processing instruction</A>,
|
|
<A href='#dt-charref'>character
|
|
reference</A>, or
|
|
<A href='#dt-entref'>entity reference</A>
|
|
can begin in one entity and end in another.</P>
|
|
|
|
|
|
|
|
<H4><A NAME='charencoding'>4.3.3 Character Encoding in Entities</a></h4>
|
|
|
|
<P>Each external parsed entity in an XML document may use a different
|
|
encoding for its characters. All XML processors must be able to read
|
|
entities in either UTF-8 or UTF-16.
|
|
|
|
</P>
|
|
<P>Entities encoded in UTF-16 must
|
|
begin with the Byte Order Mark described by ISO/IEC 10646 Annex E and
|
|
Unicode Appendix B (the ZERO WIDTH NO-BREAK SPACE character, #xFEFF).
|
|
This is an encoding signature, not part of either the markup or the
|
|
character data of the XML document.
|
|
XML processors must be able to use this character to
|
|
differentiate between UTF-8 and UTF-16 encoded documents.</P>
|
|
<P>Although an XML processor is required to read only entities in
|
|
the UTF-8 and UTF-16 encodings, it is recognized that other encodings are
|
|
used around the world, and it may be desired for XML processors
|
|
to read entities that use them.
|
|
Parsed entities which are stored in an encoding other than
|
|
UTF-8 or UTF-16 must begin with a <A href='#TextDecl'>text
|
|
declaration</A> containing an encoding declaration:
|
|
</p>
|
|
<table cellpadding='5' border='1' bgcolor='#f5dcb3' width='100%'><tr align='left'><td><strong>
|
|
Encoding Declaration</strong></td></tr><tr><td>
|
|
<table border='0' bgcolor='#f5dcb3'>
|
|
<tr valign='top'><td align='right'><a name='NT-EncodingDecl'></a>[80] </td><td align='right'><font><code>EncodingDecl</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code><a href='#NT-S'>S</A>
|
|
'encoding' <a href='#NT-Eq'>Eq</A>
|
|
('"' <a href='#NT-EncName'>EncName</A> '"' |
|
|
"'" <a href='#NT-EncName'>EncName</A> "'" )
|
|
</code></font></td>
|
|
</tr>
|
|
<tr valign='top'><td align='right'><a name='NT-EncName'></a>[81] </td><td align='right'><font><code>EncName</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code>[A-Za-z] ([A-Za-z0-9._] | '-')*</code></font></td>
|
|
<td align='center'><font><code> /* </code></font></td><td align='left'><font><code>Encoding name contains only Latin characters */</code></font></td>
|
|
</tr>
|
|
</table>
|
|
</td></tr></table>
|
|
<p>
|
|
In the <A href='#dt-docent'>document entity</A>, the encoding
|
|
declaration is part of the <A href='#dt-xmldecl'>XML declaration</A>.
|
|
The <code><a href='#NT-EncName'>EncName</A></code> is the name of the encoding used.
|
|
</P>
|
|
|
|
<P>In an encoding declaration, the values
|
|
"<CODE>UTF-8</CODE>",
|
|
"<CODE>UTF-16</CODE>",
|
|
"<CODE>ISO-10646-UCS-2</CODE>", and
|
|
"<CODE>ISO-10646-UCS-4</CODE>" should be
|
|
used for the various encodings and transformations of Unicode /
|
|
ISO/IEC 10646, the values
|
|
"<CODE>ISO-8859-1</CODE>",
|
|
"<CODE>ISO-8859-2</CODE>", ...
|
|
"<CODE>ISO-8859-9</CODE>" should be used for the parts of ISO 8859, and
|
|
the values
|
|
"<CODE>ISO-2022-JP</CODE>",
|
|
"<CODE>Shift_JIS</CODE>", and
|
|
"<CODE>EUC-JP</CODE>"
|
|
should be used for the various encoded forms of JIS X-0208-1997. XML
|
|
processors may recognize other encodings; it is recommended that
|
|
character encodings registered (as <EM>charset</EM>s)
|
|
with the Internet Assigned Numbers
|
|
Authority <A href='#IANA'>[IANA]</A>, other than those just listed, should be
|
|
referred to
|
|
using their registered names.
|
|
Note that these registered names are defined to be
|
|
case-insensitive, so processors wishing to match against them
|
|
should do so in a case-insensitive
|
|
way.</P>
|
|
<P>In the absence of information provided by an external
|
|
transport protocol (e.g. HTTP or MIME),
|
|
it is an <A href='#dt-error'>error</A> for an entity including
|
|
an encoding declaration to be presented to the XML processor
|
|
in an encoding other than that named in the declaration,
|
|
for an encoding declaration to occur other than at the beginning
|
|
of an external entity, or for
|
|
an entity which begins with neither a Byte Order Mark nor an encoding
|
|
declaration to use an encoding other than UTF-8.
|
|
Note that since ASCII
|
|
is a subset of UTF-8, ordinary ASCII entities do not strictly need
|
|
an encoding declaration.</P>
|
|
|
|
<P>It is a <A href='#dt-fatal'>fatal error</A> when an XML processor
|
|
encounters an entity with an encoding that it is unable to process.</P>
|
|
<P>Examples of encoding declarations:
|
|
</p><table cellpadding='5' border='1' bgcolor='#80ffff' width='100%'><tr><td><code><font><?xml encoding='UTF-8'?><BR><?xml encoding='EUC-JP'?></font></code></td></tr></table><p></P>
|
|
|
|
|
|
|
|
|
|
<H3><A NAME='entproc'>4.4 XML Processor Treatment of Entities and References</a></h3>
|
|
<P>The table below summarizes the contexts in which character references,
|
|
entity references, and invocations of unparsed entities might appear and the
|
|
required behavior of an <A href='#dt-xml-proc'>XML processor</A> in
|
|
each case.
|
|
The labels in the leftmost column describe the recognition context:
|
|
<DL>
|
|
<DT><B>Reference in Content</B></DT>
|
|
<DD>as a reference
|
|
anywhere after the <A href='#dt-stag'>start-tag</A> and
|
|
before the <A href='#dt-etag'>end-tag</A> of an element; corresponds
|
|
to the nonterminal <code><a href='#NT-content'>content</A></code>.</DD>
|
|
|
|
|
|
|
|
<DT><B>Reference in Attribute Value</B></DT>
|
|
<DD>as a reference within either the value of an attribute in a
|
|
<A href='#dt-stag'>start-tag</A>, or a default
|
|
value in an <A href='#dt-attdecl'>attribute declaration</A>;
|
|
corresponds to the nonterminal
|
|
<code><a href='#NT-AttValue'>AttValue</A></code>.</DD>
|
|
|
|
|
|
<DT><B>Occurs as Attribute Value</B></DT>
|
|
<DD>as a <code><a href='#NT-Name'>Name</A></code>, not a reference, appearing either as
|
|
the value of an
|
|
attribute which has been declared as type <code>ENTITY</code>, or as one of
|
|
the space-separated tokens in the value of an attribute which has been
|
|
declared as type <code>ENTITIES</code>.
|
|
</DD>
|
|
|
|
<DT><B>Reference in Entity Value</B></DT>
|
|
<DD>as a reference
|
|
within a parameter or internal entity's
|
|
<A href='#dt-litentval'>literal entity value</A> in
|
|
the entity's declaration; corresponds to the nonterminal
|
|
<code><a href='#NT-EntityValue'>EntityValue</A></code>.</DD>
|
|
|
|
<DT><B>Reference in DTD</B></DT>
|
|
<DD>as a reference within either the internal or external subsets of the
|
|
<A href='#dt-doctype'>DTD</A>, but outside
|
|
of an <code><a href='#NT-EntityValue'>EntityValue</A></code> or
|
|
<code><a href='#NT-AttValue'>AttValue</A></code>.</DD>
|
|
|
|
|
|
</DL>
|
|
|
|
<TABLE border='0' cellpadding='7' align='center'>
|
|
|
|
<tr><td bgcolor='#c0d9c0' rowspan='2' colspan='1'></td>
|
|
<td bgcolor='#c0d9c0' align='center' valign='bottom' colspan='4'>Entity Type</td>
|
|
<td bgcolor='#c0d9c0' rowspan='2' align='center'>Character</td>
|
|
</tr>
|
|
<tr align='center' valign='bottom'>
|
|
<td bgcolor='#c0d9c0'>Parameter</td>
|
|
<td bgcolor='#c0d9c0'>Internal<BR>General</td>
|
|
<td bgcolor='#c0d9c0'>External Parsed<BR>General</td>
|
|
<td bgcolor='#c0d9c0'>Unparsed</td>
|
|
</tr>
|
|
<tr align='center' valign='middle'>
|
|
|
|
<td bgcolor='#c0d9c0' align='right'>Reference<BR>in Content</td>
|
|
<td bgcolor='#c0d9c0'><A href='#not-recognized'>Not recognized</A></td>
|
|
<td bgcolor='#c0d9c0'><A href='#included'>Included</A></td>
|
|
<td bgcolor='#c0d9c0'><A href='#include-if-valid'>Included if validating</A></td>
|
|
<td bgcolor='#c0d9c0'><A href='#forbidden'>Forbidden</A></td>
|
|
<td bgcolor='#c0d9c0'><A href='#included'>Included</A></td>
|
|
</tr>
|
|
<tr align='center' valign='middle'>
|
|
<td bgcolor='#c0d9c0' align='right'>Reference<BR>in Attribute Value</td>
|
|
<td bgcolor='#c0d9c0'><A href='#not-recognized'>Not recognized</A></td>
|
|
<td bgcolor='#c0d9c0'><A href='#inliteral'>Included in literal</A></td>
|
|
<td bgcolor='#c0d9c0'><A href='#forbidden'>Forbidden</A></td>
|
|
<td bgcolor='#c0d9c0'><A href='#forbidden'>Forbidden</A></td>
|
|
<td bgcolor='#c0d9c0'><A href='#included'>Included</A></td>
|
|
</tr>
|
|
<tr align='center' valign='middle'>
|
|
<td bgcolor='#c0d9c0' align='right'>Occurs as<BR>Attribute Value</td>
|
|
<td bgcolor='#c0d9c0'><A href='#not-recognized'>Not recognized</A></td>
|
|
<td bgcolor='#c0d9c0'><A href='#not-recognized'>Forbidden</A></td>
|
|
<td bgcolor='#c0d9c0'><A href='#not-recognized'>Forbidden</A></td>
|
|
<td bgcolor='#c0d9c0'><A href='#notify'>Notify</A></td>
|
|
<td bgcolor='#c0d9c0'><A href='#not recognized'>Not recognized</A></td>
|
|
</tr>
|
|
<tr align='center' valign='middle'>
|
|
<td bgcolor='#c0d9c0' align='right'>Reference<BR>in EntityValue</td>
|
|
<td bgcolor='#c0d9c0'><A href='#inliteral'>Included in literal</A></td>
|
|
<td bgcolor='#c0d9c0'><A href='#bypass'>Bypassed</A></td>
|
|
<td bgcolor='#c0d9c0'><A href='#bypass'>Bypassed</A></td>
|
|
<td bgcolor='#c0d9c0'><A href='#forbidden'>Forbidden</A></td>
|
|
<td bgcolor='#c0d9c0'><A href='#included'>Included</A></td>
|
|
</tr>
|
|
<tr align='center' valign='middle'>
|
|
<td bgcolor='#c0d9c0' align='right'>Reference<BR>in DTD</td>
|
|
<td bgcolor='#c0d9c0'><A href='#as-PE'>Included as PE</A></td>
|
|
<td bgcolor='#c0d9c0'><A href='#forbidden'>Forbidden</A></td>
|
|
<td bgcolor='#c0d9c0'><A href='#forbidden'>Forbidden</A></td>
|
|
<td bgcolor='#c0d9c0'><A href='#forbidden'>Forbidden</A></td>
|
|
<td bgcolor='#c0d9c0'><A href='#forbidden'>Forbidden</A></td>
|
|
</tr>
|
|
|
|
</TABLE>
|
|
|
|
|
|
<H4><A NAME='not-recognized'>4.4.1 Not Recognized</a></h4>
|
|
<P>Outside the DTD, the <CODE>%</CODE> character has no
|
|
special significance; thus, what would be parameter entity references in the
|
|
DTD are not recognized as markup in <code><a href='#NT-content'>content</A></code>.
|
|
Similarly, the names of unparsed entities are not recognized except
|
|
when they appear in the value of an appropriately declared attribute.
|
|
</P>
|
|
|
|
|
|
|
|
<H4><A NAME='included'>4.4.2 Included</a></h4>
|
|
<P><a name='dt-include'></a>An entity is
|
|
<b>included</b> when its
|
|
<A href='#dt-repltext'>replacement text</A> is retrieved
|
|
and processed, in place of the reference itself,
|
|
as though it were part of the document at the location the
|
|
reference was recognized.
|
|
The replacement text may contain both
|
|
<A href='#dt-chardata'>character data</A>
|
|
and (except for parameter entities) <A href='#dt-markup'>markup</A>,
|
|
which must be recognized in
|
|
the usual way, except that the replacement text of entities used to escape
|
|
markup delimiters (the entities <CODE>amp</CODE>,
|
|
<CODE>lt</CODE>,
|
|
<CODE>gt</CODE>,
|
|
<CODE>apos</CODE>,
|
|
<CODE>quot</CODE>) is always treated as
|
|
data. (The string "<CODE>AT&amp;T;</CODE>" expands to
|
|
"<CODE>AT&T;</CODE>" and the remaining ampersand is not recognized
|
|
as an entity-reference delimiter.)
|
|
A character reference is <b>included</b> when the indicated
|
|
character is processed in place of the reference itself.
|
|
</P>
|
|
|
|
|
|
|
|
<H4><A NAME='include-if-valid'>4.4.3 Included If Validating</a></h4>
|
|
<P>When an XML processor recognizes a reference to a parsed entity, in order
|
|
to <A href='#dt-valid'>validate</A>
|
|
the document, the processor must
|
|
<A href='#dt-include'>include</A> its
|
|
replacement text.
|
|
If the entity is external, and the processor is not
|
|
attempting to validate the XML document, the
|
|
processor <A href='#dt-may'>may</A>, but need not,
|
|
include the entity's replacement text.
|
|
If a non-validating parser does not include the replacement text,
|
|
it must inform the application that it recognized, but did not
|
|
read, the entity.</P>
|
|
<P>This rule is based on the recognition that the automatic inclusion
|
|
provided by the SGML and XML entity mechanism, primarily designed
|
|
to support modularity in authoring, is not necessarily
|
|
appropriate for other applications, in particular document browsing.
|
|
Browsers, for example, when encountering an external parsed entity reference,
|
|
might choose to provide a visual indication of the entity's
|
|
presence and retrieve it for display only on demand.
|
|
</P>
|
|
|
|
|
|
|
|
<H4><A NAME='forbidden'>4.4.4 Forbidden</a></h4>
|
|
<P>The following are forbidden, and constitute
|
|
<A href='#dt-fatal'>fatal</A> errors:
|
|
<UL>
|
|
<LI>the appearance of a reference to an
|
|
<A href='#dt-unparsed'>unparsed entity</A>.
|
|
</LI>
|
|
<LI>the appearance of any character or general-entity reference in the
|
|
DTD except within an <code><a href='#NT-EntityValue'>EntityValue</A></code> or
|
|
<code><a href='#NT-AttValue'>AttValue</A></code>.</LI>
|
|
<LI>a reference to an external entity in an attribute value.
|
|
</LI>
|
|
</UL>
|
|
|
|
|
|
|
|
|
|
|
|
<H4><A NAME='inliteral'>4.4.5 Included in Literal</a></h4>
|
|
<P>When an <A href='#dt-entref'>entity reference</A> appears in an
|
|
attribute value, or a parameter entity reference appears in a literal entity
|
|
value, its <A href='#dt-repltext'>replacement text</A> is
|
|
processed in place of the reference itself as though it
|
|
were part of the document at the location the reference was recognized,
|
|
except that a single or double quote character in the replacement text
|
|
is always treated as a normal data character and will not terminate the
|
|
literal.
|
|
For example, this is well-formed:
|
|
</p><table cellpadding='5' border='1' bgcolor='#80ffff' width='100%'><tr><td><code><font><!ENTITY % YN '"Yes"' ><BR><!ENTITY WhatHeSaid "He said &YN;" ></font></code></td></tr></table><p>
|
|
while this is not:
|
|
</p><table cellpadding='5' border='1' bgcolor='#80ffff' width='100%'><tr><td><code><font><!ENTITY EndAttr "27'" ><BR><element attribute='a-&EndAttr;></font></code></td></tr></table><p>
|
|
</P>
|
|
|
|
|
|
<H4><A NAME='notify'>4.4.6 Notify</a></h4>
|
|
<P>When the name of an <A href='#dt-unparsed'>unparsed
|
|
entity</A> appears as a token in the
|
|
value of an attribute of declared type <code>ENTITY</code> or <code>ENTITIES</code>,
|
|
a validating processor must inform the
|
|
application of the <A href='#dt-sysid'>system</A>
|
|
and <A href='#dt-pubid'>public</A> (if any)
|
|
identifiers for both the entity and its associated
|
|
<A href='#dt-notation'>notation</A>.</P>
|
|
|
|
|
|
|
|
<H4><A NAME='bypass'>4.4.7 Bypassed</a></h4>
|
|
<P>When a general entity reference appears in the
|
|
<code><a href='#NT-EntityValue'>EntityValue</A></code> in an entity declaration,
|
|
it is bypassed and left as is.</P>
|
|
|
|
|
|
|
|
<H4><A NAME='as-PE'>4.4.8 Included as PE</a></h4>
|
|
<P>Just as with external parsed entities, parameter entities
|
|
need only be <A href='#include-if-valid'>included if
|
|
validating</A>.
|
|
When a parameter-entity reference is recognized in the DTD
|
|
and included, its
|
|
<A href='#dt-repltext'>replacement
|
|
text</A> is enlarged by the attachment of one leading and one following
|
|
space (#x20) character; the intent is to constrain the replacement
|
|
text of parameter
|
|
entities to contain an integral number of grammatical tokens in the DTD.
|
|
</P>
|
|
|
|
|
|
|
|
|
|
|
|
<H3><A NAME='intern-replacement'>4.5 Construction of Internal Entity Replacement Text</a></h3>
|
|
<P>In discussing the treatment
|
|
of internal entities, it is
|
|
useful to distinguish two forms of the entity's value.
|
|
<a name='dt-litentval'></a>The <b>literal
|
|
entity value</b> is the quoted string actually
|
|
present in the entity declaration, corresponding to the
|
|
non-terminal <code><a href='#NT-EntityValue'>EntityValue</A></code>.
|
|
<a name='dt-repltext'></a>The <b>replacement
|
|
text</b> is the content of the entity, after
|
|
replacement of character references and parameter-entity
|
|
references.
|
|
</P>
|
|
|
|
<P>The literal entity value
|
|
as given in an internal entity declaration
|
|
(<code><a href='#NT-EntityValue'>EntityValue</A></code>) may contain character,
|
|
parameter-entity, and general-entity references.
|
|
Such references must be contained entirely within the
|
|
literal entity value.
|
|
The actual replacement text that is
|
|
<A href='#dt-include'>included</A> as described above
|
|
must contain the <EM>replacement text</EM> of any
|
|
parameter entities referred to, and must contain the character
|
|
referred to, in place of any character references in the
|
|
literal entity value; however,
|
|
general-entity references must be left as-is, unexpanded.
|
|
For example, given the following declarations:
|
|
|
|
</p><table cellpadding='5' border='1' bgcolor='#80ffff' width='100%'><tr><td><code><font><!ENTITY % pub "&#xc9;ditions Gallimard" ><BR><!ENTITY rights "All rights reserved" ><BR><!ENTITY book "La Peste: Albert Camus, <BR>&#xA9; 1947 %pub;. &rights;" ></font></code></td></tr></table><p>
|
|
then the replacement text for the entity "<CODE>book</CODE>" is:
|
|
</p><table cellpadding='5' border='1' bgcolor='#80ffff' width='100%'><tr><td><code><font>La Peste: Albert Camus, <BR>© 1947 Éditions Gallimard. &rights;</font></code></td></tr></table><p>
|
|
The general-entity reference "<CODE>&rights;</CODE>" would be expanded
|
|
should the reference "<CODE>&book;</CODE>" appear in the document's
|
|
content or an attribute value.</P>
|
|
<P>These simple rules may have complex interactions; for a detailed
|
|
discussion of a difficult example, see
|
|
"<A href='#sec-entexpand'>D. Expansion of Entity and Character References</A>".
|
|
</P>
|
|
|
|
|
|
|
|
|
|
<H3><A NAME='sec-predefined-ent'>4.6 Predefined Entities</a></h3>
|
|
<P><a name='dt-escape'></a>Entity and character
|
|
references can both be used to <b>escape</b> the left angle bracket,
|
|
ampersand, and other delimiters. A set of general entities
|
|
(<CODE>amp</CODE>,
|
|
<CODE>lt</CODE>,
|
|
<CODE>gt</CODE>,
|
|
<CODE>apos</CODE>,
|
|
<CODE>quot</CODE>) is specified for this purpose.
|
|
Numeric character references may also be used; they are
|
|
expanded immediately when recognized and must be treated as
|
|
character data, so the numeric character references
|
|
"<CODE>&#60;</CODE>" and "<CODE>&#38;</CODE>" may be used to
|
|
escape <CODE><</CODE> and <CODE>&</CODE> when they occur
|
|
in character data.</P>
|
|
<P>All XML processors must recognize these entities whether they
|
|
are declared or not.
|
|
<A href='#dt-interop'>For interoperability</A>,
|
|
valid XML documents should declare these
|
|
entities, like any others, before using them.
|
|
If the entities in question are declared, they must be declared
|
|
as internal entities whose replacement text is the single
|
|
character being escaped or a character reference to
|
|
that character, as shown below.
|
|
</p><table cellpadding='5' border='1' bgcolor='#80ffff' width='100%'><tr><td><code><font><!ENTITY lt "&#38;#60;"> <BR><!ENTITY gt "&#62;"> <BR><!ENTITY amp "&#38;#38;"> <BR><!ENTITY apos "&#39;"> <BR><!ENTITY quot "&#34;"> <BR></font></code></td></tr></table><p>
|
|
Note that the <CODE><</CODE> and <CODE>&</CODE> characters
|
|
in the declarations of "<CODE>lt</CODE>" and "<CODE>amp</CODE>"
|
|
are doubly escaped to meet the requirement that entity replacement
|
|
be well-formed.
|
|
</P>
|
|
|
|
|
|
|
|
|
|
<H3><A NAME='Notations'>4.7 Notation Declarations</a></h3>
|
|
|
|
<P><a name='dt-notation'></a><b>Notations</b> identify by
|
|
name the format of <A href='#dt-extent'>unparsed
|
|
entities</A>, the
|
|
format of elements which bear a notation attribute,
|
|
or the application to which
|
|
a <A href='#dt-pi'>processing instruction</A> is
|
|
addressed.</P>
|
|
<P><a name='dt-notdecl'></a>
|
|
<b>Notation declarations</b>
|
|
provide a name for the notation, for use in
|
|
entity and attribute-list declarations and in attribute specifications,
|
|
and an external identifier for the notation which may allow an XML
|
|
processor or its client application to locate a helper application
|
|
capable of processing data in the given notation.
|
|
</p>
|
|
<table cellpadding='5' border='1' bgcolor='#f5dcb3' width='100%'><tr align='left'><td><strong>
|
|
Notation Declarations</strong></td></tr><tr><td>
|
|
<table border='0' bgcolor='#f5dcb3'>
|
|
<tr valign='top'><td align='right'><a name='NT-NotationDecl'></a>[82] </td><td align='right'><font><code>NotationDecl</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code>'<!NOTATION' <a href='#NT-S'>S</A> <a href='#NT-Name'>Name</A>
|
|
<a href='#NT-S'>S</A>
|
|
(<a href='#NT-ExternalID'>ExternalID</A> |
|
|
<a href='#NT-PublicID'>PublicID</A>)
|
|
<a href='#NT-S'>S</A>? '>'</code></font></td></tr>
|
|
<tr valign='top'><td align='right'><a name='NT-PublicID'></a>[83] </td><td align='right'><font><code>PublicID</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code>'PUBLIC' <a href='#NT-S'>S</A>
|
|
<a href='#NT-PubidLiteral'>PubidLiteral</A>
|
|
</code></font></td></tr>
|
|
</table>
|
|
</td></tr></table>
|
|
<p>
|
|
</P>
|
|
<P>XML processors must provide applications with the name and external
|
|
identifier(s) of any notation declared and referred to in an attribute
|
|
value, attribute definition, or entity declaration. They may
|
|
additionally resolve the external identifier into the
|
|
<A href='#dt-sysid'>system identifier</A>,
|
|
file name, or other information needed to allow the
|
|
application to call a processor for data in the notation described. (It
|
|
is not an error, however, for XML documents to declare and refer to
|
|
notations for which notation-specific applications are not available on
|
|
the system where the XML processor or application is running.)</P>
|
|
|
|
|
|
|
|
|
|
|
|
<H3><A NAME='sec-doc-entity'>4.8 Document Entity</a></h3>
|
|
|
|
<P><a name='dt-docent'></a>The <b>document
|
|
entity</b> serves as the root of the entity
|
|
tree and a starting-point for an <A href='#dt-xml-proc'>XML
|
|
processor</A>.
|
|
This specification does
|
|
not specify how the document entity is to be located by an XML
|
|
processor; unlike other entities, the document entity has no name and might
|
|
well appear on a processor input stream
|
|
without any identification at all.</P>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<H2><A NAME='sec-conformance'>5. Conformance</a></h2>
|
|
|
|
|
|
|
|
<H3><A NAME='proc-types'>5.1 Validating and Non-Validating Processors</a></h3>
|
|
<P>Conforming <A href='#dt-xml-proc'>XML processors</A> fall into two
|
|
classes: validating and non-validating.</P>
|
|
<P>Validating and non-validating processors alike must report
|
|
violations of this specification's well-formedness constraints
|
|
in the content of the
|
|
<A href='#dt-docent'>document entity</A> and any
|
|
other <A href='#dt-parsedent'>parsed entities</A> that
|
|
they read.</P>
|
|
<P><a name='dt-validating'></a>
|
|
<b>Validating processors</b> must report
|
|
violations of the constraints expressed by the declarations in the
|
|
<A href='#dt-doctype'>DTD</A>, and
|
|
failures to fulfill the validity constraints given
|
|
in this specification.
|
|
|
|
To accomplish this, validating XML processors must read and process the entire
|
|
DTD and all external parsed entities referenced in the document.
|
|
</P>
|
|
<P>Non-validating processors are required to check only the
|
|
<A href='#dt-docent'>document entity</A>, including
|
|
the entire internal DTD subset, for well-formedness.
|
|
<a name='dt-use-mdecl'></a>
|
|
While they are not required to check the document for validity,
|
|
they are required to
|
|
<b>process</b> all the declarations they read in the
|
|
internal DTD subset and in any parameter entity that they
|
|
read, up to the first reference
|
|
to a parameter entity that they do <EM>not</EM> read; that is to
|
|
say, they must
|
|
use the information in those declarations to
|
|
<A href='#AVNormalize'>normalize</A> attribute values,
|
|
<A href='#included'>include</A> the replacement text of
|
|
internal entities, and supply
|
|
<A href='#sec-attr-defaults'>default attribute values</A>.
|
|
|
|
They must not <A href='#dt-use-mdecl'>process</A>
|
|
<A href='#dt-entdecl'>entity declarations</A> or
|
|
<A href='#dt-attdecl'>attribute-list declarations</A>
|
|
encountered after a reference to a parameter entity that is not
|
|
read, since the entity may have contained overriding declarations.
|
|
</P>
|
|
|
|
|
|
|
|
<H3><A NAME='safe-behavior'>5.2 Using XML Processors</a></h3>
|
|
<P>The behavior of a validating XML processor is highly predictable; it
|
|
must read every piece of a document and report all well-formedness and
|
|
validity violations.
|
|
Less is required of a non-validating processor; it need not read any
|
|
part of the document other than the document entity.
|
|
This has two effects that may be important to users of XML processors:
|
|
<UL>
|
|
<LI>Certain well-formedness errors, specifically those that require
|
|
reading external entities, may not be detected by a non-validating processor.
|
|
Examples include the constraints entitled
|
|
<A href='#wf-entdeclared'>Entity Declared</A>,
|
|
<A href='#wf-textent'>Parsed Entity</A>, and
|
|
<A href='#wf-norecursion'>No Recursion</A>, as well
|
|
as some of the cases described as
|
|
<A href='#forbidden'>forbidden</A> in
|
|
"<A href='#entproc'>4.4 XML Processor Treatment of Entities and References</A>".</LI>
|
|
<LI>The information passed from the processor to the application may
|
|
vary, depending on whether the processor reads
|
|
parameter and external entities.
|
|
For example, a non-validating processor may not
|
|
<A href='#AVNormalize'>normalize</A> attribute values,
|
|
<A href='#included'>include</A> the replacement text of
|
|
internal entities, or supply
|
|
<A href='#sec-attr-defaults'>default attribute values</A>,
|
|
where doing so depends on having read declarations in
|
|
external or parameter entities.</LI>
|
|
</UL>
|
|
|
|
|
|
<P>For maximum reliability in interoperating between different XML
|
|
processors, applications which use non-validating processors should not
|
|
rely on any behaviors not required of such processors.
|
|
Applications which require facilities such as the use of default
|
|
attributes or internal entities which are declared in external
|
|
entities should use validating XML processors.</P>
|
|
|
|
|
|
|
|
|
|
|
|
<H2><A NAME='sec-notation'>6. Notation</a></h2>
|
|
|
|
<P>The formal grammar of XML is given in this specification using a simple
|
|
Extended Backus-Naur Form (EBNF) notation. Each rule in the grammar defines
|
|
one symbol, in the form
|
|
</p><table cellpadding='5' border='1' bgcolor='#80ffff' width='100%'><tr><td><code><font>symbol ::= expression</font></code></td></tr></table><p></P>
|
|
<P>Symbols are written with an initial capital letter if they are
|
|
defined by a regular expression, or with an initial lower case letter
|
|
otherwise.
|
|
Literal strings are quoted.
|
|
|
|
</P>
|
|
|
|
<P>Within the expression on the right-hand side of a rule, the following
|
|
expressions are used to match strings of one or more characters:
|
|
<DL>
|
|
|
|
<DT><B><CODE>#xN</CODE></B></DT>
|
|
<DD>where <CODE>N</CODE> is a hexadecimal integer, the
|
|
expression matches the character in ISO/IEC 10646 whose canonical
|
|
(UCS-4)
|
|
code value, when interpreted as an unsigned binary number, has
|
|
the value indicated. The number of leading zeros in the
|
|
<CODE>#xN</CODE> form is insignificant; the number of leading
|
|
zeros in the corresponding code value
|
|
is governed by the character
|
|
encoding in use and is not significant for XML.</DD>
|
|
|
|
|
|
|
|
<DT><B><CODE>[a-zA-Z]</CODE>, <CODE>[#xN-#xN]</CODE></B></DT>
|
|
<DD>matches any <A href='#dt-character'>character</A>
|
|
with a value in the range(s) indicated (inclusive).</DD>
|
|
|
|
|
|
|
|
<DT><B><CODE>[^a-z]</CODE>, <CODE>[^#xN-#xN]</CODE></B></DT>
|
|
<DD>matches any <A href='#dt-character'>character</A>
|
|
with a value <EM>outside</EM> the
|
|
range indicated.</DD>
|
|
|
|
|
|
|
|
<DT><B><CODE>[^abc]</CODE>, <CODE>[^#xN#xN#xN]</CODE></B></DT>
|
|
<DD>matches any <A href='#dt-character'>character</A>
|
|
with a value not among the characters given.</DD>
|
|
|
|
|
|
|
|
<DT><B><CODE>"string"</CODE></B></DT>
|
|
<DD>matches a literal string <A href='#dt-match'>matching</A>
|
|
that given inside the double quotes.</DD>
|
|
|
|
|
|
|
|
<DT><B><CODE>'string'</CODE></B></DT>
|
|
<DD>matches a literal string <A href='#dt-match'>matching</A>
|
|
that given inside the single quotes.</DD>
|
|
|
|
|
|
</DL>
|
|
|
|
These symbols may be combined to match more complex patterns as follows,
|
|
where <CODE>A</CODE> and <CODE>B</CODE> represent simple expressions:
|
|
<DL>
|
|
|
|
<DT><B>(<CODE>expression</CODE>)</B></DT>
|
|
<DD><CODE>expression</CODE> is treated as a unit
|
|
and may be combined as described in this list.</DD>
|
|
|
|
|
|
|
|
<DT><B><CODE>A?</CODE></B></DT>
|
|
<DD>matches <CODE>A</CODE> or nothing; optional <CODE>A</CODE>.</DD>
|
|
|
|
|
|
|
|
<DT><B><CODE>A B</CODE></B></DT>
|
|
<DD>matches <CODE>A</CODE> followed by <CODE>B</CODE>.</DD>
|
|
|
|
|
|
|
|
<DT><B><CODE>A | B</CODE></B></DT>
|
|
<DD>matches <CODE>A</CODE> or <CODE>B</CODE> but not both.</DD>
|
|
|
|
|
|
|
|
<DT><B><CODE>A - B</CODE></B></DT>
|
|
<DD>matches any string that matches <CODE>A</CODE> but does not match
|
|
<CODE>B</CODE>.
|
|
</DD>
|
|
|
|
|
|
|
|
<DT><B><CODE>A+</CODE></B></DT>
|
|
<DD>matches one or more occurrences of <CODE>A</CODE>.</DD>
|
|
|
|
|
|
|
|
<DT><B><CODE>A*</CODE></B></DT>
|
|
<DD>matches zero or more occurrences of <CODE>A</CODE>.</DD>
|
|
|
|
|
|
|
|
</DL>
|
|
|
|
Other notations used in the productions are:
|
|
<DL>
|
|
|
|
<DT><B><CODE>/* ... */</CODE></B></DT>
|
|
<DD>comment.</DD>
|
|
|
|
|
|
|
|
<DT><B><CODE>[ wfc: ... ]</CODE></B></DT>
|
|
<DD>well-formedness constraint; this identifies by name a
|
|
constraint on
|
|
<A href='#dt-wellformed'>well-formed</A> documents
|
|
associated with a production.</DD>
|
|
|
|
|
|
|
|
<DT><B><CODE>[ vc: ... ]</CODE></B></DT>
|
|
<DD>validity constraint; this identifies by name a constraint on
|
|
<A href='#dt-valid'>valid</A> documents associated with
|
|
a production.</DD>
|
|
|
|
|
|
</DL>
|
|
|
|
|
|
|
|
<HR>
|
|
|
|
<h1>Appendices</h1>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<H2><A NAME='sec-bibliography'>A. References</a></h2>
|
|
|
|
|
|
<H3><A NAME='sec-existing-stds'>A.1 Normative References</a></h3>
|
|
|
|
<DL>
|
|
<dt><a name='IANA'>IANA</a></dt><dd>
|
|
(Internet Assigned Numbers Authority) <EM>Official Names for
|
|
Character Sets</EM>,
|
|
ed. Keld Simonsen et al.
|
|
See <A HREF='ftp://ftp.isi.edu/in-notes/iana/assignments/character-sets'>ftp://ftp.isi.edu/in-notes/iana/assignments/character-sets</A>.
|
|
</DD>
|
|
|
|
<dt><a name='RFC1766'>IETF RFC 1766</a></dt><dd>
|
|
IETF (Internet Engineering Task Force).
|
|
<EM>RFC 1766: Tags for the Identification of Languages</EM>,
|
|
ed. H. Alvestrand.
|
|
1995.
|
|
</DD>
|
|
|
|
<dt><a name='ISO639'>ISO 639</a></dt><dd>
|
|
(International Organization for Standardization).
|
|
<EM>ISO 639:1988 (E).
|
|
Code for the representation of names of languages.</EM>
|
|
[Geneva]: International Organization for
|
|
Standardization, 1988.</DD>
|
|
|
|
<dt><a name='ISO3166'>ISO 3166</a></dt><dd>
|
|
(International Organization for Standardization).
|
|
<EM>ISO 3166-1:1997 (E).
|
|
Codes for the representation of names of countries and their subdivisions
|
|
-- Part 1: Country codes</EM>
|
|
[Geneva]: International Organization for
|
|
Standardization, 1997.</DD>
|
|
|
|
<dt><a name='ISO10646'>ISO/IEC 10646</a></dt><dd>ISO
|
|
(International Organization for Standardization).
|
|
<EM>ISO/IEC 10646-1993 (E). Information technology -- Universal
|
|
Multiple-Octet Coded Character Set (UCS) -- Part 1:
|
|
Architecture and Basic Multilingual Plane.</EM>
|
|
[Geneva]: International Organization for
|
|
Standardization, 1993 (plus amendments AM 1 through AM 7).
|
|
</DD>
|
|
|
|
<dt><a name='Unicode'>Unicode</a></dt><dd>The Unicode Consortium.
|
|
<EM>The Unicode Standard, Version 2.0.</EM>
|
|
Reading, Mass.: Addison-Wesley Developers Press, 1996.</DD>
|
|
|
|
</DL>
|
|
|
|
|
|
|
|
|
|
|
|
<H3><A NAME='null'>A.2 Other References</a></h3>
|
|
|
|
<DL>
|
|
|
|
<dt><a name='Aho'>Aho/Ullman</a></dt><dd>Aho, Alfred V.,
|
|
Ravi Sethi, and Jeffrey D. Ullman.
|
|
<EM>Compilers: Principles, Techniques, and Tools</EM>.
|
|
Reading: Addison-Wesley, 1986, rpt. corr. 1988.</DD>
|
|
|
|
<dt><a name='Berners-Lee'>Berners-Lee et al.</a></dt><dd>
|
|
Berners-Lee, T., R. Fielding, and L. Masinter.
|
|
<EM>Uniform Resource Identifiers (URI): Generic Syntax and
|
|
Semantics</EM>.
|
|
1997.
|
|
(Work in progress; see updates to RFC1738.)</DD>
|
|
|
|
<dt><a name='ABK'>Brüggemann-Klein</a></dt><dd>Brüggemann-Klein, Anne.
|
|
<EM>Regular Expressions into Finite Automata</EM>.
|
|
Extended abstract in I. Simon, Hrsg., LATIN 1992,
|
|
S. 97-98. Springer-Verlag, Berlin 1992.
|
|
Full Version in Theoretical Computer Science 120: 197-213, 1993.
|
|
|
|
</DD>
|
|
|
|
<dt><a name='ABKDW'>Brüggemann-Klein and Wood</a></dt><dd>Brüggemann-Klein, Anne,
|
|
and Derick Wood.
|
|
<EM>Deterministic Regular Languages</EM>.
|
|
Universität Freiburg, Institut für Informatik,
|
|
Bericht 38, Oktober 1991.
|
|
</DD>
|
|
|
|
<dt><a name='Clark'>Clark</a></dt><dd>James Clark.
|
|
Comparison of SGML and XML. See
|
|
<A HREF='http://www.w3.org/TR/NOTE-sgml-xml-971215'>http://www.w3.org/TR/NOTE-sgml-xml-971215</A>.
|
|
</DD>
|
|
<dt><a name='RFC1738'>IETF RFC1738</a></dt><dd>
|
|
IETF (Internet Engineering Task Force).
|
|
<EM>RFC 1738: Uniform Resource Locators (URL)</EM>,
|
|
ed. T. Berners-Lee, L. Masinter, M. McCahill.
|
|
1994.
|
|
</DD>
|
|
|
|
<dt><a name='RFC1808'>IETF RFC1808</a></dt><dd>
|
|
IETF (Internet Engineering Task Force).
|
|
<EM>RFC 1808: Relative Uniform Resource Locators</EM>,
|
|
ed. R. Fielding.
|
|
1995.
|
|
</DD>
|
|
|
|
<dt><a name='RFC2141'>IETF RFC2141</a></dt><dd>
|
|
IETF (Internet Engineering Task Force).
|
|
<EM>RFC 2141: URN Syntax</EM>,
|
|
ed. R. Moats.
|
|
1997.
|
|
</DD>
|
|
|
|
<dt><a name='ISO8879'>ISO 8879</a></dt><dd>ISO
|
|
(International Organization for Standardization).
|
|
<EM>ISO 8879:1986(E). Information processing -- Text and Office
|
|
Systems -- Standard Generalized Markup Language (SGML).</EM> First
|
|
edition -- 1986-10-15. [Geneva]: International Organization for
|
|
Standardization, 1986.
|
|
</DD>
|
|
|
|
|
|
<dt><a name='ISO10744'>ISO/IEC 10744</a></dt><dd>ISO
|
|
(International Organization for Standardization).
|
|
<EM>ISO/IEC 10744-1992 (E). Information technology --
|
|
Hypermedia/Time-based Structuring Language (HyTime).
|
|
</EM>
|
|
[Geneva]: International Organization for
|
|
Standardization, 1992.
|
|
<EM>Extended Facilities Annexe.</EM>
|
|
[Geneva]: International Organization for
|
|
Standardization, 1996.
|
|
</DD>
|
|
|
|
|
|
|
|
</DL>
|
|
|
|
|
|
|
|
|
|
|
|
<H2><A NAME='CharClasses'>B. Character Classes</a></h2>
|
|
<P>Following the characteristics defined in the Unicode standard,
|
|
characters are classed as base characters (among others, these
|
|
contain the alphabetic characters of the Latin alphabet, without
|
|
diacritics), ideographic characters, and combining characters (among
|
|
others, this class contains most diacritics); these classes combine
|
|
to form the class of letters. Digits and extenders are
|
|
also distinguished.
|
|
</p>
|
|
<table cellpadding='5' border='1' bgcolor='#f5dcb3' width='100%'><tr align='left'><td><strong>
|
|
Characters</strong></td></tr><tr><td>
|
|
<table border='0' bgcolor='#f5dcb3'>
|
|
|
|
<tr valign='top'><td align='right'><a name='NT-Letter'></a>[84] </td><td align='right'><font><code>Letter</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code><a href='#NT-BaseChar'>BaseChar</A>
|
|
| <a href='#NT-Ideographic'>Ideographic</A></code></font></td> </tr>
|
|
<tr valign='top'><td align='right'><a name='NT-BaseChar'></a>[85] </td><td align='right'><font><code>BaseChar</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code>[#x0041-#x005A]
|
|
| [#x0061-#x007A]
|
|
| [#x00C0-#x00D6]
|
|
| [#x00D8-#x00F6]
|
|
| [#x00F8-#x00FF]
|
|
| [#x0100-#x0131]
|
|
| [#x0134-#x013E]
|
|
| [#x0141-#x0148]
|
|
| [#x014A-#x017E]
|
|
| [#x0180-#x01C3]
|
|
| [#x01CD-#x01F0]
|
|
| [#x01F4-#x01F5]
|
|
| [#x01FA-#x0217]
|
|
| [#x0250-#x02A8]
|
|
| [#x02BB-#x02C1]
|
|
| #x0386
|
|
| [#x0388-#x038A]
|
|
| #x038C
|
|
| [#x038E-#x03A1]
|
|
| [#x03A3-#x03CE]
|
|
| [#x03D0-#x03D6]
|
|
| #x03DA
|
|
| #x03DC
|
|
| #x03DE
|
|
| #x03E0
|
|
| [#x03E2-#x03F3]
|
|
| [#x0401-#x040C]
|
|
| [#x040E-#x044F]
|
|
| [#x0451-#x045C]
|
|
| [#x045E-#x0481]
|
|
| [#x0490-#x04C4]
|
|
| [#x04C7-#x04C8]
|
|
| [#x04CB-#x04CC]
|
|
| [#x04D0-#x04EB]
|
|
| [#x04EE-#x04F5]
|
|
| [#x04F8-#x04F9]
|
|
| [#x0531-#x0556]
|
|
| #x0559
|
|
| [#x0561-#x0586]
|
|
| [#x05D0-#x05EA]
|
|
| [#x05F0-#x05F2]
|
|
| [#x0621-#x063A]
|
|
| [#x0641-#x064A]
|
|
| [#x0671-#x06B7]
|
|
| [#x06BA-#x06BE]
|
|
| [#x06C0-#x06CE]
|
|
| [#x06D0-#x06D3]
|
|
| #x06D5
|
|
| [#x06E5-#x06E6]
|
|
| [#x0905-#x0939]
|
|
| #x093D
|
|
| [#x0958-#x0961]
|
|
| [#x0985-#x098C]
|
|
| [#x098F-#x0990]
|
|
| [#x0993-#x09A8]
|
|
| [#x09AA-#x09B0]
|
|
| #x09B2
|
|
| [#x09B6-#x09B9]
|
|
| [#x09DC-#x09DD]
|
|
| [#x09DF-#x09E1]
|
|
| [#x09F0-#x09F1]
|
|
| [#x0A05-#x0A0A]
|
|
| [#x0A0F-#x0A10]
|
|
| [#x0A13-#x0A28]
|
|
| [#x0A2A-#x0A30]
|
|
| [#x0A32-#x0A33]
|
|
| [#x0A35-#x0A36]
|
|
| [#x0A38-#x0A39]
|
|
| [#x0A59-#x0A5C]
|
|
| #x0A5E
|
|
| [#x0A72-#x0A74]
|
|
| [#x0A85-#x0A8B]
|
|
| #x0A8D
|
|
| [#x0A8F-#x0A91]
|
|
| [#x0A93-#x0AA8]
|
|
| [#x0AAA-#x0AB0]
|
|
| [#x0AB2-#x0AB3]
|
|
| [#x0AB5-#x0AB9]
|
|
| #x0ABD
|
|
| #x0AE0
|
|
| [#x0B05-#x0B0C]
|
|
| [#x0B0F-#x0B10]
|
|
| [#x0B13-#x0B28]
|
|
| [#x0B2A-#x0B30]
|
|
| [#x0B32-#x0B33]
|
|
| [#x0B36-#x0B39]
|
|
| #x0B3D
|
|
| [#x0B5C-#x0B5D]
|
|
| [#x0B5F-#x0B61]
|
|
| [#x0B85-#x0B8A]
|
|
| [#x0B8E-#x0B90]
|
|
| [#x0B92-#x0B95]
|
|
| [#x0B99-#x0B9A]
|
|
| #x0B9C
|
|
| [#x0B9E-#x0B9F]
|
|
| [#x0BA3-#x0BA4]
|
|
| [#x0BA8-#x0BAA]
|
|
| [#x0BAE-#x0BB5]
|
|
| [#x0BB7-#x0BB9]
|
|
| [#x0C05-#x0C0C]
|
|
| [#x0C0E-#x0C10]
|
|
| [#x0C12-#x0C28]
|
|
| [#x0C2A-#x0C33]
|
|
| [#x0C35-#x0C39]
|
|
| [#x0C60-#x0C61]
|
|
| [#x0C85-#x0C8C]
|
|
| [#x0C8E-#x0C90]
|
|
| [#x0C92-#x0CA8]
|
|
| [#x0CAA-#x0CB3]
|
|
| [#x0CB5-#x0CB9]
|
|
| #x0CDE
|
|
| [#x0CE0-#x0CE1]
|
|
| [#x0D05-#x0D0C]
|
|
| [#x0D0E-#x0D10]
|
|
| [#x0D12-#x0D28]
|
|
| [#x0D2A-#x0D39]
|
|
| [#x0D60-#x0D61]
|
|
| [#x0E01-#x0E2E]
|
|
| #x0E30
|
|
| [#x0E32-#x0E33]
|
|
| [#x0E40-#x0E45]
|
|
| [#x0E81-#x0E82]
|
|
| #x0E84
|
|
| [#x0E87-#x0E88]
|
|
| #x0E8A
|
|
| #x0E8D
|
|
| [#x0E94-#x0E97]
|
|
| [#x0E99-#x0E9F]
|
|
| [#x0EA1-#x0EA3]
|
|
| #x0EA5
|
|
| #x0EA7
|
|
| [#x0EAA-#x0EAB]
|
|
| [#x0EAD-#x0EAE]
|
|
| #x0EB0
|
|
| [#x0EB2-#x0EB3]
|
|
| #x0EBD
|
|
| [#x0EC0-#x0EC4]
|
|
| [#x0F40-#x0F47]
|
|
| [#x0F49-#x0F69]
|
|
| [#x10A0-#x10C5]
|
|
| [#x10D0-#x10F6]
|
|
| #x1100
|
|
| [#x1102-#x1103]
|
|
| [#x1105-#x1107]
|
|
| #x1109
|
|
| [#x110B-#x110C]
|
|
| [#x110E-#x1112]
|
|
| #x113C
|
|
| #x113E
|
|
| #x1140
|
|
| #x114C
|
|
| #x114E
|
|
| #x1150
|
|
| [#x1154-#x1155]
|
|
| #x1159
|
|
| [#x115F-#x1161]
|
|
| #x1163
|
|
| #x1165
|
|
| #x1167
|
|
| #x1169
|
|
| [#x116D-#x116E]
|
|
| [#x1172-#x1173]
|
|
| #x1175
|
|
| #x119E
|
|
| #x11A8
|
|
| #x11AB
|
|
| [#x11AE-#x11AF]
|
|
| [#x11B7-#x11B8]
|
|
| #x11BA
|
|
| [#x11BC-#x11C2]
|
|
| #x11EB
|
|
| #x11F0
|
|
| #x11F9
|
|
| [#x1E00-#x1E9B]
|
|
| [#x1EA0-#x1EF9]
|
|
| [#x1F00-#x1F15]
|
|
| [#x1F18-#x1F1D]
|
|
| [#x1F20-#x1F45]
|
|
| [#x1F48-#x1F4D]
|
|
| [#x1F50-#x1F57]
|
|
| #x1F59
|
|
| #x1F5B
|
|
| #x1F5D
|
|
| [#x1F5F-#x1F7D]
|
|
| [#x1F80-#x1FB4]
|
|
| [#x1FB6-#x1FBC]
|
|
| #x1FBE
|
|
| [#x1FC2-#x1FC4]
|
|
| [#x1FC6-#x1FCC]
|
|
| [#x1FD0-#x1FD3]
|
|
| [#x1FD6-#x1FDB]
|
|
| [#x1FE0-#x1FEC]
|
|
| [#x1FF2-#x1FF4]
|
|
| [#x1FF6-#x1FFC]
|
|
| #x2126
|
|
| [#x212A-#x212B]
|
|
| #x212E
|
|
| [#x2180-#x2182]
|
|
| [#x3041-#x3094]
|
|
| [#x30A1-#x30FA]
|
|
| [#x3105-#x312C]
|
|
| [#xAC00-#xD7A3]
|
|
</code></font></td></tr>
|
|
<tr valign='top'><td align='right'><a name='NT-Ideographic'></a>[86] </td><td align='right'><font><code>Ideographic</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code>[#x4E00-#x9FA5]
|
|
| #x3007
|
|
| [#x3021-#x3029]
|
|
</code></font></td></tr>
|
|
<tr valign='top'><td align='right'><a name='NT-CombiningChar'></a>[87] </td><td align='right'><font><code>CombiningChar</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code>[#x0300-#x0345]
|
|
| [#x0360-#x0361]
|
|
| [#x0483-#x0486]
|
|
| [#x0591-#x05A1]
|
|
| [#x05A3-#x05B9]
|
|
| [#x05BB-#x05BD]
|
|
| #x05BF
|
|
| [#x05C1-#x05C2]
|
|
| #x05C4
|
|
| [#x064B-#x0652]
|
|
| #x0670
|
|
| [#x06D6-#x06DC]
|
|
| [#x06DD-#x06DF]
|
|
| [#x06E0-#x06E4]
|
|
| [#x06E7-#x06E8]
|
|
| [#x06EA-#x06ED]
|
|
| [#x0901-#x0903]
|
|
| #x093C
|
|
| [#x093E-#x094C]
|
|
| #x094D
|
|
| [#x0951-#x0954]
|
|
| [#x0962-#x0963]
|
|
| [#x0981-#x0983]
|
|
| #x09BC
|
|
| #x09BE
|
|
| #x09BF
|
|
| [#x09C0-#x09C4]
|
|
| [#x09C7-#x09C8]
|
|
| [#x09CB-#x09CD]
|
|
| #x09D7
|
|
| [#x09E2-#x09E3]
|
|
| #x0A02
|
|
| #x0A3C
|
|
| #x0A3E
|
|
| #x0A3F
|
|
| [#x0A40-#x0A42]
|
|
| [#x0A47-#x0A48]
|
|
| [#x0A4B-#x0A4D]
|
|
| [#x0A70-#x0A71]
|
|
| [#x0A81-#x0A83]
|
|
| #x0ABC
|
|
| [#x0ABE-#x0AC5]
|
|
| [#x0AC7-#x0AC9]
|
|
| [#x0ACB-#x0ACD]
|
|
| [#x0B01-#x0B03]
|
|
| #x0B3C
|
|
| [#x0B3E-#x0B43]
|
|
| [#x0B47-#x0B48]
|
|
| [#x0B4B-#x0B4D]
|
|
| [#x0B56-#x0B57]
|
|
| [#x0B82-#x0B83]
|
|
| [#x0BBE-#x0BC2]
|
|
| [#x0BC6-#x0BC8]
|
|
| [#x0BCA-#x0BCD]
|
|
| #x0BD7
|
|
| [#x0C01-#x0C03]
|
|
| [#x0C3E-#x0C44]
|
|
| [#x0C46-#x0C48]
|
|
| [#x0C4A-#x0C4D]
|
|
| [#x0C55-#x0C56]
|
|
| [#x0C82-#x0C83]
|
|
| [#x0CBE-#x0CC4]
|
|
| [#x0CC6-#x0CC8]
|
|
| [#x0CCA-#x0CCD]
|
|
| [#x0CD5-#x0CD6]
|
|
| [#x0D02-#x0D03]
|
|
| [#x0D3E-#x0D43]
|
|
| [#x0D46-#x0D48]
|
|
| [#x0D4A-#x0D4D]
|
|
| #x0D57
|
|
| #x0E31
|
|
| [#x0E34-#x0E3A]
|
|
| [#x0E47-#x0E4E]
|
|
| #x0EB1
|
|
| [#x0EB4-#x0EB9]
|
|
| [#x0EBB-#x0EBC]
|
|
| [#x0EC8-#x0ECD]
|
|
| [#x0F18-#x0F19]
|
|
| #x0F35
|
|
| #x0F37
|
|
| #x0F39
|
|
| #x0F3E
|
|
| #x0F3F
|
|
| [#x0F71-#x0F84]
|
|
| [#x0F86-#x0F8B]
|
|
| [#x0F90-#x0F95]
|
|
| #x0F97
|
|
| [#x0F99-#x0FAD]
|
|
| [#x0FB1-#x0FB7]
|
|
| #x0FB9
|
|
| [#x20D0-#x20DC]
|
|
| #x20E1
|
|
| [#x302A-#x302F]
|
|
| #x3099
|
|
| #x309A
|
|
</code></font></td></tr>
|
|
<tr valign='top'><td align='right'><a name='NT-Digit'></a>[88] </td><td align='right'><font><code>Digit</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code>[#x0030-#x0039]
|
|
| [#x0660-#x0669]
|
|
| [#x06F0-#x06F9]
|
|
| [#x0966-#x096F]
|
|
| [#x09E6-#x09EF]
|
|
| [#x0A66-#x0A6F]
|
|
| [#x0AE6-#x0AEF]
|
|
| [#x0B66-#x0B6F]
|
|
| [#x0BE7-#x0BEF]
|
|
| [#x0C66-#x0C6F]
|
|
| [#x0CE6-#x0CEF]
|
|
| [#x0D66-#x0D6F]
|
|
| [#x0E50-#x0E59]
|
|
| [#x0ED0-#x0ED9]
|
|
| [#x0F20-#x0F29]
|
|
</code></font></td></tr>
|
|
<tr valign='top'><td align='right'><a name='NT-Extender'></a>[89] </td><td align='right'><font><code>Extender</code></font></td>
|
|
<td align='center'><font><code> ::= </code></font></td><td align='left'><font><code>#x00B7
|
|
| #x02D0
|
|
| #x02D1
|
|
| #x0387
|
|
| #x0640
|
|
| #x0E46
|
|
| #x0EC6
|
|
| #x3005
|
|
| [#x3031-#x3035]
|
|
| [#x309D-#x309E]
|
|
| [#x30FC-#x30FE]
|
|
</code></font></td></tr>
|
|
|
|
|
|
</table>
|
|
</td></tr></table>
|
|
<p>
|
|
</P>
|
|
<P>The character classes defined here can be derived from the
|
|
Unicode character database as follows:
|
|
<UL>
|
|
<LI>
|
|
Name start characters must have one of the categories Ll, Lu,
|
|
Lo, Lt, Nl.
|
|
</LI>
|
|
<LI>
|
|
Name characters other than Name-start characters
|
|
must have one of the categories Mc, Me, Mn, Lm, or Nd.
|
|
</LI>
|
|
<LI>
|
|
Characters in the compatibility area (i.e. with character code
|
|
greater than #xF900 and less than #xFFFE) are not allowed in XML
|
|
names.
|
|
</LI>
|
|
<LI>
|
|
Characters which have a font or compatibility decomposition (i.e. those
|
|
with a "compatibility formatting tag" in field 5 of the database --
|
|
marked by field 5 beginning with a "<") are not allowed.
|
|
</LI>
|
|
<LI>
|
|
The following characters are treated as name-start characters
|
|
rather than name characters, because the property file classifies
|
|
them as Alphabetic: [#x02BB-#x02C1], #x0559, #x06E5, #x06E6.
|
|
</LI>
|
|
<LI>
|
|
Characters #x20DD-#x20E0 are excluded (in accordance with
|
|
Unicode, section 5.14).
|
|
</LI>
|
|
<LI>
|
|
Character #x00B7 is classified as an extender, because the
|
|
property list so identifies it.
|
|
</LI>
|
|
<LI>
|
|
Character #x0387 is added as a name character, because #x00B7
|
|
is its canonical equivalent.
|
|
</LI>
|
|
<LI>
|
|
Characters ':' and '_' are allowed as name-start characters.
|
|
</LI>
|
|
<LI>
|
|
Characters '-' and '.' are allowed as name characters.
|
|
</LI>
|
|
</UL>
|
|
|
|
|
|
|
|
|
|
|
|
<H2><A NAME='sec-xml-and-sgml'>C. XML and SGML (Non-Normative)</a></h2>
|
|
|
|
<P>XML is designed to be a subset of SGML, in that every
|
|
<A href='#dt-valid'>valid</A> XML document should also be a
|
|
conformant SGML document.
|
|
For a detailed comparison of the additional restrictions that XML places on
|
|
documents beyond those of SGML, see <A href='#Clark'>[Clark]</A>.
|
|
</P>
|
|
|
|
|
|
|
|
<H2><A NAME='sec-entexpand'>D. Expansion of Entity and Character References (Non-Normative)</a></h2>
|
|
<P>This appendix contains some examples illustrating the
|
|
sequence of entity- and character-reference recognition and
|
|
expansion, as specified in "<A href='#entproc'>4.4 XML Processor Treatment of Entities and References</A>".</P>
|
|
<P>
|
|
If the DTD contains the declaration
|
|
</p><table cellpadding='5' border='1' bgcolor='#80ffff' width='100%'><tr><td><code><font><!ENTITY example "<p>An ampersand (&#38;#38;) may be escaped<BR>numerically (&#38;#38;#38;) or with a general entity<BR>(&amp;amp;).</p>" ><BR></font></code></td></tr></table><p>
|
|
then the XML processor will recognize the character references
|
|
when it parses the entity declaration, and resolve them before
|
|
storing the following string as the
|
|
value of the entity "<CODE>example</CODE>":
|
|
</p><table cellpadding='5' border='1' bgcolor='#80ffff' width='100%'><tr><td><code><font><p>An ampersand (&#38;) may be escaped<BR>numerically (&#38;#38;) or with a general entity<BR>(&amp;amp;).</p><BR></font></code></td></tr></table><p>
|
|
A reference in the document to "<CODE>&example;</CODE>"
|
|
will cause the text to be reparsed, at which time the
|
|
start- and end-tags of the "<CODE>p</CODE>" element will be recognized
|
|
and the three references will be recognized and expanded,
|
|
resulting in a "<CODE>p</CODE>" element with the following content
|
|
(all data, no delimiters or markup):
|
|
</p><table cellpadding='5' border='1' bgcolor='#80ffff' width='100%'><tr><td><code><font>An ampersand (&) may be escaped<BR>numerically (&#38;) or with a general entity<BR>(&amp;).<BR></font></code></td></tr></table><p>
|
|
</P>
|
|
<P>A more complex example will illustrate the rules and their
|
|
effects fully. In the following example, the line numbers are
|
|
solely for reference.
|
|
</p><table cellpadding='5' border='1' bgcolor='#80ffff' width='100%'><tr><td><code><font>1 <?xml version='1.0'?><BR>2 <!DOCTYPE test [<BR>3 <!ELEMENT test (#PCDATA) ><BR>4 <!ENTITY % xx '&#37;zz;'><BR>5 <!ENTITY % zz '&#60;!ENTITY tricky "error-prone" >' ><BR>6 %xx;<BR>7 ]><BR>8 <test>This sample shows a &tricky; method.</test><BR></font></code></td></tr></table><p>
|
|
This produces the following:
|
|
<UL>
|
|
<LI>in line 4, the reference to character 37 is expanded immediately,
|
|
and the parameter entity "<CODE>xx</CODE>" is stored in the symbol
|
|
table with the value "<CODE>%zz;</CODE>". Since the replacement text
|
|
is not rescanned, the reference to parameter entity "<CODE>zz</CODE>"
|
|
is not recognized. (And it would be an error if it were, since
|
|
"<CODE>zz</CODE>" is not yet declared.)</LI>
|
|
<LI>in line 5, the character reference "<CODE>&#60;</CODE>" is
|
|
expanded immediately and the parameter entity "<CODE>zz</CODE>" is
|
|
stored with the replacement text
|
|
"<CODE><!ENTITY tricky "error-prone" ></CODE>",
|
|
which is a well-formed entity declaration.</LI>
|
|
<LI>in line 6, the reference to "<CODE>xx</CODE>" is recognized,
|
|
and the replacement text of "<CODE>xx</CODE>" (namely
|
|
"<CODE>%zz;</CODE>") is parsed. The reference to "<CODE>zz</CODE>"
|
|
is recognized in its turn, and its replacement text
|
|
("<CODE><!ENTITY tricky "error-prone" ></CODE>") is parsed.
|
|
The general entity "<CODE>tricky</CODE>" has now been
|
|
declared, with the replacement text "<CODE>error-prone</CODE>".</LI>
|
|
<LI>
|
|
in line 8, the reference to the general entity "<CODE>tricky</CODE>" is
|
|
recognized, and it is expanded, so the full content of the
|
|
"<CODE>test</CODE>" element is the self-describing (and ungrammatical) string
|
|
<EM>This sample shows a error-prone method.</EM>
|
|
</LI>
|
|
</UL>
|
|
|
|
|
|
|
|
|
|
|
|
<H2><A NAME='determinism'>E. Deterministic Content Models (Non-Normative)</a></h2>
|
|
<P><A href='#dt-compat'>For compatibility</A>, it is
|
|
required
|
|
that content models in element type declarations be deterministic.
|
|
</P>
|
|
|
|
<P>SGML
|
|
requires deterministic content models (it calls them
|
|
"unambiguous"); XML processors built using SGML systems may
|
|
flag non-deterministic content models as errors.</P>
|
|
<P>For example, the content model <CODE>((b, c) | (b, d))</CODE> is
|
|
non-deterministic, because given an initial <CODE>b</CODE> the parser
|
|
cannot know which <CODE>b</CODE> in the model is being matched without
|
|
looking ahead to see which element follows the <CODE>b</CODE>.
|
|
In this case, the two references to
|
|
<CODE>b</CODE> can be collapsed
|
|
into a single reference, making the model read
|
|
<CODE>(b, (c | d))</CODE>. An initial <CODE>b</CODE> now clearly
|
|
matches only a single name in the content model. The parser doesn't
|
|
need to look ahead to see what follows; either <CODE>c</CODE> or
|
|
<CODE>d</CODE> would be accepted.</P>
|
|
<P>More formally: a finite state automaton may be constructed from the
|
|
content model using the standard algorithms, e.g. algorithm 3.5
|
|
in section 3.9
|
|
of Aho, Sethi, and Ullman <A href='#Aho'>[Aho/Ullman]</A>.
|
|
In many such algorithms, a follow set is constructed for each
|
|
position in the regular expression (i.e., each leaf
|
|
node in the
|
|
syntax tree for the regular expression);
|
|
if any position has a follow set in which
|
|
more than one following position is
|
|
labeled with the same element type name,
|
|
then the content model is in error
|
|
and may be reported as an error.
|
|
</P>
|
|
<P>Algorithms exist which allow many but not all non-deterministic
|
|
content models to be reduced automatically to equivalent deterministic
|
|
models; see Brüggemann-Klein 1991 <A href='#ABK'>[Brüggemann-Klein]</A>.</P>
|
|
|
|
|
|
|
|
<H2><A NAME='sec-guessing'>F. Autodetection of Character Encodings (Non-Normative)</a></h2>
|
|
<P>The XML encoding declaration functions as an internal label on each
|
|
entity, indicating which character encoding is in use. Before an XML
|
|
processor can read the internal label, however, it apparently has to
|
|
know what character encoding is in use--which is what the internal label
|
|
is trying to indicate. In the general case, this is a hopeless
|
|
situation. It is not entirely hopeless in XML, however, because XML
|
|
limits the general case in two ways: each implementation is assumed
|
|
to support only a finite set of character encodings, and the XML
|
|
encoding declaration is restricted in position and content in order to
|
|
make it feasible to autodetect the character encoding in use in each
|
|
entity in normal cases. Also, in many cases other sources of information
|
|
are available in addition to the XML data stream itself.
|
|
Two cases may be distinguished,
|
|
depending on whether the XML entity is presented to the
|
|
processor without, or with, any accompanying
|
|
(external) information. We consider the first case first.
|
|
</P>
|
|
<P>
|
|
Because each XML entity not in UTF-8 or UTF-16 format <EM>must</EM>
|
|
begin with an XML encoding declaration, in which the first characters
|
|
must be '<CODE><?xml</CODE>', any conforming processor can detect,
|
|
after two to four octets of input, which of the following cases apply.
|
|
In reading this list, it may help to know that in UCS-4, '<' is
|
|
"<CODE>#x0000003C</CODE>" and '?' is "<CODE>#x0000003F</CODE>", and the Byte
|
|
Order Mark required of UTF-16 data streams is "<CODE>#xFEFF</CODE>".</P>
|
|
<P>
|
|
<UL>
|
|
<LI>
|
|
<CODE>00 00 00 3C</CODE>: UCS-4, big-endian machine (1234 order)
|
|
</LI>
|
|
<LI>
|
|
<CODE>3C 00 00 00</CODE>: UCS-4, little-endian machine (4321 order)
|
|
</LI>
|
|
<LI>
|
|
<CODE>00 00 3C 00</CODE>: UCS-4, unusual octet order (2143)
|
|
</LI>
|
|
<LI>
|
|
<CODE>00 3C 00 00</CODE>: UCS-4, unusual octet order (3412)
|
|
</LI>
|
|
<LI>
|
|
<CODE>FE FF</CODE>: UTF-16, big-endian
|
|
</LI>
|
|
<LI>
|
|
<CODE>FF FE</CODE>: UTF-16, little-endian
|
|
</LI>
|
|
<LI>
|
|
<CODE>00 3C 00 3F</CODE>: UTF-16, big-endian, no Byte Order Mark
|
|
(and thus, strictly speaking, in error)
|
|
</LI>
|
|
<LI>
|
|
<CODE>3C 00 3F 00</CODE>: UTF-16, little-endian, no Byte Order Mark
|
|
(and thus, strictly speaking, in error)
|
|
</LI>
|
|
<LI>
|
|
<CODE>3C 3F 78 6D</CODE>: UTF-8, ISO 646, ASCII, some part of ISO 8859,
|
|
Shift-JIS, EUC, or any other 7-bit, 8-bit, or mixed-width encoding
|
|
which ensures that the characters of ASCII have their normal positions,
|
|
width,
|
|
and values; the actual encoding declaration must be read to
|
|
detect which of these applies, but since all of these encodings
|
|
use the same bit patterns for the ASCII characters, the encoding
|
|
declaration itself may be read reliably
|
|
|
|
</LI>
|
|
<LI>
|
|
<CODE>4C 6F A7 94</CODE>: EBCDIC (in some flavor; the full
|
|
encoding declaration must be read to tell which code page is in
|
|
use)
|
|
</LI>
|
|
<LI>
|
|
other: UTF-8 without an encoding declaration, or else
|
|
the data stream is corrupt, fragmentary, or enclosed in
|
|
a wrapper of some kind
|
|
</LI>
|
|
</UL>
|
|
|
|
|
|
<P>
|
|
This level of autodetection is enough to read the XML encoding
|
|
declaration and parse the character-encoding identifier, which is
|
|
still necessary to distinguish the individual members of each family
|
|
of encodings (e.g. to tell UTF-8 from 8859, and the parts of 8859
|
|
from each other, or to distinguish the specific EBCDIC code page in
|
|
use, and so on).
|
|
</P>
|
|
<P>
|
|
Because the contents of the encoding declaration are restricted to
|
|
ASCII characters, a processor can reliably read the entire encoding
|
|
declaration as soon as it has detected which family of encodings is in
|
|
use. Since in practice, all widely used character encodings fall into
|
|
one of the categories above, the XML encoding declaration allows
|
|
reasonably reliable in-band labeling of character encodings, even when
|
|
external sources of information at the operating-system or
|
|
transport-protocol level are unreliable.
|
|
</P>
|
|
<P>
|
|
Once the processor has detected the character encoding in use, it can
|
|
act appropriately, whether by invoking a separate input routine for
|
|
each case, or by calling the proper conversion function on each
|
|
character of input.
|
|
</P>
|
|
<P>
|
|
Like any self-labeling system, the XML encoding declaration will not
|
|
work if any software changes the entity's character set or encoding
|
|
without updating the encoding declaration. Implementors of
|
|
character-encoding routines should be careful to ensure the accuracy
|
|
of the internal and external information used to label the entity.
|
|
</P>
|
|
<P>The second possible case occurs when the XML entity is accompanied
|
|
by encoding information, as in some file systems and some network
|
|
protocols.
|
|
When multiple sources of information are available,
|
|
|
|
their relative
|
|
priority and the preferred method of handling conflict should be
|
|
specified as part of the higher-level protocol used to deliver XML.
|
|
Rules for the relative priority of the internal label and the
|
|
MIME-type label in an external header, for example, should be part of the
|
|
RFC document defining the text/xml and application/xml MIME types. In
|
|
the interests of interoperability, however, the following rules
|
|
are recommended.
|
|
<UL>
|
|
<LI>If an XML entity is in a file, the Byte-Order Mark
|
|
and encoding-declaration PI are used (if present) to determine the
|
|
character encoding. All other heuristics and sources of information
|
|
are solely for error recovery.
|
|
</LI>
|
|
<LI>If an XML entity is delivered with a
|
|
MIME type of text/xml, then the <CODE>charset</CODE> parameter
|
|
on the MIME type determines the
|
|
character encoding method; all other heuristics and sources of
|
|
information are solely for error recovery.
|
|
</LI>
|
|
<LI>If an XML entity is delivered
|
|
with a
|
|
MIME type of application/xml, then the Byte-Order Mark and
|
|
encoding-declaration PI are used (if present) to determine the
|
|
character encoding. All other heuristics and sources of
|
|
information are solely for error recovery.
|
|
</LI>
|
|
</UL>
|
|
|
|
These rules apply only in the absence of protocol-level documentation;
|
|
in particular, when the MIME types text/xml and application/xml are
|
|
defined, the recommendations of the relevant RFC will supersede
|
|
these rules.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<H2><A NAME='sec-xml-wg'>G. W3C XML Working Group (Non-Normative)</a></h2>
|
|
|
|
<P>This specification was prepared and approved for publication by the
|
|
W3C XML Working Group (WG). WG approval of this specification does
|
|
not necessarily imply that all WG members voted for its approval.
|
|
The current and former members of the XML WG are:</P>
|
|
|
|
|
|
Jon Bosak, Sun (Chair);
|
|
James Clark (Technical Lead);
|
|
Tim Bray, Textuality and Netscape (XML Co-editor);
|
|
Jean Paoli, Microsoft (XML Co-editor);
|
|
C. M. Sperberg-McQueen, U. of Ill. (XML
|
|
Co-editor);
|
|
Dan Connolly, W3C (W3C Liaison);
|
|
Paula Angerstein, Texcel;
|
|
Steve DeRose, INSO;
|
|
Dave Hollander, HP;
|
|
Eliot Kimber, ISOGEN;
|
|
Eve Maler, ArborText;
|
|
Tom Magliery, NCSA;
|
|
Murray Maloney, Muzmo and Grif;
|
|
Makoto Murata, Fuji Xerox Information Systems;
|
|
Joel Nava, Adobe;
|
|
Conleth O'Connell, Vignette;
|
|
Peter Sharpe, SoftQuad;
|
|
John Tigue, DataChannel
|
|
|
|
|
|
|
|
|