You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
940 lines
48 KiB
940 lines
48 KiB
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
|
|
<HTML>
|
|
<HEAD>
|
|
<TITLE>XML XPointer Requirements</TITLE>
|
|
<LINK rel="stylesheet" type="text/css" media="screen"
|
|
href="/StyleSheets/TR/W3C-NOTE">
|
|
</HEAD>
|
|
|
|
<BODY>
|
|
|
|
<DIV class="head">
|
|
<P><A href="http://www.w3.org/">
|
|
<IMG height="48" width="72" alt="W3C"
|
|
src="http://www.w3.org/Icons/WWW/w3c_home"></A></P>
|
|
<H1>XML XPointer Requirements<br>Version 1.0</H1>
|
|
<H2>W3C Note 24-Feb-1999</H2>
|
|
<TABLE>
|
|
<TR valign="baseline"><TD>This version:
|
|
<TD><A href="http://www.w3.org/TR/1999/NOTE-xptr-req-19990224">
|
|
http://www.w3.org/TR/1999/NOTE-xptr-req-19990224</A>
|
|
<TR valign="baseline"><TD>Latest version:
|
|
<TD><A href="http://www.w3.org/TR/NOTE-xptr-req">
|
|
http://www.w3.org/TR/NOTE-xptr-req</A>
|
|
<TR valign="baseline"><TD>Editors:
|
|
<TD>Steven J. DeRose (Inso Corp. & Brown Univ.) <<a
|
|
href="mailto:Steven_DeRose@Brown.edu">Steven_DeRose@Brown.edu</a>>.
|
|
</TABLE>
|
|
|
|
<p><small>
|
|
<A HREF="http://www.w3.org/Consortium/Legal/ipr-notice.html#Copyright">Copyright</A>
|
|
©1999 <A HREF="http://www.w3.org">W3C</A> (<A HREF="http://www.lcs.mit.edu">MIT</A>,
|
|
<A HREF="http://www.inria.fr/">INRIA</A>, <A HREF="http://www.keio.ac.jp/">Keio</A>)
|
|
, All Rights Reserved. W3C <A HREF="http://www.w3.org/Consortium/Legal/ipr-notice.html#Lega
|
|
lDisclaimer">liability,</A>
|
|
<A HREF="http://www.w3.org/Consortium/Legal/ipr-notice.html#W3CTrademarks">trademark</A>,
|
|
<A HREF="http://www.w3.org/Consortium/Legal/copyright-documents.html">document
|
|
use</A> and <A HREF="http://www.w3.org/Consortium/Legal/copyright-software.html">software
|
|
licensing</A> rules apply.</small></p>
|
|
</DIV>
|
|
|
|
<HR>
|
|
|
|
<H2><A name="status">Status of this document</A></H2>
|
|
<p>This is a W3C Note produced as
|
|
a deliverable of the <a href="http://www.w3.org/XML/Activity#linking-wg">XML
|
|
Linking WG</a> according to its charter. A list of current W3C
|
|
working drafts and notes can be found at <a href="http://www.w3.org/TR">http://www.w3.org/TR
|
|
</a>.</p>
|
|
<p>This document is a work in progress representing the current consensus
|
|
of the W3C XML Linking Working Group. This version of the XML XPointer
|
|
Requirements document has been approved by the XML Linking working group
|
|
and the XML Plenary to be posted for review by W3C members and other interested
|
|
parties. Publication as a Note does not imply endorsement by the W3C membership.
|
|
Comments should be sent to <a href="mailto:www-xml-linking-comments@w3.org">
|
|
www-xml-linking-comments@w3.org</a>, which is an automatically and publicly
|
|
archived email list.</p><p>This document is being processed according to the
|
|
following review schedule:</p><table border="1" frame="border">
|
|
<caption>Review Schedule</caption>
|
|
<tbody>
|
|
<tr><th>Process</th><th>Closing date</th><th>Status</th><th>Contact</th></tr>
|
|
<tr><td>XML Linking WG signoff</td><td>1999/01/21</td><td>done</td><td><a
|
|
href="http://www.w3.org/XML/Activity#linking-wg">XML Linking WG</a></td></tr>
|
|
<tr><td>XML Plenary signoff</td><td>1999/02/03</td><td>done</td><td><a href="mailto:bill.smith@Sun.COM,veillard@w3.org">
|
|
bill.smith@Sun.COM,veillard@w3.org</a></td></tr>
|
|
<tr><td>Publish as W3C Note</td><td>1999/02/23</td><td>accepting comments
|
|
</td><td><a href="mailto:www-xml-linking-comments@w3.org">www-xml-linking-comments@w3.org
|
|
</a></td></tr>
|
|
<tr><td>Checkpoint of comments</td><td>1999/03/23</td><td> </td><td>
|
|
</td></tr>
|
|
</tbody></table><p>Comments about this document should be submitted to the
|
|
"contact" listed above for each process.</p>
|
|
|
|
<p>Many thanks to Tim Bray, James Clark, Mavis Cournane, David
|
|
Durand, Peter Flynn, Paul Grosso, Chris Maden, Eve Maler, C. M.
|
|
Sperberg-McQueen, and members of the WG and IG in general for
|
|
numerous valuable suggestions and other improvements.</p>
|
|
|
|
<h2><a name="Abstract">Abstract</a></h2>
|
|
|
|
<p>This document presents requirements for the XPointer language.
|
|
XPointer provides ways to directly identify any node, data, or
|
|
selection in any XML document by describing its structure and
|
|
context. An identified data location is called a
|
|
"target." The XPointer specification is particularly
|
|
meant to enable hyperlinks to identify any such data, regardless
|
|
of whether there is (or even could be) an ID on the target or
|
|
not. The XPointer specification is now being developed in the
|
|
XML-Linking Working Group, building on Working Drafts developed
|
|
in the XML Working Group.</p>
|
|
|
|
<p>Because the XPointer language must refer to structural parts
|
|
of XML documents, those structures must be explicit. Document
|
|
structure specifications such as DOM and the XML Information Set
|
|
may wish to consider the XPointer requirements in order to insure
|
|
interoperability when used with XPointer and XLink.</p>
|
|
|
|
<h2>Related documents</h2>
|
|
|
|
<p><a href="http://www.w3.org/XML/Group/Linking">XML Linking
|
|
Working Group Page</a> [member only], for general information about the
|
|
activities of the WG.</p>
|
|
|
|
<p><a href="http://www.w3.org/TR/1998/WD-xptr-19980303">XML
|
|
Pointer Language (XPointer) Working Draft,</a> prior WDs produced
|
|
by the former XML Working Group, and now under the XML Linking
|
|
WG. Provides a simple yet powerful mechanism for addressing data
|
|
portions in XML documents. It is very closely based on a
|
|
multiply-implemented and widely-used technology, <a
|
|
href="http://etext.virginia.edu/bin/tei-tocs?div=DIV3&id=SAXRS">extended
|
|
pointers,</a> defined in the <a
|
|
href="http://etext.virginia.edu/TEI.html">Text Encoding
|
|
Initiative <i>Guidelines</i></a>.</p>
|
|
|
|
<p><a
|
|
href="http://www.w3.org/TR/NOTE-xptr-infoset-liaison">XP
|
|
ointer-Information Set Liaison Statement,</a> produced by the
|
|
XML Linking Working Group. This document enumerates perceived
|
|
constraints that work on the XPointer specification has indicated
|
|
may affect the XML Information Set Working Group, since it is
|
|
those information structures that XPointer provides access to.</p>
|
|
|
|
<p><a href="http://www.w3.org/TR/NOTE-xlink-req">XLink
|
|
Requirements,</a> produced by the XML Linking Working Group. This
|
|
document provides requirements governing the work of this WG on
|
|
the XLink specification.</p>
|
|
|
|
<p><a href="http://www.w3.org/TR/1998/WD-xlink-19980303">XML
|
|
Linking Language (XLink) Working Draft,</a> prior WDs produced by
|
|
the former XML Working Group, and now under the XML Linking WG.</p>
|
|
|
|
<p><a
|
|
href="http://www.w3.org/TR/1998/NOTE-xlink-principles-19980303">XML
|
|
Linking Language (XLink) Design Principles,</a> produced by the
|
|
former XML Working Group, and now under the XML Linking WG. This
|
|
document provides general design principles governing the work of
|
|
this WG, involving both the XLink and XPointer specifications.</p>
|
|
|
|
<h2>Table of Contents</h2>
|
|
|
|
<ul>
|
|
<li><a href="#MinReq">Specific (minimalist) XPointer
|
|
requirements </a><ul>
|
|
<li><a href="#X.completeness">A: Completeness
|
|
requirements</a></li>
|
|
<li><a href="#X.expressiveness">B: Expressiveness
|
|
requirements</a></li>
|
|
<li><a href="#X.robustness">C: Robustness
|
|
requirements</a></li>
|
|
<li><a href="#X.user">D: General user requirements</a></li>
|
|
<li><a href="#X.syntax">E: Mechanical and syntactic
|
|
requirements</a></li>
|
|
<li><a href="#X.the.princeton.band">F:
|
|
Non-requirements</a></li>
|
|
</ul>
|
|
</li>
|
|
<li><a href="#Introduction">Background and rationale</a>
|
|
<ul>
|
|
<li><a href="#Kinds">Kinds of pointing</a></li>
|
|
<li><a href="#Lure">The lure of minimalist pointing</a></li>
|
|
<li><a href="#Nature">The nature of the distinction</a></li>
|
|
<li><a href="#Robustness">Robustness issues</a></li>
|
|
<li><a href="#Kinds-Summary">Summary</a></li>
|
|
</ul>
|
|
</li>
|
|
<li><a href="#Bibliography">Bibliography</a> </li>
|
|
</ul>
|
|
|
|
<h1><a name="MinReq">Specific (minimal) XPointer requirements</a></h1>
|
|
|
|
<p>This section lays out specific minimal functional requirements
|
|
for the XPointer specification that aim for an appropriate
|
|
balance of completeness, expressiveness, extensibility, and
|
|
simplicity. These requirements also apply to any higher-end
|
|
location-specification language to the extent that it shares some
|
|
of its functionality objectives with XPointer. Following the
|
|
specific requirements is a section on background and rationale
|
|
underlying them.</p>
|
|
|
|
<h2><a name="X.completeness">A: Completeness requirements</a></h2>
|
|
|
|
<p>This section of the requirements involves the type and variety
|
|
of data locations, or "targets", that an XPointer must
|
|
be able to identify.</p>
|
|
|
|
<p>These requirements make frequent reference to XML information
|
|
objects such as elements, attributes, PIs, and characters. The
|
|
formal definition of these objects, their relationships such as
|
|
ordering, containment, and attribution, and their precise
|
|
correspondence to XML syntax constructs are the domain of the XML
|
|
Information Set Working Group. For more detail on the
|
|
relationship, see the XML Linking Working Group's
|
|
<a href="http://www.w3.org/TR/NOTE-xptr-infoset-liaison">Liaison Statement</a>.</p>
|
|
|
|
<ol>
|
|
<li><p>XPointers must identify XML information objects, rather
|
|
than necessarily their expressions in the raw XML syntax
|
|
of a given file.</p>
|
|
<p>For example, an XPointer can identify
|
|
an element but not a tag, and (potentially) an attribute;
|
|
but not the equal-sign or quotations that expressed it.</p>
|
|
</li>
|
|
<li><p>For any single element, character in content, or PI in an
|
|
XML document, it must be possible to create at least one
|
|
XPointer that specifically identifies it.</p>
|
|
<p>This includes
|
|
special processing instructions such as the XML
|
|
declaration and the stylesheet-attachment PI.</p>
|
|
</li>
|
|
<li><p>For any single attribute or character in an attribute
|
|
value in an XML document, it must be possible to create
|
|
at least one XPointer that specifically identifies it.</p>
|
|
<p>[There
|
|
is not consensus on whether identification of attributes
|
|
is required. DOM does not presently provide a built-in
|
|
way to get from an attribute to the elements bearing it;
|
|
on the other hand RDF defines an attribute-based
|
|
representations for which omitting attributes may pose
|
|
problems. The WG seeks additional input on this issue.]</p>
|
|
</li>
|
|
<li><p>For any contiguous selection such as could be clicked or
|
|
dragged by a user in a typical view of the document, it
|
|
must be possible to create at least one XPointer that
|
|
specifically identifies it. This includes point targets
|
|
such as typically selected by a mouse click, entire
|
|
characters, and synchronous and asynchronous spans, as in
|
|
typical word processors and browsers.</p>
|
|
<p>This is
|
|
necessary to model simple everyday user selection, and to
|
|
support the first interface many applications want to
|
|
build: the ability for the user to make a selection in
|
|
the usual way and attach a link to or from it, an
|
|
annotation or bookmark to it, insert it in a path, and so
|
|
on. If users could only select or link whole elements,
|
|
the intuitive "select/act" interface would no
|
|
longer correspond to the system's actual behavior, and
|
|
thus become very confusing.</p>
|
|
<p>[Implementor note: Such a selection may include only
|
|
part of various elements: this is most obvious with a
|
|
selection that includes the end of one element and the
|
|
start of the next, but is also true relative to the
|
|
ancestors of any element. Such a selection may be
|
|
considered to include information about attributes and
|
|
boundary locations for elements that include some but not
|
|
all of the selected range. In the example just given,
|
|
there would thus be access to the attributes and boundary
|
|
locations of both the elements that are partly included,
|
|
even though neither is fully "in" the span; but
|
|
not, say, of a containing DIV. See <a href="#Broo88">Brooks
|
|
(1988)</a> for an extensive analysis of the semantics and
|
|
user interface requirements for selecting and editing in
|
|
tree-structured documents.]</p>
|
|
</li>
|
|
<li><p>For any point immediately preceding or following any
|
|
character of content, and for any point immediately
|
|
inside or outside the beginning or end of any element, or
|
|
PI, it must be possible to create at least one XPointer
|
|
that specifically identifies it.</p>
|
|
<p>This is required
|
|
because it is a standard part of user selection
|
|
semantics, and to provide an unambiguous way to express
|
|
targets for myriad other application purposes: inserting
|
|
or pasting, recording the scope of change or version
|
|
information, and so on. For example, one may wish to
|
|
specify the location in content immediately preceding a
|
|
given PI or sub-element, not only the PI or sub-element
|
|
itself (especially small elements such as italicized
|
|
words), in order to unambiguously specify a cursor
|
|
location, the end of a user selection, the destination of
|
|
a link, etc.</p>
|
|
</li>
|
|
<li><p>[There is not consensus on whether this is a short-term
|
|
requirement] the XPointer specification must provide a
|
|
way to identify targets that include multiple,
|
|
potentially discontiguous data portions.</p></li>
|
|
</ol>
|
|
|
|
<h2><a name="X.expressiveness">B: Expressiveness requirements</a></h2>
|
|
|
|
<p>These requirements involve the type and variety of XPointers
|
|
that can be used to identify a target, which is how the language
|
|
achieves greater robustness, reusability, and clarity to humans,
|
|
as described in more detail below). Fulfilling the completeness
|
|
requirements above does not guarantee fulfilling the
|
|
expressiveness requirements stated here. These are a different
|
|
but equally important class of requirement. Indeed it is easy to
|
|
design a target-rich system, but it would be much more prone to
|
|
breakage, and far less intuitive and readable to humans (even if
|
|
it managed to have fewer constructs or be terser, such as the
|
|
extreme case of a pair of byte offsets into XML source).</p>
|
|
|
|
<ol>
|
|
<li><p>The XPointer specification must utilize information
|
|
structures users can be expected to perceive or
|
|
understand in documents, such as elements, attributes,
|
|
PIs, characters, and strings; and to well-known
|
|
relationships such as containment, siblings, and so on.
|
|
The XPointer specification is not primarily concerned
|
|
with machine-oriented concepts such as offsets, absolute
|
|
nesting depth, and so on.</p></li>
|
|
<li><p>The XPointer specification must provide for identifying
|
|
targets of specified names and types, for example by XML
|
|
IDs, XML PI targets, and element type names. </p></li>
|
|
<li><p>The XPointer specification must (if Xpointers to
|
|
attributes are included) provide for identifying an
|
|
attribute by name, given an element it is on. </p></li>
|
|
<li><p>The XPointer specification must provide a way for
|
|
specific XPointers to express that a singleton target is
|
|
expected. For example, when identifying an element by an
|
|
XML ID attributes or other potential "key"
|
|
constructs. </p></li>
|
|
<li><p>The XPointer specification must provide for identifying
|
|
elements, PIs, characters, and (if supported) comments,
|
|
by their ordered position in the document structure
|
|
relative to other targets that fulfill specified
|
|
conditions. </p></li>
|
|
<li><p>The XPointer specification must provide ways to constrain
|
|
targets by specifying conditions on them, such as element
|
|
type, attribute values, and the presence of particular
|
|
content strings.</p>
|
|
<p>For example, identifying a <tt>SECTION
|
|
</tt>that contains an <tt>ABSTRACT </tt>with <tt>TYPE=FULL</tt>,
|
|
rather than merely an <tt>ABSTRACT</tt>; or being unable
|
|
to test the <tt>ABSTRACT</tt>'s attribute but still
|
|
identify the <tt>SECTION</tt> as the intended target. Or,
|
|
constraining a target to be directly or indirectly within
|
|
a broader target, or to precede or follow it among the
|
|
children of a common containing element.</p></li>
|
|
<li><p>The XPointer specification should make clear a way that
|
|
it can be extended to support testing datatype-specific
|
|
conditions when XML Datatypes are later available through
|
|
the work of the XML Schemas Working Group.</p>
|
|
<p>For example,
|
|
once it is possible to know which attributes or content
|
|
strings constitute integers, date, or real numbers, it
|
|
should be clear how to extend the language to accommodate
|
|
appropriate comparisons within its conditional
|
|
constructs. </p></li>
|
|
<li><p>The XPointer specification must provide a way to specify
|
|
what version of a target resource is intended to be
|
|
identified. </p></li>
|
|
<li><p>The XPointer specification must define how it works in
|
|
relation to XML namespaces. Since an element or attribute
|
|
name can be used as part of characterizing locations in
|
|
an XPointer expression, the meaning of those names must
|
|
be unambiguous.</p>
|
|
</li>
|
|
</ol>
|
|
|
|
<h2><a name="X.robustness">C: Robustness requirements</a></h2>
|
|
|
|
<ol>
|
|
<li><p>It must be possible, but not mandatory, to create
|
|
XPointers that can be tested for whether they identify
|
|
"the same" target when followed as they did
|
|
when created.</p><p>For example, this may be accomplished by
|
|
providing a checksum of the destination data. This
|
|
massively improves robustness because you can detect when
|
|
a link has broken (although it cannot prevent link
|
|
breakage from ever happening). [There is not consensus on
|
|
whether this requirement should be addressed within
|
|
XPointer or XLink]. </p></li>
|
|
<li><p> All XPointers must survive purely mechanical changes to
|
|
the target resource.</p> <p>For example, changing between
|
|
single and double quote characters around attributes or
|
|
between CR, LF, and CRLF for line-ends; inserting
|
|
extraneous whitespace inside <i>tags</i> (for example,
|
|
between the element type name and the attributes, or
|
|
around attribute equal-signs, etc); re-ordering
|
|
attributes; swapping between CDATA marked sections and
|
|
entities for escaping tag opens; or rearranging the
|
|
division between entities.</p>
|
|
<p>Such change are not considered to change the logical
|
|
information structure, and should not prevent an XPointer
|
|
from being interpreted as pointing to the same data.
|
|
However, any change that touches the structure may change
|
|
the destination of the XPointer, and can be (see the
|
|
preceding requirement) caught. Just what changes are
|
|
considered mechanical, is to be worked out in cooperation
|
|
with the XML Information Set Working Group. </p></li>
|
|
<li><p>XPointer must attempt to avoid dependencies on character
|
|
set and internationalization issues, such as its
|
|
definition of "character". </p></li>
|
|
<li><p>The XPointer specification must enumerate any
|
|
dependencies on the presence of a DTD or schema, and
|
|
attempt to minimize such in order to facilitate
|
|
interoperability of XPointers across DTD-supporting and
|
|
non-DTD-supporting XML environments.</p>
|
|
<p>For example, an
|
|
XPointer that identifies a target in part via an
|
|
attribute, may not be interpretable if the target is in a
|
|
non-standalone XML document being processed by a
|
|
non-DTD-supporting parser. This is inherent in XML, but
|
|
needs to be made clear in the XPointer specification. </p></li>
|
|
<li><p>The XPointer specification must be clear about what
|
|
errors may arise, and what they mean. To that end, it
|
|
should attempt to enumerate any dependencies on the XML
|
|
Information Set. <p>In particular, the XPointer
|
|
specification must clearly define the meaning when a
|
|
syntactically correct XPointer does not resolve to any
|
|
data object. [There is no consensus as of this writing on
|
|
whether that is necessarily an error]. There are security
|
|
issues to consider in this, such as that distinguishing
|
|
"data does not exist" from "data access
|
|
not authorized" may itself compromise security.</p></li>
|
|
</ol>
|
|
|
|
<h2><a name="X.user">D: General user requirements</a></h2>
|
|
|
|
<ol>
|
|
<li><p>XPointers must be reasonably user-readable and
|
|
user-writable, particularly for cases likely to be
|
|
commonly needed by beginning and intermediate users.
|
|
Syntax must scale smoothly, not abruptly with increasing
|
|
complexity. </p></li>
|
|
<li><p>The XPointer specification must provide for re-use of
|
|
XPointers by other XPointers. [There is not consensus
|
|
that this is a short-term requirement, though there is
|
|
consensus on its desirability in principle.]</p>
|
|
<p>For
|
|
example, it should be possible to express a generic
|
|
XPointer such as "the following sibling of type
|
|
T", and then write other XPointers that re-use it,
|
|
such as "the first ABSTRACT within whatever (that
|
|
xpointer) points to. This is particularly useful to
|
|
create generic pointers to relative rather than absolute
|
|
targets, and to support functionality like the HTML BASE
|
|
feature.</p></li>
|
|
</ol>
|
|
|
|
<h2><a name="X.syntax">E: Mechanical and syntactic requirements</a></h2>
|
|
|
|
<ol>
|
|
<li><p>XPointers must identify <i>locations</i> in data, not
|
|
merely the data <i>at</i> those locations. <p>A user who
|
|
selects a word or string in a document and attaches a
|
|
link to it via an XPointer, may be very unhappy if they
|
|
follow the link and get just the phrase: they want the
|
|
target data <i>with access to its context.</i> This is
|
|
why the HTML mechanism of scrolling to a target element
|
|
by appending "#" and its name to the document's
|
|
URL is the way it is: literally returning just the anchor
|
|
is usually not enough. From a location one can get the
|
|
context data when needed; from data alone one cannot
|
|
derive a unique location when needed.</p>
|
|
<p>As an example, a DOM handle to an XML information set
|
|
object such as an element, would be one way to identify
|
|
the object in context; while generating a string of
|
|
content and/or markup that contains no reference to the
|
|
original context, would not. </p></li>
|
|
<li><p>XPointer syntax must not require excessive escaping when
|
|
XPointers are embedded in URLs and external identifiers.
|
|
That is, XPointer must not go nuts with punctuation
|
|
marks. This requirement must be balanced with the next
|
|
one, of course.</p>
|
|
<p>This is to preserve readability in raw
|
|
form (which is of minimal importance, as in XML), and to
|
|
reduce multi-escaping errors such as when pasting between
|
|
external identifiers and URLs, passing URLs through
|
|
scripts, typing URLs in raw, etc, etc.</p>
|
|
<p>Note: There are two kinds of escaping to be
|
|
considered: XML escaping such as for ampersand, angle
|
|
brackets, and quotes when XPointers appear within XML
|
|
documents; and URL escaping such as %20 for space, when
|
|
they appear in URLS. </p></li>
|
|
<li><p>When embedded in URLs, XPointers must be correctly
|
|
processed by current Web software such as servers and
|
|
user agents.</p>
|
|
<p>This obviously does not mean that existing
|
|
software magically becomes XPointer-aware; merely that
|
|
XPointer syntax must survive generic URL processing such
|
|
as performed by servers. </p></li>
|
|
<li><p>XPointers must use syntax in ways that are appropriate
|
|
and familiar for the constructs we use them for.</p>
|
|
<p>For
|
|
example "+", if used, should mean addition, not
|
|
variable substitution. For example, operations from
|
|
languages that deal with XML-like data (ordered trees of
|
|
repeatable typed objects, with document-scope unique
|
|
names) would provide applicable syntax, while languages
|
|
that deal with XML-unlike data (such as unordered sets of
|
|
non-repeatable or untyped objects and names) would not. </p></li>
|
|
<li><p>XPointer syntax must be readily generalizable to more
|
|
powerful specifications needed in the future. This means
|
|
we can't use up all the syntax now, or take up all the
|
|
simple syntax for the one or two simplest cases at the
|
|
cost of a massive increment of complexity to get any
|
|
further. </p></li>
|
|
</ol>
|
|
|
|
<h2><a name="X.the.princeton.band">F: Non-requirements</a></h2>
|
|
|
|
<p>This section collect potential requirements that have been
|
|
considered and rejected, at least for the initial version of the
|
|
XPointer specification.</p>
|
|
|
|
<ol>
|
|
<li><p>For any entire comment in an XML document, it must be
|
|
possible to create at least one XPointer that
|
|
specifically identifies it. </p></li>
|
|
<li><p>Comments could be included in order to maintain the
|
|
ability to identify <i>any</i> node in the information
|
|
structure, assuming that XML Information Structure can
|
|
include comments. There do not seem to be compelling use
|
|
cases; critical information should not, in principle, be
|
|
embedded in comments. Also, XML does not require that
|
|
comments be passed back to applications at all, and so
|
|
including them in XPointer would introduce ambiguity
|
|
here.</p>
|
|
<p>If comments were supported, The XPointer
|
|
specification would have to define counting and perhaps
|
|
other operations such that XPointers not directly
|
|
pointing at comments would work regardless of whether the
|
|
application had gotten comments from the XML parser or
|
|
not.</p></li>
|
|
</ol>
|
|
<!--
|
|
==========================================================================
|
|
-->
|
|
<h1><a name="Introduction">Background and Rationale</a></h1>
|
|
|
|
<p>This document defines specific requirements for the XPointer
|
|
specification, beyond necessary but "apple pie"
|
|
requirements such as "completeness",
|
|
"clarity", or "brevity". These more detailed
|
|
functional requirements are intended to facilitate
|
|
decision-making during development of the Proposed Reccomendation
|
|
from the established Working Draft.</p>
|
|
|
|
<p>For XPointers to interoperate, they must be defined in terms
|
|
of an explicit and consistent conceptual structure. For example,
|
|
any construct involving child numbers breaks badly if one
|
|
application can counts PIs, comments, or some other objects as
|
|
children, while another doesn't: an XPointer interpreted by the
|
|
two would identify different places, which is unacceptable.
|
|
Because a consistent structure is required for XPointers to
|
|
interoperate, constraints on the conceptual information structure
|
|
for XML are discussed in a <a
|
|
href="http://www.w3.org/TR/NOTE-xptr-infoset-liaison">related
|
|
liaison paper</a>.
|
|
|
|
<h2><a name="Kinds">Kinds of pointing</a></h2>
|
|
|
|
<p>A system for identifying data locations can be characterized
|
|
along two basic axes: The <i>set of targets</i> it can express,
|
|
versus the <i>set of methods</i> it uses to express them.</p>
|
|
|
|
<p><a name="#term.range.of.targets"></a>The first axis, the <i>range of
|
|
targets</i>, involves the extent or range of things the system
|
|
can identify at all: just documents as wholes, just elements,
|
|
individual characters, strings, etc. URLs leading to HTML
|
|
generally can identify whole documents (typically files), as well
|
|
as named A elements that the author specifically provided; but
|
|
not other elements, words, or any other kind of selection or
|
|
structure in HTML (unless of course you program via CGI, server
|
|
plug-ins, etc. -- which not surprisingly can do anything any
|
|
computer program can do). </p>
|
|
|
|
<p><a name="#term. range.of.descriptions"></a>The second axis, the <i>range
|
|
of descriptions</i> available for doing the pointing, involves
|
|
how well a system can express the intent of the pointer as
|
|
distinct from what it pointed to. For example, saying "the
|
|
first footnote with author='Smith' rather than "element
|
|
3735928559". Weakness of a system on this axis would limits
|
|
the mean of expressing what you actually <i>mean</i>, as opposed
|
|
to just what you <i>got,</i> even if the result happens to be
|
|
identical. This would have the same problems as a human language
|
|
that tried to remove all synonyms, or a mathematical model that
|
|
tried to define "integer" by listing all integers.
|
|
While a grand, even elegant idea at first glance, it is
|
|
inadequate.</p>
|
|
|
|
<p>This distinction is the most crucial to any data
|
|
identification specification. It is usually quite easy to achieve
|
|
any desired level of target power, and extremely trivial to
|
|
implement. For example, a pair of byte offsets into an HTML file
|
|
has very high target power: it can point to any element, any tag,
|
|
any attribute (name, value, or whole), any character, and any
|
|
string regardless of how it crosses element boundaries.
|
|
Generally, the target power is even too high, because most offset
|
|
pairs point to ill-formed data chunks for which users would
|
|
likely never need a pointer at all (such as "<tt>me='w</tt>"
|
|
in "<tt><p name='wow'></tt>"). </p>
|
|
|
|
<p>At the same time, a pair of byte offsets has almost no
|
|
descriptive power: the offsets to a particular paragraph,
|
|
footnote, citation, or other element instance, give no hint of
|
|
what they are: they're just two numbers. There is nothing in such
|
|
a system that makes such a reference any easier, more robust,
|
|
more readable, or simpler to create than an absurd one (such as
|
|
from the middle of the name of the TYPE attribute of some element
|
|
to the middle of the 27th word of its 946th descendant).
|
|
Low-description systems fail to reflect the important fact that
|
|
some things are ubiquitous, coherent, and highly valuable, but
|
|
others are bizarre, ill-formed, or nearly useless. </p>
|
|
|
|
<p>Some other models have the opposite characteristics: IDs have
|
|
fairly low target power (you can only link to things that have
|
|
them, and only whole elements can have them), but very high
|
|
descriptive power (you can say exactly what you mean, since IDs
|
|
typically express a notion of objects that have true names). This
|
|
is why they are so highly robust, but not very general. </p>
|
|
|
|
<p>These two axes are not necessarily a tradeoff: a system can
|
|
also have neither or both. For example, numbering all the
|
|
elements of a document in order and using that number, is low to
|
|
moderate on <i>both</i> kinds of power (though it affords some
|
|
very nice optimizations for implementors). And a system of
|
|
structural identifiers that mirrors the structure of information
|
|
directly can be quite high in both kinds of power. </p>
|
|
|
|
<p>Systems with high target power but low description power have
|
|
other problems, such as compromising robustness: they break far
|
|
more easily and, even worse, make it is nearly impossible to <i>detect</i>
|
|
failures. They also lack reusability: an identifier created for
|
|
one context can generally be used in other contexts only if it is
|
|
highly descriptive. A common SGML and XML example is a document
|
|
available in multiple languages (or multiple drafts or editions).
|
|
If you assign the same IDs to corresponding elements in each
|
|
version, it is trivial to re-use links with any version, or even
|
|
with several at once (say, for parallel text display or
|
|
comparison); a system that can utilize ancestor, child, and
|
|
sibling relationships can frequently get the same result even
|
|
with few IDs around; a purely target-rich system cannot./p> </p>
|
|
|
|
<p>Many user comments on the existing WDs and implementations
|
|
have advocated increasing both kinds of power, but especially
|
|
description power: to describe the intended destination in more
|
|
abstract, information-oriented or human-oriented terms, rather
|
|
than only in terms of geometry or tree position. Taking best
|
|
advantage of more descriptive pointing may requires at least very
|
|
slight knowledge of tagging practices, but description helps even
|
|
without such knowledge; for almost any XML it is likely that
|
|
pointing to an ID-less element by pointing to some nearby element
|
|
<i>with</i> an ID, and stepwise from there to the final target
|
|
element, is a good improvement over pointing merely via
|
|
child-numbers all the way down from the document root element. In
|
|
many XML applications schemas and use requirements -- such as a
|
|
transaction record, a bibliography, an RDF file, etc. -- provide
|
|
much more information that enable automatic construction of
|
|
highly clear, descriptive pointers -- if the language allows them
|
|
to exist at all. For example, a pointer to the last transaction
|
|
in a set, or the one with the highest price, is clear and
|
|
unambiguous. In general, finding an element with a given ID, or a
|
|
given kind of element type or context, or elements with certain
|
|
combinations of simple characteristics in terms of types,
|
|
attributes, tree relationships to other characterized targets,
|
|
etc. add robustness.</p>
|
|
|
|
<h2><a name="Lure">On minimalist pointing</a></h2>
|
|
|
|
<p>Target-rich pointing may appear at first glance to be all that
|
|
is needed for XPointer. This is probably because
|
|
'select/create-link' is the first interface some applications
|
|
will implement, and because even completely non-descriptive
|
|
pointing can trivially support that interface. However, there are
|
|
many other applications for XPointer. If the criterion is merely
|
|
ability to point somehow, that would logically lead to byte
|
|
offsets as the simplest solution; but this is widely agreed to be
|
|
absurd. While for one specific application scenario that might
|
|
barely suffice, the situation is clearly more complex; it is
|
|
analogous to some other familiar situations: </p>
|
|
|
|
<ul>
|
|
<li><p>One-way, single-ended links (like HTML <A>) are
|
|
sometimes thought to be the sum total of hypertext
|
|
requirements. They, too, can achieve good power in terms
|
|
of what they <i>can</i> express, because you can always
|
|
decompose multi-ended links, bi-directional links, link
|
|
typing, and other phenomena into more complicated (but
|
|
non-standardized) structures of lots of <A>s; but
|
|
they are weak in descriptive power, since you must resort
|
|
to circuitous ways to accomplish some desired ends. </p></li>
|
|
<li><p>Word processors typically implement procedural formatting
|
|
before descriptive: "after all, you can get all the
|
|
required formatting effects without stylesheets or
|
|
unpredictable sets of tags, right?" This is
|
|
precisely the same distinction at issue here.</p></li>
|
|
</ul>
|
|
|
|
<p>Such "completeness" arguments may be true as far as
|
|
they go, but are not sufficient: much more is required for an
|
|
adequate solution in all these cases. Descriptive power
|
|
requirements matter because human usability requires other
|
|
features such as readability, indirection, re-use, ability to
|
|
change things without countless manual repairs, and generally
|
|
better robustness in the face of change. Thus, more descriptive
|
|
pointing is a vastly better solution.</p>
|
|
|
|
<p>Radically minimalist pointing has never been contemplated in
|
|
XPointer, and should not be. It is extremely easy to implement,
|
|
and attractive for that reason (although descriptive pointing has
|
|
also been repeatedly shown not to be hard). But at the same time,
|
|
minimalist pointing has a number of inherent limitations that
|
|
taken together make descriptive pointing necessary:</p>
|
|
|
|
<ol>
|
|
<li><p>Minimalist pointers are not, in general, easily <b>interpretable
|
|
</b>by humans. Large precise integers, or vectors of
|
|
smaller ones, are just not as understandable to humans as
|
|
"the 5th chapter past here" or "chapter 4
|
|
section 2 paragraph 5"; the latter are common,
|
|
familiar notions. </p></li>
|
|
<li><p>Minimalist pointing decreases the <b>ability to achieve
|
|
robustness.</b> This is because with
|
|
minimalist/procedural pointing, you can only refer to
|
|
what you <i>got,</i> not to what you wanted. This is
|
|
equivalent so long as nothing changes; but as soon as
|
|
something changes (as is frequent), minimalist pointers
|
|
break far more readily than descriptive ones. There is a
|
|
scale of robustness, and only by providing a range of
|
|
descriptive pointing techniques can link authors (human
|
|
or otherwise) take advantage of it. </p></li>
|
|
<li><p>Minimalist pointing is less appropriate for dealing with
|
|
<b>dynamically-generated
|
|
HTML</b> or XML, such as database extractions or
|
|
dynamically-assembled documents (an increasingly common
|
|
scenario). This is because such information is likely to
|
|
be changed in small-scale but widespread ways (replacing
|
|
the stock ticker or visit counter field, etc), and such
|
|
minor changes will commonly break minimalist pointers to
|
|
unchanged surrounding data, but not break descriptive
|
|
pointers. Also, such data typically has systematic
|
|
regularities that make automated construction of highly
|
|
descriptive pointers easy and highly robust -- but only
|
|
if the pointing language lets such pointers exist. </p></li>
|
|
<li><p>Minimalist pointers have virtually no potential for <b>re-use.</b>
|
|
They cannot describe relative locations, such as
|
|
"the next chapter" or "the chapter 5
|
|
milestones tags earlier", and so cannot be re-used
|
|
in multiple contexts (see <a href="#DeRo89">DeRose 1989</a>
|
|
for more on link reuse). If pointers had too little
|
|
descriptive power, even a trivial "next slide"
|
|
link in presentations could not be made generic: only an
|
|
explicit, separate, tediously different "next
|
|
slide" link on every slide; which again seems
|
|
absurd. It is hard to imagine plausible situations where
|
|
a non-descriptive link could be usefully re-used
|
|
("37 elements earlier"???). </p></li>
|
|
<li><p>Minimalist pointers generally provide no selection of
|
|
targets consisting of <b>multiple locations,</b> such as
|
|
the set of all elements of a given type, the set of all
|
|
characters within a given element, etc.</p>
|
|
<p>This is a more
|
|
severe problem than it seems, because it impacts any
|
|
XPointers that involve stepwise specifications (sometimes
|
|
called "location ladders"), even simple ones
|
|
such as "the SEC that contains an ABSTRACT". If
|
|
multiple locations are ruled out even for intermediate
|
|
results in an addressing expression, then such pointers
|
|
are ruled out even when they would end up at a single
|
|
node. This is because the evaluator would find a lot of
|
|
SEC elements first, and only then be able to go on to
|
|
pick the one that is the final result. If the
|
|
implementation must support multiple results in
|
|
intermediate steps, the savings sometimes claimed for
|
|
ruling them out largely disappears. </p></li>
|
|
<li><p>Minimalist pointers are commonly limited to identifying a
|
|
single whole element, a single point or character, or at
|
|
best a string (offsets can "sort of" point to
|
|
more, but really only point to byte ranges that may,
|
|
sometimes, correspond to these units). <b>Normal user
|
|
selections</b> cannot be modeled.</p></li>
|
|
</ol>
|
|
|
|
<h2><a name="Nature">The nature of the distinction</a></h2>
|
|
|
|
<p>Many of these differences arise from a single underlying
|
|
cause: namely, inadequate descriptive power. A non-descriptive or
|
|
trivially descriptive pointer language might in theory be able to
|
|
point to all the same objects as a descriptive language. However,
|
|
linking to "the 3rd child of the 4th child of the root"
|
|
does not mean the same thing as linking to "ID chap5"
|
|
or "the element immediately preceding the ABSTRACT".
|
|
They may happen to be the same thing on one day, for one version
|
|
of one location in one dataset; but the meaning is not the same.</p>
|
|
|
|
<p>Note: This fundamental distinction is so important that it has
|
|
names and entire literatures within many fields of study.
|
|
Linguists call such cases de dicto/de re ambiguities; logicians
|
|
and mathematicians call them intensional vs. extensional
|
|
specification; computer scientists call them shallow and deep (or
|
|
weak and strong) equality; markup theorists call them procedural
|
|
and declarative markup. In all these fields, providing formal
|
|
systems that support only one of the two cases is a classic
|
|
error.</p>
|
|
|
|
<h2><a name="Robustness">Robustness issues</a></h2>
|
|
|
|
<p>Another example of the difference in power of descriptive over
|
|
minimalist pointing involves robustness (that is, pointers that
|
|
have a good chance of pointing to the same place even after the
|
|
document has been edited in various ways). It has occasionally
|
|
been suggested that depending on usage scenarios, TREELOCs (which
|
|
specify a node by giving the sequence of child-numbers to walk
|
|
down to it) might be just as robust as IDs (unique names), or
|
|
even more so. While this is true in theory, I find it
|
|
unconvincing because</p>
|
|
|
|
<ol>
|
|
<li><p>Although one can create such scenarios in theory, they
|
|
are not at all typical of existing practice. </p></li>
|
|
<li><p>IDs require a "positive option" (editing the
|
|
ID) to break them, but TREELOCs break readily (even
|
|
editing a far-distant part of the document) without the
|
|
user having to do anything local or specific to break
|
|
them. </p></li>
|
|
<li><p>Most crucial: with IDs or other descriptive methods,
|
|
authors <i>can</i> create a work practice under which
|
|
they can edit without breaking links. Software can even
|
|
help, for example by tracking deleted IDs so they are not
|
|
accidentally re-used. With TREELOC there is <i>no</i>
|
|
such possibility at all: you cannot manage an editing
|
|
process so that a given node is always the third child,
|
|
even after you inserted a child before it. </p></li>
|
|
</ol>
|
|
|
|
<p>So although it is <i>possible to fail</i> under either
|
|
approach if you make all the worst possible choices, that does
|
|
not make the approaches equally robust. This is because only with
|
|
the ID approach is it <i>possible to protect yourself</i> even if
|
|
you make all the best choices.</p>
|
|
|
|
<h2><a name="Kinds-Summary">Summary</a></h2>
|
|
|
|
<p>The many advantages of descriptive pointing are crucial for a
|
|
scalable, generic pointing system. Descriptive pointing is
|
|
crucial for all the same reasons that descriptive markup is
|
|
crucial to documents, and that making links first-class objects
|
|
is crucial to linking. It is also clearly feasible, as shown by
|
|
multiple implementations of the prior WDs from the XML WG, and of
|
|
TEI extended pointers. At the same time, in order to get the
|
|
specification out in the time frame required, we wish to keep a
|
|
bound on the size of the language, and not implement all possible
|
|
constructs, tests, filters, and so on. XPointer thus seeks to
|
|
provide a small but rich set of descriptive pointing mechanisms,
|
|
such as walking around trees in terms of their fundamental
|
|
relationships; without taking on the undue task of a
|
|
full-fledged, multi-purpose tool to express every conceivable
|
|
predicate. To do more would take too long; to do less would
|
|
actually complicate and weaken applications, largely by limiting
|
|
XPointer to human-unclear, less robust, and less re-usable
|
|
pointers.</p>
|
|
|
|
<p>Some of the features of descriptive pointing bear some
|
|
similarity to querying in general, but that is because the term
|
|
"querying" covers an awful lot of ground: <a
|
|
href="#Yu98">Yu and Meng (1998)</a> note that "the goal of
|
|
query processing and optimization is to find user-desired data
|
|
from an often very large database efficiently" -- where
|
|
"user-desired" is arbitrarily broad; other definitions
|
|
speak of selecting data that "fulfills arbitrary sets of
|
|
constraints" or "has certain characteristics".
|
|
These all cover a wide range of activities, whose requirements,
|
|
priorities, and consequent design tradeoffs differ greatly. A
|
|
search for an ID is a query (though a very simple one); and there
|
|
are many user and developer requests for XPointer features that
|
|
overlap with what one expects in a full-blown query language
|
|
(among the relevant issues already assigned numbers, are 17-21,
|
|
26, 27, 44, and 46-49). </p>
|
|
|
|
<p>This similarity is inevitable because <i>any</i> language that
|
|
selects things out of trees requires certain basic operations,
|
|
such as genetic access to nodes of the tree; without such
|
|
operations any language that deals with trees would be utterly
|
|
crippled. However, XPointer has other requirements that are not
|
|
shared by various other mechanisms that may arise for XML for
|
|
other purposes. Among these are robustness, plus a quite
|
|
different user perspective and priority: the purpose with
|
|
XPointer is to point to a known data object (typically a single
|
|
one or a well-defined group to be treated as if it were one),
|
|
rather than to discover whether any data might be out there
|
|
somewhere and how much.</p>
|
|
|
|
<p>By separating minimalist vs. descriptive pointing models and
|
|
acknowledging our need for both, we can assign our existing
|
|
XPointer issues more clearly into categories that we can deal
|
|
with effectively. This two-level approach allows a natural
|
|
beginning.</p>
|
|
|
|
<h1><a name="Bibliography">Bibliography</a></h1>
|
|
|
|
<p><a name="Abit97">Abiteboul, Serge et al.</a> 1997.
|
|
"Querying Documents in Object Databases." In <i>International
|
|
Journal on Digital Libraries</i> 1(1): 5-19.</p>
|
|
|
|
<p><a name="Andr89">André, Jacques, Richard Furuta, and Vincent
|
|
Quint (eds).</a> 1989. <i>Structured Documents.</i> Cambridge:
|
|
Cambridge University Press. ISBN 0-521-36554-6.</p>
|
|
|
|
<p><a name="Broo88">Brooks, Kenneth P.</a> 1988. "A Two-view
|
|
Document Editor with User-definable Document Structure."
|
|
Dissertation, Stanford University Department of Computer Science.
|
|
Reprinted as <a
|
|
href="http://www.research.digital.com/SRC/publications">Technical
|
|
Report #33</a> by Digital Systems Research Center.</p>
|
|
|
|
<p><a name="Burk91">Burkowski, Forbes J.</a> 1991. "An
|
|
Algebra for Hierarchically Organized Text-Dominated
|
|
Databases." Waterloo, Ontario, Canada: Department of
|
|
Computer Science, University of Waterloo. Manuscript: Portions
|
|
"appeared as part of a paper presented at RIAO '91:
|
|
Intelligent Text and Image Handling, Barcelona, Spain, Apr.
|
|
1991." </p>
|
|
|
|
<p><a name="Conk87">Conklin, Jeff.</a> 1987. "Hypertext: An
|
|
Introduction and Survey." <i>IEEE Computer</i> 20 (9):
|
|
17-41.</p>
|
|
|
|
<p><a name="DeRo89">DeRose, Steven J.</a> 1989. "Expanding
|
|
the Notion of Links." In <i>Proceedings of Hypertext '89,</i>
|
|
Pittsburgh, PA. Baltimore, MD: Association for Computing
|
|
Machinery Press.</p>
|
|
|
|
<p><a name="DeRo95">DeRose, Steven J. and David G. Durand.</a>
|
|
1995. "The TEI Hypertext Guidelines." In <i>Text
|
|
Encoding Initiative: Background and Context.</i> Boston: Kluwer
|
|
Academic Publishers. ISBN 0-7923-3689-5. </p>
|
|
|
|
<p><a name="DeRo98a">DeRose, Steven and Eve Maler (eds).</a>
|
|
1998. <a href="http://www.w3.org/TR/1998/WD-xlink-19980303">"XML
|
|
Linking Language (XLink)."</a> World Wide Web Consortium
|
|
Working Draft. March 1998. </p>
|
|
|
|
<p><a name="DeRo98b">DeRose, Steven and Eve Maler (eds). 1998.</a>
|
|
<a href="http://www.w3.org/TR/1998/WD-xptr-19980303">"XML
|
|
Pointer Language (XPointer)."</a> World Wide Web Consortium
|
|
Working Draft. March 1998. </p>
|
|
|
|
<p><a name="Kahn89">Kahn, Paul.</a> 1989. "Webs, Trees, and
|
|
Stacks: How Hypermedia System Design Affects Hypermedia
|
|
Content." In <i>Proceedings of the Third International
|
|
Conference on Human-Computer Interaction,</i> Boston, MA,
|
|
September 18-22, 1989.</p>
|
|
|
|
<p><a name="Liu77">Liu, C. L.</a> 1977. <i>Elements of Discrete
|
|
Mathematics.</i> New York: McGraw-Hill. ISBN 0-07-038131-3.</p>
|
|
|
|
</body>
|
|
</html>
|
|
|