You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
185 lines
32 KiB
185 lines
32 KiB
<?xml version="1.0" encoding="UTF-8"?>
|
|
<!DOCTYPE html
|
|
PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
|
|
<html xmlns="http://www.w3.org/1999/xhtml" lang="EN"><head><title>Legacy extended IRIs for XML resource identification</title><style type="text/css">
|
|
code { font-family: monospace; }
|
|
|
|
div.constraint,
|
|
div.issue,
|
|
div.note,
|
|
div.notice { margin-left: 2em; }
|
|
|
|
ol.enumar { list-style-type: decimal; }
|
|
ol.enumla { list-style-type: lower-alpha; }
|
|
ol.enumlr { list-style-type: lower-roman; }
|
|
ol.enumua { list-style-type: upper-alpha; }
|
|
ol.enumur { list-style-type: upper-roman; }
|
|
|
|
|
|
div.exampleInner pre { margin-left: 1em;
|
|
margin-top: 0em; margin-bottom: 0em}
|
|
div.exampleOuter {border: 4px double gray;
|
|
margin: 0em; padding: 0em}
|
|
div.exampleInner { background-color: #d5dee3;
|
|
border-top-width: 4px;
|
|
border-top-style: double;
|
|
border-top-color: #d3d3d3;
|
|
border-bottom-width: 4px;
|
|
border-bottom-style: double;
|
|
border-bottom-color: #d3d3d3;
|
|
padding: 4px; margin: 0em }
|
|
div.exampleWrapper { margin: 4px }
|
|
div.exampleHeader { font-weight: bold;
|
|
margin: 4px}
|
|
</style><link rel="stylesheet" type="text/css" href="http://www.w3.org/StyleSheets/TR/W3C-WG-NOTE.css"/></head><body><div class="head"><p><a href="http://www.w3.org/"><img src="http://www.w3.org/Icons/w3c_home" alt="W3C" height="48" width="72"/></a></p>
|
|
<h1><a name="title" id="title"/>Legacy extended IRIs for XML resource identification</h1>
|
|
<h2><a name="w3c-doctype" id="w3c-doctype"/>W3C Working Group Note 3 November 2008 (BNF comment style corrected in place 2009-07-09)</h2><dl><dt>This version:</dt><dd>
|
|
<a href="http://www.w3.org/TR/2008/NOTE-leiri-20081103/">http://www.w3.org/TR/2008/NOTE-leiri-20081103/</a>
|
|
</dd><dt>Latest version:</dt><dd>
|
|
<a href="http://www.w3.org/TR/leiri/">http://www.w3.org/TR/leiri/</a>
|
|
</dd><dt>Previous version:</dt><dd>
|
|
</dd><dt>Editors:</dt><dd>Henry S. Thompson, University of Edinburgh <a href="mailto:ht@inf.ed.ac.uk"><ht@inf.ed.ac.uk></a></dd><dd>Richard Tobin, University of Edinburgh <a href="mailto:richard@inf.ed.ac.uk"><richard@inf.ed.ac.uk></a></dd><dd>Norman Walsh, Mark Logic Corporation <a href="mailto:norman.walsh@marklogic.com"><norman.walsh@marklogic.com></a></dd></dl><p>This document is also available in these non-normative formats: <a href="Overview.xml">XML</a>.</p><p class="copyright"><a href="http://www.w3.org/Consortium/Legal/ipr-notice#Copyright">Copyright</a> © 2008 <a href="http://www.w3.org/"><acronym title="World Wide Web Consortium">W3C</acronym></a><sup>®</sup> (<a href="http://www.csail.mit.edu/"><acronym title="Massachusetts Institute of Technology">MIT</acronym></a>, <a href="http://www.ercim.org/"><acronym title="European Research Consortium for Informatics and Mathematics">ERCIM</acronym></a>, <a href="http://www.keio.ac.jp/">Keio</a>), All Rights Reserved. W3C <a href="http://www.w3.org/Consortium/Legal/ipr-notice#Legal_Disclaimer">liability</a>, <a href="http://www.w3.org/Consortium/Legal/ipr-notice#W3C_Trademarks">trademark</a> and <a href="http://www.w3.org/Consortium/Legal/copyright-documents">document use</a> rules apply.</p></div><hr/><div>
|
|
<h2><a name="abstract" id="abstract"/>Abstract</h2><p>For historic reasons, some formats have allowed variants of IRIs that
|
|
are somewhat less restricted in syntax, for example XML system identifiers
|
|
and W3C XML Schema anyURIs. This document provides a
|
|
definition and a name (Legacy Extended IRI or LEIRI) for these variants for
|
|
easy reference. These variants have to be used with care; they
|
|
require further processing before being fully interchangeable as
|
|
IRIs. New protocols and formats should not use Legacy Extended IRIs.</p></div><div>
|
|
<h2><a name="status" id="status"/>Status of this Document</h2><p><em>This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the <a href="http://www.w3.org/TR/">W3C technical reports index</a> at http://www.w3.org/TR/.</em></p><p>This document is a W3C Working Group Note. It has been developed
|
|
by the <a href="http://www.w3.org/XML/Core/">XML Core Working Group</a>, part of the <a href="http://www.w3.org/XML/">XML Activity</a> in the W3C <a href="http://www.w3.org/UbiWeb/">Ubiquitous Web Domain</a>.</p><p>Publication as a Working Group Note does not imply endorsement by
|
|
the W3C Membership. This is a draft document and may be updated,
|
|
replaced or obsoleted by other documents at any time. It is
|
|
inappropriate to cite this document as other than work in
|
|
progress.</p><p>Please send comments about this document to <a href="mailto:xml-editor@w3.org">xml-editor@w3.org</a>
|
|
(<a href="http://lists.w3.org/Archives/Public/xml-editor/">archived</a>).</p><p>This document is very closely based on material from <a href="#iribis">[IRI-bis]</a>, specifically section 2.2, "ABNF for IRI References and IRIs" and
|
|
section 7, "Legacy Extended IRIs", included here by permission of its
|
|
authors. It is intended to provide a basis for a single
|
|
normative reference from many XML- and/or HTML-related standards in advance of
|
|
the final publication of <a href="#iribis">[IRI-bis]</a> as an RFC. When that publication occurs, this specification will be
|
|
re-issued to reference it in place of the extracts given below.</p><p> This document was produced by a group operating under the <a href="http://www.w3.org/Consortium/Patent-Policy-20040205/">5 February 2004 W3C Patent Policy</a>. W3C maintains a <a href="http://www.w3.org/2004/01/pp-impl/18796/status">public list of any patent disclosures</a> made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains <a href="http://www.w3.org/Consortium/Patent-Policy-20040205/#def-essential">Essential Claim(s)</a> must disclose the information in accordance with <a href="http://www.w3.org/Consortium/Patent-Policy-20040205/#sec-Disclosure">section 6 of the W3C Patent Policy</a>. </p></div><div class="toc">
|
|
<h2><a name="contents" id="contents"/>Table of Contents</h2><p class="toc">1 <a href="#intro">Introduction</a><br/>
|
|
2 <a href="#notation">Notation</a><br/>
|
|
3 <a href="#syntax">Legacy Extended IRI Syntax</a><br/>
|
|
4 <a href="#conversion">Conversion of Legacy Extended IRIs to IRIs</a><br/>
|
|
5 <a href="#charStatus">Characters allowed in Legacy Extended IRIs but not in IRIs</a><br/>
|
|
</p>
|
|
<h3><a name="appendices" id="appendices"/>Appendix</h3><p class="toc">A <a href="#refs">References</a><br/>
|
|
</p></div><hr/><div class="body"><div class="div1">
|
|
<h2><a name="intro" id="intro"/>1 Introduction</h2><p>For historic reasons, some formats have allowed variants of IRIs <a href="#iri">[RFC3987]</a> that
|
|
are somewhat less restricted in syntax, for example XML system identifiers
|
|
and W3C XML Schema anyURIs. This document provides a
|
|
definition and a name (Legacy Extended IRI or LEIRI) for these variants for
|
|
easier reference. These variants have to be used with care; they
|
|
require further processing before being fully interchangeable as
|
|
IRIs. New protocols and formats <strong>should not</strong> use Legacy
|
|
Extended IRIs. The provisions in this
|
|
document also apply to Legacy Extended IRI references.
|
|
</p></div><div class="div1">
|
|
<h2><a name="notation" id="notation"/>2 Notation</h2><p>In this document, characters are referenced by
|
|
using a prefix of 'U+' followed by four to six hexadecimal digits.</p><p>In this document, the key words <strong>must</strong>, <strong>must not</strong>, <strong>required</strong>,
|
|
<strong>shall</strong>, <strong>shall not</strong>, <strong>should</strong>, <strong>should not</strong>, <strong>recommended</strong>, <strong>may</strong>,
|
|
and <strong>optional</strong> are to be interpreted as described in <a href="#maymust">[RFC2119]</a>.</p></div><div class="div1">
|
|
<h2><a name="syntax" id="syntax"/>3 Legacy Extended IRI Syntax</h2><p>The syntax of Legacy Extended IRIs (LEIRIs) and LEIRI references is
|
|
the same as that for IRIs and IRI references except that
|
|
<a href="#ucschar">ucschar</a> is redefined. The syntax of this
|
|
ABNF is described in <a href="#abnf_spec">[RFC5234]</a>. Character numbers are taken from the
|
|
UCS, without implying any actual binary encoding. Terminals in the
|
|
ABNF are characters, not bytes.</p><p>For consistency with <a href="#iri">[RFC3987]</a> for IRIs,
|
|
generic LEIRI software <strong>should not</strong> check
|
|
LEIRIs for conformance to this syntax.</p><p>Some productions are ambiguous. The "first-match-wins" (a.k.a.
|
|
"greedy") algorithm applies. For details, see <a href="#rfc3986">[RFC3986]</a>.</p>
|
|
<h5><a name="d0e201" id="d0e201"/>Productions changed from RFC3986</h5><table class="scrap" summary="Scrap"><tbody><tr valign="baseline"><td><a name="LEIRI" id="LEIRI"/>[1] </td><td><code>LEIRI</code></td><td> ::= </td><td><code><a href="#scheme">scheme</a> ":" <a href="#ihier-part">ihier-part</a> [ "?" <a href="#iquery">iquery</a> ]
|
|
[ "#" <a href="#ifragment">ifragment</a> ]</code></td></tr><tr valign="baseline"><td><a name="ihier-part" id="ihier-part"/>[2] </td><td><code>ihier-part</code></td><td> ::= </td><td><code>"//" <a href="#iauthority">iauthority</a> <a href="#ipath-abempty">ipath-abempty</a></code></td></tr><tr valign="baseline"><td/><td/><td/><td><code>/ <a href="#ipath-absolute">ipath-absolute</a></code></td></tr><tr valign="baseline"><td/><td/><td/><td><code>/ <a href="#ipath-rootless">ipath-rootless</a></code></td></tr><tr valign="baseline"><td/><td/><td/><td><code>/ <a href="#ipath-empty">ipath-empty</a></code></td></tr><tr valign="baseline"><td><a name="LEIRI-reference" id="LEIRI-reference"/>[3] </td><td><code>LEIRI-reference</code></td><td> ::= </td><td><code><a href="#LEIRI">LEIRI</a> / <a href="#irelative-ref">irelative-ref</a></code></td></tr><tr valign="baseline"><td><a name="absolute-LEIRI" id="absolute-LEIRI"/>[4] </td><td><code>absolute-LEIRI</code></td><td> ::= </td><td><code><a href="#scheme">scheme</a> ":" <a href="#ihier-part">ihier-part</a> [ "?" <a href="#iquery">iquery</a> ]</code></td></tr><tr valign="baseline"><td><a name="irelative-ref" id="irelative-ref"/>[5] </td><td><code>irelative-ref</code></td><td> ::= </td><td><code><a href="#irelative-part">irelative-part</a> [ "?" <a href="#iquery">iquery</a> ] [ "#" <a href="#ifragment">ifragment</a> ]</code></td></tr><tr valign="baseline"><td><a name="irelative-part" id="irelative-part"/>[6] </td><td><code>irelative-part</code></td><td> ::= </td><td><code>"//" <a href="#iauthority">iauthority</a> <a href="#ipath-abempty">ipath-abempty</a></code></td></tr><tr valign="baseline"><td/><td/><td/><td><code>/ <a href="#ipath-absolute">ipath-absolute</a></code></td></tr><tr valign="baseline"><td/><td/><td/><td><code>/ <a href="#ipath-noscheme">ipath-noscheme</a></code></td></tr><tr valign="baseline"><td/><td/><td/><td><code>/ <a href="#ipath-empty">ipath-empty</a></code></td></tr><tr valign="baseline"><td><a name="iauthority" id="iauthority"/>[7] </td><td><code>iauthority</code></td><td> ::= </td><td><code>[ <a href="#iuserinfo">iuserinfo</a> "@" ] <a href="#ihost">ihost</a> [ ":" <a href="#port">port</a> ]</code></td></tr><tr valign="baseline"><td><a name="iuserinfo" id="iuserinfo"/>[8] </td><td><code>iuserinfo</code></td><td> ::= </td><td><code>*( <a href="#iunreserved">iunreserved</a> / <a href="#pct-encoded">pct-encoded</a> / <a href="#sub-delims">sub-delims</a> / ":" )</code></td></tr><tr valign="baseline"><td><a name="ihost" id="ihost"/>[9] </td><td><code>ihost</code></td><td> ::= </td><td><code><a href="#IP-literal">IP-literal</a> / <a href="#IPv4address">IPv4address</a> / <a href="#ireg-name">ireg-name</a></code></td></tr><tr valign="baseline"><td><a name="ireg-name" id="ireg-name"/>[10] </td><td><code>ireg-name</code></td><td> ::= </td><td><code>*( <a href="#iunreserved">iunreserved</a> / <a href="#pct-encoded">pct-encoded</a> / <a href="#sub-delims">sub-delims</a> )</code></td></tr><tr valign="baseline"><td><a name="ipath" id="ipath"/>[11] </td><td><code>ipath</code></td><td> ::= </td><td><code><a href="#ipath-abempty">ipath-abempty</a> <i>; begins with "/" or is empty</i></code></td></tr><tr valign="baseline"><td/><td/><td/><td><code>/ <a href="#ipath-abempty">ipath-absolute</a> <i>; begins with "/" but not "//"</i></code></td></tr><tr valign="baseline"><td/><td/><td/><td><code>/ <a href="#ipath-abempty">ipath-noscheme</a> <i>; begins with a non-colon segment</i></code></td></tr><tr valign="baseline"><td/><td/><td/><td><code>/ <a href="#ipath-abempty">ipath-rootless</a> <i>; begins with a segment</i></code></td></tr><tr valign="baseline"><td/><td/><td/><td><code>/ <a href="#ipath-abempty">ipath-empty</a> <i>; zero characters</i></code></td></tr><tr valign="baseline"><td><a name="ipath-abempty" id="ipath-abempty"/>[12] </td><td><code>ipath-abempty</code></td><td> ::= </td><td><code>*( "/" <a href="#isegment">isegment</a> )</code></td></tr><tr valign="baseline"><td><a name="ipath-absolute" id="ipath-absolute"/>[13] </td><td><code>ipath-absolute</code></td><td> ::= </td><td><code>"/" [ <a href="#isegment-nz">isegment-nz</a> *( "/" <a href="#isegment">isegment</a> ) ]</code></td></tr><tr valign="baseline"><td><a name="ipath-noscheme" id="ipath-noscheme"/>[14] </td><td><code>ipath-noscheme</code></td><td> ::= </td><td><code><a href="#isegment-nz-nc">isegment-nz-nc</a> *( "/" <a href="#isegment">isegment</a> )</code></td></tr><tr valign="baseline"><td><a name="ipath-rootless" id="ipath-rootless"/>[15] </td><td><code>ipath-rootless</code></td><td> ::= </td><td><code><a href="#isegment-nz">isegment-nz</a> *( "/" <a href="#isegment">isegment</a> )</code></td></tr><tr valign="baseline"><td><a name="ipath-empty" id="ipath-empty"/>[16] </td><td><code>ipath-empty</code></td><td> ::= </td><td><code>0<<a href="#ipchar">ipchar</a>></code></td></tr><tr valign="baseline"><td><a name="isegment" id="isegment"/>[17] </td><td><code>isegment</code></td><td> ::= </td><td><code>*<a href="#ipchar">ipchar</a></code></td></tr><tr valign="baseline"><td><a name="isegment-nz" id="isegment-nz"/>[18] </td><td><code>isegment-nz</code></td><td> ::= </td><td><code>1*<a href="#ipchar">ipchar</a></code></td></tr><tr valign="baseline"><td><a name="isegment-nz-nc" id="isegment-nz-nc"/>[19] </td><td><code>isegment-nz-nc</code></td><td> ::= </td><td><code>1*( <a href="#ipath-abempty">iunreserved</a> / <a href="#ipath-abempty">pct-encoded</a> / <a href="#ipath-abempty">sub-delims</a> / "@" )</code></td></tr><tr valign="baseline"><td/><td/><td/><td><code><i>; non-zero-length segment without any colon ":"</i></code></td></tr><tr valign="baseline"><td><a name="ipchar" id="ipchar"/>[20] </td><td><code>ipchar</code></td><td> ::= </td><td><code><a href="#iunreserved">iunreserved</a> / <a href="#pct-encoded">pct-encoded</a> / <a href="#sub-delims">sub-delims</a> / ":"</code></td></tr><tr valign="baseline"><td/><td/><td/><td><code>/ "@"</code></td></tr><tr valign="baseline"><td><a name="iquery" id="iquery"/>[21] </td><td><code>iquery</code></td><td> ::= </td><td><code>*( <a href="#ipchar">ipchar</a> / <a href="#iprivate">iprivate</a> / "/" / "?" )</code></td></tr><tr valign="baseline"><td><a name="ifragment" id="ifragment"/>[22] </td><td><code>ifragment</code></td><td> ::= </td><td><code>*( <a href="#ipchar">ipchar</a> / "/" / "?" )</code></td></tr><tr valign="baseline"><td><a name="iunreserved" id="iunreserved"/>[23] </td><td><code>iunreserved</code></td><td> ::= </td><td><code><a href="http://tools.ietf.org/html/rfc5234#appendix-B.1">ALPHA</a> / <a href="http://tools.ietf.org/html/rfc5234#appendix-B.1">DIGIT</a> / "-"
|
|
/ "." / "_" / "~" / <a href="#ipath-abempty">ucschar</a></code></td></tr><tr valign="baseline"><td><a name="iprivate" id="iprivate"/>[24] </td><td><code>iprivate</code></td><td> ::= </td><td><code>%xE000-F8FF / %xE0000-E0FFF / %xF0000-FFFFD</code></td></tr><tr valign="baseline"><td/><td/><td/><td><code>/ %x100000-10FFFD</code></td></tr></tbody></table>
|
|
<h5><a name="d0e522" id="d0e522"/>Productions unchanged from RFC3986</h5><table class="scrap" summary="Scrap"><tbody><tr valign="baseline"><td><a name="scheme" id="scheme"/>[25] </td><td><code>scheme</code></td><td> ::= </td><td><code><a href="http://tools.ietf.org/html/rfc5234#appendix-B.1">ALPHA</a> *( <a href="http://tools.ietf.org/html/rfc5234#appendix-B.1">ALPHA</a> / <a href="http://tools.ietf.org/html/rfc5234#appendix-B.1">DIGIT</a> / "+" / "-" / "." )</code></td></tr><tr valign="baseline"><td><a name="port" id="port"/>[26] </td><td><code>port</code></td><td> ::= </td><td><code>*<a href="http://tools.ietf.org/html/rfc5234#appendix-B.1">DIGIT</a></code></td></tr><tr valign="baseline"><td><a name="IP-literal" id="IP-literal"/>[27] </td><td><code>IP-literal</code></td><td> ::= </td><td><code>"[" ( <a href="#ipath-abempty">IPv6address</a> / <a href="#ipath-abempty">IPvFuture</a> ) "]"</code></td></tr><tr valign="baseline"><td><a name="IPvFuture" id="IPvFuture"/>[28] </td><td><code>IPvFuture</code></td><td> ::= </td><td><code>"v" 1*<a href="http://tools.ietf.org/html/rfc5234#appendix-B.1">HEXDIG</a> "." 1*( <a href="#ipath-abempty">unreserved</a> / <a href="#ipath-abempty">sub-delims</a> / ":" )</code></td></tr><tr valign="baseline"><td><a name="IPv6address" id="IPv6address"/>[29] </td><td><code>IPv6address</code></td><td> ::= </td><td><code> 6( <a href="#h16">h16</a> ":" ) <a href="#h16">ls32</a></code></td></tr><tr valign="baseline"><td/><td/><td/><td><code>/ "::" 5( <a href="#h16">h16</a> ":" ) <a href="#h16">ls32</a></code></td></tr><tr valign="baseline"><td/><td/><td/><td><code>/ [ <a href="#h16">h16</a> ] "::" 4( <a href="#h16">h16</a> ":" ) <a href="#h16">ls32</a></code></td></tr><tr valign="baseline"><td/><td/><td/><td><code>/ [ *1( <a href="#h16">h16</a> ":" ) <a href="#h16">h16</a> ] "::" 3( <a href="#h16">h16</a> ":" ) <a href="#h16">ls32</a></code></td></tr><tr valign="baseline"><td/><td/><td/><td><code>/ [ *2( <a href="#h16">h16</a> ":" ) <a href="#h16">h16</a> ] "::" 2( <a href="#h16">h16</a> ":" ) <a href="#h16">ls32</a></code></td></tr><tr valign="baseline"><td/><td/><td/><td><code>/ [ *3( <a href="#h16">h16</a> ":" ) <a href="#h16">h16</a> ] "::" <a href="#h16">h16</a> ":" <a href="#h16">ls32</a></code></td></tr><tr valign="baseline"><td/><td/><td/><td><code>/ [ *4( <a href="#h16">h16</a> ":" ) <a href="#h16">h16</a> ] "::" <a href="#h16">ls32</a></code></td></tr><tr valign="baseline"><td/><td/><td/><td><code>/ [ *5( <a href="#h16">h16</a> ":" ) <a href="#h16">h16</a> ] "::" <a href="#h16">h16</a> </code></td></tr><tr valign="baseline"><td/><td/><td/><td><code>/ [ *6( <a href="#h16">h16</a> ":" ) <a href="#h16">h16</a> ] "::"</code></td></tr><tr valign="baseline"><td><a name="h16" id="h16"/>[30] </td><td><code>h16</code></td><td> ::= </td><td><code>1*4<a href="http://tools.ietf.org/html/rfc5234#appendix-B.1">HEXDIG</a></code></td></tr><tr valign="baseline"><td><a name="ls32" id="ls32"/>[31] </td><td><code>ls32</code></td><td> ::= </td><td><code>( <a href="#ipath-abempty">h16</a>
|
|
":" <a href="#ipath-abempty">h16</a> ) / <a href="#ipath-abempty">IPv4address</a> </code></td></tr><tr valign="baseline"><td><a name="IPv4address" id="IPv4address"/>[32] </td><td><code>IPv4address</code></td><td> ::= </td><td><code><a href="#dec-octet">dec-octet</a> "." <a href="#dec-octet">dec-octet</a> "." <a href="#dec-octet">dec-octet</a> "." <a href="#dec-octet">dec-octet</a></code></td></tr><tr valign="baseline"><td><a name="dec-octet" id="dec-octet"/>[33] </td><td><code>dec-octet</code></td><td> ::= </td><td><code><a href="http://tools.ietf.org/html/rfc5234#appendix-B.1">DIGIT</a> <i>; 0-9</i></code></td></tr><tr valign="baseline"><td/><td/><td/><td><code>/ %x31-39 <a href="http://tools.ietf.org/html/rfc5234#appendix-B.1">DIGIT</a> <i>; 10-99</i></code></td></tr><tr valign="baseline"><td/><td/><td/><td><code>/ "1" 2<a href="http://tools.ietf.org/html/rfc5234#appendix-B.1">DIGIT</a> <i>; 100-199</i></code></td></tr><tr valign="baseline"><td/><td/><td/><td><code>/ "2" %x30-34 <a href="http://tools.ietf.org/html/rfc5234#appendix-B.1">DIGIT</a> <i>; 200-249</i></code></td></tr><tr valign="baseline"><td/><td/><td/><td><code>/ "25" %x30-35 <i>; 250-255</i></code></td></tr><tr valign="baseline"><td><a name="pct-encoded" id="pct-encoded"/>[34] </td><td><code>pct-encoded</code></td><td> ::= </td><td><code>"%" <a href="http://tools.ietf.org/html/rfc5234#appendix-B.1">HEXDIG</a> <a href="http://tools.ietf.org/html/rfc5234#appendix-B.1">HEXDIG</a></code></td></tr><tr valign="baseline"><td><a name="unreserved" id="unreserved"/>[35] </td><td><code>unreserved</code></td><td> ::= </td><td><code><a href="http://tools.ietf.org/html/rfc5234#appendix-B.1">ALPHA</a> / <a href="http://tools.ietf.org/html/rfc5234#appendix-B.1">DIGIT</a> / "-" / "." / "_" / "~"</code></td></tr><tr valign="baseline"><td><a name="reserved" id="reserved"/>[36] </td><td><code>reserved</code></td><td> ::= </td><td><code><a href="#gen-delims">gen-delims</a> / <a href="#sub-delims">sub-delims</a></code></td></tr><tr valign="baseline"><td><a name="gen-delims" id="gen-delims"/>[37] </td><td><code>gen-delims</code></td><td> ::= </td><td><code>":" / "/" / "?" / "#" / "[" / "]" / "@"</code></td></tr><tr valign="baseline"><td><a name="sub-delims" id="sub-delims"/>[38] </td><td><code>sub-delims</code></td><td> ::= </td><td><code>"!" / "$" / "&" / "'" / "(" / ")"</code></td></tr><tr valign="baseline"><td/><td/><td/><td><code>/ "*" / "+" / "," / ";" / "="</code></td></tr></tbody></table>
|
|
<h5><a name="d0e777" id="d0e777"/>Modified ucschar production</h5><table class="scrap" summary="Scrap"><tbody><tr valign="baseline"><td><a name="ucschar" id="ucschar"/>[39] </td><td><code>ucschar</code></td><td> ::= </td><td><code>" " / "<" / ">" / '"' / "{" / "}" / "|"</code></td></tr><tr valign="baseline"><td/><td/><td/><td><code>/ "\" / "^" / "`" / %x0-1F / %x7F-D7FF</code></td></tr><tr valign="baseline"><td/><td/><td/><td><code>/ %xE000-FFFD / %x10000-10FFFF</code></td></tr></tbody></table><p>The restriction on bidirectional formatting characters in
|
|
<a href="http://tools.ietf.org/html/rfc3629#section-4.1">Section 4.1</a> of <a href="#iri">[RFC3987]</a>
|
|
is lifted. The <a href="#iprivate">iprivate</a> production becomes redundant.</p><p>Formats that use Legacy Extended IRIs <strong>may</strong> further restrict the
|
|
characters allowed therein, either implicitly by the fact that the
|
|
format as such does not allow some characters, or explicitly. An
|
|
example of a character not allowed implicitly may be the NUL
|
|
character (<code>U+0000</code>). However, all the characters allowed in IRIs <strong>must</strong>
|
|
still be allowed.</p></div><div class="div1">
|
|
<h2><a name="conversion" id="conversion"/>4 Conversion of Legacy Extended IRIs to IRIs</h2><p>To convert a Legacy Extended IRI (reference) to an IRI (reference), each character allowed in a Legacy Extended IRI (reference)
|
|
but not allowed in an IRI (reference) (see <a href="#charStatus"><b>5 Characters allowed in Legacy Extended IRIs but not in IRIs</b></a>) <strong>must</strong> be
|
|
percent-encoded by applying the following steps:
|
|
</p><ol class="enumar"><li><p>Convert the character to a sequence of one or more octets
|
|
using UTF-8 <a href="#rfc3629">[RFC3629]</a>.</p></li><li><p>Convert each octet to <code>%HH</code>, where <code>HH</code> is the hexadecimal notation
|
|
of the octet value. Note that this is identical to the percent-encoding
|
|
mechanism in Section 2.1 of <a href="#rfc3986">[RFC3986]</a>. To reduce variability,
|
|
the hexadecimal notation <strong>should</strong> use uppercase letters.</p></li><li><p>Replace the original character with the resulting character
|
|
sequence (that is, a sequence of <code>%HH</code> triplets).</p></li></ol><p>Conversion from a LEIRI to an IRI or a URI <strong>must</strong> be performed only when absolutely necessary and as late as possible in a processing chain. In particular, neither the process of converting a relative LEIRI to an absolute one nor the process of passing a LEIRI to a process or software component responsible for dereferencing it <strong>should</strong> trigger percent-encoding.</p></div><div class="div1">
|
|
<h2><a name="charStatus" id="charStatus"/>5 Characters allowed in Legacy Extended IRIs but not in IRIs</h2><p>This section provides a list of the groups of characters and code
|
|
points that are allowed in Legacy Extedend IRIs but are not allowed
|
|
in IRIs or are allowed in IRIs only in the query part. For each
|
|
group of characters, advice on the usage of these characters is also
|
|
given, concentrating on the reasons not to use them.</p><dl><dt class="label">Space (U+0020)</dt><dd><p>Some formats and applications use space as a delimiter, for example, for
|
|
items in a list. Appendix C of <a href="#rfc3986">[RFC3986]</a> also mentions that
|
|
white space may have to be added when displaying or printing long URIs; the
|
|
same applies to long IRIs. This means that spaces can disappear or can make
|
|
the Legacy Extended IRI to be interpreted as two or more separate IRIs.</p></dd><dt class="label">Delimiters "<" (U+003C), ">" (U+003E) and '"' (U+0022)</dt><dd><p>Appendix C of <a href="#rfc3986">[RFC3986]</a> suggests the use of
|
|
double-quotes (<code>"http://example.com/"</code>) and angle brackets
|
|
(<code><http://example.com/></code>) as delimiters for URIs in plain text.
|
|
These conventions are often used and also apply to IRIs. Legacy Extended IRIs
|
|
using these characters will be cut off at the wrong place.</p></dd><dt class="label">Unwise characters "\" (U+005C), "^" (U+005E), "`" (U+0060), "{"
|
|
(U+007B), "|" (U+007C) and "}" (U+007D)</dt><dd><p>These characters
|
|
originally have been excluded from URIs because the respective
|
|
codepoints are assigned to different graphic characters in some
|
|
7-bit or 8-bit encoding. Despite the move to Unicode, some of
|
|
these characters are still occasionally displayed differently on
|
|
some systems, for example, <code>U+005C</code> as a Japanese Yen symbol. Also, the
|
|
fact that these characters are not used in URIs or IRIs has
|
|
encouraged their use outside URIs or IRIs in contexts that may
|
|
include URIs or IRIs. In case a Legacy Extended IRI with such a
|
|
character is used in such a context, the Legacy Extended IRI will
|
|
be interpreted piecemeal.</p></dd><dt class="label">The controls (C0 controls, DEL and C1 controls, U+0000 - U+001F U+007F - U+009F)</dt><dd><p>There is no way to transmit these characters reliably
|
|
except potentially in electronic form. Even when in electronic
|
|
form, some software components might silently filter out some of
|
|
these characters or may stop processing alltogether when
|
|
encountering some of them. These characters may affect text
|
|
display in subtle, unnoticable ways or in drastic, global and
|
|
irreversible ways depending on the hardware and software involved.
|
|
The use of some of these characters may allow malicious users to
|
|
manipulate the display of a Legacy Extended IRI and its context.</p></dd><dt class="label">Bidi formatting characters (U+200E, U+200F, U+202A-202E)</dt><dd><p>These
|
|
characters affect the display ordering of characters. Displayed
|
|
Legacy Extended IRIs containing these characters cannot be
|
|
converted back to electronic form (logical order) unambiguously.
|
|
These characters may allow malicious users to manipulate the
|
|
display of a Legacy Extended IRI and its context.</p></dd><dt class="label">Specials (U+FFF0-FFFD)</dt><dd><p>These code points provide functionality
|
|
beyond that useful in a Legacy Extended IRI, for example byte
|
|
order identification, annotation and replacements for unknown
|
|
characters and objects. Their use and interpretation in a Legacy
|
|
Extended IRI serves no purpose and may lead to confusing display
|
|
variations.</p></dd><dt class="label">Private use code points (U+E000-F8FF, U+F0000-FFFFD, U+100000-
|
|
10FFFD)</dt><dd><p>Display and interpretation of these code points is by
|
|
definition undefined without private agreement. Therefore, these
|
|
code points are not suited for use on the Internet. They are not
|
|
interoperable and may have unpredictable effects.</p></dd><dt class="label">Tags (U+E0000-E0FFF)</dt><dd><p> These characters provide a way to include language
|
|
tags in Unicode plain text. They are not appropriate for Legacy
|
|
Extended IRIs because language information in identifiers cannot
|
|
reliably be input, transmitted (for example, on a visual medium such as
|
|
paper), or recognized.</p></dd><dt class="label">Non-characters (U+FDD0-FDEF, U+1FFFE-1FFFF, U+2FFFE-2FFFF,
|
|
U+3FFFE-3FFFF, U+4FFFE-4FFFF, U+5FFFE-5FFFF, U+6FFFE-6FFFF,
|
|
U+7FFFE-7FFFF, U+8FFFE-8FFFF, U+9FFFE-9FFFF, U+AFFFE-AFFFF,
|
|
U+BFFFE-BFFFF, U+CFFFE-CFFFF, U+DFFFE-DFFFF, U+EFFFE-EFFFF,
|
|
U+FFFFE-FFFFF, U+10FFFE-10FFFF)</dt><dd><p>These code points are defined as
|
|
non-characters. Applications may use some of them internally, but
|
|
are not prepared to interchange them.</p></dd></dl><p>For reference, we here also list the code points and code units not
|
|
even allowed in Legacy Extended IRIs:</p><dl><dt class="label">Surrogate code units (U+D800-U+DFFF)</dt><dd><p>These do not represent Unicode
|
|
codepoints.</p></dd></dl></div></div><div class="back"><div class="div1">
|
|
<h2><a name="refs" id="refs"/>A References</h2><dl><dt class="label"><a name="maymust" id="maymust"/>RFC2119</dt><dd>Bradner, S., <em>Key words for use in RFCs to Indicate
|
|
Requirement Levels</em>, BCP 14, RFC 2119, IETF, March 1997.
|
|
Available online as <a href="http://tools.ietf.org/html/bcp14">http://tools.ietf.org/html/bcp14</a> and <a href="http://tools.ietf.org/html/rfc2119">http://tools.ietf.org/html/rfc2119</a></dd><dt class="label"><a name="abnf_spec" id="abnf_spec"/>RFC5234</dt><dd>Crocker, D. and P. Overell, Eds, <em>Augmented BNF for Syntax
|
|
Specifications: ABNF</em>, RFC 5234/STD 68, IETF, January 2008.
|
|
Available online as <a href="http://tools.ietf.org/html/rfc5234">http://tools.ietf.org/html/rfc5234</a>
|
|
</dd><dt class="label"><a name="rfc3629" id="rfc3629"/>RFC3629</dt><dd>Yergeau, F., <em>UTF-8, a transformation format of ISO
|
|
10646</em>, STD 63, RFC 3629, IETF November 2003. Available
|
|
online as <a href="http://tools.ietf.org/html/rfc3629">http://tools.ietf.org/html/rfc3629</a>.</dd><dt class="label"><a name="rfc3986" id="rfc3986"/>RFC3986</dt><dd>Berners-Lee, T., R. Fielding and L. Masinter, <em>Uniform
|
|
Resource Identifier (URI): Generic Syntax</em>, STD 66,
|
|
RFC 3986, IETF, January 2005. Available online as <a href="http://tools.ietf.org/html/rfc3986">http://tools.ietf.org/html/rfc3986</a>.</dd><dt class="label"><a name="iri" id="iri"/>RFC3987</dt><dd><em>Internationalized Resource Identifiers
|
|
(IRIs)</em>, RFC3987, Dürst, M. and M. Suignard, eds. IETF,
|
|
2005. Available online as <a href="http://tools.ietf.org/html/rfc3987">http://tools.ietf.org/html/rfc3987</a></dd><dt class="label"><a name="iribis" id="iribis"/>IRI-bis</dt><dd><em>Internationalized Resource Identifiers
|
|
(IRIs)</em>, draft-duerst-iri-bis-04, Dürst, M. and M. Suignard, eds. IETF,
|
|
2008. Available online as <a href="http://tools.ietf.org/html/draft-duerst-iri-bis-04">http://tools.ietf.org/html/draft-duerst-iri-bis-04</a>.</dd></dl></div></div></body></html>
|