server_playground/doc/www.w3.org/TR/2007/NOTE-unicode-xml-20070516/index.html


								<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">

								<html>

								<head>

								  <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">

								  <meta name="ProgId" content="FrontPage.Editor.Document">

								  <style type="text/css">

								.unicode     { font-style: normal }

								.unicode:link { color: #FF0000; background-color: #FFFFFF }

								.unicode:visited { color: #808080; background-color: #FFFFFF }

								.unicode:active { color: #0000FF; background-color: #FFFFFF }

								em.unicode   { font-style: normal }

								 </style>

								  <title>Unicode in XML and other Markup Languages</title>

								  <link rel="stylesheet" type="text/css"

								  href="http://www.w3.org/StyleSheets/TR/W3C-WG-NOTE.css">

								</head>


								<body>


								<div class="head">

								<p><a href="http://www.w3.org/"><img alt="W3C"

								src="http://www.w3.org/Icons/w3c_home" align="middle" border="0" height="48"

								width="72"></a> <a href="http://www.unicode.org/"><img alt="Unicode"

								src="http://www.unicode.org/img/unilogo-72.gif" align="middle" border="0"

								height="72" width="72"></a> </p>


								<h1>Unicode in XML and other Markup Languages</h1>


								<h2 class="unicode" id="utr20">Unicode Technical Report #20</h2>


								<h2>W3C Working Group Note 16 May 2007</h2>

								<dl>

								  <dt class="unicode">Revision (Unicode):</dt>

								    <dd>8</dd>

								  <dt>This version:</dt>

								    <dd class="unicode"><a

								      href="http://www.unicode.org/reports/tr20/tr20-8.html">http://www.unicode.org/reports/tr20/tr20-8.html</a></dd>

								    <dd><a

								      href="http://www.w3.org/TR/2007/NOTE-unicode-xml-20070516/">http://www.w3.org/TR/2007/NOTE-unicode-xml-20070516/</a></dd>

								  <dt>Latest version:</dt>

								    <dd class="unicode"><a

								      href="http://www.unicode.org/reports/tr20/">http://www.unicode.org/reports/tr20/</a></dd>

								    <dd><a

								      href="http://www.w3.org/TR/unicode-xml/">http://www.w3.org/TR/unicode-xml/</a></dd>

								  <dt>Previous version:</dt>

								    <dd class="unicode"><a

								      href="http://www.unicode.org/reports/tr20/tr20-7.html">http://www.unicode.org/reports/tr20/tr20-7.html</a></dd>

								    <dd><a

								      href="http://www.w3.org/TR/2003/NOTE-unicode-xml-20030613/">http://www.w3.org/TR/2003/NOTE-unicode-xml-20030613/</a></dd>

								  <dt>Date (Unicode):</dt>

								    <dd>2007-05-16</dd>

								  <dt>Authors:</dt>

								    <dd>Martin Dürst (<a

								      href="mailto:duerst@it.aoyama.ac.jp">duerst@it.aoyama.ac.jp</a>)</dd>

								    <dd>Asmus Freytag (<a

								      href="mailto:asmus@unicode.org">asmus@unicode.org</a>)</dd>

								</dl>


								<p class="copyright">Copyright © 2007 Unicode®, and <a

								href="http://www.w3.org/"><acronym

								title="World Wide Web Consortium">W3C</acronym></a><sup>®</sup> (<a

								href="http://www.csail.mit.edu/"><acronym

								title="Massachusetts Institute of Technology">MIT</acronym></a>, <a

								href="http://www.ercim.org/"><acronym

								title="European Research Consortium for Informatics and Mathematics">ERCIM</acronym></a>,

								<a href="http://www.keio.ac.jp/">Keio</a>), All Rights Reserved. <a

								href="#Copyright">Detailed copyright information</a> is available.</p>

								<hr title="Separator from Header">

								</div>


								<h2><a name="Abstract" id="Abstract"></a>Abstract</h2>


								<p>This document contains guidelines on the use of the Unicode Standard in

								conjunction with markup languages such as XML.</p>


								<h2><a name="CommonStatus">Status of This Document (common)</a></h2>

								<!--PROPOSED UPDATE

								<p><font color="#FF0000">This is a proposed update to a Technical Report

								published jointly by the <a href="http://www.unicode.org/unicode/consortium/utc.html">Unicode

								Technical Committee</a> and by the <a href="http://www.w3.org/International/Group/">W3C

								Internationalization Working Group/Interest Group</a> (<a href="http://cgi.w3.org/MemberAccess/AccessRequest">W3C

								Members only</a>) in the context of the <a href="http://www.w3.org/International/Activity">W3C

								Internationalization Activity</a>. This is a draft document which may be

								updated, replaced, or superseded by other documents at any time. This is not a

								stable document; it is inappropriate to cite this document as other than a work

								in progress.&nbsp;</font></p>

								-->

								<!-- APPROVED -->


								<p>This is a Technical Report published jointly by the <a

								href="http://www.unicode.org/unicode/consortium/utc.html">Unicode Technical

								Committee</a> and by the <a href="http://www.w3.org/International/core/">W3C

								Internationalization Core Working Group</a>, which is part of the <a

								href="http://www.w3.org/International/Activity">W3C Internationalization

								Activity</a>.</p>


								<p>The base version of the Unicode Standard for this document is <a

								href="#Unicode50">Version 5.0</a>. For more information about versions of the

								Unicode Standard, see <a

								href="http://www.unicode.org/unicode/standard/versions/">http://www.unicode.org/unicode/standard/versions/</a>.

								Both the Unicode Standard and markup technologies are evolving. When

								appropriate, a new version of this document may be published.</p>

								Please mail corrigenda and other comments to the authors or use the <a

								href="http://www.unicode.org/reporting.html">reporting form</a>.


								<h2 class="unicode"><a name="UnicodeStatus">Status of This Document (Unicode

								Consortium)</a></h2>


								<div>

								<!-- PROPOSED UPDATE <font color="#FF0000">This document is a proposed

								update of a previously approved <b>Unicode Technical Report</b>. Publication

								does not imply endorsement by the Unicode Consortium. </font>

								-->

								<!-- APPROVED -->

								This document has been reviewed by Unicode members and other interested

								parties, and has been approved by the Unicode Technical Committee as a

								<b>Unicode Technical Report</b>. It is a stable document and may be used as

								reference material or cited as a normative reference from another document. <!-- -->

								 </div>


								<div>


								<blockquote>

								  <p><b>A Unicode Technical Report (UTR) </b>contains informative material.

								  Conformance to the Unicode Standard does not imply conformance to any UTR.

								  Other specifications, however, are free to make normative references to a

								  UTR.</p>

								</blockquote>

								</div>


								<div>

								For a list of current Unicode Technical Reports see <a

								href="http://www.unicode.org/reports/">http://www.unicode.org/reports</a>.


								<h2><a name="W3CStatus">Status of This Document (W3C)</a></h2>


								<p><em>This section describes the status of this document at the time of its

								publication. Other documents may supersede this document. A list of current

								W3C publications and the latest revision of this technical report can be

								found in the <a href="http://www.w3.org/TR/">W3C technical reports index</a>

								at http://www.w3.org/TR/.</em></p>

								<!--PROPOSED UPDATE

								<p><font color="#FF0000">This is a proposed update to a Note that has been

								previously endorsed by the W3C Internationalization Working Group/Interest

								Group, but has not been reviewed or endorsed by W3C Members.</font></p>

								-->

								<!--APPROVED -->


								<p>This document contains guidelines on the use of the Unicode Standard in

								conjunction with markup languages such as XML.</p>


								<p>This <a href="http://www.w3.org/2005/10/Process-20051014/tr.html#q75">W3C

								Working Group Note</a> was produced by the <a

								href="http://www.w3.org/International/core/" shape="rect">i18n Core Working

								Group</a>, part of the <a

								href="http://www.w3.org/International/">Internationalization Activity</a>.

								Please send comments related to this document to <a

								href="mailto:www-i18n-comments@w3.org?subject=%5Bunicode-xml%5D"

								shape="rect">www-i18n-comments@w3.org</a> (<a

								href="http://lists.w3.org/Archives/Public/www-i18n-comments/"

								shape="rect">public archive</a>). Use "[unicode-xml]" in the subject line of

								your email.</p>


								<p>Publication as a <a

								href="http://www.w3.org/2005/10/Process-20051014/tr.html#tr-end">Working

								Group Note</a> does not imply endorsement by the W3C Membership. At the time

								of publication, work on this document was considered complete and no further

								revisions are anticipated. It is a stable document and may be used as

								reference material or cited from another document. However, this document may

								be updated, replaced, or made obsolete by other documents at any time.</p>


								<p>This document was produced by a group operating under the <a

								href="http://www.w3.org/Consortium/Patent-Policy-20040205/">5 February 2004

								W3C Patent Policy</a>. W3C maintains a <a

								href="http://www.w3.org/2004/01/pp-impl/32113/status">public list of any

								patent disclosures</a> made in connection with the deliverables of the group;

								that page also includes instructions for disclosing a patent. An individual

								who has actual knowledge of a patent which the individual believes contains

								<a

								href="http://www.w3.org/Consortium/Patent-Policy-20040205/#def-essential">Essential

								Claim(s)</a> must disclose the information in accordance with <a

								href="http://www.w3.org/Consortium/Patent-Policy-20040205/#sec-Disclosure">section

								6 of the W3C Patent Policy</a>.</p>

								</div>

								<!-- -->


								<h2><a name="Contents">Table of Contents</a></h2>

								<ol>

								  <li><a href="#Introduction">Introduction</a><br>

								    1.1 <a href="#Notation">Notation</a></li>

								  <li><a href="#General">General Considerations</a><br>

								    2.1 <a href="#Linearity">Linearity versus Structure</a><br>

								    2.2 <a href="#Overlap">Overlap of Control Code and Markup

								    Semantics</a><br>

								    2.3 <a href="#Markup">Markup and Styling</a><br>

								    2.4 <a href="#Coincidence">Coincidence of Markup and Functions</a><br>

								    2.5 <a href="#Extensibility">Extensibility of Markup</a><br>

								    2.6 <a href="#Suitability">Suitability of Characters in Markup</a></li>

								  <li><a href="#Suitable">Characters not Suitable for Use With Markup</a><br>

								    3.1 <a href="#Charlist">Table of Characters not Suitable for Use With

								    Markup</a><br>

								    3.2 <a href="#Line">Line and Paragraph Separator</a><br>

								    3.3 <a href="#Bidi">Bidi Embedding Controls</a><br>

								    3.4 <a href="#Deprecated">Deprecated Formatting Characters</a><br>

								    3.5 <a href="#BOM">Byte Order Mark</a><br>

								    3.6 <a href="#Interlinear">Interlinear Annotation Characters</a><br>

								    3.7 <a href="#Object">Object Replacement Character</a><br>

								    3.8 <a href="#Musical">Musical Controls</a><br>

								    3.9 <a href="#Language">Language Tag Characters</a><br>

								    3.10 <a href="#OtherDeprecated">Other Deprecated Characters</a></li>

								  <li><a href="#Format">Format Characters Suitable for Use With Markup</a>

								     <br>

								    4.1 <a href="#Subtending">Subtending Marks</a><br>

								    4.2 <a href="#Fraction">Fraction Slash</a><br>

								    4.3 <a href="#Variation">Variation Selector</a><br>

								    4.4 <a href="#Ideographic">Ideographic Description Characters</a><br>

								    4.5 <a href="#Invisible">Invisible Mathematical Operators</a><br>

								    4.6 <a href="#LineBreak">Line Break Controls</a><br>

								    4.7 <a href="#Fillers">Hangul Fillers</a></li>

								  <li><a href="#Compatibility">Characters with Compatibility Mappings</a><br>

								    5.1 <a href="#Overview">Overview</a><br>

								    5.2 <a href="#Generating">Generating New Text</a><br>

								    5.3 <a href="#List">List item Marker Characters</a><br>

								    5.4 <a href="#Fractions">Fractions</a><br>

								    5.5 <a href="#Squared">Squared or Horizontal</a><br>

								    5.6 <a href="#Superscripts">Superscripts and Subscripts</a><br>

								    5.7 <a href="#Other">Other Characters Marked &lt;compat&gt;</a></li>

								  <li><a href="#Noncharacters">Noncharacters</a></li>

								  <li><a href="#White">White Space</a><br>

								    <a href="#converting-nl-to-ws">7.1 Converting Newline Functions to White

								    Space</a></li>

								  <li><a href="#Versioning">Versioning</a></li>

								  <li><a href="#Conformance">Conformance</a></li>

								  <li><a href="#References">References</a></li>

								  <li><a href="#Acknowledgements">Acknowledgements</a></li>

								  <li><a href="#ChangeHistory">Change History</a></li>

								  <li><a href="#Copyright">Copyright</a></li>

								</ol>


								<h2><a name="Introduction">1. Introduction</a></h2>


								<p>The Unicode Standard  [<a href="#Unicode">Unicode</a>] defines the

								universal character set. Its primary goal is to provide an unambiguous

								encoding of the content of plain text, ultimately covering all languages in

								the world, but also major text-based notational systems for science,

								technology, music, and scholarship.</p>


								<p>Currently in its <a href="#Unicode50">fifth major version</a>, Unicode

								contains a large number of characters covering most of the currently used

								scripts in the world. It also contains additional characters for

								interoperability with older character encodings, and characters with

								control-like functions included primarily for reasons of providing

								unambiguous interpretation of plain text. Unicode provides specifications for

								use of all of these characters.</p>


								<p>For document and data interchange, the Internet and the World Wide Web

								make extensive use of marked-up text such as <a href="#html4.01">HTML4.01</a>

								and <a href="#xml10">XML</a>. In many instances, markup provides the same, or

								essentially similar features to those provided by format characters in the

								Unicode Standard for use in plain text. Another special character category

								provided by Unicode are compatibility characters. While there may be valid

								reasons to support these characters and their specifications in plain text,

								their use in marked-up text can conflict with the rules of the markup

								language. Formatting characters are discussed in Section 3, <i><a

								href="#Suitable">Characters not Suitable for Use With Markup</a></i> and

								Section 4, <i><a href="#Format">Format Characters Suitable for Use With

								Markup</a>, </i>compatibility characters in Section 5,<i><a

								href="#Compatibility">Characters with Compatibility Mappings</a> </i>.

								Section 6 briefly discusses noncharacters, and Section 7 is devoted to white

								space.</p>


								<p>Issues resulting from canonical equivalences and Normalization [<a

								href="#UTR15">Normalization</a>] as well as the interaction of character

								encoding and methods of escaping characters in markup are discussed in the

								Character Model for the World Wide Web [<a href="#Charmod">Charmod</a>] and

								[<a href="#Charmodnorm">Charmodnorm</a>].</p>


								<p>The issues of using Unicode characters with marked-up text depend to some

								degree on the rules of the markup language in question and the set of

								elements it contains. In a narrow sense, this document concerns itself only

								with XML, and to some extent HTML. However, much of the general information

								presented here should be useful in a broader context, including some page

								layout languages.</p>


								<blockquote>

								  <p><b><a name="Note">Note:</a></b> Many of the recommendations of this

								  report depend on the availability of particular markup or styling. Where

								  possible, appropriate DTDs or Schemas should be used or designed to make

								  such markup or styling available, or the DTDs or Schemas used should be

								  appropriately extended. The current version of this document makes no

								  specific recommendations for the design of DTDs or Schemas, or for the use

								  of particular DTDs or Schemas, but the information presented here may be

								  useful to designers of DTDs and Schemas, and to people selecting DTDs or

								  Schemas for their applications. </p>


								  <p><b>Note: </b>The recommendations of this report do not apply in the case

								  of XML used for blind data transport and similar cases.</p>

								</blockquote>


								<h3><a name="Notation">1.1 Notation</a></h3>


								<p>This report uses XML [<a href="#xml10">XML</a>] as a prominent and general

								example of markup. The XML namespace notation [<a

								href="#Namespace">Namespace</a>] is used to indicate that a certain element

								is taken from a specific markup language. As an example, the prefix 'xhtml:'

								indicates that this element is taken from [<a href="#XHTML">XHTML</a>]. This

								means that the examples containing the namespace prefix 'xhtml:' are assumed

								to include a namespace declaration of xmlns:xhtml="..." </p>


								<p>Characters are denoted using the notation used in the Unicode Standard,

								that is, an optional U+ followed by their hexadecimal number, using at least

								4 digits, such as "U+1234" or "U+10FFFD". In XML or HTML this could be

								expressed as "&amp;#x1234;" or "&amp;#x10FFFD;".</p>


								<h2><a name="General">2. General Considerations</a></h2>


								<p>There are several general points to consider when looking at the

								interaction between character encoding and markup. </p>

								<ul>

								  <li>Linearity of text vs. hierarchy of markup structure</li>

								  <li>Overlap of control codes and markup semantics</li>

								  <li>Markup <i>vs.</i> Styling</li>

								  <li>Coincidence of semantic markup and functions </li>

								  <li>Extensibility of markup</li>

								</ul>


								<h3 align="left"><a name="Linearity">2.1 Linearity versus Structure</a></h3>


								<p align="left">Encoding text as a sequence of characters without further

								information leads to a linear sequence, commonly called plain text. Character

								follows character, without any particular structure. Markup, on the other

								hand, defines a hierarchical structure for the text or data. In the case of

								XML and most other, similar markup languages, the markup defines a tree

								structure. While this tree structure is linearized for transmission in the

								XML document, once the document has been parsed, the tree is available

								directly.</p>


								<p align="left">Operations that are easy to perform on trees are often

								difficult to perform on linear sequences and vice versa. By separating

								functionality between character encoding and markup appropriately, the

								architecture becomes simpler, more powerful and longer-lasting.</p>


								<p align="left">In particular, operations on hierarchical structures can

								easily make sure that information is kept in context. Attributes assigned to

								parts of a document are moved together with the associated part of the

								document. Assigning an attribute to a part of a document limits the scope of

								the attribute to that part of the document. Performing the same operations on

								linear sequences of characters using control codes to set attributes and to

								delimit their scope requires much more work and is error prone. Locating the

								start or end of a span of text of the same attribute requires scanning

								backwards and forwards for the embedded delimiter or control code. Moving or

								editing text often results in mismatched control codes, so that an attribute

								might suddenly apply to text it was not intended for.</p>


								<h3 align="left"><a name="Overlap">2.2 Overlap of Control Code and Markup

								Semantics</a></h3>


								<p align="left">When markup is not available, plain text may require control

								characters. This is usually the case where plain text must contain some

								scoping or attribute information in order to be legible, <i>i.e.</i> to be

								able to transmit the same content between originator and receiver. Many of

								these control characters have direct equivalents in particular markup

								languages, since markup handles these concerns efficiently. If both

								characters and their markup equivalents may be present in the same text, the

								question of priority is raised. Therefore it is important to identify and

								resolve these ambiguities at the time markup is first applied.</p>


								<h3 align="left"><a name="Markup">2.3 Markup and Styling</a></h3>


								<p align="left">Besides the basic character encoding and text markup there is

								a third contributor to text functionality, namely styling. Markup is

								concerned with the logical structure of the text or data, <i>e.g. </i>to

								indicate sections, subsections, and headers in a document, or to indicate the

								various fields of an address record. Styling is used to present the

								information in various ways, <i>e.g.</i> in different fonts, different type

								styles (italic, bold), different colors, <i>etc. </i>Some character codes do

								not encode a generic character, but a styled character. Where these

								characters are used, styling information is frozen, <i>i.e.</i> it is no

								longer possible to alter the appearance of the text by applying style

								information. However, there are many examples where a historically free

								stylistic variation has over time become a semantic distinction that is

								properly encoded as plain text. Sometimes, what is a free variation in some

								contexts, implies strict semantic differentiation in others. In all such

								instances, altering the appearance of the text by styling information would

								irreparably alter the content of the text. This is of particular concern with

								mathematical notation or systems for phonetic and phonemic transcription

								which make extensive semantic use of styles on a character by character

								basis.</p>


								<h3 align="left"><a name="Coincidence">2.4 Coincidence of Markup and

								Functions</a></h3>


								<p align="left">Dealing with various functionalities on the markup level has

								the additional advantage that in most cases, text portions that need some

								particular attribute (or styling) are actually those text portions identified

								by markup. A paragraph may be in French, a citation may need a bidi

								embedding, a keyword may be in italics, a list number may be circled, and so

								on. This makes it very efficient to associate those attributes with

								markup.</p>


								<p align="left">However, where local or point-like functionality is needed,

								markup is <i>not</i> very efficient and its main benefit, easy manipulation

								of scope, is not required. On the contrary, the intrusion of markup in the

								middle of words can make search or sort operations more difficult. For these

								cases expressing the information as character codes is not only a viable, but

								often the preferred alternative, which needs to be considered in the design

								of markup languages.</p>


								<h3 align="left"><a name="Extensibility">2.5 Extensibility of Markup</a></h3>


								<p align="left">Character encoding works with a range of integers used as

								character codes. This is extremely efficient, but has some limitations.

								Markup, on the other hand, is much more extensible. Using technologies such

								as XML Namespaces [<a href="#Namespace">Namespace</a>] and their application

								in schema languages like [<a href="#XMLSchema">XML Schema</a>], various

								vocabularies can be mixed.</p>


								<h3><a name="Suitability">2.6 Suitability of Characters in Markup</a></h3>


								<p>The suitability of a particular character for markup depends on its status

								in the Unicode Standard, the nature of its behavior in text and the

								availability of equivalent markup. Many format characters that are needed for

								advanced plain text are not suitable for use with markup. <a

								href="#Suitable">Section 3</a> gives a list and detailed descriptions.

								However, not all format characters are unsuitable for use with markup. <a

								href="#Format">Section 4</a> provides a list of format characters that are

								suitable for use with markup and gives some discussion about their use. In

								addition to format characters, the Unicode Standard also has compatibility

								characters, some of which may be replaceable by suitable markup. These

								characters are discussed in <a href="#Compatibility">Section 5</a>.</p>


								<h2><a name="Suitable">3. Characters not Suitable for use With Markup</a></h2>


								<p>There are characters which are unsuitable in the context of markup in

								XML/HTML and whose use is discouraged, because one or more of the following

								conditions apply:</p>

								<ul>

								  <li>They are deprecated in the Unicode Standard.</li>

								  <li>They are unsupportable without additional data.</li>

								  <li>They are difficult to handle because they are stateful.</li>

								  <li>They are better handled by markup.</li>

								  <li>They are undesirable because of conflict with equivalent markup.</li>

								</ul>


								<p><a href="#Charlist">Section 3.1</a> provides a list of such characters.

								Sections <a href="#Line">3.2</a> through <a href="#OtherDeprecated">3.10</a>

								discuss in more detail the following points for the discouraged

								characters.</p>

								<ul>

								  <li>Short description of semantics</li>

								  <li>Reason for inclusion in Unicode</li>

								  <li>Specific problems when used with markup</li>

								  <li>Other areas where problems may occur (<i>e.g.</i> plain text)</li>

								  <li>What kind of markup to use instead</li>

								  <li>What to do if detected in a particular context</li>

								</ul>


								<h3><a name="Charlist">3.1 Table of Characters not Suitable for use With

								Markup</a></h3>


								<p>The following table contains the characters currently considered not

								suitable for use with markup in XML or HTML. (See however the <a

								href="#Note">note</a> in the <a href="#Introduction">Introduction</a>.) They

								may also be unsuitable for other markup or page layout languages. For

								determining possible conflict this report uses the markup available in

								HTML.</p>


								<p align="center"><b>Table 3.1 Characters not suitable for use with

								markup</b></p>


								<table border="1" cellpadding="2" cellspacing="0" width="95%">

								  <tbody>

								    <tr>

								      <th align="left" bgcolor="#ccffcc" width="210"><p

								        align="left">Codepoints</p>

								      </th>

								      <th align="left" bgcolor="#ccffcc" width="273"><p

								        align="left">Names/Description</p>

								      </th>

								      <th align="left" bgcolor="#ccffcc" width="341"><p align="left">Short

								        Comment</p>

								      </th>

								    </tr>

								    <tr>

								      <td width="210">U+0340..U+0341</td>

								      <td width="273">Clones of grave and accent</td>

								      <td width="341">Deprecated in Unicode</td>

								    </tr>

								    <tr>

								      <td width="210">U+17A3, U+17D3</td>

								      <td width="273">Obsolete characters for Khmer</td>

								      <td width="341">Deprecated in Unicode</td>

								    </tr>

								    <tr>

								      <td width="210">U+2028..U+2029</td>

								      <td width="273">Line and paragraph separator</td>

								      <td width="341">use &lt;xhtml:br /&gt;,

								        &lt;xhtml:p&gt;&lt;/xhtml:p&gt;, or equivalent</td>

								    </tr>

								    <tr>

								      <td width="210">U+202A..U+202E</td>

								      <td width="273">BIDI embedding controls <br>

								        (LRE, RLE, LRO, RLO, PDF)</td>

								      <td width="341">Strongly discouraged in [<a

								        href="#html4.01">HTML4.01</a>]</td>

								    </tr>

								    <tr>

								      <td width="210">U+206A..U+206B</td>

								      <td width="273">Activate/Inhibit Symmetric swapping</td>

								      <td width="341">Deprecated  in Unicode</td>

								    </tr>

								    <tr>

								      <td width="210">U+206C..U+206D</td>

								      <td width="273">Activate/Inhibit Arabic form shaping</td>

								      <td width="341">Deprecated in Unicode</td>

								    </tr>

								    <tr>

								      <td width="210">U+206E..U+206F</td>

								      <td width="273">Activate/Inhibit National digit shapes</td>

								      <td width="341">Deprecated in Unicode</td>

								    </tr>

								    <tr>

								      <td width="210">U+FFF9..U+FFFB</td>

								      <td width="273">Interlinear annotation characters</td>

								      <td width="341">Use ruby markup [<a href="#Ruby">Ruby</a>]</td>

								    </tr>

								    <tr>

								      <td rowspan="2" width="210">U+FEFF</td>

								      <td width="273">as ZWNBSP</td>

								      <td width="341">Use U+2060 Word Joiner instead</td>

								    </tr>

								    <tr>

								      <td width="273">as Byte Order Mark</td>

								      <td width="341">Use only at the start of a file, not as part of

								      markup</td>

								    </tr>

								    <tr>

								      <td width="210">U+FFFC</td>

								      <td width="273">Object replacement character</td>

								      <td width="341">Use markup, e.g. HTML &lt;object&gt; or HTML

								      &lt;img&gt;</td>

								    </tr>

								    <tr>

								      <td width="210">U+1D173..U+1D17A</td>

								      <td width="273">Scoping for Musical Notation</td>

								      <td width="341">Use an appropriate markup language</td>

								    </tr>

								    <tr>

								      <td width="210">U+E0000..U+E007F</td>

								      <td width="273">Language Tag code points </td>

								      <td width="341">Use xhtml:lang or xml:lang</td>

								    </tr>

								  </tbody>

								</table>


								<p>Except for Line and Paragraph Separator, or the Byte Order Mark, it is

								acceptable for browsers and similar user agents to ignore the presence of

								discouraged characters in HTML or XML. It is up to authoring tools to ensure

								proper conversion between these characters and equivalent markup where it

								exists.</p>


								<h3><a name="Line">3.2 Line and Paragraph Separator, U+2028..U+2029</a></h3>


								<p><em>Short description</em>: The line and paragraph separator provide

								unambiguous means to denote hard line breaks and paragraph delimiters in

								plain text.</p>


								<p><em>Reason for inclusion</em>: These characters were introduced into the

								Unicode Standard to overcome the ambiguous and widely divergent use of

								control codes for this purpose.<font color="#00ffff"></font> See <i>Section

								5.8, Newline Guidelines,</i> in [<a href="#Unicode">Unicode</a>].</p>


								<p><em>Problems when used in markup</em>: Including these characters in

								markup text does not work where it would duplicate the existing markup

								commands for delimiting paragraphs and lines.</p>


								<p><em>Problems with other uses</em>: The separator characters can also

								problematic when used in plain text, because legacy data is usually converted

								code point for code point into Unicode and all receivers of Unicode plain

								text have to effectively be able to interpret the existing use of control

								codes for this purpose. As a result, fewer Unicode implementations support

								these characters, than would be the case otherwise.</p>


								<p><em>Replacement markup</em>: In HTML, use &lt;xhtml:br /&gt; instead of

								U+2028 and surround paragraphs by &lt;xhtml:p&gt; and &lt;/xhtml:p&gt;

								instead of separating them with U+2029.</p>


								<p><em>What to do if detected</em>: In a browser context, treat as white

								space, or ignore. When received in an editing context, replace the character

								by the corresponding markup. </p>


								<h3><a name="Bidi">3.3 Bidi Embedding Controls (LRE, RLE, LRO, RLO, PDF),

								U+202A..U+202E</a></h3>


								<p><em>Short description</em>: The bidi embedding controls are required to

								supplement the Unicode Bidirectional Algorithm in plain text</p>


								<p><em>Reason for inclusion</em>: The Unicode Bidirectional algorithm

								unambiguously resolves the display direction for bidirectional text. It does

								so by assigning all characters directional categories and then resolving

								these in context. In a small number of circumstances this <i>implicit </i>

								method does not produce satisfactory results and embedding controls are

								needed to ensure that sender and receiver agree on the display direction for

								a given text. See Unicode Technical Report #9, The Bidirectional Algorithm <a

								href="#UTR9">[UAX 9]</a>.</p>


								<p><em>Problems when used in markup</em>: These characters duplicate

								available markup, which is better suited to handle the stateful nature of

								their effect. </p>


								<p><em>Problems with other uses</em>: The embedding controls introduce a

								state into the plain text, which must be maintained when editing or

								displaying the text. Processes that are modifying the text without being

								aware of this state may inadvertently affect the rendering of large portions

								of the text, for example by removing a PDF.</p>


								<p><em>Replacement markup</em>: The following table gives the replacement

								markup:<br>

								</p>


								<blockquote>


								  <table border="1" cellspacing="0">

								    <tbody>

								      <tr>

								        <td bgcolor="#ccffcc" width="15"><b>Unicode</b></td>

								        <td bgcolor="#ccffcc" width="30%"><b>Equivalent markup</b></td>

								        <td bgcolor="#ccffcc" width="55%"><b>Comment</b></td>

								      </tr>

								      <tr>

								        <td width="15"><p>RLO</p>

								        </td>

								        <td width="30%">&lt;xhtml:bdo dir = "rtl"&gt;</td>

								        <td width="55%"> </td>

								      </tr>

								      <tr>

								        <td width="15"><p>LRO</p>

								        </td>

								        <td width="30%">&lt;xhtml:bdo dir = "ltr"&gt;</td>

								        <td width="55%"> </td>

								      </tr>

								      <tr>

								        <td width="15">PDF</td>

								        <td width="30%">&lt;/xhtml:bdo&gt;</td>

								        <td width="55%">when used to terminate RLO or LRO only, otherwise

								          ignore</td>

								      </tr>

								      <tr>

								        <td width="15">RLE</td>

								        <td width="30%">dir = "rtl"</td>

								        <td width="55%">attribute on block or inline element</td>

								      </tr>

								      <tr>

								        <td width="15">LRE</td>

								        <td width="30%">dir = "ltr"</td>

								        <td width="55%">attribute on block or inline element</td>

								      </tr>

								    </tbody>

								  </table>

								</blockquote>


								<p>For details on bidi markup, please see Section 8.2 of HTML [<a

								href="#HTML4.0-8.2">HMTL 4.0-8.2</a>]. The text of HTML 4.0 gives this

								recommendation: </p>


								<blockquote>

								  <p><em><strong>Using HTML directionality markup with Unicode

								  characters.</strong> Authors and designers of authoring software should be

								  aware that conflicts can arise if the <a

								  href="http://www.w3.org/TR/html401/struct/dirlang.html#adef-dir"

								  class="noxref"><samp class="ainst">dir</samp></a> attribute is used on

								  inline elements (including <a

								  href="http://www.w3.org/TR/html401/struct/dirlang.html#edef-BDO"

								  class="noxref"><samp class="einst">BDO</samp></a>) concurrently with the

								  corresponding<a rel="biblioentry" href="#Unicode"

								  class="normref">[UNICODE]</a> formatting characters. Preferably one or the

								  other should be used exclusively. The markup method offers a better

								  guarantee of document structural integrity and alleviates some problems

								  when editing bidirectional HTML text with a simple text editor, but some

								  software may be more apt at using the<a rel="biblioentry" href="#Unicode"

								  class="normref">[UNICODE]</a> characters. If both methods are used, great

								  care should be exercised to insure proper nesting of markup and directional

								  embedding or override, otherwise, rendering results are undefined.</em></p>

								</blockquote>


								<p>This document goes beyond HTML and recommends that <i>only</i> the markup

								should be used.</p>


								<blockquote>

								  <p><b>Note:</b> The interpretation of how to handle directionality markup

								  for block level elements differs in different versions of [<a

								  href="#CSS">CSS</a>].</p>

								</blockquote>


								<p><em>What to do if detected</em>: In a browser context, ignore. When

								received in an editing context, replace the characters by the appropriate

								markup. </p>


								<h3><a name="Deprecated">3.4<em></em>Deprecated Formatting Characters,

								U+206A..U+206F</a></h3>


								<p><em>Short description</em>: These characters are deprecated. They were

								originally intended to allow explicit activation of contextual shaping,

								numeric digit rendering and symmetric swapping.</p>


								<p><em>Reason for inclusion</em>: These characters were retained from draft

								versions of ISO 10646.</p>


								<p><em>Problems when used in markup</em>: The processing model for these

								characters is not supported in markup.</p>


								<p><em>Problems with other uses</em>: The Unicode Standard requires that

								symmetric swapping, contextual shaping, and alternate digit shapes are

								enabled by default and no longer supports inhibiting any of them by use of

								these character codes. The most likely effect of their occurrence in

								generated text would be that of a 'garbage' character.</p>


								<p><em>Conversion for use with markup</em>: Apply the appropriate conversion

								to bring the data stream in line with the Unicode text model for

								bidirectional text and cursively-connected scripts.</p>


								<p><em>What to do if detected</em>: When received by a browser as part of

								marked up text, they may be ignored. When received in an editing context,

								they may be removed, possibly with a warning. Alternatively, an appropriate

								conversion from the legacy text model may be provided. This will most likely

								be limited to applications directly interfacing with and knowledgeable of the

								particular legacy implementation that inspired these characters.</p>


								<h3><a name="BOM">3.5 Byte Order Mark, ZWNBSP, U+FEFF</a></h3>


								<p><em>Short description</em>: U+FEFF has two functions. It is formally known

								as <span style="font-variant: small-caps;">zero width no-break space</span>

								(ZWNBSP), and can act as a word joiner, but its primary use is as <i>byte

								order mark (BOM)</i>, to indicate in a file signature at the start of a file

								that a file is in a particular Unicode encoding form and of a particular byte

								order. Using U+FEFF as a word joiner in new data is deprecated  as of [<a

								href="#Unicode32">Unicode3.2</a>] in favor of U+2060 <span

								style="font-variant: small-caps;">word joiner</span> (WJ). The use as byte

								order mark remains unaffected.</p>


								<p><em>Reason for inclusion</em>: Originally included in Unicode for the sole

								purpose of indicating byte order or use in file signatures, the character

								acquired the ZWNBSP semantics as part of the merger between ISO/IEC 10646 and

								Unicode. When used as a byte order mark the character is placed at the

								beginning of a file. If a recipient views it as FEFF then the byte order

								between sender and receiver match. If the recipient views it as FFFE (a

								non-character code point) then the sender used opposite byte order from the

								recipient, and the recipient needs to invert the byte order or refuse to read

								the file. When used as a ZWNBSP the character is intended to prevent breaks

								between adjacent characters. This function is now provided by U+2060 <span

								style="font-variant: small-caps;">word joiner</span> (WJ) making it

								unnecessary to insert U+FEFF in the middle of a file. For more information

								see Chapter 16 of [<a href="#Unicode">Unicode</a>].</p>


								<p><em>Problems when used in markup</em>: Using U+FEFF as ZWNBSP makes it

								impossible to distinguish it from the case where a byte order mark was left

								in the middle of a file inadvertently due to incorrect splicing. U+FEFF can

								and in some cases (XML encoded in UTF-16) must be used at the start of a file

								containing markup, but as a signature, this is not part of actual markup or

								marked-up content. Some older versions of browsers and parsers may not

								correctly recognize U+FEFF at the start of a file encoded in UTF-8. For

								details of how U+FEFF participates in encoding detection of XML files, see

								Appendix F of <a href="#xml10">[XML 1.0]</a>. </p>


								<p><em>Problems with other uses</em>: The use of byte order mark as ZWNBSP is

								also problematic when used in plain text, and has been deprecated for that

								purpose in favor of U+2060 <span style="font-variant: small-caps;">word

								joiner</span>. The use of U+FEFF in file signatures to indicate byte order is

								the only recommended use of this character.</p>


								<p><em>Replacement markup</em>: None. In locations other than the beginning

								of a text file, U+FEFF can be removed or replaced by U+2060 in an editing

								environment.</p>


								<p><em>What to do if detected</em>:  When received by a browser as part of

								marked-up text, treat depending on location. At the start of an external

								entity, treat as byte order mark (i.e. as part of the character encoding, not

								as part of the parsed character stream, see e.g. Section 4.3.3 of <a

								href="#xml10">[XML 1.0]</a>). Otherwise, assume it is older data using it as

								ZWNBSP. When receiving plain text in an editing environment, editors may take

								one or more of several actions: replace ZWNBSP in the middle of a file with

								WJ or issue a warning to the user.</p>


								<h3><a name="Interlinear">3.6 Interlinear Annotation Characters,

								U+FFF9-U+FFFB</a></h3>


								<p><em>Short description</em>: The interlinear annotation characters are used

								to delimit interlinear annotations in certain circumstances. They are

								intended to provide text anchors and delimiters for interlinear annotation

								for in-process use and are not intended for interchange.</p>


								<p><em>Reason for inclusion</em>: The interlinear annotation characters were

								included in Unicode only in order to reserve code points for very frequent

								application-internal use. The interlinear annotation characters are used to

								delimit interlinear annotations in contexts where other delimiters are not

								available, and where non-textual means exist to carry formatting information.

								Many text-processing applications store the text and the associated markup

								(or in some cases styling information) of a document in separate structures.

								The actual text is kept in a single linear structure; additional information

								is kept separately with pointers to the appropriate text positions. This is

								called out-of-band information. The overall implementation makes sure that

								these two structures are kept in sync. If the text contains interlinear

								annotations, it is extremely helpful for implementations to have delimiters

								in the text itself; even though delimiters are not otherwise used for style

								markup. With this method, and unlike the case of the object replacement

								character, all textual information can remain in the standard text stream,

								but any additional formatting information is kept separately. In addition,

								the Interlinear Annotation Anchor serves as a placeholder for formatting

								information for the whole annotation object, the same way a paragraph mark

								can be a placeholder to attach paragraph formatting information.</p>


								<p><em>Problems when used in markup</em>: Including interlinear annotation

								characters in marked-up text does not work because the additional formatting

								information (how to position the annotation,...) is not available.</p>


								<p><em>Problems with other uses</em>: The interlinear annotation characters

								are also problematic when used in plain text, and are not intended for that

								purpose. In particular, on older display systems that simply ignore or

								replace the Interlinear Annotation Characters, the meaning of the text may be

								changed.</p>


								<p><em>Replacement markup</em>: The markup to be used in place of the

								Interlinear Annotation Characters depends on the formatting and nature of the

								interlinear annotation in question. For ruby, please see [<a

								href="#Ruby">Ruby</a>].</p>


								<p><em>What to do if detected</em>:  When received by a browser as part of

								marked-up text, they may be ignored. When receiving plain text in an editing

								environment, editors may take one or more of several actions: remove U+FFF9

								together with removing all characters between U+FFFA and following U+FFFB;

								ignore U+FFF9 and turn U+FFFA and U+FFFB  into "[" and "]" respectively, or

								into similar characters; issue a warning to the user; or tentatively convert

								into appropriate ruby markup for further editing and formatting by the

								user.</p>


								<h3><a name="Object">3.7 Object Replacement Character, U+FFFC</a></h3>


								<p><em>Short description</em>: The object replacement character is used to

								stand in place of an object (e.g. an image) included in a text.</p>


								<p><em>Reason for inclusion</em>: The object replacement character was

								included in Unicode only in order to reserve a codepoint for a very frequent

								application-internal use. Many text-processing applications store the text

								and the associated markup (or in some cases styling information) of a

								document in separate structures. The actual text is kept in a single linear

								structure; additional information is kept separately with pointers to the

								appropriate text positions. The overall implementation makes sure that these

								two structures are kept in sync. If the text contains objects such as images,

								it is extremely helpful for implementations to have a sentinel in the text

								itself; any additional information is kept separately.</p>


								<p><em>Problems when used in markup</em>: Including an object replacement

								character in markup text does not work because the additional information

								(what object to include,...) is not available.</p>


								<p><em>Problems with other uses</em>: The object replacement character is

								also problematic when used in plain text, because there is no way in plain

								text to provide the actual object information or a reference to it.</p>


								<p><em>Replacement markup</em>: The markup to be used in place of the Object

								Replacement Character depends on the object in question and the markup

								context it is used in. Typical cases are &lt;xhtml:img src='...' /&gt;,

								&lt;xhtml:object ...&gt;, or &lt;html:applet ...&gt;. These constructs allow

								providing all additional information needed to identify and use the object in

								question.</p>


								<p><em>What to do if detected</em>: Browsers may ignore this character. When

								received in an editing context, if the actual object is accessible, editors

								may either replace the character by the appropriate markup for that object,

								or otherwise remove it, ideally providing a warning.</p>


								<h3><a name="Musical">3.8 Musical Controls</a>, U+1D173..U+1D17A</h3>


								<p><em>Short description</em>: A series of characters for controlling scope

								in musical notation.</p>


								<p><em>Reason for inclusion</em>: These characters designate the start and

								end of common musical constructs. Full musical layout depends on additional

								information, for example pitch, that cannot be encoded using Unicode.

								However, many musical symbols may be depicted in isolation (and without

								assigning pitch) as part of a textual discussion of music. Plain text use of

								Unicode characters is primarily intended for this latter purpose. The scoping

								operators can be used to support limited renderings of beams, slurs, phrases,

								etc. in this context. However, in the context of markup languages, musical

								scoring calls for a dedicated markup language (analogous to MathML) which

								would be expected to contain markup for these constructs.</p>


								<p><em>Problems when used in markup</em>: These characters duplicate

								information that can in principle be expressed in markup.</p>


								<p><em>Problems with other uses</em>: Their special code range allows them to

								be easily filtered, but applications that do not expect them will treat them

								as garbage characters.</p>


								<p><em>Replacement markup</em>: Replace with equivalent markup if

								available.</p>


								<p><em>What to do if detected</em>: Browsers may ignore these characters.

								When received in an editing context, editors may remove or replace them by

								equivalent markup.</p>


								<h3><a name="Language">3.9 Language Tag Characters</a>, U+E0000..U+E007F</h3>


								<p><em>Short description</em>: A series of characters for expressing language

								tags, based on existing standards for language tags using the rules in

								Chapter 16 of [<a href="#Unicode">Unicode</a>].</p>


								<p><em>Reason for inclusion</em>: These characters allow in-band language

								tagging in situations where full markup is not available, while allowing easy

								filtering by applications that do not support them. They were solely included

								for the benefit of those Internet protocols, such as ACAP, which require a

								standard mechanism for marking language in UTF-8 strings, and at the same

								time to avoid the use of other tagging schemes that relied on specific

								details of the encoding form used.</p>


								<p><em>Problems when used in markup</em>: These characters duplicate

								information that can be expressed in markup.</p>


								<p><em>Problems with other uses</em>: Their special code range allows them to

								be easily filtered, but applications that do not expect them will treat them

								as garbage characters.</p>


								<p><em>Replacement markup</em>: Replace with equivalent language markup. XML

								and XHTML have the xml:lang attribute. HTML has the lang attribute. These

								attributes follow different scoping rules than the tag characters, therefore

								this replacement will generally not be a simple 1:1 substitution.</p>


								<p><em>What to do if detected</em>: Browsers may ignore these characters.

								When received in an editing context, editors may remove or replace them by

								equivalent markup.</p>


								<h3><a name="OtherDeprecated">3.10 Other Characters Deprecated in

								Unicode</a></h3>


								<p><em>Short description</em>: The Unicode Character Database [<a

								href="#UnicodeData">UnicodeData</a>] lists all characters that have been

								deprecated in [<a href="#Unicode">Unicode</a>]. This list may grow (slowly)

								over time. Deprecated characters remain valid characters forever, but their

								use is strongly discouraged. Deprecation of characters is applied only in

								exceptional circumstances. It is never the result of historical changes of a

								writing system: characters no longer in current, modern use are retained in

								Unicode, as they are needed for the representation of historical

								documents.</p>


								<p><em>Reason for inclusion</em>: Usually, characters that are deprecated

								were never needed, but were inadvertently added to the Unicode Standard,

								perhaps based on incomplete information available at the time of encoding.</p>


								<p><em>Problems when used in markup</em>: Except where noted elsewhere in

								this document, their presence in markup presents the same problems as in

								plain text, usually that of an unnecessary duplicate encoding.</p>


								<p><em>Problems with other uses</em>: Depends on the character and the reason

								for its deprecation. For more information see [<a

								href="#Unicode">Unicode</a>].</p>


								<p><em>Conversion for use with markup</em>: For deprecated characters not

								discussed elsewhere in this document, see the relevant descriptions of those

								characters in [<a href="#Unicode">Unicode</a>] for information on the

								recommended alternatives.</p>


								<p><em>What to do if detected</em>:  Unless a specific recommendation is

								given elsewhere, deprecated characters are not ignored; where possible, in an

								editing environment, a preferred alternate encoding may be substituted.</p>


								<h2><a name="Format">4. Format Characters Suitable for Use with

								Markup</a></h2>


								<p>The following table contains format characters that do not exhibit the

								problems discussed at the start of <a href="#Suitable">Section 3</a>. Despite

								their apparent relation to or similarity with characters in table <a

								href="#Charlist">3.1</a>, they are considered suitable for use with markup.

								It is not acceptable for user agents to ignore the characters in table 4.1.

								For a description of these characters see [<a

								href="#Unicode">Unicode</a>].</p>


								<p align="center"><b>Table 4.1: Some characters that affect text format but

								are suitable for use with markup</b></p>


								<table border="1" cellpadding="2" cellspacing="0" width="95%">

								  <tbody>

								    <tr>

								      <th align="left" bgcolor="#ccffcc" width="198"><p align="left">Code

								        points</p>

								      </th>

								      <th align="left" bgcolor="#ccffcc" width="362"><p

								        align="left">Names/Description</p>

								      </th>

								      <th align="left" bgcolor="#ccffcc" width="280"><p align="left">Short

								        Comment</p>

								      </th>

								    </tr>

								    <tr>

								      <td width="198">U+00A0</td>

								      <td width="362">No-break Space</td>

								      <td width="280">Line break control</td>

								    </tr>

								    <tr>

								      <td width="198">U+00AD</td>

								      <td width="362">Soft Hyphen</td>

								      <td width="280">Line break control</td>

								    </tr>

								    <tr>

								      <td width="198">U+034F</td>

								      <td width="362">Combining Grapheme Joiner</td>

								      <td width="280">Used in sorting</td>

								    </tr>

								    <tr>

								      <td width="198">U+0600</td>

								      <td width="362">Arabic Number Sign</td>

								      <td width="280">Subtending mark</td>

								    </tr>

								    <tr>

								      <td width="198">U+0601</td>

								      <td width="362">Arabic Sign Sanah</td>

								      <td width="280">Subtending mark</td>

								    </tr>

								    <tr>

								      <td width="198">U+0602</td>

								      <td width="362">Arabic Footnote Marker</td>

								      <td width="280">Subtending mark</td>

								    </tr>

								    <tr>

								      <td width="198">U+0603</td>

								      <td width="362">Arabic Sign Safha</td>

								      <td width="280">Subtending mark</td>

								    </tr>

								    <tr>

								      <td width="198">U+06DD</td>

								      <td width="362">Arabic End of Ayah</td>

								      <td width="280">Enclosing mark</td>

								    </tr>

								    <tr>

								      <td width="198">U+070F</td>

								      <td width="362">Syriac Abbreviation Mark (SAM)</td>

								      <td width="280">Supertending mark</td>

								    </tr>

								    <tr>

								      <td width="198">U+0F0C</td>

								      <td width="362">Tibetan Mark Delimiter Tsheg Bstar</td>

								      <td width="280">Non-breaking form of 0F0B</td>

								    </tr>

								    <tr>

								      <td width="198">U+115F..U+1160</td>

								      <td width="362">Hangul Jamo Fillers</td>

								      <td width="280">Filler</td>

								    </tr>

								    <tr>

								      <td width="198">U+180B..U+180E</td>

								      <td width="362">Mongolian Variation Selectors(FVS1..FVS3), Mongolian

								        Vowel Separator</td>

								      <td width="280">Required for Mongolian</td>

								    </tr>

								    <tr>

								      <td width="198">U+200B</td>

								      <td width="362">Zero-width Space</td>

								      <td width="280">Line break control</td>

								    </tr>

								    <tr>

								      <td width="198">U+200C..U+200D</td>

								      <td width="362">Zero-width Join Controls (ZWJ and ZWNJ)</td>

								      <td width="280">Required for a.o. Persian and many Indic scripts</td>

								    </tr>

								    <tr>

								      <td width="198">U+200E..U+200F</td>

								      <td width="362">Implicit Directional Marks (LRM and RLM)</td>

								      <td width="280">LRM and RLM are allowed</td>

								    </tr>

								    <tr>

								      <td width="198">U+2011</td>

								      <td width="362">Non-breaking Hyphen</td>

								      <td width="280">Line break control</td>

								    </tr>

								    <tr>

								      <td width="198">U+202F</td>

								      <td width="362">Narrow No-break Space</td>

								      <td width="280">Line break control/Mongolian</td>

								    </tr>

								    <tr>

								      <td width="198">U+2044</td>

								      <td width="362">Fraction Slash</td>

								      <td width="280">Or use markup (MathML)</td>

								    </tr>

								    <tr>

								      <td width="198">U+2060</td>

								      <td width="362">Word Joiner</td>

								      <td width="280">Use for that purpose instead of U+FEFF ZWNBSP</td>

								    </tr>

								    <tr>

								      <td width="198">U+2061..U+2064</td>

								      <td width="362">Invisible Mathematical Operators</td>

								      <td width="280">Mathematical use</td>

								    </tr>

								    <tr>

								      <td width="198">U+2FF0..U+2FFB</td>

								      <td width="362">Ideographic Character Description</td>

								      <td width="280">Graphic characters (not controls)</td>

								    </tr>

								    <tr>

								      <td width="198">U+303E</td>

								      <td width="362">Ideographic Variation Indicator</td>

								      <td width="280">Graphic character (not a control)</td>

								    </tr>

								    <tr>

								      <td width="198">U+FF80</td>

								      <td width="362">Halfwidth Hangul Filler</td>

								      <td width="280">Filler, not generally required</td>

								    </tr>

								    <tr>

								      <td width="198">FE00..FE0F</td>

								      <td width="362">Variation Selectors</td>

								      <td width="280">Modify graphic characters</td>

								    </tr>

								    <tr>

								      <td width="198">E0100..E01DF</td>

								      <td width="362">Variation Selectors</td>

								      <td width="280">Modify graphic characters</td>

								    </tr>

								  </tbody>

								</table>


								<p>The following subsections briefly discuss some of the characters from the

								above list, particularly those that affect more than their immediately

								adjacent neighbors. Please see the Unicode Standard [<a

								href="#Unicode">Unicode</a>] for full details.</p>


								<h3><a name="Subtending">4.1 Subtending Marks</a></h3>


								<p>Subtending marks are needed to represent a common feature in the Arabic

								and Syriac scripts where a mark can be placed below a range of characters,

								for example below a sequence of digits, to indicate a year. The Syriac

								abbreviation mark is placed above a series of characters, making it

								technically a supertending mark, and the <span

								style="font-variant: small-caps;">ARABIC END OF AYAH</span> is an enclosing

								mark. In the character stream, a subtending mark precedes the affected

								characters. The end of affected range of characters is defined implicitly,

								usually by the first non-alphanumeric character. </p>


								<p align="left">Unlike subtending marks, the scope of combining enclosing

								marks, such as <span

								style="text-transform: uppercase; font-variant: small-caps;">combining

								enclosing circle,</span> is limited to the preceding default grapheme

								cluster. For details on grapheme clusters see Unicode Standard Annex #29:

								"Text Boundaries"<i>,</i> [<a href="#UAX29">UAX 29</a>] .</p>


								<p align="left">There is currently no existing markup that can represent the

								scoping and layout functions defined by these characters, so they cannot be

								substituted. It is unresolved to what degree intervening markup affects the

								scope of these marks.</p>


								<h3 align="left"><a name="Fraction">4.2 Fraction Slash</a></h3>


								<p align="left">The fraction slash is used between sequences of decimal

								digits to form fractions. Whether the resulting fraction has a horizontal or

								diagonal fraction line is unspecified. The fallback is to leave the digits

								unchanged and display a regular slash. In order to separate a digit from a

								following fraction, as in 1¾, the use of <span

								style="font-variant: small-caps;">U+2009 THIN SPACE</span> is recommended.</p>


								<p align="left">For better control of fractions the use of [<a

								href="#MathML">MathML</a>] is suggested where appropriate.</p>


								<h3><a name="Variation">4.3 Variation Selectors</a></h3>


								<p>A variation selector is intended to cause a specific variant form (or

								range of variant forms) when applied to a base character. For a variation

								selector to have an effect it must immediately follow its base character.

								Only pre-determined combinations of selected base characters and specific

								variation selectors have a defined effect. All other combinations are

								ill-formed and are to be ignored. The list of standardized combinations is

								documented in the Unicode Character Database, see [<a

								href="#Variants">Variants</a>]. In addition to the 256 generic variation

								selectors, there are 3 Mongolian <i>free variation selectors</i>. They

								function in all other ways like variation selectors, except they only apply

								to base characters from the Mongolian script. Since Mongolian, like Arabic,

								has positional character shapes, the variations are limited to particular

								shaping contexts.</p>


								<h3><a name="Ideographic">4.4 Ideographic Description Characters</a></h3>


								<p>Ideographic Description Characters are included in the Unicode Standard as

								a means to indicate the composition of ideographs from a combination of

								pieces (terms), where each piece or term is either a Unicode character or

								composed. Ordinarily the result would be a human readable description of a

								character, perhaps one for which a font is not available. However, at least

								some vendors are interested in automatic conversion of these sequences into

								single ideographs.</p>


								<h3><a name="Invisible">4.5 Invisible Mathematical Operators</a></h3>


								<p>These characters are needed to convey the intended meaning of a

								mathematical expression to an automated parser whenever two elements are

								simply written next to each other. See Unicode Technical Report #25: "Unicode

								Support for Mathematics" [<a href="#UTR25">UTR25</a>] for more details.</p>


								<h3><a name="LineBreak">4.6 Line Break Controls</a></h3>


								<p>Most of these characters prevent line breaks adjacent to them, but ZWSP

								and SHY provide invisible line break opportunities. The detailed function of

								these characters is described in Unicode Standard Annex #14: "Line Breaking

								Properties" [<a href="#UAX14">UAX14</a>]. While high-end applications may be

								able to deduce line breaking opportunities automatically solely with the help

								of very generic markup or styling properties, the use of these characters

								currently provides the most reliable and straight-forward way to control line

								breaking and hyphenation. Note that [<a href="#html4.01">HTML4.01</a>] uses

								U+00A0 NO-BREAK SPACE also as a "hard space" (i.e. a space with a fixed

								width), something that is not part of its character semantics in [<a

								href="#Unicode">Unicode</a>].</p>


								<p>U+2011 NON-BREAKING HYPHEN (NBHY) is used to encode a hyphen that does not

								provide a line break opportunity. In several languages, the sequence &lt;SHY,

								NBHY&gt; may be used to handle special line breaking behavior for explicit

								hyphens, see  [<a href="#UAX14">UAX14</a>].</p>


								<h3><a name="Fillers">4.7 Hangul Fillers</a></h3>


								<p>These should not be needed except for texts that need to have a fixed

								number of jamos per Korean syllable block. See the description of Korean

								Syllable Blocks in [<a href="#Unicode">Unicode</a>].</p>


								<h2><a name="Compatibility">5. Characters with Compatibility Mappings</a></h2>


								<p>The Unicode Standard provides compatibility mappings for a number of

								characters. Compatibility mappings indicate a relationship to another

								character, but the exact nature of the relationship varies. In some cases the

								relationship means "is based on" in some other cases it denotes a property.

								When plain text is marked up, it may make sense to map some of these

								characters to a combination of their compatibility equivalents <em

								style="font-style: normal;">and</em> suitable markup. It is important to

								understand the nature of the distinctions between characters and their

								compatibility equivalents and the context in which these distinctions matter.

								It is never advisable to apply compatibility mappings indiscriminately. This

								section provides guidance on when and how to apply compatibility mappings in

								the case of importing text from non-XML (non-marked-up) sources. The section

								is organized by the "compatibility tag" associated with each compatibility

								mapping.</p>


								<h3><a name="Overview">5.1 Overview</a></h3>


								<p>The following table gives an overview of the various compatibility

								characters, organized by "compatibility tag". The first column, <i>Tag

								value,</i> contains the value of the "compatibility tag" from the Unicode

								Character Database [<a href="#UnicodeData">UnicodeData</a>]. Although these

								tags use "&lt;" and "&gt;", they do not appear as such in markup and should

								not be confused with XML tags. <em>Code range</em> indicates a further break

								down by code points. <i>Action</i> summarizes the recommended action to be

								taken whenever markup is first applied to non-XML text. Each entry indicates

								whether the characters can be substituted using the compatibility equivalent

								according to Normalization Form KC of [<a href="#UAX15">UAX 15</a>], can be

								replaced by equivalent markup where available, or should be retained. For

								some cases, instead of or in addition to markup, style information [<a

								href="#CSS">CSS</a>] is needed. <i>Description and usage</i> provides

								additional information. Sections <a href="#List">5.3</a> through <a

								href="#Superscripts">5.6</a> provide additional information for some of these

								sets of compatibility characters including detailed recommended actions.</p>


								<p align="center"><b>Table 5.1 Characters with compatibility mappings</b></p>


								<table border="1" cellpadding="2" cellspacing="0" width="95%">

								  <tbody>

								    <tr>

								      <th align="left" bgcolor="#ccffcc" width="80">Tag value</th>

								      <th align="left" bgcolor="#ccffcc" width="97">Code range</th>

								      <th align="left" bgcolor="#ccffcc" width="83">Action</th>

								      <th align="left" bgcolor="#ccffcc">Description and usage</th>

								    </tr>

								    <tr>

								      <td valign="top" width="80">&lt;circled&gt;</td>

								      <td valign="top" width="97">all</td>

								      <td valign="top" width="83">retain</td>

								      <td valign="top" width="572">Circled letters and digits used for list

								        item markers, and in running text</td>

								    </tr>

								    <tr>

								      <td rowspan="12" valign="top" width="80">&lt;compat&gt;</td>

								      <td valign="top" width="97">2002..200A</td>

								      <td valign="top" width="83">retain</td>

								      <td valign="top" width="572">Fixed width spaces</td>

								    </tr>

								    <tr>

								      <td valign="top" width="97">2100..2101</td>

								      <td valign="top" width="83">retain</td>

								      <td valign="top" width="572">Variant letter forms that are used as

								        symbols</td>

								    </tr>

								    <tr>

								      <td valign="top" width="97">2105..2106</td>

								      <td valign="top" width="83">retain</td>

								      <td valign="top" width="572">Variant letter forms that are used as

								        symbols</td>

								    </tr>

								    <tr>

								      <td valign="top" width="97">2121, 213B</td>

								      <td valign="top" width="83">retain</td>

								      <td valign="top" width="572">For use as single code point in vertical

								        layout</td>

								    </tr>

								    <tr>

								      <td valign="top" width="97">2160..217F</td>

								      <td valign="top" width="83">retain, or use list item marker style, or

								        normalize</td>

								      <td valign="top" width="572">For use as single code point in vertical

								        layout, or as list item marker</td>

								    </tr>

								    <tr>

								      <td valign="top" width="97">2474..249B</td>

								      <td valign="top" width="83">retain, or use list item marker style, or

								        normalize</td>

								      <td valign="top" width="572">Parenthesized or dotted number used as

								        list item marker</td>

								    </tr>

								    <tr>

								      <td valign="top" width="97">249C..24B5</td>

								      <td valign="top" width="83">retain, or use list item marker style, or

								        normalize</td>

								      <td valign="top" width="572">Parenthesized letters used as list item

								        markers</td>

								    </tr>

								    <tr>

								      <td valign="top" width="97">3131..318E</td>

								      <td valign="top" width="83">retain</td>

								      <td valign="top" width="572">Compatibility Hangul Jamo. These do not

								        conjoin</td>

								    </tr>

								    <tr>

								      <td valign="top" width="97">3200..3229</td>

								      <td valign="top" width="83">retain, or use list item marker style, or

								        normalize</td>

								      <td valign="top" width="572">Parenthesized characters used as list item

								        markers</td>

								    </tr>

								    <tr>

								      <td height="26" valign="top" width="97">322A..3243</td>

								      <td height="26" valign="top" width="83">retain</td>

								      <td height="26" valign="top" width="572">Parenthesized characters used

								        as symbols in vertical layout</td>

								    </tr>

								    <tr>

								      <td valign="top" width="97">32C0..32CB</td>

								      <td valign="top" width="83">retain</td>

								      <td valign="top" width="572">String used as single code point in

								        vertical layout</td>

								    </tr>

								    <tr>

								      <td valign="top">all other</td>

								      <td valign="top" width="83">retain</td>

								      <td valign="top" width="572">Maintain, semantic distinctions apply</td>

								    </tr>

								    <tr>

								      <td valign="top" width="80">&lt;final&gt;</td>

								      <td valign="top" width="97">all</td>

								      <td valign="top" width="83">normalize</td>

								      <td valign="top" width="572">Arabic Presentation forms</td>

								    </tr>

								    <tr>

								      <td valign="top" width="80">&lt;font&gt;</td>

								      <td valign="top" width="97">all</td>

								      <td valign="top" width="83">retain</td>

								      <td valign="top" width="572">Variant letter forms that are used as

								        symbols</td>

								    </tr>

								    <tr>

								      <td valign="top" width="80">&lt;fraction&gt;</td>

								      <td valign="top" width="97">all</td>

								      <td valign="top" width="83">normalize</td>

								      <td valign="top" width="572">As long as fraction slash is

								      supported!</td>

								    </tr>

								    <tr>

								      <td valign="top" width="80">&lt;initial&gt;</td>

								      <td valign="top" width="97">all</td>

								      <td valign="top" width="83">normalize</td>

								      <td valign="top" width="572">Arabic Presentation forms</td>

								    </tr>

								    <tr>

								      <td valign="top" width="80">&lt;isolated&gt;</td>

								      <td valign="top" width="97">all</td>

								      <td valign="top" width="83">normalize</td>

								      <td valign="top" width="572">Arabic Presentation forms</td>

								    </tr>

								    <tr>

								      <td valign="top" width="80">&lt;medial&gt;</td>

								      <td valign="top" width="97">all</td>

								      <td valign="top" width="83">normalize</td>

								      <td valign="top" width="572">Arabic Presentation forms</td>

								    </tr>

								    <tr>

								      <td valign="top" width="80">&lt;narrow&gt;</td>

								      <td valign="top" width="97">all</td>

								      <td valign="top" width="83">retain</td>

								      <td valign="top" width="572">Half-width characters</td>

								    </tr>

								    <tr>

								      <td valign="top" width="80">&lt;noBreak&gt;</td>

								      <td valign="top" width="97">all</td>

								      <td valign="top" width="83">retain</td>

								      <td valign="top" width="572">The compatibility mapping merely indicates

								        the equivalent breaking character. The noBreak distinction must be

								        preserved</td>

								    </tr>

								    <tr>

								      <td valign="top" width="80">&lt;small&gt;</td>

								      <td valign="top" width="97">all</td>

								      <td valign="top" width="83">retain</td>

								      <td valign="top" width="572">Precise usage unknown. Maintain, but do

								        not generate</td>

								    </tr>

								    <tr>

								      <td rowspan="4" valign="top" width="80">&lt;square&gt;</td>

								      <td valign="top" width="97">3300..3357</td>

								      <td valign="top" width="83">retain</td>

								      <td valign="top" width="572">Single display cell cluster containing

								        multiple lines of kana for vertical layout</td>

								    </tr>

								    <tr>

								      <td valign="top" width="97">3358..337D</td>

								      <td valign="top" width="83">retain</td>

								      <td valign="top" width="572">For use as single code point in vertical

								        layout</td>

								    </tr>

								    <tr>

								      <td valign="top" width="97">33E0..33FE</td>

								      <td valign="top" width="83">retain</td>

								      <td valign="top" width="572">For use as single code point in vertical

								        layout</td>

								    </tr>

								    <tr>

								      <td valign="top" width="97">all other</td>

								      <td valign="top" width="83">retain</td>

								      <td valign="top" width="572">Variant letter form used as symbol in

								        vertical layout</td>

								    </tr>

								    <tr>

								      <td rowspan="2" valign="top" width="80">&lt;sub&gt;</td>

								      <td valign="top" width="97">2080..208E</td>

								      <td valign="top" width="83">retain, or use markup</td>

								      <td valign="top" width="572">Subscript digits 0-9, as well as minus,

								        plus, equal and parens</td>

								    </tr>

								    <tr>

								      <td valign="top" width="97">all other</td>

								      <td valign="top" width="83">retain</td>

								      <td valign="top" width="572">Subscript characters, usually used as

								        modifier letters in phonetic notation</td>

								    </tr>

								    <tr>

								      <td rowspan="5" valign="top" width="80">&lt;super&gt;</td>

								      <td valign="top" width="97">00B2..00B3</td>

								      <td rowspan="4" valign="top" width="83">retain, or use  markup</td>

								      <td rowspan="4" valign="top" width="572">Superscript digits 0-9, as

								        well as minus, plus, equal and parens</td>

								    </tr>

								    <tr>

								      <td valign="top" width="97">00B9</td>

								    </tr>

								    <tr>

								      <td valign="top" width="97">2070</td>

								    </tr>

								    <tr>

								      <td valign="top" width="97">2074..207E</td>

								    </tr>

								    <tr>

								      <td valign="top" width="97">all other</td>

								      <td valign="top" width="83">retain</td>

								      <td valign="top" width="572">Superscript characters, usually used as

								        modifier letters in phonetic notation</td>

								    </tr>

								    <tr>

								      <td valign="top" width="80">&lt;vertical&gt;</td>

								      <td valign="top" width="97">all</td>

								      <td valign="top" width="83">normalize</td>

								      <td valign="top" width="572">East Asian Presentation forms</td>

								    </tr>

								    <tr>

								      <td valign="top" width="80">&lt;wide&gt;</td>

								      <td valign="top" width="97">all</td>

								      <td valign="top" width="83">retain</td>

								      <td valign="top" width="572">Full-width characters</td>

								    </tr>

								  </tbody>

								</table>


								<blockquote>

								  <p><b>Note: </b>Some symbols used in vertical layout exist as single code

								  points in legacy systems, but can also be composed on the fly by more

								  advanced display engines. There are currently no style properties that

								  could be used to express squared Kana clusters (<i>kumimoji</i>) or

								  horizontal in vertical writing mode (<i>tate-chu-yoko</i>).</p>

								</blockquote>


								<h3><a name="Generating">5.2 Generating New Text</a></h3>


								<p>Presentation forms and characters for which adequate representation exists

								as marked up text should never be entered into new data. Many of the

								characters with &lt;font&gt; tag are however suitable for new data, as long

								as they are used in the manner they are intended, that is as symbols, with

								definite semantic differentiation between the different forms. The largest

								set of these characters exists to carry essential semantic distinctions in

								mathematical notation, where the any loss of markup during text export would

								compromise the meaning of the text. Most of the characters with &lt;super&gt;

								and &lt;sub&gt; tag have been encoded for use in phonetic or phonemic

								transcriptions, where they act as ordinary letters and the use of style

								markup is therefore deemed inappropriate. However, it is inappropriate to use

								any of these classes of characters to create the appearance of styled text

								runs.</p>


								<p>For example to write <i>hello,</i> one should use &lt;i&gt;hello&lt;/i&gt;

								and not the sequence of Unicode characters U+210E, U+212F, U+2113, U+2113,

								U+2134. Conversely, to indicate <i>Planck's constant</i> one should use

								U+210E and not &lt;i&gt;h&lt;/i&gt;.</p>


								<p>When style is applied across entire words, sentences or paragraphs, the

								use of markup is preferred. When style is applied to individual letters,

								especially to letters inside a word, giving them a particular interpretation,

								the use of character codes is preferred. See also <a

								href="#Superscripts">Section 5.6</a>.</p>


								<h3><a name="List">5.3 List Item Marker Characters</a></h3>


								<p><em>Short description</em>: Characters with a &lt;circled&gt; tag or

								characters with &lt;compat&gt; tag and compatibility mapping to a

								parenthesized string.</p>


								<p><em>Reason for inclusion</em>: They are most frequently used for marking

								enumerated list items, but the characters with a &lt;circled&gt; tag often

								occur as dingbats or footnote markers in tables. The same characters are used

								in regular text when citing an item from a corresponding ordered list.</p>


								<p><em>Problems when used in markup</em>: These characters do not cause undue

								interaction with markup</p>


								<p><em>Problems with other uses</em>: None</p>


								<p><em>Replacement markup</em>: (in text use) these characters are often used

								in running text; sometimes, but not exclusively, in situations where the text

								is to be associated with an item from a nearby numbered list. Replacement

								markup may not be available, and the support for such markup is much more

								limited today than was anticipated when this document was first written.</p>


								<p>(list item style) When generating marked up text these characters occur

								only internal to the user agent when list item styles are rendered. When

								marking up plain text data they could be converted to suitable list item

								styles, if such use can be properly inferred. The default recommendation is

								to retain the original character.</p>


								<p>(characters with compatibility mappings of the form "(<em>n</em>)" or

								"<em>n</em>." or roman numerals) Unlike circled characters, these could be

								rendered by sequences of regular characters. Using a list item marker style

								would in theory allow the support of longer lists (the Unicode characters are

								limited to the set  (1) to (20) and "1." to "20."). Using regular character

								sequences would also allow the use of fonts that match the text of the

								list.</p>


								<p><em>What to do if detected</em>: No action needs to be taken by browsers.

								When received in an editing context, substitution of a list item marker style

								may be appropriate. However, the same characters are very often used as

								dingbat-like symbols in tables, or may appear in general text, whether or not

								referring to an item from a list. Therefore the user must have the choice of

								whether to replace the character.</p>


								<h3><a name="Fractions">5.4 Fractions</a></h3>


								<p><em>Short description</em>: Single character fractions such as ½ or ¼.</p>


								<p><em>Reason for inclusion</em>: Subsets of these occur in practically all

								legacy character sets.</p>


								<p><em>Problems when used in markup</em>: The character repertoire is limited

								to a few common fractions. When used with more general methods of generating

								fractions such as MathML [<a href="#MathML">MathML</a>] the usual problem of

								dual representation arises.</p>


								<p><em>Problems with other uses</em>: Other than normalization issues, these

								characters present no undue problems in plain text. Where fraction slash is

								supported, these can be expressed by substituting their compatibility

								mappings. </p>


								<p><em>Replacement markup</em>: MathML can represent fractions unambiguously.

								When using fraction slash, care must be taken such that values like 3½ do not

								turn into 31/2 (=15.5).</p>


								<p><em>What to do if detected</em>: No action needs to be taken by browsers

								or editors, except when converting plain text to MathML.</p>


								<h3><a name="Squared">5.5 Squared or Horizontal</a></h3>


								<p><em>Short description</em>: Characters that are symbols composed of groups

								of typically kana or Latin letters, digits plus slash for use in a single

								display cell in vertical display of text. </p>


								<p><em>Reason for inclusion</em>: Many existing character sets contain these

								as precomposed characters since for simple implementations this is the only

								way to support the common use of providing metric units and other

								abbreviations in a single character cell for vertical text layout. </p>


								<p><em>Problems when used in markup</em>: Proposed markup, including CSS

								styling, would be able express an unbounded set of these abbreviations,

								obviating the need of cataloguing these in the character encoding standard

								and making them more directly accessible to text based processing, for

								example searching.</p>


								<p><em>Problems with other uses</em>: The repertoire of these legacy

								characters is limited; many more combinations are in actual use than are

								accounted for in character sets. Pre-composed symbols do not make their text

								content available to search engines. They also require re-encoding for text

								laid out horizontally.</p>


								<p><em>Replacement markup</em>: None available.</p>


								<p><em>What to do if detected</em>: No action required. (Subject to change

								pending the outcome of current proposals.)</p>


								<h3><a name="Superscripts">5.6 Superscripts and Subscripts</a></h3>


								<p><em>Short description</em>: Mainly super and subscript digits, but also

								signs, parentheses and a large number of letters.</p>


								<p><em>Reason for inclusion</em>:  Super and subscripted letters and digits

								are quite common in some forms of phonetic or phonemic transcriptions, where

								the use of styles is both awkward and prone to data integrity issues when

								exported to plain text. For super or subscripted letters in phonetic

								transcription in particular, a change from superscript of subscript to

								regular style would alter the meaning. Note that such use in transcription is

								not limited to letters: superscripted small digits are often used to indicate

								tone. When used for these purposes, these characters should be retained and

								markup should <i>not</i> be used. </p>


								<p>A few super and subscript characters, primarily the digits, also occur in

								many legacy character sets, including Latin-1. Their use in pure plain text

								is common for databases, e.g. including metric units for part descriptions

								(viz. cm<sup>2</sup>) or for (usually simplified) formulae as occur in titles

								of scientific publications. </p>


								<p>When used in mathematical context (MathML) it is recommended to

								consistently use style markup for superscripts and subscripts. This is

								because mathematical layout allows not just individual symbols, but entire

								expressions to be superscripted or subscripted in a regular, nested

								manner.</p>


								<p><em>Problems when used in markup</em>: Mixing direct use of these

								characters with the use of style markup provides multiple representations of

								the same text, leading to potentially different treatment by search and

								display engines.</p>


								<p>However, when super and sub-scripts are to reflect semantic distinctions,

								it is easier to work with these meanings encoded in text rather than markup,

								for example, in phonetic or phonemic transcription. Otherwise, they would

								require markup in the middle of words, and  they may also be inadvertently

								changed to normal style text, when exporting to plain text. This applies to

								the majority of super and subscripted characters in Unicode.  On the other

								hand, some user agent may support certain superscripted or subscripted

								characters only when used as marked up text for example, because of lack of

								font support for them.</p>


								<p><em>Problems with other uses</em>: none</p>


								<p><em>Replacement markup</em>: Unless used as letters, &lt;xhtml:sup&gt; and

								&lt;xhtml:sub&gt; or &lt;mathml:msup&gt; and &lt;mathml:msub&gt; may be

								used.</p>


								<p><em>What to do if detected</em>: Both representations (with or without

								style markup) should be equivalent for search purposes. Input methods for

								mathematical texts might enforce the use of styles.  If superscript

								characters are encountered during display of mathematical formulae, it is

								recommended that they be displayed in a manner indistinguishable from that

								achieved by using regular characters with corresponding style markup.. </p>


								<h3><a name="Other">5.7 Other Characters Marked &lt;compat&gt;</a></h3>


								<p><em>Short description</em>: The &lt;compat&gt; label was given to a set of

								compatibility characters whose further classification was not settled at the

								time the standard was created. The largest components are list item marker

								characters.</p>


								<p><em>Reason for inclusion</em>: These characters occur in many legacy

								character sets.</p>


								<p><em>Problems when used in markup</em>: none. There usually is no

								equivalent markup.</p>


								<p><em>Problems with other uses</em>: none</p>


								<p><em>Replacement markup</em>: none.</p>


								<p><em>What to do if detected</em>: No action required.</p>


								<h2><a name="Noncharacters">6.  Noncharacters</a></h2>


								<p>The Unicode Standard defines 66 non-character code points, or

								<i>noncharacters</i>. These are the last two positions on each of the 17

								planes, in other words, all characters whose code points end in ...FFFE or

								...FFFF, as well as the 32 code points from U+FDD0 to U+FDEF. Applications

								are free to use any of these code points internally but should never attempt

								to interchange them. In effect, noncharacters can be thought of as

								application-internal private-use code points.</p>


								<h2>7. <a name="White">White Space</a></h2>


								<p>This section presents common issues with white space characters in markup

								languages, mostly based on their difference in function as part of the

								structure of the markup source (syntactic white space) on the one hand and as

								part of the document content on the other hand.</p>


								<p>The set of characters in the Unicode standard that have the property

								"White_Space" (see 'White Space' in the [<a href="#UnicodeData">UCD</a>]) is

								quite large. It includes white space characters with different line breaking

								properties, different ligating properties, and different widths. It is

								appropriate to use these characters as part of markup content for their very

								specific purpose. It  is preferable to place them in the markup source so

								that they are surrounded by ordinary characters rather than line breaks for

								example.  The set of white space characters defined by typical markup

								language specifications is a subset of the characters that are considered

								white space by [<a href="#Unicode">Unicode</a>] .</p>


								<p>Each markup language defines the set of characters that it accepts as part

								of the markup syntax, this is usually a very small set. The XML [<a

								href="#xml10">XML1.0</a>] and [<a href="#xml11">XML1.1</a>] specifications

								define white space as a combination of one or more of the following

								characters: U+0020 SPACE, carriage return (U+000D), line feed (U+000A), or

								tab (U+0009). [<a href="#html4.01">HTML4.01</a>] adds to these the form feed

								character (U+000C), but that character cannot be used in any XHTML

								version.</p>


								<p>In addition, markup languages may use conventions for converting or

								removing some kinds of white space. XML processors replace some combinations

								of end-of-line characters by a single line feed character. [<a

								href="#xml10">XML1.0</a>] normalizes any two character sequences of (U+000D

								U+000A) or any U+000D not followed by U+000A to a single U+000A. [<a

								href="#xml11">XML1.1</a>] also normalizes NEL (U+0085) and U+2028 LINE

								SEPARATOR, but U+2029 PARAGRAPH SEPARATOR is not treated that way. Additional

								processing of white space before it is handled to an application also occurs

								for attribute values: line breaks are replaced by spaces, leading and

								trailing spaces are removed, and subsequent spaces are replaced by a single

								space.</p>


								<p>In XML, white space is purely syntactic inside tags, for example, to

								separate the element name from attributes, and between elements in element

								content models (as they are typical for data-oriented applications). White

								space in element content models is used to lay out the markup source, using

								line breaks and indentation, to improve readability. The same use of white

								space is possible in many cases in mixed content (typical for text-oriented

								applications).</p>


								<p>Because XML is used for a very wide range of applications, after the

								processing steps mentioned above it passes all white space to the

								application. Some XML applications such as [<a href="#XHTML">XHTML</a>] may

								have their own white space processing rules when processing white space

								characters. Also, applications and software transforming XML (e.g. [<a

								href="#XSLT">XSLT</a>]) have specific conventions of how they handle white

								space, and specific ways of how to control this behavior. To appropriately

								use white space characters, readers are advised to examine all involved

								standards and software.</p>


								<p>If the characters U+2028 and U+2029 appear in text, they may be treated as

								zero-width characters without semantic meaning (see Section 3.2).</p>


								<h3 id="converting-nl-to-ws">7.1 Converting Newline Functions to White

								Space</h3>


								<p>White space that is not purely syntactic, including control codes that

								define a newline function (see <i>Section 5.8, Newline Guidelines,</i> in [<a

								href="#Unicode">Unicode</a>]), can be handled in three main ways.</p>

								<ol>

								  <li>For data-oriented applications, the textual content of elements is

								    treated according to the needs of the data type in question. In many

								    cases, processing by the application includes aspects similar to those of

								    the processing of attribute values by the XML parser itself. For some

								    types of data, in particular small data items, some applications may also

								    simply prohibit the use of white space.</li>

								  <li>For running text in text-oriented applications, reflowing is used, i.e.

								    the line breaks in the markup source are removed and the text is reflown

								    into lines whose length is determined by the output medium and styling

								    properties. In the context of Unicode, this reflowing process requires

								    care; it is described in more detail below.</li>

								  <li>For preformatted text, such as program source code, line breaks must be

								    preserved. Text-oriented applications usually contain special markup for

								    preformatted text, e.g. &lt;xhtml:pre&gt;. XML itself defines an

								    xml:space attribute that applications may use for a similar purpose.</li>

								</ol>


								<p>When reflowing, line breaks and adjacent white space can be treated as

								space, removed, collapsed with adjacent control characters of the same type,

								or treated as zero-width space. Which choice is appropriate depends on the

								script of the surrounding text. The assumption is that line breaks and

								adjacent white space (in particular following white space, used for

								indentation) was added to make the markup source more readable, in particular

								to make each line fit on a line of a plain text editor. For scripts that use

								spaces, line breaks will have been inserted where there originally was a

								space; treating them as spaces therefore preserves the intended separation

								between words. For scripts which do not use spaces, such as Ideographic

								scripts or certain South East Asian scripts, such as Thai, line feeds should

								be removed, or replaced by U+200B zero width space. The choice of treatment

								can depend on the script value of the characters preceding and following the

								line feed character, assuming these characters belong to the same run of

								text.</p>


								<blockquote>

								  <p><b>Note:</b> The Unicode Standard [<a href="#Unicode">Unicode</a>]

								  specifies that the zero width space is considered a valid line-break point

								  and that if two characters with a zero width space in between are placed on

								  the same line they are placed with no space between them; and that if they

								  are placed on two lines no additional glyph area is created at the

								  line-break.</p>

								</blockquote>


								<p>The details of reflowing are the responsibility of the various markup

								applications (e.g. [<a href="#XHTML">XHTML</a>]). However, there is a

								tendency to move this functionality from markup applications to styling, so

								that it can be shared across applications.</p>


								<p>Authors should be aware of the fact that the above script-specific

								treatment of line breaks when reflowing text is not yet available in all

								implementations (e.g. browsers). For scripts that do not use white space to

								separate words, it may therefore still be advisable to not split long

								lines.</p>


								<p>Editing tools should try to support the user in the appropriate use of

								white space. Some white space characters cannot easily be entered via a

								keyboard, but some others, e.g. U+3000 Ideographic Space, can. Editing tools

								should try to make sure that only line breaks and white space that is

								accepted as syntactic white space by the relevant markup language are used to

								improve markup source readability.</p>


								<p>While the styling possibilities provided by CSS and its implementations

								have not reached the level of professional typesetting systems, they offer a

								wide range of ways to control layout and spacing of text. A very simple

								example is text centering, which would have been done by inserting an

								appropriate number of spaces on each line in pure plain text.</p>


								<h2><a name="Versioning">8. Versioning</a></h2>


								<p>This report will be updated by the Unicode Technical Committee in

								cooperation with the W3C Internationalization Activity whenever the tables of

								characters in this document need to be updated as a result of the addition of

								characters to the Unicode Standard, as a result of a revised determination of

								the suitability of a given character for use with markup, or when additional

								background information or recommendations become available.</p>


								<p>Each report carries a revision number, which may be used to refer to a

								specific version of the report. Older versions of the report will remain

								available. Each version of this report specifies the underlying version of

								the Unicode Standard.</p>


								<p>For more information on the Unicode Standard and its versions, see:</p>

								<ul class="unicode">

								  <li><a href="http://www.unicode.org/unicode/standard/versions/">Versions of

								    the Unicode Standard</a> [<a

								  href="#UnicodeVersions">UnicodeVersions</a>]</li>

								  <li><a href="http://www.unicode.org/ucd/">About the Unicode Character

								    Database</a> [<a href="#UCD">UCD</a>]</li>

								  <li><a href="http://www.unicode.org/Public/UNIDATA/UCD.html">Unicode

								    Character Database</a> [<a href="#UnicodeData">UnicodeData</a>]</li>

								</ul>


								<h2><a name="Conformance">9. Conformance</a></h2>


								<p>In the context of the Unicode Standard, the material in this technical

								report is <em>informative. </em>However, other documents, particularly markup

								language specifications, may specify conformance including normative

								references to this document. Such references may have to be updated as a

								result of future updates to this report as discussed in Section 8<i>, <a

								href="#Versioning">Versioning</a>.</i></p>


								<h2><a name="References">10. References</a></h2>

								<dl>

								  <dt><a name="Charmod">[Charmod]</a></dt>

								    <dd></dd>

								    <dd>Martin J. Dürst, François Yergeau, Richard Ishida, Misha Wolf, Tex

								      Texin, Eds., <cite>Character Model for the World Wide Web 1.0:

								      Fundamentals</cite>, W3C Recommendation, 15-February-2005, &lt;<a

								      href="http://www.w3.org/TR/2005/REC-charmod-20050215/">http://www.w3.org/TR/2005/REC-charmod-20050215/</a>&gt;.</dd>

								  <dt>[<a name="Charmodnorm">Charmodnorm</a>]</dt>

								    <dd>François Yergeau, Martin J. Dürst, Richard Ishida, Addison Phillips,

								      Misha Wolf, and Tex Texin, Eds., <i>Character Model for the World Wide

								      Web 1.0: Normalization,</i> W3C Working Draft, 27-October-2005, &lt;<a

								      href="http://www.w3.org/TR/2005/WD-charmod-norm-20051027/">http://www.w3.org/TR/2005/WD-charmod-norm-20051027/</a>&gt;.</dd>

								  <dt><a name="CharReq">[CharReq]</a></dt>

								    <dd>Martin J. Dürst, <cite>Requirements for String Identity and Character

								      Indexing Definitions for the WWW</cite>, W3C Working Draft,

								      10-July-1998, &lt;<a

								      href="http://www.w3.org/TR/WD-charreq">http://www.w3.org/TR/WD-charreq</a>&gt;.</dd>

								  <dt>[<a name="CSS">CSS</a>]</dt>

								    <dd>For information on cascading style sheet specifications, see &lt;<a

								      href="http://www.w3.org/Style/CSS/">http://www.w3.org/Style/CSS/</a>&gt;.</dd>

								  <dt>[<a name="Feedback">Feedback</a>]</dt>

								    <dd>Reporting Errors and Requesting Information Online to the Unicode

								      Consortium,<i>&lt;</i><a

								      href="http://www.unicode.org/reporting.html">http://www.unicode.org/reporting.html</a>&gt;.</dd>

								  <dt><a name="html4.01">[HTML4.01]</a></dt>

								    <dd>Dave Raggett, Arnaud Le Hors, Ian Jacobs, Eds., <cite>HTML 4.01

								      Specification</cite>, W3C Recommendation, 18-Dec-1997 (revised on

								      24-Dec-1999), &lt;<a

								      href="http://www.w3.org/TR/1999/REC-html401-19991224/">http://www.w3.org/TR/1999/REC-html401-19991224/</a>&gt;.</dd>

								  <dt><a name="HTML4.0-8.2">[HTML 4.0 - 8.2]</a></dt>

								    <dd>Section 8.2 of [HTML4.0] <i>Specifying the direction of text and

								      tables: the dir attribute</i> &lt;<a

								      href="http://www.w3.org/TR/1999/REC-html401-19991224/struct/dirlang.html#h-8.2">http://www.w3.org/TR/1999/REC-html401-19991224/struct/dirlang.html#h-8.2</a>&gt;.</dd>

								  <dt><a name="MathML">[MathML]</a></dt>

								    <dd>David Carlisle, Patrick Ion, Robert Miner, Nico Poppelier, Eds.,

								      <i>Mathematical Mathematical Markup Language (MathML) Version 2.0

								      (Second Edition)</i>, W3C Recommendation, 21-Oct-2003, &lt;<a

								      href="http://www.w3.org/TR/2003/REC-MathML2-20031021/">http://www.w3.org/TR/2003/REC-MathML2-20031021/</a>&gt;.</dd>

								  <dt><a name="Namespace">[Namespace]</a></dt>

								    <dd>Tim Bray, Dave Hollander, Andrew Layman, Eds., <i>Namespaces in XML

								      (Second Edition)</i>, W3C Recommendation, 16-Aug-2006, &lt;<a

								      href="http://www.w3.org/TR/2006/REC-xml-names-20060816/">http://www.w3.org/TR/2006/REC-xml-names-20060816/</a>&gt;.</dd>

								  <dt><a name="Ruby">[Ruby]</a></dt>

								    <dd>Marcin Sawicki, Michel Suignard, Masayasu Ishikawa, Martin Dürst, Tex

								      Texin, Eds., <i>Ruby Annotation</i>, W3C Recommendation, 31-May-2001,

								      &lt;<a

								      href="http://www.w3.org/TR/2001/REC-ruby-20010531/">http://www.w3.org/TR/2001/REC-ruby-20010531/</a>&gt;.</dd>

								  <dt><a name="UTR9">[UAX 9]</a></dt>

								    <dd>Mark Davis, <cite>Unicode Standard Annex #9, The Bidirectional

								      Algorithm</cite>, &lt;<a

								      href="http://www.unicode.org/reports/tr9/">http://www.unicode.org/reports/tr9/</a>&gt;.</dd>

								  <dt>[<a name="UAX14">UAX14</a>]</dt>

								    <dd>Asmus Freytag,<i>Unicode Standard Annex #14,</i> <i>Line Breaking

								      Properties</i> <a

								      href="http://www.unicode.org/reports/tr14/">http://www.unicode.org/reports/tr14/</a></dd>

								  <dt><a name="UTR15">[UAX 15]</a><a name="UAX15"></a></dt>

								    <dd>Mark Davis, Martin Dürst, <cite>Unicode Standard Annex #15, Unicode

								      Normalization Forms</cite>, &lt;<a

								      href="http://www.unicode.org/reports/tr15/">http://www.unicode.org/reports/tr15/</a>&gt;.</dd>

								  <dt>[<a name="UAX29">UAX 29</a>]</dt>

								    <dd>Mark Davis,<i>Unicode Standard Annex #29</i>, <i>Text Boundaries</i>.

								      <a

								      href="http://www.unicode.org/reports/tr29/">http://www.unicode.org/reports/tr29/</a></dd>

								  <dt>[<a name="UCD">UCD</a>]</dt>

								    <dd><cite>About the Unicode Character Database</cite>, &lt;<a

								      href="http://www.unicode.org/ucd/">http://www.unicode.org/ucd/</a>&gt;.</dd>

								  <dt><a name="Unicode">[Unicode]</a></dt>

								    <dd>The Unicode Consortium.<i><a

								      href="http://www.unicode.org/versions/Unicode5.0.0/">The Unicode

								      Standard, Version 5.0</a></i> (Boston, MA, Addison-Wesley, 2007. ISBN

								      0-321-48091-0). </dd>

								  <dt><a name="Unicode32">[Unicode32]</a></dt>

								    <dd><cite>Unicode Standard Annex #28 <a

								      href="http://www.unicode.org/reports/tr28/">Unicode 3.2</a></cite>, The

								      Unicode Consortium, 2002.</dd>

								  <dt><a name="Unicode40">[Unicode40]</a></dt>

								    <dd><cite><a

								      href="http://www.unicode.org/unicode/standard/standard.html">The

								      Unicode Standard</a>, <a

								      href="http://www.unicode.org/unicode/standard/versions/Unicode3.0.html">Version

								      4.0</a></cite>, <i>The Unicode Standard, Version 4.0, </i>(Reading,

								      Massachusetts: Addison-Wesley Developers Press, 2003, ISBN

								      0-321-18578-1) or online as &lt;<a

								      href="http://www.unicode.org/versions/Unicode4.0.0/">http://www.unicode.org/versions/Unicode4.0.0/</a>&gt;.</dd>

								  <dt>[<a name="Unicode50">Unicode50</a>]</dt>

								    <dd>The Unicode Consortium.<i><a

								      href="http://www.unicode.org/versions/Unicode5.0.0/">The Unicode

								      Standard, Version 5.0</a></i> (Boston, MA, Addison-Wesley, 2007. ISBN

								      0-321-48091-0) or online as &lt;<a

								      href="http://www.unicode.org/versions/Unicode5.0.0/">http://www.unicode.org/versions/Unicode5.0.0/</a>&gt;</dd>

								  <dt><a name="UnicodeData">[UnicodeData]</a></dt>

								    <dd><cite>Unicode Character Database</cite>, &lt;<a

								      href="http://www.unicode.org/Public/UNIDATA/UCD.html">http://www.unicode.org/Public/UNIDATA/UCD.html</a>&gt;.</dd>

								  <dt><a name="UnicodeVersions">[UnicodeVersions]</a></dt>

								    <dd><cite>Versions of the Unicode Standard</cite>, &lt;<a

								      href="http://www.unicode.org/unicode/standard/versions/">http://www.unicode.org/unicode/standard/versions/</a>&gt;.</dd>

								  <dt>[<a name="UTR25">UTR25</a>]</dt>

								    <dd>Asmus Freytag, Barbara Beeton, Murray Sargent, <i>Unicode Technical

								      Report #25, Unicode Support for Mathematics, &lt;<a

								      href="http://www.unicode.org/reports/tr25/">http://www.unicode.org/reports/tr25/</a>&gt;</i></dd>

								  <dt>[<a name="Variants">Variants</a>]</dt>

								    <dd>Standardized Variants &lt;<a

								      href="http://www.unicode.org/Public/UNIDATA/StandardizedVariants.html">http://www.unicode.org/Public/UNIDATA/StandardizedVariants.html</a>&gt;.</dd>

								  <dt><a name="XHTML">[XHTML]</a></dt>

								    <dd>Steven Pemberton, et al., Eds.,

								      <cite>XHTML</cite><i><cite>&trade;</cite></i><cite>1.0: The Extensible

								      HyperText Markup Language - A Reformulation of HTML 4.0 in XML

								      1.0</cite>, W3C Recommendation, 01-Aug-2002, &lt;<a

								      href="http://www.w3.org/TR/2002/REC-xhtml1-20020801/">http://www.w3.org/TR/2002/REC-xhtml1-20020801/</a>&gt;.</dd>

								  <dt><a name="xml10">[XML 1.0]</a></dt>

								    <dd>Tim Bray, Jean Paoli, Eve Maler, C. M. Sperberg-McQueen, François

								      Yergeau, Eds., <i>Extensible Markup Language (XML) 1.0 (Fourth

								      Edition)</i>, W3C Recommendation, 16-August-2006, &lt;<a

								      href="http://www.w3.org/TR/2006/REC-xml-20060816/">http://www.w3.org/TR/2006/REC-xml-20060816/</a>&gt;.</dd>

								  <dt>[<a name="XSLT">XLST</a>]</dt>

								    <dd>Michael Kay, Ed., <i>XSL Transformations (XSLT) Version 2.0</i>, W3C

								      Recommendation, 23-January-2007, &lt;<a

								      href="http://www.w3.org/TR/2007/REC-xslt20-20070123/">http://www.w3.org/TR/2007/REC-xslt20-20070123/</a>&gt;</dd>

								  <dt><a name="xml11">[XML 1.1]</a></dt>

								    <dd>Jean Paoli, Eve Maler, Tim Bray, C. M. Sperberg-McQueen, François

								      Yergeau, John Cowan, Eds., <i>Extensible Markup Language (XML) 1.1

								      (Second Edition)</i>, W3C Recommendation 16-August-2006, &lt;<a

								      href="http://www.w3.org/TR/2006/REC-xml11-20060816/">http://www.w3.org/TR/2006/REC-xml11-20060816/</a>&gt;.

								    </dd>

								  <dt>[<a name="XMLSchema">XML Schema</a>]</dt>

								    <dd>Henry S. Thompson, David Beech, Murray Maloney, Noah Mendelsohn,

								      Eds., <i>XML Schema Part 1: Structures Second Edition</i>, W3C

								      Recommendation 28-October-2004, &lt;<a

								      href="http://www.w3.org/TR/2004/REC-xmlschema-1-20041028/">http://www.w3.org/TR/2004/REC-xmlschema-1-20041028/</a>&gt;

								      . </dd>

								</dl>


								<h2><a name="Acknowledgements">11. Acknowledgements</a></h2>


								<p>Mark Davis and Hideki Hiura contributed to the early drafts. Yukka Korpela

								and Felix Sasaki provided input to the current document.</p>


								<h2><a name="ChangeHistory">12. Change History (last changes first)</a></h2>


								<p>Changes from <a class="unicode"

								href="http://www.unicode.org/reports/tr20/tr20-7.html">http://www.unicode.org/reports/tr20/tr20-7.html</a>

								: Added entries for new characters in Unicode 5.0. Updated references to use

								new chapter/section numbers in Unicode 5.0. Updated the discussion of

								superscript and subscript characters, accounting for the differences between

								their use in phonetic or phonemic transcription and mathematics. Added

								Section 3.10 and 4.5, 4.6 and 4.7. Added a Section 7 on handling white space.

								Updated references to W3C publications (AF). More work on white space

								section; moved everything about BOM to one place (MJD)</p>


								<p>Changes from <a class="unicode"

								href="http://www.unicode.org/reports/tr20/tr20-6.html">http://www.unicode.org/reports/tr20/tr20-6.html</a>

								: Added entries for new characters in Unicode 4.0. Separated out, and

								extended, the discussion of format characters suitable for markup. This

								resulted in a new section 2.6, moving section 3.2 to 4, and renumbering, as

								well as new sections 4.1, 4.2, 4.3, 4.4. Added a discussion on noncharacters

								in a new section 6. Updated reference from Unicode 3.1 and 3.2 to Unicode

								4.0. Improved the layout an description of what is now table 5.1. Changed the

								recommended action in 5.6 to none. Updated the Unicode status section.

								Changed http://www.unicode.org/unicode/reports/ to <a

								href="http://www.unicode.org/reports/">http://www.unicode.org/reports</a>

								throughout to reflect the preferred style of URL (older style URLs continue

								to be valid). Updated references to W3C publications. (AF/MJD)</p>


								<p>Changes from <a class="unicode"

								href="http://www.unicode.org/reports/tr20/tr20-5.html">http://www.unicode.org/reports/tr20/tr20-5.html</a>

								: Updated reference from Unicode 3.0 to 3.1 and 3.2 where appropriate. Added

								sections 3.6 and  3.9. Minor wording fixes in sections 2.3, 3.1, 3.2, 3.6,

								3.10, 4.3, 4.5 and 5. (AF/MJD)</p>


								<p>Changes from <a class="unicode"

								href="http://www.unicode.org/reports/tr20/tr20-4.html">http://www.unicode.org/reports/tr20/tr20-4.html</a>

								: Added a note to the introduction to limit the scope. Reorganized section 3

								and clarified the language. Renamed some sections and tables. Updated the

								document to prepare for publication as Unicode Technical Report and W3C Note

								(AF/MJD). Minor editorial changes to the text, added section 4.7, fixed some

								dates, plus a few typos. (AF)</p>


								<p>Changes from <a class="unicode"

								href="http://www.unicode.org/reports/tr20/tr20-3.html">http://www.unicode.org/reports/tr20/tr20-3.html</a>

								: Minor editorial changes to the introduction, fixed some references, links,

								and dates, plus a few typos. (AF/MJD)</p>


								<p>Changes from <a class="unicode"

								href="http://www.unicode.org/reports/tr20/tr20-2.html">http://www.unicode.org/reports/tr20/tr20-2.html</a>

								: Added sections 2.1-2.6 (MJD), sections 3.1-3.5, and 3.8, as well as

								sections 4.4-4.6 and 8 (AF). Edited text for publication as DRAFT Unicode

								Technical Report. (AF)</p>


								<p>Changes from <a class="unicode"

								href="http://www.unicode.org/reports/tr20/tr20-1.html">http://www.unicode.org/reports/tr20/tr20-1.html</a>

								: Completed references, linked TOC. Various wording changes. Added W3C WD

								stylesheet, logo, copyright, status of this document. Streamlined authors'

								section. (MJD) Added material on compatibility characters. (AF)</p>


								<p>Changes from the initial draft: Fixed the header. Fixed the numbering.

								Fixed the title. Put references to final version of data files based on

								naming conventions. Minor wording changes. Added proposed language on

								annotation characters to match example on FFFC. Posted for internal review by

								UTC and W3C. (AF)</p>


								<h2><a name="Copyright">13. Copyright</a></h2>


								<p>Copyright © 1999-2007 Unicode<sup>®</sup>, Inc. and <a

								href="http://www.w3.org/">W3C</a><sup>®</sup> (<a

								href="http://www.csail.mit.edu/index.php"><acronym

								title="Massachussetts Institute of Technology">MIT</acronym></a>, <a

								href="http://www.ercim.org/"><acronym

								title="European Research   Consortium for Informatics and Mathematics">ERCIM</acronym></a>,

								<a href="http://www.keio.ac.jp/">Keio</a>), All Rights Reserved.</p>


								<p>This document is available under the <a

								href="http://www.w3.org/Consortium/Legal/copyright-documents-19990405">W3C

								Document License</a> or the <a

								href="http://www.unicode.org/unicode/copyright.html">Unicode License</a>.

								Documents available from the W3C have additional <a

								href="http://www.w3.org/Consortium/Legal/ipr-notice-20000612#Legal_Disclaimer">warranties,

								liability</a>, and <a

								href="http://www.w3.org/Consortium/Legal/ipr-notice-20000612#W3C_Trademarks">trademark</a>

								policies associated with them. The <a

								href="http://www.unicode.org/unicode/copyright.html">Unicode License</a>

								specifies warranty/liability and trademark terms including:</p>


								<blockquote>

								  <p class="unicode">The Unicode Consortium makes no expressed or implied

								  warranty of any kind, and assumes no liability for errors or omissions. No

								  liability is assumed for incidental and consequential damages in connection

								  with or arising out of the use of the information or programs contained or

								  accompanying this technical report.</p>


								  <p class="unicode">Unicode and the Unicode logo are trademarks of Unicode,

								  Inc., and are registered in some jurisdictions.</p>

								</blockquote>

								</body>

								</html>