You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
1052 lines
41 KiB
1052 lines
41 KiB
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
|
|
<html lang="en">
|
|
<head>
|
|
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
|
|
<title>XHTML and RDF</title>
|
|
<style type="text/css">
|
|
code { font-family: monospace; }
|
|
div.constraint,
|
|
div.issue,
|
|
div.note,
|
|
div.notice { margin-left: 2em; }
|
|
dt.label { display: run-in; }
|
|
li p { margin-top: 0.3em;
|
|
margin-bottom: 0.3em; }
|
|
div.exampleInner pre { margin-left: 1em;
|
|
margin-top: 0em; margin-bottom: 0em}
|
|
div.exampleOuter {border: 4px double gray;
|
|
margin: 0em; padding: 0em}
|
|
div.exampleInner { background-color: #d5dee3;
|
|
border-top-width: 4px;
|
|
border-top-style: double;
|
|
border-top-color: #d3d3d3;
|
|
border-bottom-width: 4px;
|
|
border-bottom-style: double;
|
|
border-bottom-color: #d3d3d3;
|
|
padding: 4px; margin: 0em }
|
|
div.exampleWrapper { margin: 4px }
|
|
div.exampleHeader { font-weight: bold;
|
|
margin: 4px}
|
|
.xmlverb-default { color: #333333; background-color: #ffffff; font-family: monospace }
|
|
.xmlverb-element-name { color: #990000 }
|
|
.xmlverb-element-nsprefix { color: #666600 }
|
|
.xmlverb-attr-name { color: #660000 }
|
|
.xmlverb-attr-content { color: #000099; font-weight: bold }
|
|
.xmlverb-ns-name { color: #666600 }
|
|
.xmlverb-ns-uri { color: #330099 }
|
|
.xmlverb-text { color: #000000; font-weight: bold }
|
|
.xmlverb-comment { color: #006600; font-style: italic }
|
|
.xmlverb-pi-name { color: #006600; font-style: italic }
|
|
.xmlverb-pi-content { color: #006666; font-style: italic }</style>
|
|
<link rel="stylesheet" type="text/css"
|
|
href="http://www.w3.org/StyleSheets/TR/W3C-NOTE.css">
|
|
</head>
|
|
|
|
<body lang="en">
|
|
|
|
<div class="head">
|
|
<p><a href="http://www.w3.org/"><img src="http://www.w3.org/Icons/w3c_home"
|
|
alt="W3C" height="48" width="72"></a></p>
|
|
|
|
<h1><a name="title" id="title"></a>XHTML and RDF</h1>
|
|
|
|
<h2><a name="w3c-doctype" id="w3c-doctype"></a>Version 14 February 2004</h2>
|
|
<dl>
|
|
<dt>This version: </dt>
|
|
<dd>http://www.w3.org/MarkUp/2004/02/xhtml-rdf.html</dd>
|
|
<dt>Editor:</dt>
|
|
<dd>Mark Birbeck, x-port.net Ltd. <a
|
|
href="Mark.Birbeck@x-port.net"><Mark.Birbeck@x-port.net></a></dd>
|
|
</dl>
|
|
|
|
<p class="copyright"><a
|
|
href="http://www.w3.org/Consortium/Legal/ipr-notice#Copyright">Copyright</a> © 2004 <a
|
|
href="http://www.w3.org/"><abbr
|
|
title="World Wide Web Consortium">W3C</abbr></a><sup>®</sup> (<a
|
|
href="http://www.lcs.mit.edu/"><abbr
|
|
title="Massachusetts Institute of Technology">MIT</abbr></a>, <a
|
|
href="http://www.ercim.org/"><abbr
|
|
title="European Research Consortium for Informatics and Mathematics">ERCIM</abbr></a>,
|
|
<a href="http://www.keio.ac.jp/">Keio</a>), All Rights Reserved. W3C <a
|
|
href="http://www.w3.org/Consortium/Legal/ipr-notice#Legal_Disclaimer">liability</a>,
|
|
<a
|
|
href="http://www.w3.org/Consortium/Legal/ipr-notice#W3C_Trademarks">trademark</a>,
|
|
<a href="http://www.w3.org/Consortium/Legal/copyright-documents">document
|
|
use</a>, and <a
|
|
href="http://www.w3.org/Consortium/Legal/copyright-software">software
|
|
licensing</a> rules apply.</p>
|
|
</div>
|
|
<hr>
|
|
|
|
<div>
|
|
<h2><a name="abstract" id="abstract"></a>Abstract</h2>
|
|
|
|
<p>The aim of this document is to discuss the relationship between XHTML and
|
|
metadata.</p>
|
|
</div>
|
|
|
|
<div>
|
|
<h2><a name="status" id="status"></a>Status of this Document</h2>
|
|
|
|
<p>This is an internal working draft.</p>
|
|
|
|
<p><em>Last Modified: $Date: 2004/02/27 11:55:44 $</em></p>
|
|
</div>
|
|
|
|
<div class="toc">
|
|
<h2><a name="contents" id="contents"></a>Table of Contents</h2>
|
|
|
|
<p class="toc">1 <a href="#div151383088">Motivation</a><br>
|
|
2 <a href="#div151384000">Types of Metadata in XHTML Documents</a><br>
|
|
3 <a href="#div154317128">Document Metadata</a><br>
|
|
3.1 <a href="#div254317416">Top-level Metadata with
|
|
meta and QNames</a><br>
|
|
3.2 <a href="#div254358472">Statements About
|
|
Top-level Metadata</a><br>
|
|
3.3 <a href="#div254363072">Statements About Other
|
|
Resources</a><br>
|
|
3.4 <a href="#div254370032">String Literals and
|
|
URIs</a><br>
|
|
4 <a href="#div154379976">Markup Metadata</a><br>
|
|
4.1 <a href="#div254386928">What Are We Trying To
|
|
Represent?</a><br>
|
|
4.2 <a href="#div254333552">Identifying
|
|
Resources</a><br>
|
|
4.3 <a href="#div254343520">Representing Type</a><br>
|
|
4.4 <a href="#div254348288">Representing Value</a><br>
|
|
5 <a href="#div154351016">Bibliography</a><br>
|
|
</p>
|
|
</div>
|
|
<hr>
|
|
|
|
<div class="body">
|
|
|
|
<div class="div1">
|
|
<h2><a name="div151383088" id="div151383088"></a>1 Motivation</h2>
|
|
|
|
<p>We have two standards running parallel with each other; HTML is the de
|
|
facto standard for document markup, accounting for millions of items on the
|
|
web. RDF is a standard for expressing metadata, which in turn provides a
|
|
foundation for making use of that metadata, such as reasoning about it. Yet
|
|
the former is very rarely the subject of the latter; meta information placed
|
|
in the HTML family of documents is often encoded in such a way as to make it
|
|
difficult to extract by RDF-related parsers. And if it cannot be extracted,
|
|
then it cannot be used.</p>
|
|
|
|
<p>Our intention here is to make more of the information that is contained
|
|
within HTML-family documents available to RDF tools, but without putting an
|
|
unnecessary burden on authors familiar with HTML, but not with the subtleties
|
|
of triples and statements. However, for our discussions on how best to do
|
|
this, we do need to be familiar with at least the principles of RDF.</p>
|
|
|
|
<p>RDF is about statements and triples. There are a number of syntaxes which
|
|
can be used to express these triples, such as N3 and RDF/XML. This document
|
|
proposes ensuring that the metadata elements already proposed in XHTML 2.0
|
|
can be used to glean useful RDF-based information.</p>
|
|
</div>
|
|
|
|
<div class="div1">
|
|
<h2><a name="div151384000" id="div151384000"></a>2 Types of Metadata in XHTML
|
|
Documents</h2>
|
|
|
|
<p>There are a number of sets of useful meta information that may be
|
|
contained in, or relate to, an XHTML document:</p>
|
|
<ul>
|
|
<li><p>There may be information about the document itself; who wrote it and
|
|
when, the document's subject matter, and so on.</p>
|
|
</li>
|
|
<li><p>There may be further information about some of this document
|
|
metadata; not only might we indicate the name of the author, but we might
|
|
also mark-up metadata to indicate their contact information.</p>
|
|
</li>
|
|
<li><p>There may be information about words used in the text; an
|
|
abbreviation may be marked-up with the full text version, or the word
|
|
"yesterday" may be tagged as being a date with a value of
|
|
"2004-01-01".</p>
|
|
</li>
|
|
<li><p>And finally, there may be information about items completely
|
|
unrelated to the document; there may be meta information about a company
|
|
or country.</p>
|
|
</li>
|
|
</ul>
|
|
|
|
<p>To introduce the issues we'll go through some examples of each of these,
|
|
using the RDF notion of triples.</p>
|
|
</div>
|
|
|
|
<div class="div1">
|
|
<h2><a name="div154317128" id="div154317128"></a>3 Document Metadata</h2>
|
|
|
|
<div class="div2">
|
|
<h3><a name="div254317416" id="div254317416"></a>3.1 Top-level Metadata with
|
|
<code>meta</code> and QNames</h3>
|
|
|
|
<p>Our first scenario involves statements made about the document. This is to
|
|
some extent already catered for with the current use of <code>meta</code> and
|
|
<code>link</code>. Statements can be made about the document, such as its
|
|
publication date, author, revision number, subjects, and so on.</p>
|
|
|
|
<p>An example of using <code>meta</code> is provided in the latest XHTML 2.0
|
|
draft:</p>
|
|
|
|
<div class="exampleInner">
|
|
<pre><html ... profile="http://www.acme.com/profiles/core">
|
|
<head>
|
|
<title>How to complete Memorandum cover sheets</title>
|
|
<meta name="author">John Doe</meta>
|
|
<meta name="copyright">&copy; 1997 Acme Corp.</meta>
|
|
<meta name="keywords">corporate,guidelines,cataloging</meta>
|
|
<meta name="date">1994-11-06T08:49:37+00:00</meta>
|
|
</head>
|
|
...
|
|
</pre>
|
|
</div>
|
|
|
|
<p>Whilst this provides useful metadata about the document, the lack of
|
|
qualification of the value in the <code>name</code> attribute means that in
|
|
terms of RDF statements, it is weak. The technique usually used in previous
|
|
versions of HTML is to provide a prefix on the property name, or to use the
|
|
<code>scheme</code> attribute. Adopting this same technique in XHTML 2.0, our
|
|
markup would now look like this:</p>
|
|
|
|
<div class="exampleInner">
|
|
<pre><meta name="DC.identifier">http://www.rfc-editor.org/rfc/rfc3236.txt</meta>
|
|
</pre>
|
|
</div>
|
|
|
|
<p>From an RDF standpoint this doesn't help. The statement that the author
|
|
intends - expressed as RDF triples - would be:</p>
|
|
|
|
<div class="exampleInner">
|
|
<pre><> <http://purl.org/dc/elements/1.1/identifier> <http://www.rfc-editor.org/rfc/rfc3236.txt> .
|
|
</pre>
|
|
</div>
|
|
|
|
<table border="1" summary="Editorial note">
|
|
<tbody>
|
|
<tr>
|
|
<td align="left" valign="top" width="50%"><b>Editorial note</b></td>
|
|
<td align="right" valign="top" width="50%"> </td>
|
|
</tr>
|
|
<tr>
|
|
<td colspan="2" align="left" valign="top">All RDF statements are
|
|
represented in this document using N3, since it is more compact than
|
|
RDF/XML. Most of the notation I will use is covered in <a
|
|
href="#N3-PRIMER">[N3-PRIMER]</a>.</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
|
|
<p>An easy way to resolve this would be to allow the author to use QNames
|
|
inside <code>name</code>:</p>
|
|
|
|
<div class="exampleInner">
|
|
<pre><html xmlns:dc="http://purl.org/dc/elements/1.1/">
|
|
<head>
|
|
<title>How to complete Memorandum cover sheets</title>
|
|
<meta name="dc:creator">John Doe</meta>
|
|
<meta name="dc:rights">&copy; 1997 Acme Corp.</meta>
|
|
<meta name="dc:subject">corporate,guidelines,cataloging</meta>
|
|
<meta name="dc:date">1994-11-06T08:49:37+00:00</meta>
|
|
</head>
|
|
...
|
|
</pre>
|
|
</div>
|
|
|
|
<p>However, for reasons that will become clearer later, I would suggest that
|
|
we drop the use of <code>name</code> and call it <code>property</code>:</p>
|
|
|
|
<div class="exampleInner">
|
|
<pre><html xmlns:dc="http://purl.org/dc/elements/1.1/">
|
|
<head>
|
|
<title>How to complete Memorandum cover sheets</title>
|
|
<meta property="dc:creator">John Doe</meta>
|
|
<meta property="dc:rights">&copy; 1997 Acme Corp.</meta>
|
|
<meta property="dc:subject">corporate,guidelines,cataloging</meta>
|
|
<meta property="dc:date">1994-11-06T08:49:37+00:00</meta>
|
|
</head>
|
|
...
|
|
</pre>
|
|
</div>
|
|
|
|
<p>This would create the following statements:</p>
|
|
|
|
<div class="exampleInner">
|
|
<pre>@prefix dc: <http://purl.org/dc/elements/1.1/> .
|
|
<> dc:creator "John Doe" ;
|
|
dc:rights "&copy; 1997 Acme Corp." ;
|
|
dc:subject "corporate,guidelines,cataloging" ;
|
|
dc:date "1994-11-06T08:49:37+00:00" .
|
|
</pre>
|
|
</div>
|
|
|
|
<table border="1" summary="Editorial note">
|
|
<tbody>
|
|
<tr>
|
|
<td align="left" valign="top" width="50%"><b>Editorial note</b></td>
|
|
<td align="right" valign="top" width="50%"> </td>
|
|
</tr>
|
|
<tr>
|
|
<td colspan="2" align="left" valign="top">This would mean that in the
|
|
previous example, non-prefixed names would be regarded as being
|
|
properties from the source document - for example <code><>
|
|
:author "John Doe" .</code> - which I would suggest is correct. The
|
|
consequence is that no matter how consistently a name is applied
|
|
across a web site, it never conveys anything other than a 'local'
|
|
concept.</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
|
|
<div class="div2">
|
|
<h3><a name="div254358472" id="div254358472"></a>3.2 Statements About
|
|
Top-level Metadata</h3>
|
|
|
|
<p>Our second scenario concerns statements made about the metadata that we
|
|
have just created to describe the document. We might want to add further
|
|
information about the author or publisher, for example.</p>
|
|
|
|
<p>The simplest way to do this would be to place the additional statements in
|
|
an external RDF/XML document, and make reference to them. For example, the
|
|
external reference could, under the current spec, be expressed like this:</p>
|
|
|
|
<div class="exampleInner">
|
|
<pre><link rel="meta" href="JohnDoe.rdf" />
|
|
</pre>
|
|
</div>
|
|
|
|
<p>and the metadata in the document being linked to, might look like this:</p>
|
|
|
|
<div class="exampleInner">
|
|
<pre><rdf:RDF
|
|
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
|
|
xmlns:con="http://www.w3.org/2000/10/swap/pim/contact#"
|
|
>
|
|
<con:Person rdf:about="http://www.example.org/People/JohnDoe">
|
|
<con:fullName>John Doe</con:fullName>
|
|
<con:mailbox rdf:resource="mailto:JohnDoe@example.org"/>
|
|
</con:Person>
|
|
</rdf:RDF>
|
|
</pre>
|
|
</div>
|
|
|
|
<p>Whilst this makes the metadata available to an RDF/XML parser, this does
|
|
imply that if this information were to be usable <em>within</em> the XHTML
|
|
2.0 browser, then it would need to incorporate an RDF/XML parser. Whilst this
|
|
is certainly a desirable goal in the long term, it is probably an impractical
|
|
requirement in the short. Anyway, there are other ways of structuring the
|
|
data that may give the XHTML 2.0 browser access to the meta information,
|
|
without requiring it to incorporate an RDF/XML parser.</p>
|
|
|
|
<p>Unlike previous HTML mark-up, XHTML 2.0 allows <code>meta</code> to be
|
|
nested. The effect of this in RDF terms, is to create an anonymous resource
|
|
inside the containing <code>meta</code> element, and then for nested
|
|
<code>meta</code> elements to be further statements about this anonymous
|
|
resource. This technique would therefore allow the previous information about
|
|
our document's author to be used by the browser, or indeed the author,
|
|
without requiring a full-fledged RDF parser. The structure might look
|
|
something like this (some statements from the external document are not
|
|
reflected):</p>
|
|
|
|
<div class="exampleInner">
|
|
<pre><html xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:con="http://www.w3.org/2000/10/swap/pim/contact#">
|
|
<head>
|
|
<title>How to complete Memorandum cover sheets</title>
|
|
<meta property="dc:creator">
|
|
<meta property="con:fullName">John Doe</meta>
|
|
</meta>
|
|
<meta property="dc:rights">&copy; 1997 Acme Corp.</meta>
|
|
<meta property="dc:subject">corporate,guidelines,cataloging</meta>
|
|
<meta property="dc:date">1994-11-06T08:49:37+00:00</meta>
|
|
</head>
|
|
...
|
|
</pre>
|
|
</div>
|
|
|
|
<p>This would be read in prose as "the document's author is called 'John
|
|
Doe'", and in terms of statements that "this document has a dc:creator
|
|
property, which is an anonymous object, which in turn has a con:fullName
|
|
property, which is 'John Doe'". Note that the implication here is that any
|
|
non-contained <code>meta</code> element is referring to "this document". In
|
|
N3 notation it would be:</p>
|
|
|
|
<div class="exampleInner">
|
|
<pre><> dc:creator [ con:fullName "John Doe" ] .
|
|
</pre>
|
|
</div>
|
|
</div>
|
|
|
|
<div class="div2">
|
|
<h3><a name="div254363072" id="div254363072"></a>3.3 Statements About Other
|
|
Resources</h3>
|
|
|
|
<p>Sometimes we need to be able to make statements about resources outside of
|
|
our document. For example, we may know that the resource identified by
|
|
<code>http://people.com/TonyBlair</code> has a <code>con:fullName</code> of
|
|
"Tony Blair". Currently the only choices we have are either to embed RDF/XML
|
|
inside our XHTML document, or to link to an external RDF file, in the manner
|
|
that we saw above.</p>
|
|
|
|
<p>Since neither of these approches is suitable, I would suggest that we add
|
|
a new attribute <code>about</code>that can appear on the <code>meta</code>
|
|
element. Our new markup might look like this:</p>
|
|
|
|
<div class="exampleInner">
|
|
<pre><html ...>
|
|
<head>
|
|
<title>How to complete Memorandum cover sheets</title>
|
|
<meta property="dc:creator">
|
|
<meta property="con:fullName">John Doe</meta>
|
|
</meta>
|
|
<meta about="p:TonyBlair" property="con:fullName">Tony Blair</meta>
|
|
</head>
|
|
...
|
|
</pre>
|
|
</div>
|
|
|
|
<p>This time our triples contain one statement about our document (and a
|
|
statement about that statement), followed by one about some external resource
|
|
over which we have no control:</p>
|
|
|
|
<div class="exampleInner">
|
|
<pre><> dc:creator [ con:fullName "John Doe" ] .
|
|
p:TonyBlair con:fullName "Tony Blair" .
|
|
</pre>
|
|
</div>
|
|
|
|
<p>Another consequence of adding <code>about</code> is that we can now make
|
|
statments about other parts of our document. For example, if we know that one
|
|
part of the document is actually attributable to someone else, we could use
|
|
the following markup:</p>
|
|
|
|
<div class="exampleInner">
|
|
<pre><html ...>
|
|
<head>
|
|
<title>Some quotes</title>
|
|
<meta about="#q1">
|
|
<meta property="dc:source">http://www.example.com/tolkien/twotowers.html</meta>
|
|
</meta>
|
|
</head>
|
|
<body>
|
|
<blockquote id="q1">
|
|
<p>They went in single file, running like hounds on a strong scent,
|
|
and an eager light was in their eyes. Nearly due west the broad
|
|
swath of the marching Orcs tramped its ugly slot; the sweet grass
|
|
of Rohan had been bruised and blackened as they passed.</p>
|
|
</blockquote>
|
|
</body>
|
|
</html>
|
|
</pre>
|
|
</div>
|
|
|
|
<table border="1" summary="Editorial note">
|
|
<tbody>
|
|
<tr>
|
|
<td align="left" valign="top" width="50%"><b>Editorial note</b></td>
|
|
<td align="right" valign="top" width="50%"> </td>
|
|
</tr>
|
|
<tr>
|
|
<td colspan="2" align="left" valign="top">This is a reworking of the
|
|
example at <a
|
|
href="http://www.w3.org/MarkUp/Group/2003/WD-xhtml2-20031029/mod-structural.html#sec_8.3.">8.3</a>.
|
|
Note, however that we favour the Dublin Core's <code>source</code>
|
|
element over the proposed XHTML 2.0 <code>cite</code> element, which
|
|
was used in the original example.</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
|
|
<p>The triple represented is as follows:</p>
|
|
|
|
<div class="exampleInner">
|
|
<pre><#q1> dc:source "http://www.example.com/tolkien/twotowers.html" .
|
|
</pre>
|
|
</div>
|
|
</div>
|
|
|
|
<div class="div2">
|
|
<h3><a name="div254370032" id="div254370032"></a>3.4 String Literals and
|
|
URIs</h3>
|
|
|
|
<p>Although allowing QNames in <code>property</code> and <code>about</code>
|
|
on <code>meta</code> gets us a little closer to usable RDF statements, there
|
|
is still another problem; RDF allows statements to be made about two types of
|
|
objects - resources and string literals - but the syntax that we have built
|
|
up so far, only allows the object of a statement to be a string literal. The
|
|
consequence is that this:</p>
|
|
|
|
<div class="exampleInner">
|
|
<pre><meta property="dc:identifier">http://www.rfc-editor.org/rfc/rfc3236.txt</meta>
|
|
</pre>
|
|
</div>
|
|
|
|
<p>produces the following triples:</p>
|
|
|
|
<div class="exampleInner">
|
|
<pre><> dc:identifier "http://www.rfc-editor.org/rfc/rfc3236.txt" .
|
|
</pre>
|
|
</div>
|
|
|
|
<p>As you can see, the object of the statement is a string literal, even
|
|
though we wanted it to be a resource. The statement we really wanted to make,
|
|
was this:</p>
|
|
|
|
<div class="exampleInner">
|
|
<pre><> dc:identifier <http://www.rfc-editor.org/rfc/rfc3236.txt> .
|
|
</pre>
|
|
</div>
|
|
|
|
<p>Since XHTML 2.0 brings <code>link</code> with it from HTML to express
|
|
relationships between the source document and some other document, then I
|
|
would suggest that we simply allow QNames in the <code>rel</code> attribute
|
|
on <code>link</code>. Our Dublin Core <code>identifier</code> example would
|
|
then be expressed as follows:</p>
|
|
|
|
<div class="exampleInner">
|
|
<pre><link rel="dc:identifier" href="http://www.rfc-editor.org/rfc/rfc3236.txt" />
|
|
</pre>
|
|
</div>
|
|
|
|
<table border="1" summary="Editorial note">
|
|
<tbody>
|
|
<tr>
|
|
<td align="left" valign="top" width="50%"><b>Editorial note</b></td>
|
|
<td align="right" valign="top" width="50%"> </td>
|
|
</tr>
|
|
<tr>
|
|
<td colspan="2" align="left" valign="top">To make this consistent, the
|
|
values in <code>rel</code><a
|
|
href="#XHTML-2.0-LINKTYPES">[XHTML-2.0-LINKTYPES]</a> would all need
|
|
to be namespace prefixed, for example <code>xhtml2:next</code> or
|
|
<code>xhtml2:stylesheet</code>. This is no bad thing.</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
|
|
<p>In addition we would allow <code>link</code> to appear anywhere that
|
|
<code>meta</code> can, since we are saying that the only difference between
|
|
them is that one identifies a property of the document and assigns it a
|
|
string literal value, whilst the other identifies a property of the document
|
|
and assigns it a resource as the value. Our previous example (in which we
|
|
indicated the source of a quote), would then become this:</p>
|
|
|
|
<div class="exampleInner">
|
|
<pre><meta about="#q1">
|
|
<link rel="dc:source" href="http://www.example.com/tolkien/twotowers.html" />
|
|
</meta>
|
|
</pre>
|
|
</div>
|
|
|
|
<p>Note that wheras before the triples produced were these:</p>
|
|
|
|
<div class="exampleInner">
|
|
<pre><#q1> dc:source "http://www.example.com/tolkien/twotowers.html" .
|
|
</pre>
|
|
</div>
|
|
|
|
<p>now we correctly have these:</p>
|
|
|
|
<div class="exampleInner">
|
|
<pre><#q1> dc:source <http://www.example.com/tolkien/twotowers.html> .
|
|
</pre>
|
|
</div>
|
|
|
|
<p>There is no reason why a number of the attributes for <code>meta</code>
|
|
and <code>link</code> cannot appear on the same element. Our example could
|
|
therefore be abbreviated to:</p>
|
|
|
|
<div class="exampleInner">
|
|
<pre><link about="#q1" rel="dc:source" href="http://www.example.com/tolkien/twotowers.html" />
|
|
</pre>
|
|
</div>
|
|
</div>
|
|
</div>
|
|
|
|
<div class="div1">
|
|
<h2><a name="div154379976" id="div154379976"></a>4 Markup Metadata</h2>
|
|
|
|
<p>The next category of metadata concerns the qualification of text that
|
|
appears in a document. This whole area is often discussed on the HTML mailing
|
|
lists, and is the source of much confusiion. Almost every other day someone
|
|
proposes new markup to capture the notion of weight, time, addresses, and so
|
|
on. It's therefore worth looking at what the various proposals are trying to
|
|
capture.</p>
|
|
|
|
<p>If we were simply concerned with how a document <em>appeared</em> then we
|
|
would not need new markup. For example, if we have a corporate website, and
|
|
on the 'contact us' page the address of the company headquarters is
|
|
displayed, then the following would be sufficient:</p>
|
|
|
|
<div class="exampleInner">
|
|
<pre><p class="address">
|
|
<span class="street">4 Pear Treet Court</span>
|
|
<span class="city">London</span>
|
|
<span class="country">United Kingdom</span>
|
|
</p></pre>
|
|
</div>
|
|
|
|
<p>We could then add a stylesheet, and have the city shown in bold, red,
|
|
loud, or whatever.</p>
|
|
|
|
<p>However, an automated process that was analysing this 'address' would be
|
|
unlikely to derive any 'meaning' from this markup. To 'make sense' of this
|
|
information, an indexing engine, content management system, e-commerce
|
|
server, or whatever, would need to know in advance what 'street' and 'city'
|
|
meant.</p>
|
|
|
|
<p>Of course, we could achieve this if we all agreed to reserve those words
|
|
in the <code>class</code> attribute to mean only 'parts of an address', but
|
|
that would then break with people who want to use 'city' as a style class,
|
|
with no relation to its semantic meaning. It would also break for people who
|
|
want to use 'address' for an IP address. In other words, we would be imposing
|
|
a meaning onto those that don't want it. Far better then to try to create
|
|
'unique' versions of these identifiers.</p>
|
|
|
|
<p>What is usually proposed is to create some elements that capture the
|
|
notions of <em>addresses</em>. The author's presentational requirements are
|
|
not compromised since they are still able to style the new markup, and of
|
|
course the elements capture the semantics:</p>
|
|
|
|
<div class="exampleInner">
|
|
<pre><style>
|
|
city { background-color: teal; }
|
|
</style>
|
|
<address>
|
|
<street>4 Pear Treet Court</street>
|
|
<city>London</city>
|
|
<country>United Kingdom</country>
|
|
</address></pre>
|
|
</div>
|
|
|
|
<p>Whilst the general approach is fine - the use of elements to clearly mark
|
|
metadata - the fact that those elements exist in the XHTML namespace causes a
|
|
number of problems. The main one is that it is not clear where we should draw
|
|
the line when adding new elements to XHTML; as mentioned earlier, on the
|
|
<code>www-html</code> list debates frequently run about whether
|
|
date/time/quantity/happiness elements should be added to XHTML. All of them
|
|
have worthy arguments for their inclusion, but the bigger issue is whether we
|
|
should even bother (since where do we stop), and should we instead focus on
|
|
providing mechanisms by which elements from other namespaces can be used?</p>
|
|
|
|
<p>The second problem with these elements appearing in the XHTML namespace
|
|
concerns their actual 'meaning'. Since all of these elements exist in the
|
|
XHTML 2.0 namespace, then to some metadata processor, what we actually have
|
|
is an 'XHTML date' (or an 'XHTML address', or person, or whatever).</p>
|
|
|
|
<p>One solution to this might appear to be to use the techniques we described
|
|
above - <code>meta</code> and <code>link</code>:</p>
|
|
|
|
<div class="exampleInner">
|
|
<pre><html xmlns:con="http://www.w3.org/2000/10/swap/pim/contact#">
|
|
<head>
|
|
<meta property="con:address">
|
|
<meta property="con:street">4 Pear Treet Court</meta>
|
|
<meta property="con:city">London</meta>
|
|
<meta property="con:country">United Kingdom</meta>
|
|
</meta>
|
|
</head>
|
|
...
|
|
</pre>
|
|
</div>
|
|
|
|
<p>The statements represented by this are</p>
|
|
|
|
<div class="exampleInner">
|
|
<pre><> con:address
|
|
[
|
|
con:street "4 Pear Tree Court" ;
|
|
con:city "London" ;
|
|
con:country "United Kingdom"
|
|
] .
|
|
</pre>
|
|
</div>
|
|
|
|
<p>Note however, that this indicates that it is the current document that
|
|
'has an address'. This is something that is often overlooked in the
|
|
discussions about new markup to represent meta information - what exactly is
|
|
it representing? To return to the type of example most often given:</p>
|
|
|
|
<div class="exampleInner">
|
|
<pre><style>
|
|
city { background-color: teal; }
|
|
</style>
|
|
<address>
|
|
<street>4 Pear Treet Court</street>
|
|
<city>London</city>
|
|
<country>United Kingdom</country>
|
|
</address>
|
|
</pre>
|
|
</div>
|
|
|
|
<p>what is it that is based in London? And more to the point, what exactly
|
|
<em>is</em> London. Let's turn to these questions now.</p>
|
|
|
|
<div class="div2">
|
|
<h3><a name="div254386928" id="div254386928"></a>4.1 What Are We Trying To
|
|
Represent?</h3>
|
|
|
|
<p>When we are using <code>meta</code> it is clear that this:</p>
|
|
|
|
<div class="exampleInner">
|
|
<pre><meta property="dc:creator">John Doe</meta>
|
|
</pre>
|
|
</div>
|
|
|
|
<p>means:</p>
|
|
|
|
<div class="exampleInner">
|
|
<pre>"John Doe" is <em>the creator</em> of <em>this document</em></pre>
|
|
</div>
|
|
|
|
<p>or:</p>
|
|
|
|
<div class="exampleInner">
|
|
<pre><> dc:creator "John Doe" .
|
|
</pre>
|
|
</div>
|
|
|
|
<p>As we know, <code>meta</code> has an implied subject of the document that
|
|
contains it. But when we say:</p>
|
|
|
|
<div class="exampleInner">
|
|
<pre><city>London</city>
|
|
</pre>
|
|
</div>
|
|
|
|
<p>What is it that 'has' a property of city? We are probably not saying that
|
|
the document has a property of city, with a value of 'London'. In fact, the
|
|
only thing that we <em>can</em> be saying is that the text string 'London'
|
|
has a <em>type</em> of <em>city</em>.</p>
|
|
|
|
<p>And why might authors want to emphasise this? In the hope that when
|
|
someone uses some search engine to find documents about 'the city called
|
|
London', they will find this document, and when they search for documents
|
|
about Jack London, they won't.</p>
|
|
|
|
<p>Currently there is no easy way to indicate in a document which of these
|
|
two queries you would like your document to repsond to. We could indicate the
|
|
subject of a document with <code>meta</code>, by setting
|
|
<code>property</code> to "dc:subject":</p>
|
|
|
|
<div class="exampleInner">
|
|
<pre><meta property="dc:subject">London</meta>
|
|
</pre>
|
|
</div>
|
|
|
|
<p>but there is nothing here to say whether we are writing about Jack London,
|
|
or the city.</p>
|
|
|
|
<p>Solving this problem would also help internationalisation, since currently
|
|
the only options to help French users to find this document would be to mark
|
|
the document up like this:.</p>
|
|
|
|
<div class="exampleInner">
|
|
<pre><meta property="dc:subject">London, Londres</meta>
|
|
</pre>
|
|
</div>
|
|
|
|
<p>or this:.</p>
|
|
|
|
<div class="exampleInner">
|
|
<pre><meta xml:lang="en" property="dc:subject">London</meta>
|
|
<meta xml:lang="fr" property="dc:subject">Londres</meta>
|
|
</pre>
|
|
</div>
|
|
|
|
<p>neither of which are very practical.</p>
|
|
|
|
<p>So, we seem to be stuck; if we start at the top of the document, and say
|
|
that</p>
|
|
|
|
<div class="exampleInner">
|
|
<pre>"London" is <em>a subject</em> of <em>this document</em></pre>
|
|
</div>
|
|
|
|
<p>we don't know if we mean Jack London, or London Town. But if we start at
|
|
the bottom of the document and say that:</p>
|
|
|
|
<div class="exampleInner">
|
|
<pre>"London" is <em>a type</em> of <em>city</em></pre>
|
|
</div>
|
|
|
|
<p>we're none the wiser as to how this piece of information relates to
|
|
anything else. Using RDF statements again, we can say that the top-down
|
|
approach is represented by:</p>
|
|
|
|
<div class="exampleInner">
|
|
<pre><> dc:subject "London" .
|
|
</pre>
|
|
</div>
|
|
|
|
<p>and the bottom-up, by:</p>
|
|
|
|
<div class="exampleInner">
|
|
<pre>[ :city "London" ] .
|
|
</pre>
|
|
</div>
|
|
|
|
<p>Our task then is to see if we can make the two meet in the middle, and our
|
|
first step is to solve the problem of searching for 'London' versus
|
|
'Londres'.</p>
|
|
</div>
|
|
|
|
<div class="div2">
|
|
<h3><a name="div254333552" id="div254333552"></a>4.2 Identifying
|
|
Resources</h3>
|
|
|
|
<p>To re-cap, the issue is that the type of markup that is often proposed,
|
|
such as:</p>
|
|
|
|
<div class="exampleInner">
|
|
<pre><city>London</city>
|
|
</pre>
|
|
</div>
|
|
|
|
<p>is not much use to us, since we do not know what the information relates
|
|
to. In this case, which city is it that has a string literal value of
|
|
"London"? The key to this is of course to use a unique value based on URIs.
|
|
If we had one defined for the city of London, in the UK, such as
|
|
<code>http://www.examples.org/city#london</code>, and we were able to attach
|
|
this in some way to our text, then we could unambiguously identify what we're
|
|
talking about.</p>
|
|
|
|
<p>I propose therefore, that we add an attribute called
|
|
<code>resource</code>, which would allow us to do the following:</p>
|
|
|
|
<div class="exampleInner">
|
|
<pre><city resource="city:london">London</city>
|
|
</pre>
|
|
</div>
|
|
|
|
<p>We have now uniquely identified that the document we're dealing with has a
|
|
reference to the city of London, in the UK, regardless of the actual text
|
|
used in the document:</p>
|
|
|
|
<div class="exampleInner">
|
|
<pre>I stayed in the <city resource="city:london">capital</city> for a week.
|
|
Je voudrais visiter a <city resource="city:london">Londres</city>.
|
|
</pre>
|
|
</div>
|
|
|
|
<p>Now a search engine could search for all documents that contain a
|
|
reference to <code>http://www.examples.org/city#london</code>, which would
|
|
unambiguously identify documents relating to the city of London, in the UK.
|
|
And provided that the search server had a mapping between "Londres" and
|
|
<code>http://www.examples.org/city#london</code>, French speakers could also
|
|
find this article. (And the indexing server may have deduced the mapping when
|
|
crawling documents.)</p>
|
|
|
|
<p>As we can see with the use of this technique to make documents available
|
|
to searches in other languages, it is particularly powerful when the term
|
|
being searched for is not present in the document. For example, with this
|
|
document:</p>
|
|
|
|
<div class="exampleInner">
|
|
<pre>Tomorrow the Prime Minister is expected to fly to ...
|
|
</pre>
|
|
</div>
|
|
|
|
<p>Searching for "Tony Blair" would not find this article. However, if a
|
|
search server knows how to establish from a user that they are actually
|
|
searching for <code>http://people.com/TonyBlair</code>, then we could mark up
|
|
our document as follows:</p>
|
|
|
|
<div class="exampleInner">
|
|
<pre>Tomorrow the <span resource="p:TonyBlair">Prime Minister</span>
|
|
is expected to fly to ...
|
|
</pre>
|
|
</div>
|
|
|
|
<p>A search for articles relating to Tony Blair would now yield this one,
|
|
even though his name is not mentioned. This becomes particularly useful when
|
|
individuals and places are identified in a number of ways, and the lexical
|
|
description bears no relation to the actual meaning:</p>
|
|
|
|
<div class="exampleInner">
|
|
<pre>Today <span resource="p:DianaSpencer">Lady Diana</span> died in a car crash ...
|
|
The <span resource="p:DianaSpencer">Princess of Wales</span> was killed today ...
|
|
</pre>
|
|
</div>
|
|
|
|
<p>One final example concerns distinguishing between two items of text that
|
|
are the same, but represent something different. For example:</p>
|
|
|
|
<div class="exampleInner">
|
|
<pre>Yesterday in Parliament the <span resource="p:WinstonChurchill">Prime Minister</span>
|
|
said that we will fight on the beaches ...
|
|
Tomorrow the <span resource="p:TonyBlair">Prime Minister</span>
|
|
is expected to fly to ...</pre>
|
|
</div>
|
|
|
|
<p>In these two examples the string of characters "Prime Minister" is exactly
|
|
the same, but they refer to two different individuals. It is now possible to
|
|
search for "Winston Churchill" and still find the first of these two
|
|
articles, but not the second. Indeed, where we to search for "Prime
|
|
Minister", our search server could easily ask us whether we want to search
|
|
for the words "Prime Minister", or for articles about the current or a
|
|
previous prime minister.</p>
|
|
|
|
<table border="1" summary="Editorial note">
|
|
<tbody>
|
|
<tr>
|
|
<td align="left" valign="top" width="50%"><b>Editorial note</b></td>
|
|
<td align="right" valign="top" width="50%"> </td>
|
|
</tr>
|
|
<tr>
|
|
<td colspan="2" align="left" valign="top">Of course, it is also now
|
|
possible to search for "all articles about British prime ministers
|
|
who served during the second world war", provided that the search
|
|
engine is able to deduce such things. As long as the answer includes
|
|
<code>p:WinstonChurchill</code> then this document can be
|
|
located.</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
|
|
<p>One last thing on resources, we still haven't addressed where this
|
|
information is attached to. The easiest approach is to say that where any of
|
|
our new attributes appear on an element that is not <code>meta</code>, then
|
|
there is an implied relationship to the containing resource (or the document)
|
|
of "xhtml2:reference". Our previous example therefore results in the
|
|
following triples:</p>
|
|
|
|
<div class="exampleInner">
|
|
<pre><> xh2:reference p:TonyBlair .
|
|
</pre>
|
|
</div>
|
|
|
|
<p>This means simply that <em>this document</em> contains <em>a
|
|
reference</em> to <em>Tony Blair</em>. Note that this is not just a fudge -
|
|
we are saying that a search for articles and documents written by, reviewed
|
|
by, criticised by, or summarised by Tony Blair, is very different to seeking
|
|
out articles and documents that <em>make reference</em> to Tony Blair. This
|
|
additional property allows us to make that distinction.</p>
|
|
</div>
|
|
|
|
<div class="div2">
|
|
<h3><a name="div254343520" id="div254343520"></a>4.3 Representing Type</h3>
|
|
|
|
<p>Our second issue was to be able to express the <em>type</em> of our
|
|
element. I would propose that <code>property</code> is allowed on any
|
|
element, not just <code>meta</code>. For example:</p>
|
|
|
|
<div class="exampleInner">
|
|
<pre><span property="con:address">
|
|
<span property="con:street">4 Pear Tree Court</span>
|
|
<span property="con:city">London</span>
|
|
<span property="con:country">United Kingdom</span>
|
|
</span></pre>
|
|
</div>
|
|
|
|
<p>This now opens up an unlimited supply of descriptors for our text, and so
|
|
rather than trying to add elements such as address, weight, time and so on we
|
|
can make use of established taxonomies, or devise our own.</p>
|
|
|
|
<p>Note also that the presence of <code>property</code> creates opportunities
|
|
for avoiding repetition. For example, we often find ourselves creating our
|
|
metadata from information that also appears in the document itself:</p>
|
|
|
|
<div class="exampleInner">
|
|
<pre><html xmlns:dc="http://purl.org/dc/elements/1.1/">
|
|
<head>
|
|
<title>Prime Minister to Fly Out Tomorrow</title>
|
|
<meta property="dc:creator">John Doe</meta>
|
|
</meta>
|
|
</head>
|
|
<body>
|
|
<h1>Prime Minister to Fly Out Tomorrow</h1>
|
|
<span>By John Doe</span>
|
|
<p>
|
|
Tomorrow the <span resource="p:TonyBlair">Prime Minister</span> is expected to fly to ...
|
|
</p>
|
|
</body>
|
|
</html>
|
|
</pre>
|
|
</div>
|
|
|
|
<p>This often seems like unnecessary duplication, and may add an additional
|
|
maintenance headache. However, specifying the <code>property</code>
|
|
explicitly rather than using the default, we can abbreviate this document
|
|
to:</p>
|
|
|
|
<div class="exampleInner">
|
|
<pre><html xmlns:dc="http://purl.org/dc/elements/1.1/">
|
|
<head>
|
|
<title>Prime Minister to Fly Out Tomorrow</title>
|
|
</head>
|
|
<body>
|
|
<h1>Prime Minister to Fly Out Tomorrow</h1>
|
|
<span>By
|
|
<span property="dc:creator">John Doe</span>
|
|
</span>
|
|
<p>
|
|
Tomorrow the <span resource="p:TonyBlair">Prime Minister</span> is expected to fly to ...
|
|
</p>
|
|
</body>
|
|
</html>
|
|
</pre>
|
|
</div>
|
|
|
|
<table border="1" summary="Editorial note">
|
|
<tbody>
|
|
<tr>
|
|
<td align="left" valign="top" width="50%"><b>Editorial note</b></td>
|
|
<td align="right" valign="top" width="50%"> </td>
|
|
</tr>
|
|
<tr>
|
|
<td colspan="2" align="left" valign="top">Note that this means we have
|
|
to refine the rule earlier, in which there was an implied
|
|
"xhtml2:reference" value. We should now say that this is the default
|
|
value for <code>properrty</code>.</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
|
|
<div class="div2">
|
|
<h3><a name="div254348288" id="div254348288"></a>4.4 Representing Value</h3>
|
|
|
|
<p>Just as we sometimes need to refer to a resource, so sometimes we need to
|
|
indicate clearly what the vallue of something is. In this example:</p>
|
|
|
|
<div class="exampleInner">
|
|
<pre>Yesterday in Parliament the <span resource="p:WinstonChurchill">Prime Minister</span>
|
|
said that we will fight on the beaches ...
|
|
</pre>
|
|
</div>
|
|
|
|
<p>we would like to indicate that the speech was made on June 4th, 1940,
|
|
since this would allow sophisticated metadata searches to be made. To do this
|
|
we use the attributes <code>val</code> and <code>datatype</code>:</p>
|
|
|
|
<div class="exampleInner">
|
|
<pre><span datatype="xsd:date" val="1940-06-04">Yesterday</span> in Parliament the <span resource="p:WinstonChurchill">Prime Minister</span>
|
|
said that we will fight on the beaches ...
|
|
</pre>
|
|
</div>
|
|
|
|
<p>This would create the following triples:</p>
|
|
|
|
<div class="exampleInner">
|
|
<pre><> xhtml2:reference "1940-06-04"^^xsd:date , p:WinstonChurchill .
|
|
</pre>
|
|
</div>
|
|
</div>
|
|
</div>
|
|
|
|
<div class="div1">
|
|
<h2><a name="div154351016" id="div154351016"></a>5 Bibliography</h2>
|
|
<dl>
|
|
<dt class="label"><a name="XHTML-2.0-LINKTYPES"
|
|
id="XHTML-2.0-LINKTYPES"></a>XHTML-2.0-LINKTYPES</dt>
|
|
<dd>XHTML 2.0 Link Types (See <a
|
|
href="http://www.w3.org/TR/2003/WD-xhtml2-20030506/abstraction.html#dt_LinkTypes">http://www.w3.org/TR/2003/WD-xhtml2-20030506/abstraction.html#dt_LinkTypes</a>.)</dd>
|
|
<dt class="label"><a name="N3-PRIMER" id="N3-PRIMER"></a>N3-PRIMER</dt>
|
|
<dd>N3 Primer (See <a
|
|
href="http://www.w3.org/2000/10/swap/Primer">http://www.w3.org/2000/10/swap/Primer</a>.)</dd>
|
|
</dl>
|
|
</div>
|
|
</div>
|
|
</body>
|
|
</html>
|