You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
365 lines
14 KiB
365 lines
14 KiB
<?xml version="1.0" encoding="UTF-8"?><!--*- nxml -*-->
|
|
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
|
|
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
|
|
<html xmlns="http://www.w3.org/1999/xhtml">
|
|
<head>
|
|
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
|
|
<title>Gleaning Resource Descriptions from Dialects of Languages
|
|
(GRDDL)</title>
|
|
<style type="text/css">
|
|
.issue {
|
|
background-color:#dfd;
|
|
border: thin solid black;
|
|
color:black;
|
|
}
|
|
|
|
.designSketch {
|
|
background-color:#fdf;
|
|
border: thin solid black;
|
|
color:black;
|
|
}
|
|
|
|
.illustration {
|
|
margin-left:auto;
|
|
margin-right:auto;
|
|
text-align:center;
|
|
}
|
|
|
|
.example {
|
|
margin-left:auto;
|
|
margin-right:auto;
|
|
padding-top:0.5em;
|
|
padding-bottom:0.5em;
|
|
width:70%;
|
|
border-top:thin dashed black;
|
|
border-bottom:thin dashed black;
|
|
}</style>
|
|
<link rel="stylesheet" type="text/css" href="http://www.w3.org/StyleSheets/TR/W3C-CG-NOTE" />
|
|
</head>
|
|
|
|
<body xml:lang="en" lang="en">
|
|
|
|
<div class="head">
|
|
<a href="http://www.w3.org/"><img alt="W3C" src="http://www.w3.org/Icons/w3c_home"
|
|
height="48" width="72" /></a>
|
|
|
|
<h1>Gleaning Resource Descriptions from Dialects of Languages (GRDDL)</h1>
|
|
|
|
<h2>W3C Coordination Group Note 13 April 2004</h2>
|
|
<dl>
|
|
<dt>This Version:</dt>
|
|
<dd><a href="http://www.w3.org/TR/2004/NOTE-grddl-20040413/">http://www.w3.org/TR/2004/NOTE-grddl-20040413/</a></dd>
|
|
<dt>Latest Version:</dt>
|
|
<dd><a
|
|
href="http://www.w3.org/TR/grddl/">http://www.w3.org/TR/grddl/</a></dd>
|
|
<dt>Authors:</dt>
|
|
<dd><a href="/People/Dom/">Dominique Hazaël-Massieux</a></dd>
|
|
<dd><a
|
|
href="/People/Connolly/">Dan Connolly</a></dd>
|
|
</dl>
|
|
|
|
<p class="copyright"><a
|
|
href="http://www.w3.org/Consortium/Legal/ipr-notice#Copyright">Copyright</a>
|
|
© 2003, 2004 <a href="http://www.w3.org/"><acronym
|
|
title="World Wide Web Consortium">W3C</acronym></a><sup>®</sup> (<a
|
|
href="http://www.csail.mit.edu/"><acronym
|
|
title="Massachusetts Institute of Technology">MIT</acronym></a>, <a
|
|
href="http://www.ercim.org/"><acronym
|
|
title="European Research Consortium for Informatics and Mathematics">ERCIM</acronym></a>,
|
|
<a href="http://www.keio.ac.jp/">Keio</a>), All Rights Reserved. W3C <a
|
|
href="http://www.w3.org/Consortium/Legal/ipr-notice#Legal_Disclaimer">liability</a>,
|
|
<a
|
|
href="http://www.w3.org/Consortium/Legal/ipr-notice#W3C_Trademarks">trademark</a>,
|
|
<a href="http://www.w3.org/Consortium/Legal/copyright-documents">document
|
|
use</a> and <a
|
|
href="http://www.w3.org/Consortium/Legal/copyright-software">software
|
|
licensing</a> rules apply.</p>
|
|
</div>
|
|
<hr />
|
|
|
|
<h2>Abstract</h2>
|
|
|
|
<p>This document presents GRDDL, a mechanism for encoding RDF statements in
|
|
XHTML and XML to be extracted by programs such as XSLT transformations.</p>
|
|
|
|
<div>
|
|
<h2>Status of This Document</h2>
|
|
|
|
<p><em>This section describes the status of this document at the time
|
|
of its publication. Other documents may supersede this document. A
|
|
list of current W3C publications and the latest revision of this
|
|
technical report can be found in the <a
|
|
href="http://www.w3.org/TR/">W3C technical reports index</a> at
|
|
<tt>http://www.w3.org/TR/</tt>.</em></p>
|
|
|
|
|
|
<p>As part of the work of the <a
|
|
href="http://www.w3.org/2001/sw/Activity">W3C Semantic Web
|
|
Activity</a>, the <a href="/2001/sw/CG/">Semantic Web Coordination Group</a> (Member-only) and the <a href="/MarkUp/">HTML Working
|
|
Group</a> started a task force on RDF in XHTML. This draft is a snapshot
|
|
of one of the designs discussed in that task force.</p>
|
|
|
|
<p>Please send review comments, implementation experience reports,
|
|
etc. to <a href= "mailto:public-rdf-in-xhtml-tf@w3.org"
|
|
>public-rdf-in-xhtml-tf@w3.org</a>, a mailing list with <a
|
|
href="http://lists.w3.org/Archives/Public/public-rdf-in-xhtml-tf/">public
|
|
archive</a>.</p>
|
|
|
|
<p>The <a
|
|
href="http://esw.w3.org/topic/EmbeddingRDFinHTML">EmbeddingRDFinHTML</a>
|
|
wiki topic is also available as a shared space for collected wisdom on
|
|
related topics.</p>
|
|
|
|
<p>A related <a
|
|
href="http://www.w3.org/2004/01/rdxh/specbg.html">design history and
|
|
rationale</a> discusses contribution of this draft to RDF issues such
|
|
as <a
|
|
href="http://www.w3.org/2000/03/rdf-tracking/#faq-html-compliance"
|
|
>faq-html-compliance</a> and <a
|
|
href="http://www.w3.org/2000/03/rdf-tracking/#rdfms-validating-embedded-rdf"
|
|
>rdfms-validating-embedded-rdf</a> and Web Architecture issues such as
|
|
<a href="http://www.w3.org/2001/tag/issues.html?type=1#RDFinXHTML-35"
|
|
>RDFinXHTML-35</a> and <a
|
|
href="http://www.w3.org/2001/tag/issues.html?type=1#namespaceDocument-8"
|
|
>namespaceDocument-8</a>.</p>
|
|
|
|
|
|
<p>This is something of a design sketch, but it is backed by running
|
|
code. We provide pair of online services, <a
|
|
href="http://www.w3.org/2003/11/rdf-in-xhtml-demo">one demo for
|
|
XHTML</a> and <a
|
|
href="http://www.w3.org/2004/01/rdxh/grddl-xml-demo">one demo for
|
|
generic XML</a> on an experimental, best-effort basis.</p>
|
|
|
|
<p>The editors are aware of a few <span class="issue">remaining issues,
|
|
marked up like this <q>@@@</q></span>.</p>
|
|
|
|
<p>A <a href="#changes">log of changes</a> is appended.</p>
|
|
|
|
<p><em>Publication as a Coordination Group Note does not imply
|
|
endorsement by the W3C Membership. This is a draft document and may be
|
|
updated, replaced or obsoleted by other documents at any time. It is
|
|
inappropriate to cite this document as other than work in
|
|
progress.</em></p>
|
|
|
|
|
|
</div>
|
|
|
|
<div>
|
|
<h2 id="toc">Contents</h2>
|
|
<ol>
|
|
<li><a href="#intro">Introduction</a></li>
|
|
<li><a href="#grddl-xhtml">GRDDL for XHTML</a></li>
|
|
<li><a href="#grddl-xml">GRDDL for XML</a></li>
|
|
<li><a href="#ns-bind">GRDDL for XML Namespace Documents</a></li>
|
|
<li><a href="#sec">Security Considerations</a></li>
|
|
<li class="issue">@@ References</li>
|
|
</ol>
|
|
<ul>
|
|
<li><a href="#changes">Changelog</a></li>
|
|
</ul>
|
|
|
|
<h3 id="toc-app">Supplementary Material</h3>
|
|
<ul>
|
|
<li><a
|
|
href="http://www.w3.org/2004/lambda/Sites/index.html">Example
|
|
Homepage with Dublin Core, GeoURL, RSS, Creative Commons, etc.</a></li>
|
|
<li><a id="notes" href="http://www.w3.org/2004/01/rdxh/specbg.html">Design Histoy and Rationale</a></li>
|
|
</ul>
|
|
</div>
|
|
|
|
<div>
|
|
<h2 id="intro"><span class="gen">1.</span> Introduction</h2>
|
|
|
|
<p>An article by J. Kunze in 1999, <cite><a
|
|
href="http://www.ietf.org/rfc/rfc2731.txt">Encoding Dublin Core Metadata in
|
|
HTML</a></cite>, explains one way that the Dublin Core community encodes its
|
|
metadata in HTML documents. This metadata can also be expressed in the
|
|
Resource Description Framework (<a href="http://www.w3.org/RDF/">RDF</a>).</p>
|
|
|
|
<p>The mapping between the HTML encoding and the RDF encoding can be
|
|
represented as an XSLT transformation, <a
|
|
href="http://www.w3.org/2000/06/dc-extract/dc-extract.xsl">dc-extract.xsl</a>:</p>
|
|
|
|
<div class="illustration">
|
|
<img src="dc-extract.png" alt="diagram: HTML to RDF via dc-extract.xsl" /><br
|
|
/>
|
|
Decoding HTML metadata to RDF <br />
|
|
<small>(<a href="dc-extract.svg">svg</a>)</small></div>
|
|
|
|
<p>If the HTML author understood and agreed to these encoding conventions,
|
|
then their HTML document will conform to the syntactic conventions. In this
|
|
case, the mapping preserves the author's meaning. But an author may have
|
|
<em>accidentally</em> conformed to the syntactic conventions without any
|
|
knowledge of Dublin Core at all. In that case, the mapping most likely does
|
|
<em>not</em> preserve the author's meaning.</p>
|
|
|
|
<h2 id="grddl-xhtml"><span class="gen">2.</span> The GRDDL profile for
|
|
XHTML</h2>
|
|
|
|
<p>The HTML specification, in section <a href=
|
|
"http://www.w3.org/TR/1999/REC-html401-19991224/struct/global.html#h-7.4.4.3"
|
|
>7.4.4.3 Meta data profiles</a> provides a mechanism for authors to
|
|
use particular metadata vocabularies and thereby indicate the author's
|
|
intent to use those terms in accordance with the conventions of the
|
|
community that originated the terms.</p>
|
|
|
|
<blockquote>
|
|
<p>Authors may wish to define additional link types not described in this
|
|
specification. If they do so, they should use a <a
|
|
href="http://www.w3.org/TR/1999/REC-html401-19991224/struct/global.html#profiles">profile</a>
|
|
to cite the conventions used to define the link types.</p>
|
|
</blockquote>
|
|
|
|
<p><dfn>GRDDL</dfn> is such a profile; it's a mechanism for <b>G</b>leaning
|
|
<b>R</b>esource <b>D</b>escriptions from <b>D</b>ialects of <b>L</b>anguages.
|
|
Use of the <tt><a
|
|
href="/2003/g/data-view">http://www.w3.org/2003/g/data-view</a></tt> profile
|
|
indicates that <em>RDF statements that result from transformation of the HTML
|
|
document to RDF by designated algorithms are part of the document's
|
|
meaning.</em></p>
|
|
|
|
<p>In this profile, the <tt>transformation</tt> link relationship relates a
|
|
document to an algorithm for for gleaning resource descriptions from the
|
|
dialect the document is written in.</p>
|
|
|
|
<div class="illustration">
|
|
<img src="processing.png" alt="diagram: link to transformation" /><br />
|
|
Decoding HTML metadata to RDF <br />
|
|
<small>(<a href="processing.svg">svg</a>)</small>
|
|
|
|
</div>
|
|
|
|
<p class="issue">@@@ Should we namespace-qualify token used in
|
|
<code>rel</code>?cf <a
|
|
href="http://lists.w3.org/Archives/Public/public-rdf-in-xhtml-tf/2004Jan/0005.html">Profiles
|
|
attribute: A format to be defined</a> Karl Dubost 15 Jan 2004.</p>
|
|
|
|
<p>For example:</p>
|
|
<pre class="example"><html xmlns="http://www.w3.org/1999/xhtml">
|
|
<head profile="http://www.w3.org/2003/g/data-view">
|
|
<title>Some Document</title>
|
|
<link rel="transformation"
|
|
href="http://www.w3.org/2000/06/dc-extract/dc-extract.xsl" />
|
|
<meta name="DC.Subject"
|
|
content="ADAM; Simple Search; Index+; prototype" />
|
|
...
|
|
</head>
|
|
...
|
|
</html></pre>
|
|
|
|
<p>The following RDF statement is part of the meaning of this document:</p>
|
|
<pre class="example"><rdf:RDF
|
|
xmlns:dc="http://purl.org/dc/elements/1.1/"
|
|
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
|
|
>
|
|
<rdf:Description rdf:about="">
|
|
<dc:subject>ADAM; Simple Search; Index+; prototype</dc:subject>
|
|
</rdf:Description>
|
|
</rdf:RDF></pre>
|
|
|
|
<p>Transformation algorithms <b>should</b> be represented in XSLT. While
|
|
javascript, C, or any other programming language technically expresses the
|
|
relevant information, XSLT is specifically designed to express XML to XML
|
|
transformations and has some good safety characteristics. Other
|
|
representations <b>may</b> be used by prior agreement of all concerned
|
|
parties.</p>
|
|
|
|
<p>Transformation algorithms <b>should</b> be well-defined functions whose
|
|
only input is the source document. The use of the XSLT
|
|
<code>document()</code> function to incorporate other data at transformation
|
|
time is an <b>error</b>.</p>
|
|
|
|
<p class="issue">Limitations on <code>xsl:import</code>?</p>
|
|
|
|
<p>Note that an XHTML document may conform to a number of dialects
|
|
simultaneously and link to more than one decoding algorithm. For example, the
|
|
fictional <a
|
|
href="http://www.w3.org/2004/lambda/Sites/index.html">Joe
|
|
Lambda's Homepage</a> demonstrates a mixture of Dublin Core, Creative
|
|
Commons, RSS, FOAF, and geoURL dialects.</p>
|
|
</div>
|
|
|
|
<div>
|
|
<h2 id="grddl-xml"><span class="gen">3.</span> The GRDDL attribute in XML</h2>
|
|
|
|
<p>The GRDDL profile mechanism is a special case of GRDDL designed to fit
|
|
within the syntax of XHTML 1.0. The general form of GRDDL is an attribute
|
|
suitable for use with a wide variety of XML dialects.</p>
|
|
|
|
<p>Use of the <code>interpreter</code> attribute in the
|
|
<code>http://www.w3.org/2003/g/data-view#</code> namespace on the root
|
|
element of an XML document indicates that <em>RDF statements that result from
|
|
transformation of the HTML document to RDF by designated algorithms are part
|
|
of the document's meaning.</em></p>
|
|
|
|
<p>The value of the <code>grddl:interpreter</code> attribute designates a
|
|
list of algorithms by URI reference. <span class="issue">@@@IRI
|
|
reference?</span></p>
|
|
<p>For example: <em class="issue">update to P3Q example?</em></p>
|
|
<pre class="example"><code><svg xmlns="http://www.w3.org/2000/svg"
|
|
xmlns:data-view="http://www.w3.org/2003/g/data-view#"
|
|
data-view:interpreter="http://www.example.org/2004/01/svg2dc.xsl"
|
|
width="4cm" height="8cm"
|
|
version="1.1" baseProfile="tiny" ></code></pre>
|
|
</div>
|
|
|
|
<div>
|
|
<h2 id="ns-bind"><span class="gen">4.</span> XML Namespaces and embedded RDF</h2>
|
|
|
|
<p>The RDF property
|
|
<code>http://www.w3.org/2003/g/data-view#namespaceTransformation</code>
|
|
links an XML Namespace to an interpreter that may be applied to any document
|
|
which has its root element in that namespace, such that the output of the
|
|
interpreter will be an RDF/XML form of some (or all) of the information
|
|
content of the document.</p>
|
|
|
|
<p>For instance, given the XML Namespace
|
|
<code>http://www.example.net/fooML</code>,</p>
|
|
<div class="example">
|
|
<pre><code><rdf:Description rdf:about="http://www.example.net/fooML">
|
|
<namespaceTransformation xmlns='http://www.w3.org/2003/g/data-view#'
|
|
rdf:resource='http://www.example.net/fooML2rdf.xsl' />
|
|
</rdf:Description></code></pre>
|
|
</div>
|
|
<p>asserts that if an XML document has a root element in the
|
|
<code>http://www.example.net/fooML</code> namespace, and it is run through
|
|
the XSLT style sheet <code>http://www.example.net/fooML2rdf.xsl</code>
|
|
then the result will be valid RDF/XML which is information which can be
|
|
considered to have been expressed by the document.</p>
|
|
</div>
|
|
|
|
<div>
|
|
<h2 id="sec"><span class="gen">5.</span> Security considerations</h2>
|
|
|
|
<p><a href="http://www.faqs.org/rfcs/rfc2046.html">RFC 2046</a>, in
|
|
section 9. Security Considerations says:</p>
|
|
|
|
<blockquote>
|
|
<p>Implementors should pay special attention to the
|
|
security implications of any media types that can cause the remote
|
|
execution of any actions in the recipient's environment. In such
|
|
cases, the discussion of the "application/postscript" type may serve
|
|
as a model for considering other media types with remote execution
|
|
capabilities.</p>
|
|
</blockquote>
|
|
|
|
<p>Given the expressive power of XSLT, and the possibility to access external
|
|
resources from a XSLT style sheet (e.g. through the <code>document</code>
|
|
function or the <code>xsl:import</code> mechanism), implementors should take
|
|
the appropriate measures to prevent malicious usage of this mechanism.</p>
|
|
</div>
|
|
|
|
<div>
|
|
<h2 id="changes"><em>Change History</em></h2>
|
|
|
|
<p>The <a href="http://www.w3.org/2003/11/rdf-in-xhtml-proposal">Nov
|
|
2003 draft</a> is a predecessor of this spec.</p>
|
|
|
|
<p>An <a href="http://www.w3.org/2004/01/rdxh/spec">editor's working draft</a> is also available; v1.11 was announced in <a
|
|
href="http://lists.w3.org/Archives/Public/public-rdf-in-xhtml-tf/2004Jan/0011.html">a
|
|
message of 16Jan</a>.</p>
|
|
|
|
</div>
|
|
</body>
|
|
</html>
|