You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
510 lines
20 KiB
510 lines
20 KiB
<html xmlns="http://www.w3.org/1999/xhtml">
|
|
<head>
|
|
<meta name="generator" content=
|
|
"HTML Tidy for Mac OS X (vers 31 October 2006 - Apple Inc. build 13), see www.w3.org" />
|
|
<title>
|
|
Interpretation properties -- Ideas about Web architecture
|
|
</title>
|
|
<link rel="Stylesheet" href="di.css" type="text/css" />
|
|
<meta http-equiv="Content-Type" content=
|
|
"text/html; charset=us-ascii" />
|
|
</head>
|
|
<body bgcolor="#DDFFDD" text="#000000" lang="en" xml:lang="en">
|
|
<address>
|
|
Tim Berners-Lee<br />
|
|
Date: 1998, last change: $Date: 2009/08/27 21:38:07 $<br />
|
|
Status: personal view only. Editing status: first draft.
|
|
</address>
|
|
<p>
|
|
<a href="./">Up to Design Issues</a>
|
|
</p>
|
|
<h3>
|
|
Design Issues - Ideas about Web Architecture
|
|
</h3>
|
|
<p>
|
|
<em>This page assumes an imaginary namespace referred to as
|
|
play: which is used only for the sake of example. The readers
|
|
is assumed to be able to guess its specification.</em>
|
|
</p>
|
|
<hr />
|
|
<h1>
|
|
<a name="Interpreta" id="Interpreta">Interpretation
|
|
properties</a>
|
|
</h1>
|
|
<p>
|
|
<em>Abstract: Natural languages, encodings, and similar
|
|
relationships between one abstract thing and another, are
|
|
best modeled in RDF as properties. I call these
|
|
Interpretation properties in that they express the
|
|
relationship between one value and that value interpreted (or
|
|
processed in the imagination) in a specific way.</em>
|
|
</p>
|
|
<h2>
|
|
<a name="problem" id="problem">The problem of annotating
|
|
natural language</a>
|
|
</h2>
|
|
<p>
|
|
There has to date (2000/02) been a consistent muddle in the
|
|
RDF community about how to represent the natural language of
|
|
a string. In XML it is simple, because you never have to
|
|
exactly explain what you mean. You can mark up span of text
|
|
and declare it to be French.
|
|
</p>
|
|
<blockquote>
|
|
<p>
|
|
His name was <html:span
|
|
xml:lang="fr">Jean-Fran&ccedilla;ois</html:span>
|
|
but we called him Dan.
|
|
</p>
|
|
</blockquote>
|
|
<p>
|
|
Under pressure from the XML community to be standard, the RDF
|
|
spec included this attribute as the official RDF way to
|
|
record that a string was in a given language. This was a
|
|
mistake, as the attribute was thrown into the syntax but not
|
|
into the model which the spec was defining.
|
|
</p>
|
|
<p>
|
|
Consider the <a href="Identity.html#this">example</a> in the
|
|
<a href="Identity.html">identity section</a>,
|
|
</p>
|
|
<pre>
|
|
<rdf:description>
|
|
<rdf:type>http://www.people.org/types#person</a>
|
|
<play:name>Ora Yrjö Uolevi Lassila</play:name>
|
|
<play:mailbox resource="mailto:ora.lassila@research.nokia.com"/>
|
|
<play:homePage resource="http://www.w3.org/People/Lassila"/>
|
|
</rdf:description>
|
|
</pre>
|
|
<p>
|
|
Now that represents five nodes in the RDF graph: the
|
|
anonymous node for Ora himself (who has no web address) and
|
|
the four arcs specifying that this thing is of type person,
|
|
and has a common name, email address and home page as given.
|
|
</p>
|
|
<p>
|
|
Where to we add the language property? Of course we could add
|
|
a language attribute to the XML, but that would be lost on
|
|
translation into the RDF model: no triple would result.
|
|
</p>
|
|
<h3>
|
|
<a name="Attempt2" id="Attempt2">Attempt 1: a property of the
|
|
person?</a>
|
|
</h3>
|
|
<p>
|
|
Many specifications such as iCalendar (see my notes@link)
|
|
would add another property to the definition of the person.
|
|
</p>
|
|
<pre>
|
|
<rdf:description>
|
|
<rdf:type>http://www.people.org/types#person</a>
|
|
<play:name>Ora Yrjö Uolevi Lassila</play:name>
|
|
<play:namelang>fi</play:namelang>
|
|
<play:mailbox>ora.lassila@research.nokia.com</play:mailbox>
|
|
<play:homePage>http://www.w3.org/People/Lassila/</play:homepage>
|
|
</rdf:description>
|
|
</pre>
|
|
<p>
|
|
Here, the property <em>play:namelang</em> is defined to mean
|
|
"A has a name which is in natural language B". In the
|
|
iCalendar spec, the definition more complex in that the
|
|
<em>lang</em> property is in same cases the language of a
|
|
name and in other cases that of the object's description.
|
|
This is a modeling muddle. The nice thing about doing it this
|
|
way is that the structure is kept flat, and pre-XML systems
|
|
such as RFC822 (email etc) headers have a syntax which can
|
|
only cope with this.
|
|
</p>
|
|
<p>
|
|
There are many drawbacks to this muddle. Ora may have two
|
|
names, one in Finish and another in English, and the model
|
|
fails to be able to express that. Because the attribute is
|
|
apparently tied to the person and not obviously attached to
|
|
the name, automatic processing of such a thing is ruled out.
|
|
Clearly, the structure does not reflect the facts of the
|
|
case.
|
|
</p>
|
|
<h3>
|
|
<a name="Attempt1" id="Attempt1">Attempt 2: a property of the
|
|
string?</a>
|
|
</h3>
|
|
<p>
|
|
The second attempt is to make a graph which expresses the
|
|
language as a property of the string itself. Clearly, "Ora
|
|
Yrjö Uolevi Lassila" is Finnish, is it not? Yes, Ora is
|
|
Finnish, but that is different. What we need to say is that
|
|
the string is in the Finnish language. The problem, then,
|
|
becomes that RDF does not allow literal text to be the
|
|
subject of a statement. Never mind, RDF in fact invents the
|
|
<em>rdf:value</em> property which allows us to specify that a
|
|
node is really text, but say other things about it too. This
|
|
is done by introducing an intermediate node.
|
|
</p>
|
|
<pre>
|
|
<rdf:description>
|
|
<rdf:type resource="http://www.people.org/types#person" />
|
|
<play:name rdf:parseType="Resource">
|
|
<rdf:value>Ora Yrjö Uolevi Lassila</rdf:value>
|
|
<play:lang>fi</play:lang>
|
|
</play:name>
|
|
<play:mailbox resource="mailto:ora.lassila@research.nokia.com"/>
|
|
<play:homePage resource="http://www.w3.org/People/Lassila">
|
|
</rdf:description>
|
|
</pre>
|
|
<p>
|
|
There we have it, and in an RDF graph at least very pretty it
|
|
looks. And indeed, we could work with this, apart from the
|
|
fact that we have made another modeling error. It is not true
|
|
that the language is a property of the text string. After
|
|
all, the string "Tim" - is that English (short for Timothy?
|
|
or French (short for "Timothé")? I don't need to add a
|
|
long list of text strings which can be interpreted as one
|
|
language or as another. A system which made the assertion
|
|
that the string itself was fundamentally English would simply
|
|
be not representing the case.
|
|
</p>
|
|
<h3>
|
|
<a name="Attempt" id="Attempt">Attempt 3: a relationship
|
|
between them.</a>
|
|
</h3>
|
|
<p>
|
|
In fact, the situation is that Ora's name is a natural
|
|
language object, which is the interpretation according to
|
|
Finnish of the string "Ora Yrjö Uolevi Lassila". In
|
|
other words, Finish the language is the relationship between
|
|
Ora's name and the string. In RDF, we model a binary
|
|
relationship with a property.
|
|
</p>
|
|
<pre>
|
|
<rdf:description>
|
|
<rdf:type>http://www.people.org/types#person</a>
|
|
<play:name>
|
|
<lang:fi>Ora Yrjö Uolevi Lassila</lang:fi>
|
|
</play:name>
|
|
<play:mailbox>ora.lassila@research.nokia.com</play:mailbox>
|
|
<play:homePage>http://www.w3.org/People/Lassila/</play:homepage>
|
|
</rdf:description>
|
|
</pre>
|
|
<p>
|
|
This works much better. Ora has a name which is the Finnish
|
|
"Ora". This allows an RDF system to create a node for that
|
|
string, and a "Finish" link from the concept of Ora the
|
|
person, maybe a Danish link from the concept of the currency,
|
|
and an old english link from the concept of weight (1/15
|
|
pound), not to mention a Latin link from the concept of the
|
|
shore.
|
|
</p>
|
|
<p>
|
|
A problem we may feel is we would like the language to be a
|
|
string, so that we can reference the ISO spec for all such
|
|
things, but there is of course no reason why the spec for the
|
|
lang: space should not reference the same spec.
|
|
</p>
|
|
<p>
|
|
Another problem we might feel is that it is reasonable for
|
|
the play:name to expect a string, and in most cases it may
|
|
get a string: what is the poor system supposed to do in order
|
|
to accommodate finding a natural language object in place of
|
|
a string? I guess making a class which includes all strings
|
|
and all natural language objects is the best way to go. Any
|
|
use of string which did not allow also such natural language
|
|
object makes life much more difficult for multilingual
|
|
software- so this is serious problem.
|
|
</p>
|
|
<p>
|
|
<em>[[This leads us on to another interesting question of
|
|
packaging in RDF. There is a requirement in XML packaging and
|
|
in email packaging and it seems quite similarly in RDF that
|
|
when you ask me for something of type X I must be able to
|
|
give you something of type package which happens to include
|
|
the X you asked for and also some information for your
|
|
edification. But that is another story.@@@ eleborate and
|
|
define properties or syntax@@@]]</em>
|
|
</p>
|
|
<p>
|
|
What is really important is that we are using the ability of
|
|
RDF to talk about abstract things, just as when we identified
|
|
people by the resources they were associated with, but
|
|
avoided pretending that any person had a definitive URI.
|
|
</p>
|
|
<h2 id="Interpreta1">
|
|
Datatypes as interpretation properties<sup><a href="#L380"
|
|
name="L382" id="L382">*</a></sup>
|
|
</h2>
|
|
<p>
|
|
<em>Datatypes</em> here I mean in the sense of the atomic
|
|
types in a programming language, or for example XML Datatypes
|
|
(XML schema part 2). Defining datatypes involves defining
|
|
constraints on an input string (for example specifying what a
|
|
valid date is as a regular expression) and specifying the
|
|
mathematical abstract individuals which instances of a type
|
|
represent. One can model the relationship between the
|
|
representation and the abstract value and the string using a
|
|
property.
|
|
</p>
|
|
<table border="0" width="100%">
|
|
<tbody>
|
|
<tr>
|
|
<td valign="middle">
|
|
<pre>
|
|
<rdf:Description about="#myshoe">
|
|
<shoe:size>10</shoe:size>
|
|
</rdf:Description>
|
|
</pre>
|
|
</td>
|
|
<td valign="middle">
|
|
<span class="N3"><#myshoe> shoe:size "10".</span>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
<p>
|
|
This doesn't tell us what it is 10 of. We could go through
|
|
life without any model of types: we could define a shoe size
|
|
as being a decimal string for a number inches. There are many
|
|
questions and tradeoffs which datatype designers make (for
|
|
example,
|
|
</p>
|
|
<ul>
|
|
<li>Can you tell the type of a value from the string
|
|
representation in every case? (eg 1.4e4 vs 1.4d4 for
|
|
precision)
|
|
</li>
|
|
<li>Are the values of different datatypes distinct? (Eg, is 1
|
|
= 1.0?)
|
|
</li>
|
|
<li>Are the set of datatypes extensible? (Eg, can you add
|
|
complex numbers or prime numbers?)
|
|
</li>
|
|
<li>Does representation equality imply value equality?
|
|
</li>
|
|
<li>Does value equality imply representation equality? (Is
|
|
the only allowed representation the canonical one?)
|
|
</li>
|
|
</ul>
|
|
<p>
|
|
It would be nice to be able to model these questions in
|
|
general in the semantic web, in order describe the properties
|
|
of dat in arbitrary systems. We can introduce interpretation
|
|
properties which link a string to its decimal interpretation
|
|
as number, or a length including units. The problem is that
|
|
the RDF graph which most folks use is the one above. The
|
|
object of shoe:size is "10".
|
|
</p>
|
|
<p>
|
|
The simplistic system corresponding exactly to the <a href=
|
|
"#Attempt2">Attempt 1 above</a>, is to declare that shoe:size
|
|
is of class integer. This implies (we then say) that any
|
|
value is a decimal string. Given the string and the type we
|
|
can conclude the abstract value, the integer ten. This works.
|
|
It is the system used by XML datatytpes whose answers for the
|
|
questions above are as I understand it [No, Yes, Yes, Yes,
|
|
No]. A snag is that you can't compare two values unless you
|
|
know the datatypes.
|
|
</p>
|
|
<p>
|
|
To model the representation explicitly in the RDF it seems
|
|
you have to introduce another node and arc, which is a pain.
|
|
</p>
|
|
<table border="0" width="100%">
|
|
<tbody>
|
|
<tr>
|
|
<td valign="middle">
|
|
<pre>
|
|
<rdf:Description about="#myshoe">
|
|
<shoe:size>
|
|
<rdf:value>10</rdf:value>
|
|
</shoe:size>
|
|
</rdf:Description>
|
|
</pre>
|
|
</td>
|
|
<td valign="middle">
|
|
<span class="N3"><#myshoe> shoe:size [ rdf:value
|
|
"10" ].</span>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
<p>
|
|
We can then define rdf:value to express that there is some
|
|
datatype relation which relates the size of the shoe to "10".
|
|
All datatype relations are subProperties of rdf:value with
|
|
this system. Once it is that form, the datatype information
|
|
can be added to the graph. You have the choice of asserting
|
|
that the object is of a given class, and deducing that the
|
|
datatype relation must be a certain one. You can nest
|
|
interpretation properties - interpreting a string as a
|
|
decimal and then as a length in feet. But this is not
|
|
possible without that extra node. One wonders about radically
|
|
changing the way all RDF is parsed into triples, so as to
|
|
introduce the extra abstract node for every literal --
|
|
frightful. One wonders about declaring "10" to be a generic
|
|
resource, an abstraction associated with the set of all
|
|
things for which "10" is a representation under some datatype
|
|
relation. This is frightful too you don't have "equals" any
|
|
more in the sense you used to have it.
|
|
</p>
|
|
<p>
|
|
Instead of adding an extra arc in series with the original,
|
|
we can leave all Properties such as shoe:size as being rather
|
|
vague relations between the shoe and some string
|
|
representation, and then using a functional property (say
|
|
<code>rdf:actual)</code> to relate the shoe:size to a (more
|
|
useful) property whose object is a typed abstract value.
|
|
</p>
|
|
<pre>
|
|
{ <#myshoe> shoe:size "10" } log:implies
|
|
{ <#myshoe> [is rdf:actual of shoe:size] [rdf:value "10"] } .
|
|
</pre>
|
|
<p>
|
|
<em>@@@ No clear way forward for describing datatypes in
|
|
RDF/DAML (2001/1) @@</em>
|
|
</p>
|
|
<h2>
|
|
<a name="More" id="More">More examples</a>
|
|
</h2>
|
|
<p>
|
|
Interpretation properties was the name I have arbitrarily
|
|
chosen for this sort of use. I am not sure whether it is a
|
|
good word. But I want to encourage their use. Base 64
|
|
encoding is another example. It comes up everywhere, but XML
|
|
Digital Signature is one place.
|
|
</p>
|
|
<pre>
|
|
<rdf:description>
|
|
<play:name parseType="Resource">
|
|
<lang:fi parseType="Resource">
|
|
<enc:base64>jksdfhher78f8e47fy87eysady87f7sea</enc:base64>
|
|
</lang:fi>
|
|
</play:name>
|
|
</rdf:description>
|
|
</pre>
|
|
<p>
|
|
Another example is type coercion. Suppose there is a need to
|
|
take something of datetime and use it as a date:
|
|
</p>
|
|
<pre>
|
|
<rdf:description>
|
|
<play:event parseType="Resource">
|
|
<play:start parseType="Resource">
|
|
<play:date>2000-01-31 12:00ET</play:date>
|
|
</play:start>
|
|
<play:sumary>The Bryn Poeth Uchaf Folk festival</play:summary>
|
|
</play:event>
|
|
</rdf:description>
|
|
</pre>
|
|
<p>
|
|
Such properties often have uniqueness and/or unambiguity
|
|
properties. <em>enc:base64</em> for example is clearly a
|
|
reversible transformation. It it relates two strings, on
|
|
printable and the other a byte string with no other
|
|
constraints. The byte string could not in general be
|
|
represented in an XML document. The definition of
|
|
<em>enc:base64</em> is that A when encoded in base 64 yields
|
|
A. This allows any processor, given B to derive A. The
|
|
specification of the encoding namespace (here refereed to by
|
|
prefix <em>enc:</em>) could be that any conforming processor
|
|
must be able to accept a base64 encoding of a string in any
|
|
place that a string is acceptable.
|
|
</p>
|
|
<p>
|
|
Interpretation properties make it clear what is going on. For
|
|
example,
|
|
</p>
|
|
<pre>
|
|
<rdf:description about="http://www.w3.org/">
|
|
<play:xml-cannonicalized parseType="Resource">
|
|
<enc:hash-sha-1 parseType="Resource">
|
|
<enc:base64>jd8734djr08347jyd4</enc:base64>
|
|
</enc:hash-sha-1>
|
|
</play:xml-cannonicalized>
|
|
</rdf:description>
|
|
</pre>
|
|
<p>
|
|
clearly makes a statement, using properties quite
|
|
independently defined for the various processes, that the
|
|
base64 encoding of the SHA-1 hash of the canonicalized form
|
|
of the W3C home page is jd8734djr08347jyd4. Compare this
|
|
withe the HTTP situation in which the headers cannot be
|
|
nested, and the encodings and compression and other things
|
|
applied to the body are mentioned as unordered annotations,
|
|
and the spec has to provide a way of making the right
|
|
conclusion about which happened in what order.
|
|
</p>
|
|
<h2>
|
|
Units of Measure (2006)
|
|
</h2>
|
|
<p>
|
|
This pattern applies very well to units of measure.
|
|
</p>
|
|
<p>
|
|
See, for example a simple ontology <a href=
|
|
"http://www.w3.org/2007/ont/unit">http://www.w3.org/2007/ont/unit</a>
|
|
of units of measure.
|
|
</p>
|
|
<h2>
|
|
<a name="Conclusion" id="Conclusion">Conclusion</a>
|
|
</h2>
|
|
<p>
|
|
Representing the interpretation of one string as an abstract
|
|
thing can be done easily with RDF properties. This helps make
|
|
a clean accurate model. However, using the concept for
|
|
datatypes in RDF is incompatible with RDF as we know it
|
|
today.
|
|
</p>
|
|
<hr />
|
|
<p>
|
|
See also:
|
|
</p>
|
|
<ul>
|
|
<li>
|
|
<a href="Identity.html">Expressing the identity of real
|
|
things</a>
|
|
</li>
|
|
</ul>
|
|
<p>
|
|
<em>@@@Needs circle-and-arrow pictures for each attempt.</em>
|
|
</p>
|
|
<p>
|
|
<a name="L380" href="#L382" id="L380">Note.</a> This section
|
|
followed a discussion about "<em><a href=
|
|
"/2001/01/ct24">Using XML Schema Datatypes in RDF and
|
|
DAML+OIL</a></em> with DWC.
|
|
</p>
|
|
<p>
|
|
<a href="mailto:gruber@ksl.stanford.edu">Thomas R. Gruber</a>
|
|
and Gregory R. Olsen, KSL <a href=
|
|
"http://www-ksl.stanford.edu/knowledge-sharing/papers/engmath.html">
|
|
"An Ontology for Engineering Mathematics"</a> in Jon Doyle,
|
|
Piero Torasso, & Erik Sandewall, Eds., <em>Fourth
|
|
International Conference on Principles of Knowledge
|
|
Representation and Reasoning</em>, Gustav Stresemann
|
|
Institut, Bonn, Germany, Morgan Kaufmann, 1994. <em>A non-RDF
|
|
but thorough treatement including units of measure as scalar
|
|
quantities.</em>
|
|
</p>
|
|
<p>
|
|
Compare with <a href=
|
|
"http://icosym-nt.cvut.cz/kifb/en/ont/sumo-units-of-measure.html">
|
|
SUMO units of Measure</a> which seems have units as
|
|
instances, and multupliers such as kilo, giga, etc as
|
|
functions.
|
|
</p>
|
|
<p>
|
|
A ittle off-topic, On linear and area memasure, John Baez's
|
|
<a href="http://www.math.ucr.edu/home/baez/inches.html">"Why
|
|
are there 63360 inches per mile?"</a> is good reaing.
|
|
</p>
|
|
<hr />
|
|
<p>
|
|
<a href="Overview.html">Up to Design Issues</a>
|
|
</p>
|
|
<p>
|
|
<a href="../People/Berners-Lee">Tim BL</a>
|
|
</p>
|
|
<p>
|
|
(names of certain characters may have been misspelled to
|
|
protect the innocent ;-)
|
|
</p>
|
|
</body>
|
|
</html>
|