server_playground/doc/www.w3.org/DesignIssues/RDF-XML


								<html xmlns="http://www.w3.org/1999/xhtml">

								  <head>

								    <meta name="generator" content=

								    "HTML Tidy for Mac OS X (vers 31 October 2006 - Apple Inc. build 13), see www.w3.org" />

								    <meta http-equiv="Content-Type" content="text/html" />

								    <title>

								      Semantic Web: Why RDF is more than XML

								    </title>

								    <link href="di.css" rel="stylesheet" type="text/css" />

								  </head>

								  <body bgcolor="#DDFFDD" text="#000000">

								    <address>

								      Tim Berners-Lee

								      <p>

								        <small>Date: September 1998. Last modified: $Date:

								        1998/10/14 20:17:13 $</small>

								      </p>

								      <p>

								        Status: An attempt to explain the difference between the

								        XML and RDF models. Editing status: Draft. Comments

								        welcome!

								      </p>

								    </address>

								    <p>

								      <a href="Overview.html">Up to Design Issues</a>

								    </p>

								    <hr />

								    <h1>

								      Why RDF model is different from the XML model

								    </h1>

								    <p>

								      This note is an attempt to answer the question, "Why should I

								      use RDF - why not just XML?". This has been a question which

								      has been around ever since RDF started. At the W3C Query

								      Language workshop, there was a clear difference of view

								      between those who wanted to query documents and those who

								      wanted to extract the "meaning" in some form and query that.

								      This is typical. I wrote this note in a frustrated attempt to

								      explain whatthe RDF model was for those who though in terms

								      of the XML model. I later listened to those who thought in

								      terms of the XML model, and tried to writ it the other way

								      around in <a href="XML-Semantics.html">another note</a>. This

								      note assumes that the XML data model in all its complexity,

								      and the RDF syntax as in RDF Model and Syntax, in all its

								      complexity. It doesn't try to map one directly onto the other

								      -- it expresses the RDF model using XML.

								    </p>

								    <p>

								      Let me take as an example a single RDF assertion. Let's try

								      "The author of the <i>page</i> is <i>Ora</i>". This is

								      traditional. In RDF this is a triple

								    </p>

								    <pre>

								triple(author, page, Ora)

								</pre>

								    <p>

								      which you can think of as represented by the diagram

								    </p>

								    <p align="center">

								      <img src="diagrams/aac.gif" width="265" height="73" alt=

								      "page ---has author---&gt; Ora" border="0" />

								    </p>

								    <p>

								      How would this information be typically be represented in

								      XML?

								    </p>

								    <pre>

								&lt;author&gt;

								     &lt;uri&gt;page&lt;/uri&gt;

								     &lt;name&gt;Ora&lt;/name&gt;

								&lt;/author&gt;

								</pre>

								    <p>

								      or maybe

								    </p>

								    <pre>

								&lt;document href="page"&gt;

								   &lt;author&gt;Ora&lt;/author&gt;

								&lt;/document&gt;

								</pre>

								    <p>

								      or maybe

								    </p>

								    <pre>

								&lt;document&gt;

								   &lt;details&gt;

								    &lt;uri&gt;href="page"&lt;/uri&gt;

								    &lt;author&gt;

								        &lt;name&gt;Ora&lt;/name&gt;

								    &lt;/author&gt;

								    &lt;/details&gt;

								&lt;/document&gt;

								</pre>

								    <p>

								      or maybe

								    </p>

								    <pre>

								&lt;document&gt;

								   &lt;author&gt;

								    &lt;uri&gt;href="page"&lt;/uri&gt;

								    &lt;details&gt;

								        &lt;name&gt;Ora&lt;/name&gt;

								    &lt;/details&gt;

								    &lt;/author&gt;

								&lt;/document&gt;


								&lt;document href="http://www.w3.org/test/page" author="Ora" /&gt;

								</pre>

								    <h2>

								      The XML Graph

								    </h2>

								    <p>

								      These are all perfectly good XML documents - and to a person

								      reading then they mean the same thing. To a machine parsing

								      them, they produce different XML trees. Suppose you look at

								      the XML tree

								    </p>

								    <pre>

								&lt;v&gt;

								   &lt;x&gt;

								    &lt;y&gt; a="ppppp"&lt;/y&gt;

								    &lt;z&gt;

								        &lt;w&gt;qqqqq&lt;/w&gt;

								    &lt;/z&gt;

								   &lt;/x&gt;

								&lt;/v&gt;

								</pre>

								    <p>

								      It's not so obvious what to make of it. The element names

								      were a big hint for a human reader.

								    </p>

								    <p>

								      <b>Without looking at the schema</b>, you know things about

								      the document structure, but nothing else. You can't tell what

								      to deduce. You don't know whether <i>ppppp</i> is a <i>y</i>

								      of <i>qqqqq</i>, or <i>qqqqq</i> is a <i>z</i> of

								      <i>ppppp</i> or what. You can't even really tell what real

								      questions can be asked. A source of some confusion is that in

								      the xyz example above, there are lots of questions you

								      <i>can</i> ask. They are questions like,

								    </p>

								    <ul>

								      <li>Is there a w element within a details element?

								      </li>

								      <li>What is the content of the w element within the first x

								      element?

								      </li>

								      <li>What is the content of the w element following the first

								      y element which contains an x element whose a attribute is

								      "pppp"?

								      </li>

								      <li>and so on.

								      </li>

								    </ul>

								    <p>

								      These are all questions about the <i>document</i>. If you

								      know the document schema (a big <i>if</i>) , and if that

								      schema it only gives you a limited number of ways of

								      expressing the same thing (another big <i>if</i>) , then

								      asking these questions can be in fact equivalent to asking

								      questions like

								    </p>

								    <ul>

								      <li>What is the author of <i>page</i>?

								      </li>

								    </ul>

								    <p>

								      This is hairy. It is possible because there is a mapping from

								      XML documents to semantic graphs. In brief, it is hairy

								      because

								    </p>

								    <ul>

								      <li>The mapping is many to one

								      </li>

								      <li>You need a schema to know what the mapping is

								      </li>

								      <li>(The schemas we are talking about for XML at the moment

								      do not include that anyway and would have to have a whole

								      inference language added)

								      </li>

								      <li>The expression you need for querying something in terms

								      of the XML tree is necessarily more complicated than the

								      expression you need for querying something in terms of the

								      RDF tree.

								      </li>

								    </ul>

								    <p>

								      This last is a big one. If you try to write down the

								      expression for the author of a document where the information

								      is in some arbitrary XML schema, you can probably do it

								      though it may or may not be very pretty. If you try to

								      combine more than one property into a combined expression,

								      (give me a list of books by the same author as this one),

								      saying it in XML gets too clumsy to consider.

								    </p>

								    <p>

								      (Think of trying to define the addition of numbers by regular

								      expression operations on the strings. Its possible for

								      addition. When you get to multiplication it gets ridiculous -

								      to solve the problem you would end up reinventing numbers as

								      a separate type.)

								    </p>

								    <p>

								      Looking at the simple XML encoding above,

								    </p>

								    <pre>

								&lt;author&gt;

								     &lt;uri&gt;page&lt;/uri&gt;

								     &lt;name&gt;Ora&lt;/name&gt;

								&lt;/author&gt;

								</pre>

								    <p>

								      it could be represented as a graph

								    </p>

								    <p>

								      <img src="diagrams/xml1.gif" alt=

								      "A graph of the XML tree with 3 element nodes each with name and some with content"

								      width="" height="0" />

								    </p>

								    <p>

								      We can represent the tree more concisely if we make a

								      shorthand by writing the name of each element inside its

								      circle:

								    </p>

								    <p>

								      <img src="diagrams/aab.gif" width="" height="0" />

								    </p>

								    <p>

								      Of course the RDF tree which this represents (although it

								      isn't obvious from the XML tree except to those who know) is

								    </p>

								    <p align="center">

								      <img src="diagrams/aac.gif" width="265" height="73" alt=

								      "page ---has author---&gt; Ora" border="0" />

								    </p>

								    <p>

								      Here we have made a shorthand again by putting making the

								      label for each part its URI.

								    </p>

								    <p>

								      The complexity of querying the XML tree is because there are

								      in general a large number of ways in which the XML maps onto

								      the logical tree, and the query you write has to be

								      independent of the choice of them. So much of the query is an

								      attempt to basically convert the set of all possible

								      representations of a fact into one statement. This is just

								      what RDF does. It gives you some standard ways of writing

								      statements so that however it occurs in a document, they

								      produce the same effect in RDF terms. The same RDF tree

								      results from many XML trees.

								    </p>

								    <p>

								      Wouldn't it be nice if we could label our XML so that when

								      the parser read it, it could find the assertions (triples)

								      and distinguish their subjects and objects, so as to just

								      deduce the logical assertions without needing RDF? This is

								      just what RDF does, though.

								    </p>

								    <h2>

								      The RDF Graph

								    </h2>

								    <p>

								      In fact RDF is very flexible - it can represent this triple

								      in many ways in XML so as to be able to fit in with

								      particular applications, but just to pick one way, you could

								      write the above as

								    </p>

								    <pre>

								&lt;Description about="http://www.w3.org/test/page" Author ="Ora" /&gt;

								</pre>

								    <p>

								      I have missed out the stuff about namespaces. In fact as

								      anyone can create or own the verbs, subjects and objects in a

								      distributed Web, any term has to be identified by a URI

								      somehow. This actual real example works out to in real life

								      more like

								    </p>

								    <pre>

								&lt;?xml version="1.0"?&gt;

								  &lt;Description


								          xmlns="http://www.w3.org/TR/WD-rdf-syntax#"

								          xmlns:s="http://docs.r.us.com/bibliography-info/"


								                  about="http://www.w3.org/test/page"

								                  s:Author ="http://www.w3.org/staff/Ora" /&gt;

								</pre>

								    <p>

								      You can think that the "description" RDF element gives the

								      clue to the parser as to how to find the subjects, objects

								      and verbs in what follows.

								    </p>

								    <p>

								      This is pretty much the most shorthand way of using the base

								      RDF in XML. There are others which are longer, but more

								      efficient when you have, for instance, sets of many

								      properties of the same object. The useful thing is that of

								      course they all convey the same triple

								    </p>

								    <p align="center">

								      <img src="diagrams/aac.gif" width="265" height="73" alt=

								      "page ---has author---&gt; Ora" border="0" />

								    </p>

								    <p>

								      It is a mess when you use questions about a document to try

								      to ask questions about what the document is trying to convey.

								      It will work. In a way. But flagging the grammar explicitly

								      (RDF syntax is a way of doing this) is a whole lot better.

								    </p>

								    <p>

								      Things you can do with RDF which you can't do with XML

								      include

								    </p>

								    <ul>

								      <li>You can parse the semantic tree, which end up giving you

								      a set of (possibly mutually referential) triples and then you

								      can use the ones you want ignoring the ones you don't

								      understand.

								      </li>

								    </ul>

								    <p>

								      Problems with basing you understanding on the structure

								      include

								    </p>

								    <ul>

								      <li>Without having gone to the trouble of getting the schema,

								      or having an application hand-programmed to recognise a

								      particular document type, you can't pick up any semantic

								      information from a document;

								      </li>

								      <li>When an XML schema changes, it could typically introduce

								      new intermediate elements (like "details" in the tree above

								      or "div" is HTML). These may or may or may not invalidate any

								      query which has been based on the structure of the document.

								      </li>

								      <li>If you haven't gone to the trouble of making a semantic

								      model, then you may not have a well defined one.

								      </li>

								    </ul>

								    <p>

								      I'll end this with some examples of the last problem. Clearly

								      they can be avoided by good design even in an XML system

								      which does not use RDF. Using RDF makes things easier.

								    </p>

								    <h2>

								      Get it right

								    </h2>

								    <p>

								      If you haven't gone to the trouble of making a semantic

								      model, then you may not have a well defined one. What does

								      that mean? I can give some general examples of ambiguities

								      which crop up in practice. In RDF, you need a good idea about

								      what is being said about what, and they would tend not to

								      arise.

								    </p>

								    <p>

								      Look at a label on the jam jar which says: "Expires 1999".

								      What expires: the label, or the jam? Here the ambiguity is

								      between a statement about a statement about a document, and a

								      statement about a document.

								    </p>

								    <p>

								      Another example is an element which qualifies another

								      apparently element. When information is assembled in a set of

								      independently thrown in records often ambiguities can arise

								      because of the lack of logic. HTTP headers (or email headers)

								      are a good example. These things can work when one program

								      handles all the records, but when you start mixing records

								      you get trouble. In XML it is all too easy to fall into the

								      trap of having two elements, one describing the author, and a

								      separate one as a flag that the "author" element in fact

								      means not the direct author but that of a work translated to

								      make the book in question. Suddenly, the "author" tag, which

								      used to allow you to conclude that the author of a finnish

								      document must speak finnish, now can be invalidated by an

								      element somewhere else on the record.

								    </p>

								    <p>

								      Another symptom of a specification where the actual semantics

								      may not be as obvious as as first sight is ordering. When we

								      hear that the order of a set of records is important, but the

								      records seem to be defined independently, how can that be?

								      Independent assertions are always valid taken individually or

								      in any order. In a server configuration file, for example, he

								      statement which looks like "any member has access to the

								      page" might really mean "any member has access to the page

								      unless there is no other rule in this file which has matched

								      the page". That isn't what the spec said, but it did mention

								      that the rules were processed in order until one applied.

								      Represented logically, in fact there is a large nested

								      conditional. There is implicit ordering when mail headers

								      say, "this message is encrypted", "this message is

								      compressed", "this message is ASCII encoded", "this message

								      is in HTML". In fact the message is an ASCII encoded version

								      of an encrypted version of a compressed version of a message

								      in HTML. In email headers the logic of this has to be written

								      into the fine print of the specification.

								    </p>

								    <h2>

								      Order in documents

								    </h2>

								    <p>

								      There is something fundamentally different between giving a

								      machine a knowledge tree, and giving a person a document. A

								      document for a person is generally serialized so that, when

								      read serially by a human being, the result will be to build

								      up a graph of associations in that person's head. The order

								      is important.

								    </p>

								    <p>

								      For a graph of knowledge, order is not important, so long as

								      the nodes in common between different statements are

								      identified consistently. (There are concepts of ordered lists

								      which are important although in RDF they break down at the

								      fine level of detail to an unordered set of statements like

								      "The first element of L is x", the "third element of L is z",

								      etc so order disappears at the lowest level.). In

								      machine-readable documents a list of ostensibly independent

								      statements where order is important often turn out to be

								      statements which are by no means independent.

								    </p>

								    <p>

								      Some people have been reluctant to consider using an RDF tree

								      because they do not wish to give up the order, but my

								      assumption is that this is from constraints on processing

								      human readable documents. These documents are typically not

								      ripe for RDF conversion anyway.

								    </p>

								    <p>

								      Conclusion:

								    </p>

								    <p>

								      Sometimes it seems there is a set of people for whom the

								      semantic web is the only graph which they would consider, and

								      another for whom the document tree (or graph if you include

								      links) is all they would consider. But it is important to

								      recognise the difference.

								    </p>

								    <hr />

								    <p>

								      In this series:

								    </p>

								    <ul>

								      <li>

								        <a href="RDFnot.html"><i>What the Semantic Web is

								        not</i></a> - answering some FAQs of the unconvinced.

								      </li>

								      <li>

								        <a href="Evolution.html">Evolvability</a>: properties of

								        the language for evolution of the technology

								      </li>

								      <li>

								        <a href="Architecture.html">Web Architecture from 50,000

								        feet</a>

								      </li>

								    </ul>

								    <h2>

								      Not put in yet:

								    </h2>

								    <p>

								      <i>.@@@ RDF does not have to be serialized in XML but ...</i>

								    </p>

								    <hr />

								    <p>

								      <a href="Overview.html">Up to Design Issues</a>

								    </p>

								  </body>

								</html>