server_playground/doc/www.w3.org/DesignIssues/Diff


								<?xml version="1.0" encoding="iso-8859-1"?>

								<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">

								  <head>

								    <meta name="generator" content=

								    "HTML Tidy for Mac OS X (vers 31 October 2006 - Apple Inc. build 13), see www.w3.org" />

								    <title>

								      RDF Diff, Patch, Update, and Sync -- Design Issues

								    </title>

								    <style type="text/css">

								/*<![CDATA[*/


								    .definition {text-align: right}

								    .proposition {text-align: right}

								    /*]]>*/

								    </style>

								    <link rel="Stylesheet" href="di.css" type="text/css" />

								    <link rel="Stylesheet" href="lncs04/article.css" type=

								    "text/css" />

								  </head>

								  <body xml:lang="en" lang="en">

								    <div class="online">

								      <a href="./">Up to Design Issues</a>

								    </div>

								    <div class="maketitle">

								      <h1 class="title">

								        Delta: an ontology for the distribution of differences

								        between RDF graphs

								      </h1>

								      <address>

								        <a rel="author" href=

								        "http://www.w3.org/People/Berners-Lee/">Tim Berners-Lee</a>

								        and <a rel="author" href=

								        "http://www.w3.org/People/Connolly/">Dan Connolly</a>,

								        <a rel="institute" href="http://www.csail.mit.edu/">MIT

								        Computer Science and Artificial Intelligence Laboratory

								        (CSAIL)</a><br />

								        <span class="thanks">This work is supported in part by

								        funding from US Defense Advanced Research Projects Agency

								        (DARPA) and Air Force Research Laboratory, Air Force

								        Materiel Command, USAF, under agreement number

								        F30602-00-2-0593, <q>Semantic Web

								        Development</q>.</span><br />

								        <span class="online">Created: 2001, current: $Revision:

								        1.114 $ of <!--linebreak-->

								         $Date: 2009/08/27 21:38:06 $</span><br />

								        <span class="online">Status: personal view only. Editing

								        status: rough. 2004/03: Extended to add pointers to

								        implementations, and details of actual language used. see

								        also: <a href=

								        "http://lists.w3.org/Archives/Team/sw-team/2004Jul/0008">comments

								        from reviewers</a></span>

								      </address>

								      <p class="online">

								        Keywords: RDF, Difference, patch, remote update,

								        synchronization, graph comparison.

								      </p>

								    </div>

								    <hr />

								    <div class="abstract">

								      <h4>

								        Abstract

								      </h4>

								      <p>

								        The problem of updating and synchronizing data in the

								        Semantic Web motivates an analog to text diffs for RDF

								        graphs. This paper discusses the problem of comparing two

								        RDF graphs, generating a set of differences, and updating a

								        graph from a set of differences. It discusses two forms of

								        difference information, the context-sensitive <q>weak</q>

								        patch, and the context-free <q>strong</q> patch. It gives a

								        proposed <strong>update ontology</strong> for patch files

								        for RDF, and discusses experience with proof of concept

								        code.

								      </p>

								    </div>

								    <h2>

								      Introduction

								    </h2>

								    <p>

								      The use of text files to record programs, documents, and

								      other artifacts is supported by version control systems such

								      as RCS<a href="#Tich85">[Tich85]</a> and CVS<a href=

								      "#Ber90">[Ber90]</a> that are based on the ability to compute

								      the difference between two text files and represent it as

								      diff<a href="#Mill85">[Mill85]</a>, i.e. a set of editing

								      instructions. The use of database tables to record bank

								      accounts and records of all sorts is supported by the

								      relational calculus<a href="#Codd70">[Codd70]</a> and its

								      expression as SQL statements. In both cases, the data goes

								      thru a sequence of states; not only are the states

								      represented explicitly (as text files or database tables) but

								      also the transitions from one state to the other can be

								      represented explicitly (either as editing instructions or SQL

								      insert/update statements). Difference (<samp>\Delta</samp>)

								      and sum (<samp>\Sigma</samp>) functions are ubiquitous in

								      computing and, like differentiation and integration, are

								      inverse in the sense that:

								    </p>

								    <p class="eqn-display">

								      v1 = <samp>\Sigma</samp>(v0, <samp>\Delta</samp>(v0, v1))

								    </p>

								    <p>

								      Since the transitions can be represented much more compactly

								      than the pairs of states, and the sigma function is

								      straightforward to compute, the deltas are useful for

								      efficiently updating data distributed among two or more

								      peers.

								    </p>

								    <p>

								      We are developing a Semantic Web Application Platform

								      (<a href="/2000/10/swap/">SWAP</a>) including tools and

								      applications to manipulate RDF graphs much like traditional

								      tools manipulate text files. It includes <code>cwm</code>, a

								      command-line tool for processing RDF in both the standard XML

								      encoding<a href="#RDF04">[RDF04]</a> and an experimental

								      encoding, Notation3 (n3)<a href="#Ber03">[Ber03]</a>.

								    </p>

								    <p>

								      As we build the Semantic Web, using RDF graphs<a href=

								      "#RDFC04">[RDFC04]</a> to represent data such as

								      bibliographies<a href="#DC02">[DC02]</a>, syndication

								      summaries<a href="#RSS">[RSS]</a> and medical

								      terminology<a href="#Gol03">[Gol03]</a>, we see a need for

								      difference and sum functions for RDF graphs. The use of RDF

								      to represent test results<a href="#EARL">[EARL]</a>,<a href=

								      "#OWLT">[OWLT]</a> motivates better ways to compare the

								      actual results of software tests with the intended results

								      and isolate the differences.

								    </p>

								    <h3>

								      <a name="Synchroniz" id="Synchroniz">The Synchronization

								      Problem</a>

								    </h3>

								    <p>

								      One of the most stubborn problems in practical computing is

								      that of synchronizing calendars and address books between

								      different devices. Various combinations of device and

								      program, from the same or different manufacturers, produce

								      very strange results on not-so-rare occasions.

								    </p>

								    <p>

								      The problem has three parts. There is the syntactic problem

								      of extracting the data from the strange device or its storage

								      medium and turning into something manageable, such as RDF.

								      There is the semantic problem of understanding what the

								      fields mean: can one have two home phone numbers? There is

								      the problem of actually synchronizing changes, particularly

								      in the general case that changes have been made on both

								      devices.

								    </p>

								    <p>

								      Because the direct syntactic conversion to RDF often leaves

								      something which has strained and awkward semantics, it is

								      often necessary or tempting to mix the semantic and syntactic

								      conversions. <span class="online">(See <a href=

								      "/2002/12/cal/" class="online">RDF calendaring</a>

								      discussions.)</span> Because the merging of changes requires

								      more application knowledge than the bare RDF data provides,

								      it is tempting to mix the conversion and sync algorithm.

								      However, this mixing reduces the modularity and testability

								      of the resulting program. Perhaps if the three stages were

								      separated, then a more robust system, and one more extensible

								      by the addition of information in new ontologies, would

								      result.

								    </p>

								    <p>

								      In the semantic web architecture, the application constraints

								      on the data can be represented in the ontology, and so can be

								      used by a a generic synchronization system.

								    </p>

								    <p>

								      On the one hand, the syntactic problems are straightforward,

								      if tedious, and the much harder semantic problems may explain

								      why many existing synchronization packages break down. But on

								      the other hand, perhaps it is the combination of the two that

								      result in so many failures; perhaps software that separates

								      the problems, treating synchronization generically, will be

								      more robust. We hope this work contributes to further work on

								      specifications such as SyncML<a href="#Sync02">[Sync02]</a>.

								    </p>

								    <p>

								      And while in the general case, concurrent changes may be

								      completely irreconcilable, the diff mechanisms discussed here

								      solve an interesting part of the problem space.

								    </p>

								    <h3>

								      Problems with the line-oriented approach

								    </h3>

								    <p>

								      RDF graphs can be serialized and used with traditional

								      line-oriented tools. In the general case, with no constraints

								      on how the graphs are serialized, line-oriented deltas can be

								      as large as the data itself, even between files representing

								      the same graph. However, when files are edited by hand, small

								      changes to the data naturally result in small textual diffs.

								      But since the difference is expressed as the difference

								      between two text files, not the difference between two

								      graphs, the delta is dependent on the graph serialization.

								      It's not enough to have the original graph to use the delta;

								      one needs a copy of the particular serialization.

								    </p>

								    <p>

								      Pretty-printing algorithms reduce the large number of

								      possible serializations of an RDF graph to a few actual

								      serializations. The difference engine<a href=

								      "#Kly04">[Kly04]</a> produces human-readable difference

								      descriptions using an algorithm analogous to comparing

								      pretty-printed graphs; its descriptions are not sufficient to

								      reconstruct one graph from the other, however.

								    </p>

								    <p>

								      We find it practical to use CVS to manage both hand-edited

								      and machine-serialized RDF data files in many cases. A

								      notable exception is the reference results for tests:

								      comparison of experimental test results versus reference

								      results yield many false test failures every time we change

								      the pretty-printing algorithm in the slightest. The cost of

								      managing the reference results this way is barely tolerable.

								    </p>

								    <p>

								      The straightforward pretty-printing algorithm works in the

								      obvious way when all the nodes are named (either with URIs or

								      literals): triples are sorted by subject, and those that

								      share a subject are grouped together. Notation3 has syntax

								      for grouping triples that shared predicates. Unlabeled nodes

								      (<em>blank nodes</em> or <em>bnodes</em>) that have no

								      incoming triples are treated like named subjects. Bnodes that

								      have one incoming link serve as internal nodes in the

								      pretty-printing tree. Bnodes that have more than one incoming

								      triple are given arbitrary labels for the purpose of

								      serialization and are hence treated like named subjects. For

								      example, the triples

								    </p>

								    <pre class="example">

								:Bob :pet _:p.

								_:p :size "small".

								:Bob :brother :Pete.

								_:p :mother _:p2.

								:Pete :pet _:p2.

								</pre>

								    <p>

								      are pretty-printed as

								    </p>

								    <pre class="example">

								    :Bob     :brother :Pete;

								         :pet  [

								             :mother _:g0;

								             :size "small" ] .


								    :Pete     :pet _:g0 .

								</pre>

								    <p>

								      The ordering and the identification of bnodes are the two

								      ways which serializations of the same graph can arbitrarily

								      differ. <code>Cwm</code> not only attempts to find a

								      serialization which minimizes the number of arbitrarily named

								      nodes but often happens to regenerate arbitrary names

								      consistently across runs. Even so, diffs of pretty-printed

								      RDF are still unsatisfactory, since changes as small as one

								      triple can lead to arbitrarily large textual diffs if that

								      triple changes the set of bnodes that need arbitrary labels.

								    </p>

								    <p>

								      To completely eliminate the arbitrary choices in how to

								      serialize an RDF graph, we could employ a canonicalization

								      algorithm such as the one<a href="#Car03s">[Car03s]</a> in

								      Jena<a href="#Car03">[Car03]</a>, or <a href=

								      "/2000/10/swap/cant.py">cant.py</a> from our own SWAP

								      toolkit. One problem with this approach is that the canonical

								      form is expressed in the N-Triples<a href=

								      "#RDFT04">[RDFT04]</a> representation. Deltas between

								      N-Triples files are verbose and tedious to read for most

								      practical graphs. Further, the problem of large textual diffs

								      resulting from small changes remains: these canonicalization

								      algorithms work by computing a signature for each blank node

								      based on nearby triples and sorting the results; adding or

								      removing one triple near a blank node will change its

								      signature and hence potentially the labeling of many bnodes.

								    </p>

								    <h2>

								      Goals: Economy and Robustness

								    </h2>

								    <p>

								      SQL statements and text file diffs are attractive because

								      they succinctly represent the difference between two states.

								      If the difference between two text files were not much

								      smaller than either of the text files, it would be of little

								      use. The essential feature of a difference algorithm, then,

								      is <em>economy</em>: small differences between input states

								      should result in small deltas.

								    </p>

								    <p>

								      Much of the popularity of CVS is due to its support of

								      concurrent development. It makes a patch file<a href=

								      "#Wall">[Wall]</a> representing the changes each party has

								      made. These changes are made, in order, to the repository

								      file to generate new versions. In the event that two agents

								      take a copy of the same version <samp>v0</samp> and make

								      different changes to it (<samp>v1a</samp> and

								      <samp>v1b</samp>), the party that commits last attempts to

								      make <samp>v1</samp> which incorporates both diffs:

								    </p>

								    <p class="eqn-display">

								      v1 = <samp>\Sigma</samp>(<samp>\Sigma</samp>(v0,

								      <samp>\Delta</samp>(v0, v1a)), <samp>\Delta</samp>(v0, v1b))

								    </p>

								    <p>

								      Note that <samp>\Delta(v0, v1b)</samp> is applied to

								      something other than v0. The context diff and unidiff formats

								      are sufficiently robust that it does work in most practical

								      cases. When it does not work, then the user is left with the

								      problem of manually reconciling the conflicts. This happens

								      when, for example, one party moves the date of a meeting at

								      the same time as someone else moves or deletes the meeting.

								      It may be that the criterion that a problem needs human

								      involvement is very application-dependent.

								    </p>

								    <p>

								      There are thee failure modes:

								    </p>

								    <ol>

								      <li>Inconsistent changes were made. This failure mode is not

								      automatically soluble.

								      </li>

								      <li>The patch was incapable of finding the appropriate points

								      in v1a at which to make the change <samp>\Delta</samp>(v0,

								      v1b). This form of failure we can eliminate for certain RDF

								      graph deltas.

								      </li>

								      <li>The patch was misapplied: the context was used to

								      determine points at which to make the change, but the wrong

								      point was used, and erroneous data resulted. This is

								      unacceptable.

								      </li>

								    </ol>

								    <p>

								      A <em>robust</em> patch is one which may be applied so a file

								      different to the one it was originally generated from,

								      without being misapplied and hence generating erroneous

								      information. In the line oriented tools, the <em>patch</em>

								      program was introduced to be more robust than simply applying

								      the patch as a series of editor commands.

								    </p>

								    <h2>

								      Delta and Sigma for RDF Graphs

								    </h2>

								    <p>

								      An RDF graph is a set of (subject, predicate, object)

								      triples, i.e. a set of typed links between nodes. Each node

								      may or may not be named (either by a URI or a literal). As a

								      measure of the size of the difference between two RDF graphs

								      <samp>G1</samp> and <samp>G2</samp>, one can use the sum of

								      the size of the set differences <samp>|G1-G3|</samp> and

								      <samp>|G2-G3|</samp> where <samp>G3</samp> is the largest

								      common subgraph of <samp>G1</samp> and of <samp>G2</samp>.

								    </p>

								    <h3>

								      Computing differences between RDF graphs

								    </h3>

								    <p>

								      In the case in which all the nodes are named, computing the

								      difference between two graphs is simple and straightforward:

								    </p>

								    <p class="definition">

								      If <samp>G1</samp> and <samp>G2</samp> are ground RDF graphs,

								      then the <em>ground graph delta</em> of <samp>G1</samp> and

								      <samp>G2</samp> is a pair <samp>(insertions,

								      deletions)</samp> where <samp>insertions</samp> is the set

								      difference <samp>G2-G1</samp> and <samp>deletions</samp> is

								      <samp>G1-G2</samp>.

								    </p>

								    <p>

								      This form of delta is reasonably economical: the storage cost

								      is linear in the size of the difference between the graphs.

								      Straightforward extensions with slightly improved economy

								      might be more specific in expressing differences in which

								      only one or two parts of the triple have changed.

								    </p>

								    <p>

								      It is also completely robust. Each statement is independent,

								      with no variables: there is no cause for ambiguity. The

								      deletion statements may be deleted from, and the insertion

								      statements added to, any graph.

								    </p>

								    <p>

								      In the case where not all of the nodes are named, finding the

								      largest common subgraph becomes a case of the graph

								      isomorphism problem. The arc labels do have names (in a very

								      large set of practical cases, including all those which can

								      be serialized as RDF/XML). Graph isomorphism is in fact a

								      class of difficult problem that cannot be solved in

								      polynomial time but which has not been shown to be NP

								      complete<a href="#Kob93">[Kob93]</a>. While the general graph

								      isomorphism problem has readily available solutions<a href=

								      "#Ski97">[Ski97]</a><a href="#Ski01">[Ski01]</a>, they do not

								      seem to be a good match for the practical cases of RDF graph

								      diff.

								    </p>

								    <p>

								      There is an interesting subset of real cases in which there

								      are a mixture of named and unnamed nodes, but none of the

								      unnamed nodes is very far from a named node. In this case,

								      the unnamed nodes can be indirectly identified by giving a

								      path from a named node. The difference is then expressed by

								      giving this local context and the related changes.

								    </p>

								    <h3>

								      A patch file format for RDF deltas

								    </h3>

								    <p>

								      By analogy to the text diff, there is a need not only for a

								      difference-finding algorithm, but for a patch file format.

								      Such a format needs:

								    </p>

								    <ul>

								      <li>a way to uniquely identify what is changing

								      </li>

								      <li>a way to distinguish between the pieces added and those

								      subtracted

								      </li>

								    </ul>

								    <p>

								      It is straightforward to pinpoint the parts of the graph that

								      have changed when all nodes are named, but less so in the

								      presence of anonymous nodes.

								    </p>

								    <p>

								      To identify what is changing, we use Notation3 expressions

								      for quoted RDF graphs with schema variables, and we introduce

								      three new terms. For example:

								    </p>

								    <pre class="example">

								@prefix diff: &lt;http://www.w3.org/2004/delta#&gt;.

								{ ?x  bank:accountNo "1234578"; bank:balance 4000}

								 diff:replacement

								{ ?x  bank:accountNo "1234578"; bank:balance 3575}.

								</pre>

								    <p>

								      This one new property <code>replacement</code> can express

								      any change. Deletions can be written <code>{...}

								      diff:replacement {}</code> and additions can be written

								      <code>{} diff:replacement {...}</code>.

								    </p>

								    <p>

								      The second alternative is very similar but involves two

								      properties, one for inserting and one for deleting:

								    </p>

								    <pre class="example">

								{ ?x  bank:accountNo "1234578"}

								  diff:deletion  { ?x  bank:balance 4000};

								  diff:insertion { ?x  bank:balance 3575}.

								</pre>

								    <p>

								      The form using <code>diff:insertion</code> and

								      <code>diff:deletion</code> is implemented in <a href=

								      "/2000/10/swap/doc/cwm">cwm</a>.

								    </p>

								    <p>

								      The first and second form are related by

								    </p>

								    <pre class="definition">

								{ ?F replacement ?G }    &lt;=&gt;  { ?F deletion ?F; insertion ?G }

								</pre>

								    <h3>

								      Weak and Strong diffs

								    </h3>

								    <p>

								      To address robustness, we distinguish two types of RDF graph

								      deltas: a <em>weak</em> delta gives enough information to

								      apply it to exactly the graph it was computed from, but a

								      <em>strong</em> delta specifies the changes in a

								      context-independent manner. The difference is not in the

								      patch file format, but in the information a particular patch

								      gives.

								    </p>

								    <p>

								      Returning to the bank example, if bank account numbers are

								      globally unique, then the replacement pattern will bind ?x to

								      a node identifying a particular bank account. In OWL<a href=

								      "#OWL">[OWL]</a> terms, if <code>bank:accountNumber</code> is

								      an <code>owl:InverseFunctionalProperty</code>, then the node

								      must be the <code>owl:sameAs</code> any other node with the

								      same account number. In that case, the patch will be strong.

								    </p>

								    <p>

								      If, however, many accounts can have the same number, applying

								      that patch to another knowledge base may inadvertently alter

								      the wrong account. The patch would be weak.

								    </p>

								    <p>

								      In normal information processing, of course, numbers such as

								      bank account numbers are used to avoid this confusion.

								      Consider those graphs in which every blank node is in fact

								      unambiguously identified by one functional or inverse

								      functional property. Further, that property is invariant

								      under any changes represented by the deltas.

								    </p>

								    <p>

								      The pattern for terms goes as follows:

								    </p>

								    <p class="definition">

								      Given a background ontology <samp>W</samp> and a graph

								      <samp>G</samp>, if a blank node <samp>b</samp> in

								      <samp>G</samp> is the object of a triple whose subject

								      <samp>v</samp> is <em>functionally ground</em> and whose

								      predicate <samp>p</samp> is an

								      <code>owl:FunctionalProperty</code> according to

								      <samp>W</samp>, then <samp>v.p</samp> is a <em>functional

								      term label</em> for <samp>b</samp> in <samp>G</samp> with

								      respect to <samp>W</samp>. Likewise, <samp>v\uparrow q</samp>

								      is a functional term label for <samp>b</samp> if

								      <samp>q</samp> is an

								      <code>owl:InverseFunctionalProperty</code>, b is the subject,

								      and v is the object. Recursively, v is functionally ground if

								      it is a name (URI or literal) or a bnode with a functional

								      term label.

								    </p>

								    <p>

								      Then we can rewrite certain graphs:

								    </p>

								    <p class="definition">

								      With respect to a background ontology <samp>W</samp>, a graph

								      <samp>G</samp> is <em>fully labeled</em> iff every node in

								      <samp>G</samp> is functionally ground. A <em>functional RDF

								      graph</em> is a set of triples whose terms are URIs,

								      literals, or functional terms. A functional RDF graph

								      <samp>F</samp> is a <em>functional analog</em> of an RDF

								      graph <samp>G</samp> iff <samp>G</samp> is fully labeled and

								      <samp>F</samp> can be obtained from <samp>G</samp> by

								      replacing each bnode b in <samp>G</samp> with a functional

								      term label for b.

								    </p>

								    <p>

								      The diffs of functional RDF graphs are just as simple to make

								      as ground RDF deltas:

								    </p>

								    <p class="definition">

								      Given a background ontology <samp>W</samp>, a <em>strong</em>

								      delta between fully labeled graphs <samp>G1</samp> and

								      <samp>G2</samp> is a pair <samp>(insertions,

								      deletions)</samp> where <samp>insertions</samp> is the set

								      difference <samp>F2-F1</samp>, deletions is

								      <samp>F1-F2</samp>, and <samp>F1</samp> and <samp>F2</samp>

								      are functional analogs of <samp>G1</samp> and <samp>G2</samp>

								      respectively.

								    </p>

								    <p class="online">

								      (@@need to define sigma for strong deltas?) It is actually

								      the same as for any delta: horn match and delete or insert.

								    </p>

								    <p>

								      A strong delta is like a context diff that cannot be

								      mis-applied.

								    </p>

								    <p class="proposition">

								      If <samp>D</samp> is a strong delta between fully labeled

								      graphs <samp>k1</samp> and <samp>k2</samp>, and

								      <samp>k3</samp> is a subset of <samp>k1</samp>, then

								      <samp>\Sigma(k3, D)</samp> is consistent with

								      <samp>k2</samp>. <span class="online">@@TODO: proof</span>

								    </p>

								    <p>

								      One advantage of a strong patch is, then, that one can take a

								      patch from any true knowledge base change and apply it to a

								      subset knowledge base, and the result will be true. For

								      example, if changes to a knowledge base are represented by a

								      sequence of strong diffs, one can subscribe to the diffs from

								      any given point on, and acquire a subset of the final

								      knowledge base.

								    </p>

								    <p>

								      As a practical matter, achieving fully labeled graphs

								      requires care in building and using the ontology. As a

								      supplement to the good practice of using URIs to distributing

								      data, it is useful to identify things indirectly by using

								      terms with published ontologies that say whether they are

								      many-many, many-1, 1-many or 1-1. The <a href=

								      "/2000/10/swap/diff.py">diff.py</a> program from <a href=

								      "/2000/10/swap/Overview.html">SWAP</a> will generate a strong

								      diff between two files, provided it can find sufficient

								      information in the Web to fully label the input graphs.

								    </p>

								    <p class="online">

								      We note in passing that the ontologies we used all involved

								      inverse functional datatype properties, which are OWL/Full

								      but not OWL/DL.

								    </p>

								    <h2>

								      Application to Update and Sync

								    </h2>

								    <p>

								      Though we have made small scale tests, we are interested in

								      pursuing strong diffs, and suspect they will be are useful in

								      a variety of applications.

								    </p>

								    <h3>

								      Peer-peer update and sync

								    </h3>

								    <p>

								      The algorithm for synchronizing two databases can be

								      straightforwardly generalized to N. In a decentralized

								      peer-peer network such as Network News Transfer

								      Protocol<a href="#NNTP">[NNTP]</a> (or many others), messages

								      are timestamped and distributed eventually to every party,

								      though a message may be received by different parties at

								      different times. When the network is reliable, there may be a

								      well-defined maximum delivery time.

								    </p>

								    <p>

								      A crude algorithm is to apply the patches in order of the

								      time-stamp. If a message arrives with a timestamp preceding

								      the recent ones already taken into account, they are unwound

								      so that the new version can be built in the proper order. A

								      patch which fails (as in a CVS conflict) is rejected. In the

								      case of RDF graphs, failure can be a pre-agreed form of

								      consistency, such as (for example) OWL-DL consistency. The

								      sender of the failed patch will realize this as they will be

								      running the same algorithm on the same patches, and will have

								      to take recovery action.

								    </p>

								    <p>

								      A new version can be given a version id by hashing the

								      version id of its predecessor with the message id of the

								      patch used to make the new from the old. The community can

								      refer to versions by these ids, and if they want to refer to

								      a commonly held document, then one only has to wait for the

								      maximum delivery time to know that everyone in the community

								      will know the value of the knowledge base for that version.

								      Even without waiting, anyone who knows of a version with that

								      ID will know they have the same contents.

								    </p>

								    <h3>

								      Patches as knowledge

								    </h3>

								    <p>

								      The idea of the strong patch file format is interesting

								      because a patch is a little bit of knowledge. A patch for

								      example that where my phone number was 1234 it should now be

								      5678, when in the context in which it is known to be a change

								      to a valid knowledge base between one week and the next,

								      indicates that my phone number has actually changed. One

								      might conclude, say, that I moved or changed jobs. A strong

								      patch has meaning in itself, and distributing and filtering

								      these becomes an interesting way of processing knowledge. In

								      some areas (like houses for sale) it is the new changed

								      information which is of most interest, and in some areas

								      (like currency rates) if you listen to a stream of changes

								      you will in fact accumulate a working knowledge of the area.

								    </p>

								    <h3>

								      Patches as news

								    </h3>

								    <p>

								      From the historical <em>NCSA Mosaic What's New</em> page to

								      the current syndication of RSS streams <a href=

								      "#RSS">[RSS]</a>, the interest in news on (or off) the Web

								      demonstrates that there is great interest in changes to the

								      status quo. We speculate that this will also be the case on

								      the Semantic Web. When the state is represented in RDF, then

								      RDF diffs represent news. The W3C Technical Reports list is

								      available as RDF, and the W3C RSS feed is partly,

								      effectively, a list of changes to the Tech Reports list. This

								      could be formalized by explicitly distributing RDF diffs.

								    </p>

								    <h2>

								      Future directions

								    </h2>

								    <p>

								      The algorithm developed to date produces difference files

								      only on graphs which are labeled directly with URIs or

								      indirectly with functional properties or inverse functional

								      properties.

								    </p>

								    <p>

								      It may be useful to extend the algorithm to cope with graphs

								      which are not completely labeled, but where the unlabeled

								      bits are the same in each graph, and so a strong diff can

								      still be produced. Another avenue would be to look at using

								      more than one property to label a node when one is not

								      sufficient.

								    </p>

								    <p>

								      Applications which do not need robustness can use weak

								      patches. The algorithm could be extended to do more of a

								      canonicalization-style signature-based match to optionally

								      give a weak diff where a strong diff cannot be given.

								    </p>

								    <p class="online">

								      In practice, while RDF fundamentally has a graph structure,

								      the graph is often used to encode ordered lists (RDF

								      collections). While lists are in fact represented by a

								      structure of <em>first</em> and <em>rest</em> links within

								      the graph, when serialized they are normally represented

								      directly as lists, and within software implementations they

								      may be stored specially. The representation of changes to

								      lists may merit a special syntax in the difference file, to

								      avoid a mess of <em>rdf:first</em> and <em>rest:rest</em>

								      statements. (@@DanC: first/rest are functional, so I don't

								      think this case mertis anything special.)

								    </p>

								    <p>

								      RDF does not contain the notion of an unordered set, though

								      one can with OWL create a class which has an enumerated set

								      of members. If the use of unordered sets becomes common,

								      which the authors suspect would be wise in the long run, then

								      a difference engine should be aware of such sets and be able

								      to express differences between them.

								    </p>

								    <p>

								      This application, like the rule language, demonstrates the

								      usefulness of the quoted formulae of n3. The authors believe

								      that many applications will need this ability to quote RDF

								      graphs within graphs. As n3 becomes a language of

								      communication, difference files will of course have to

								      express changes to nested formulae. As these are graphs, this

								      is basically a straightforward recursive use of the

								      difference system for single graphs. A simple though verbose

								      alternative is to reify the n3 before building differences.

								    </p>

								    <p>

								      With these extensions, the simple difference file format may

								      lose the elegance of its current simplicity. However, even

								      with these extensions, most data and ontologies shipped

								      around the web -- the bottom layers of the semantic web layer

								      cake -- will be plain RDF graphs and so have simple

								      difference files.

								    </p>

								    <p>

								      Clearly there are many algorithms which can be imagined for

								      efficiently generating deltas for RDF graphs. The ones

								      written are not particularly efficient, having being designed

								      as proof of concept.

								    </p>

								    <h2>

								      Conclusions

								    </h2>

								    <p>

								      There are many uses for technology of communicating

								      differences between graphs or changes to a graph. While in

								      general the generation of differences is basically a graph

								      isomorphism problem, in a wide set of practical cases, one

								      can efficiently generate a difference, or patch file.

								      So-called strong patch files are particularly interesting,

								      and open up a new series of applications based on the

								      syndication of change information. However, to be able to

								      generate them, one needs either a well-labeled graph, which

								      in turn needs an ontological knowledge of inverse functional

								      properties to allow nodes to be indirectly labeled. The patch

								      file format proposed is simple, being a new ontology of only

								      two (or three) new properties, and directly uses Notation3

								      syntax and semantics, which itself is a simple extension of

								      RDF. This format can be generated by all sorts of

								      difference-finding algorithms. It can be absorbed by any

								      system capable of matching RDF subgraphs. The patch file

								      ontology is a candidate for a future standard for remote

								      update of RDF data.

								    </p>

								    <div>

								      <h2>

								        References

								      </h2>

								      <p class="online">

								        see <a href="lncs04/Diffbib.bib">Diffbib.bib</a>

								      </p>

								      <dl class="bib">

								        <dt class="misc">

								          [<a name="RDF04" id="RDF04">RDF04</a>]

								        </dt>

								        <dd>

								          <span class="author">Beckett, D.</span> <cite><a href=

								          "http://www.w3.org/TR/2004/REC-rdf-syntax-grammar-20040210/">

								          RDF/XML Syntax Specification (Revised)</a></cite>

								          <span class="institution">W3C</span> <span class=

								          "type">Recommendation</span>, 10 <span class=

								          "month">February</span> <span class="year">2004</span>.

								          <p class="online">

								            <a href=

								            "http://www.w3.org/TR/rdf-syntax-grammar">Latest

								            version</a> available at

								            <code>http://www.w3.org/TR/rdf-syntax-grammar</code>

								          </p>

								        </dd>

								        <dt class="misc">

								          [<a name="DC02" id="DC02">DC02</a>]

								        </dt>

								        <dd>

								          <span class="author">Beckett, D. and Miller, E. and

								          Brickley, D.</span> <a href=

								          "http://dublincore.org/documents/2002/07/31/dcmes-xml/"><cite>

								          Expressing Simple Dublin Core in RDF/XML</cite></a>

								          <span class="institution">Dublin Core Metadata

								          Initiative</span> <span class=

								          "type">Recommendation</span> 31 <span class=

								          "month">July</span> <span class="year">2002</span>

								        </dd>

								        <dt class="misc">

								          [<a name="RSS" id="RSS">RSS</a>]

								        </dt>

								        <dd>

								          <span class="author">Beged-Dov, Gabe et. al.</span>

								          <cite><a href="http://web.resource.org/rss/1.0/">RDF Site

								          Summary (RSS) 1.0</a></cite> 6 <span class=

								          "month">December</span> <span class="year">2000</span>

								        </dd>

								        <dt class="inproceedings">

								          [<a name="Ber90" id="Ber90">Ber90</a>]

								        </dt>

								        <dd>

								          <span class="author">Berliner, Brian</span> <cite>CVS II:

								          Parallelizing Software Development</cite> <span class=

								          "booktitle"><a href="http://www.usenix.org/">USENIX</a>

								          Conference Proceedings</span> pp <span class=

								          "pages">341--352</span> <span class=

								          "month">January</span> 22-26, <span class=

								          "year">1990</span> <span class="address">Washington,

								          D.C.</span>

								          <p class="online">

								            <a href=

								            "http://www.hpcc.ecs.soton.ac.uk/hpci/tools/cvs/html/cvs-paper.html">

								            online copy</a>; <a href=

								            "http://cvsweb.xfree86.org/cvsweb/cvs/doc/cvs-paper.ms">

								            ms source</a>

								          </p>

								        </dd>

								        <dt class="misc">

								          [<a name="Ber03" id="Ber03">Ber03</a>]

								        </dt>

								        <dd>

								          <span class="author">Berners-Lee, Tim and Hawke, Sandro

								          and Connolly, Dan</span> <cite><a href=

								          "http://www.w3.org/2000/10/swap/doc/">Semantic Web

								          Tutorial Using N3</a></cite> <span class=

								          "howpublished">Twelfth International World Wide Web

								          Conference</span> <span class="address">Budapest,

								          Hungary</span> <span class="month">May</span>

								          <span class="year">2003</span>

								        </dd>

								        <dt class="techreport">

								          [<a name="Car03" id="Car03">Car03</a>]

								        </dt>

								        <dd>

								          <span class="author">Carroll, Jeremy J. and Dickinson,

								          Ian and Dollin, Chris and Reynolds, Dave and Seaborne,

								          Andy and Wilkinson, Kevin</span> <cite><a href=

								          "http://www.hpl.hp.com/techreports/2003/HPL-2003-146.html">

								          Jena: Implementing the Semantic Web

								          Recommendations</a></cite> <span class=

								          "institution">Hewlett-Packard</span> <span class=

								          "number">HPL-2003-146</span> <span class=

								          "month">Dec</span> <span class="year">2003</span>

								          <p class="online">

								            <a href=

								            "http://www.hpl.hp.com/semweb/jena.htm">Jena</a>

								            includes a graph diff program <code>rdfcompare</code>

								            in the <a href=

								            "http://jena.sourceforge.net/tools.html">command line

								            tools</a>.

								          </p>

								        </dd>

								        <dt class="TechReport">

								          [<a name="Car03s" id="Car03s">Car03s</a>]

								        </dt>

								        <dd>

								          <span class="author">Caroll, Jeremy J.</span>

								          <cite><a href=

								          "http://www.hpl.hp.com/techreports/2003/HPL-2003-142.html">

								          Signing RDF Graphs</a></cite> <span class=

								          "institution">Hewlett-Packard</span> <span class=

								          "number">HPL-2003-142</span> <span class=

								          "month">Jul</span> <span class="year">2003</span>

								        </dd>

								        <dt class="Article">

								          [<a name="Codd70" id="Codd70">Codd70</a>]

								        </dt>

								        <dd>

								          <span class="author">Codd, E. F.</span> <cite><a href=

								          "http://www.acm.org/classics/nov95/">A Relational Model

								          of Data for Large Shared Data Banks</a></cite>,

								          <span class="journal">Communications of the ACM</span>,

								          Vol. <span class="volume">13</span>, No. <span class=

								          "number">6</span>, <span class="month">June</span>

								          <span class="year">1970</span>, pp. <span class=

								          "pages">377--387</span>.

								        </dd>

								        <dt class="Article">

								          [<a name="Gol03" id="Gol03">Gol03</a>]

								        </dt>

								        <dd>

								          <span class="author">Golbeck, Jennifer and Fragoso,

								          Gilberto and Hartel, Frank and Hendler, James and Parsia,

								          Bijan and Oberthaler, Jim</span> <cite><a href=

								          "http://www.mindswap.org/papers/WebSemantics-NCI.pdf">The

								          national cancer institute's thesaurus and

								          ontology</a></cite>. <span class="journal">Journal of Web

								          Semantics</span>, <span class=

								          "volume">1</span>(<span class="number">1</span>),

								          <span class="month">Dec</span> <span class=

								          "year">2003</span>.

								        </dd>

								        <dt class="misc">

								          [<a name="RDFT04" id="RDFT04">RDFT04</a>]

								        </dt>

								        <dd>

								          <span class="author">Grant, J. and Beckett, D.</span>

								          <cite><a href=

								          "http://www.w3.org/TR/2004/REC-rdf-testcases-20040210/">RDF

								          Test Cases</a></cite>, <span class=

								          "institution">W3C</span> <span class=

								          "type">Recommendation</span>, 10 <span class=

								          "month">February</span> <span class="year">2004</span>.

								          <p class="online">

								            <a href="http://www.w3.org/TR/rdf-testcases">Latest

								            version</a> available at

								            <tt>http://www.w3.org/TR/rdf-testcases</tt>

								          </p>

								        </dd>

								        <dt class="misc">

								          [<a name="RDFC04" id="RDFC04">RDFC04</a>]

								        </dt>

								        <dd>

								          <span class="author">Klyne, G. and Carroll, J. J.</span>

								          <cite><a href=

								          "http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/">Resource

								          Description Framework (RDF): Concepts and Abstract

								          Syntax</a></cite>, <span class="institution">W3C</span>

								          <span class="type">Recommendation</span>, 10 <span class=

								          "month">February</span> <span class="year">2004</span>.

								          <p class="online">

								            <a href="http://www.w3.org/TR/rdf-concepts/">Latest

								            version</a> available at

								            <code>http://www.w3.org/TR/rdf-concepts/</code>

								          </p>

								        </dd>

								        <dt class="TechReport">

								          [<a name="NNTP" id="NNTP">NNTP</a>]

								        </dt>

								        <dd>

								          <span class="author">Kantor, Brian and Lapsley,

								          Phil</span> <cite><a href=

								          "http://www.ietf.org/rfc/rfc977">Network News Transfer

								          Protocol</a></cite> <span class="institution">IETF</span>

								          <span class="number">RFC977</span> <span class=

								          "month">February</span> <span class="year">1986</span>

								        </dd>

								        <dt class="misc">

								          [<a name="Kly04" id="Kly04">Kly04</a>]

								        </dt>

								        <dd>

								          <span class="author">Klyne, Graham</span> <cite><a href=

								          "http://www.ninebynine.org/RDFNotes/Swish/Intro.html">Semantic

								          Web Inference Scripting in Haskell</a></cite>

								          <span class="month">Feb</span> <span class=

								          "year">2004</span>

								          <p class="online">

								            see esp. section <a href=

								            "http://www.ninebynine.org/RDFNotes/Swish/Intro.html#GraphDiff">

								            Comparing graphs</a>

								          </p>

								        </dd>

								        <dt class="Book">

								          [<a name="Kob93" id="Kob93">Kob93</a>]

								        </dt>

								        <dd>

								          <span class="author">Johannes K<span title=

								          "\&quot;o">&ouml;</span>bler and Uwe Sch<span title=

								          "\&quot;o">&ouml;</span>ning and Jacobo Tor<span title=

								          "\'a">&aacute;</span>n</span> <cite><a href=

								          "http://www.birkhauser.com/cgi-win/ISBN/0-8176-3680-3">The

								          Graph Isomorphism Problem: Its Structural

								          Complexity</a></cite> <span class="series">Progress in

								          Theoretical Computer Science</span>. <span class=

								          "publisher">Birkh<span title=

								          "\&quot;a">&auml;</span>user</span>, <span class=

								          "address">Boston, MA</span>, (<span class=

								          "year">1993</span>).

								          <p class="online">

								            <a href=

								            "http://www.informatik.hu-berlin.de/Institut/struktur/algorithmenII/Buecher/GI/">

								            preface, TOC, etc.</a>. cited in <a href=

								            "http://www.math.tu-berlin.de/~schwartz/papers/KaibelSchwartz2002.references.bib">

								            KaibelSchwartz2002.references.bib</a>

								          </p>

								        </dd>

								        <dt class="Article">

								          [<a name="Mill85" id="Mill85">Mill85</a>]

								        </dt>

								        <dd>

								          <span class="author">Miller, Webb and Myers, Eugene

								          W.</span> <cite>A File Comparison Program</cite>

								          <span class="journal">Software---Practice and

								          Experience</span>, <span class=

								          "volume">15</span>(<span class="number">11</span>), pp.

								          <span class="pages">1025--1040</span>, <span class=

								          "month">November</span> <span class="year">1985</span>.

								          <p class="online">

								            <a href=

								            "http://liinwww.ira.uka.de/cgi-bin/bibshow?e=TF0tqf/fyqboefe%7d789658&amp;r=bibtex&amp;mode=intra">

								            bib</a>

								          </p>

								        </dd>

								        <dt class="Book">

								          [<a name="Ski97" id="Ski97">Ski97</a>]

								        </dt>

								        <dd>

								          <span class="author">Skiena, Steve</span> <cite>The

								          Algorithm Design Manual</cite> <span class=

								          "publisher"><a href="http://www.telospub.com/">Telos

								          Pr</a></span> <span class="address">New York</span>

								          <span class="year">1997</span>

								        </dd>

								        <dt class="incollection">

								          [<a name="Ski01" id="Ski01">Ski01</a>]

								        </dt>

								        <dd>

								          <span class="author"><a href=

								          "http://www.cs.sunysb.edu/~skiena/">Skiena,

								          Steve</a></span> <span class="chapter">1.5.9</span>

								          <cite><a href=

								          "http://www.cs.sunysb.edu/~algorith/files/graph-isomorphism.shtml">

								          Graph Isomorphism</a></cite> in the <span class=

								          "booktitle"><a href=

								          "http://www.cs.sunysb.edu/~algorith/index.html">Stony

								          Brook Algorithm Repository</a></span> <span class=

								          "publisher">Stony Brook University</span> <span class=

								          "year">2001</span>

								          <p class="online">

								            with reference to <a href=

								            "http://www.cs.sunysb.edu/~algorith/implement/gmt/implement.shtml">

								            GMT - Graph Matching Toolkit</a>

								          </p>

								        </dd>

								        <dt class="Article">

								          [<a name="Tich85" id="Tich85">Tich85</a>]

								        </dt>

								        <dd>

								          <span class="author">Tichy, W.</span> <a href=

								          "http://portal.acm.org/citation.cfm?id=4202&amp;dl=ACM&amp;coll=GUIDE">

								          <cite>RCS--a system for version control</cite></a>

								          <span class="journal">Software Practice <span class=

								          "amp">&amp;</span> Experience</span> Volume <span class=

								          "volume">15</span> , Issue <span class="number">7</span>

								          (<span class="month">July</span> <span class=

								          "year">1985</span>) Pages: <span class=

								          "pages">637--654</span>

								        </dd>

								        <dt class="misc">

								          [<a name="Sync02" id="Sync02">Sync02</a>]

								        </dt>

								        <dd>

								          <cite><a href=

								          "http://www.openmobilealliance.org/tech/affiliates/syncml/syncmlindex.html">

								          SyncML Specifications, Version 1.1</a></cite>

								          <span class="month">Feb</span> <span class=

								          "year">2002</span> <span class="publisher"><a href=

								          "http://www.openmobilealliance.org/">Open Mobile Alliance

								          (OMA)</a></span>

								        </dd>

								        <dt class="misc">

								          [<a name="Wall" id="Wall">Wall</a>]

								        </dt>

								        <dd>

								          <span class="author">Wall, Larry et. al.</span>

								          <cite><a href=

								          "http://www.gnu.org/software/patch/patch.html">patch</a></cite>

								          <span class="publisher">Free Software Foundation</span>

								          27 <span class="month">Jun</span> <span class=

								          "year">2000</span>

								        </dd>

								        <dt class="misc">

								          [<a name="EARL" id="EARL">EARL</a>]

								        </dt>

								        <dd>

								          <span class="author">Chisholm, W. and Palmer, S.

								          B.</span> Editors: <cite><a href=

								          "http://www.w3.org/TR/2002/WD-EARL10-20021206/">Evaluation

								          and Report Language (EARL) 1.0</a></cite> <span class=

								          "institution">W3C</span> <span class="type">Working

								          Draft</span> (work in progress), 6 <span class=

								          "month">December</span> <span class="year">2002</span>

								          <p class="online">

								            <a href="http://www.w3.org/TR/EARL10/">Latest

								            version</a> available at http://www.w3.org/TR/EARL10/

								          </p>

								        </dd>

								        <dt class="misc">

								          [<a name="OWLT" id="OWLT">OWLT</a>]

								        </dt>

								        <dd>

								          <span class="author">Carroll, J. J. and De Roo, J.</span>

								          Editors: <cite><a href=

								          "http://www.w3.org/TR/2004/REC-owl-test-20040210/">OWL

								          Web Ontology Language Test Cases</a></cite> <span class=

								          "institution">W3C</span> <span class=

								          "type">Recommendation</span> , 10 <span class=

								          "month">February</span> <span class="year">2004</span>.

								          <p class="online">

								            <a href="http://www.w3.org/TR/owl-test/">Latest

								            version</a> available at http://www.w3.org/TR/owl-test/

								          </p>

								        </dd>

								        <dt class="misc">

								          [<a name="OWL" id="OWL">OWL</a>]

								        </dt>

								        <dd>

								          <span class="author">Schreiber, G. and Dean, M.</span>

								          Editors: <cite><a href=

								          "http://www.w3.org/TR/2004/REC-owl-ref-20040210/">OWL Web

								          Ontology Language Reference</a></cite> <span class=

								          "institution">W3C</span> <span class=

								          "type">Recommendation</span> , 10 <span class=

								          "month">February</span> <span class="year">2004</span>.

								          <p class="online">

								            <a href="http://www.w3.org/TR/owl-ref/">Latest

								            version</a> available at http://www.w3.org/TR/owl-ref/

								          </p>

								        </dd>

								      </dl>

								    </div>

								    <hr />

								    <div class="online">

								      <a href="Overview.html">Up to Design Issues</a>

								    </div>

								  </body>

								</html>