Another abandoned server code base... this is kind of an ancestor of taskrambler.
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 
 
 

1124 lines
48 KiB

<?xml version="1.0" encoding="iso-8859-1"?>
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta name="generator" content=
"HTML Tidy for Mac OS X (vers 31 October 2006 - Apple Inc. build 13), see www.w3.org" />
<title>
RDF Diff, Patch, Update, and Sync -- Design Issues
</title>
<style type="text/css">
/*<![CDATA[*/
.definition {text-align: right}
.proposition {text-align: right}
/*]]>*/
</style>
<link rel="Stylesheet" href="di.css" type="text/css" />
<link rel="Stylesheet" href="lncs04/article.css" type=
"text/css" />
</head>
<body xml:lang="en" lang="en">
<div class="online">
<a href="./">Up to Design Issues</a>
</div>
<div class="maketitle">
<h1 class="title">
Delta: an ontology for the distribution of differences
between RDF graphs
</h1>
<address>
<a rel="author" href=
"http://www.w3.org/People/Berners-Lee/">Tim Berners-Lee</a>
and <a rel="author" href=
"http://www.w3.org/People/Connolly/">Dan Connolly</a>,
<a rel="institute" href="http://www.csail.mit.edu/">MIT
Computer Science and Artificial Intelligence Laboratory
(CSAIL)</a><br />
<span class="thanks">This work is supported in part by
funding from US Defense Advanced Research Projects Agency
(DARPA) and Air Force Research Laboratory, Air Force
Materiel Command, USAF, under agreement number
F30602-00-2-0593, <q>Semantic Web
Development</q>.</span><br />
<span class="online">Created: 2001, current: $Revision:
1.114 $ of <!--linebreak-->
$Date: 2009/08/27 21:38:06 $</span><br />
<span class="online">Status: personal view only. Editing
status: rough. 2004/03: Extended to add pointers to
implementations, and details of actual language used. see
also: <a href=
"http://lists.w3.org/Archives/Team/sw-team/2004Jul/0008">comments
from reviewers</a></span>
</address>
<p class="online">
Keywords: RDF, Difference, patch, remote update,
synchronization, graph comparison.
</p>
</div>
<hr />
<div class="abstract">
<h4>
Abstract
</h4>
<p>
The problem of updating and synchronizing data in the
Semantic Web motivates an analog to text diffs for RDF
graphs. This paper discusses the problem of comparing two
RDF graphs, generating a set of differences, and updating a
graph from a set of differences. It discusses two forms of
difference information, the context-sensitive <q>weak</q>
patch, and the context-free <q>strong</q> patch. It gives a
proposed <strong>update ontology</strong> for patch files
for RDF, and discusses experience with proof of concept
code.
</p>
</div>
<h2>
Introduction
</h2>
<p>
The use of text files to record programs, documents, and
other artifacts is supported by version control systems such
as RCS<a href="#Tich85">[Tich85]</a> and CVS<a href=
"#Ber90">[Ber90]</a> that are based on the ability to compute
the difference between two text files and represent it as
diff<a href="#Mill85">[Mill85]</a>, i.e. a set of editing
instructions. The use of database tables to record bank
accounts and records of all sorts is supported by the
relational calculus<a href="#Codd70">[Codd70]</a> and its
expression as SQL statements. In both cases, the data goes
thru a sequence of states; not only are the states
represented explicitly (as text files or database tables) but
also the transitions from one state to the other can be
represented explicitly (either as editing instructions or SQL
insert/update statements). Difference (<samp>\Delta</samp>)
and sum (<samp>\Sigma</samp>) functions are ubiquitous in
computing and, like differentiation and integration, are
inverse in the sense that:
</p>
<p class="eqn-display">
v1 = <samp>\Sigma</samp>(v0, <samp>\Delta</samp>(v0, v1))
</p>
<p>
Since the transitions can be represented much more compactly
than the pairs of states, and the sigma function is
straightforward to compute, the deltas are useful for
efficiently updating data distributed among two or more
peers.
</p>
<p>
We are developing a Semantic Web Application Platform
(<a href="/2000/10/swap/">SWAP</a>) including tools and
applications to manipulate RDF graphs much like traditional
tools manipulate text files. It includes <code>cwm</code>, a
command-line tool for processing RDF in both the standard XML
encoding<a href="#RDF04">[RDF04]</a> and an experimental
encoding, Notation3 (n3)<a href="#Ber03">[Ber03]</a>.
</p>
<p>
As we build the Semantic Web, using RDF graphs<a href=
"#RDFC04">[RDFC04]</a> to represent data such as
bibliographies<a href="#DC02">[DC02]</a>, syndication
summaries<a href="#RSS">[RSS]</a> and medical
terminology<a href="#Gol03">[Gol03]</a>, we see a need for
difference and sum functions for RDF graphs. The use of RDF
to represent test results<a href="#EARL">[EARL]</a>,<a href=
"#OWLT">[OWLT]</a> motivates better ways to compare the
actual results of software tests with the intended results
and isolate the differences.
</p>
<h3>
<a name="Synchroniz" id="Synchroniz">The Synchronization
Problem</a>
</h3>
<p>
One of the most stubborn problems in practical computing is
that of synchronizing calendars and address books between
different devices. Various combinations of device and
program, from the same or different manufacturers, produce
very strange results on not-so-rare occasions.
</p>
<p>
The problem has three parts. There is the syntactic problem
of extracting the data from the strange device or its storage
medium and turning into something manageable, such as RDF.
There is the semantic problem of understanding what the
fields mean: can one have two home phone numbers? There is
the problem of actually synchronizing changes, particularly
in the general case that changes have been made on both
devices.
</p>
<p>
Because the direct syntactic conversion to RDF often leaves
something which has strained and awkward semantics, it is
often necessary or tempting to mix the semantic and syntactic
conversions. <span class="online">(See <a href=
"/2002/12/cal/" class="online">RDF calendaring</a>
discussions.)</span> Because the merging of changes requires
more application knowledge than the bare RDF data provides,
it is tempting to mix the conversion and sync algorithm.
However, this mixing reduces the modularity and testability
of the resulting program. Perhaps if the three stages were
separated, then a more robust system, and one more extensible
by the addition of information in new ontologies, would
result.
</p>
<p>
In the semantic web architecture, the application constraints
on the data can be represented in the ontology, and so can be
used by a a generic synchronization system.
</p>
<p>
On the one hand, the syntactic problems are straightforward,
if tedious, and the much harder semantic problems may explain
why many existing synchronization packages break down. But on
the other hand, perhaps it is the combination of the two that
result in so many failures; perhaps software that separates
the problems, treating synchronization generically, will be
more robust. We hope this work contributes to further work on
specifications such as SyncML<a href="#Sync02">[Sync02]</a>.
</p>
<p>
And while in the general case, concurrent changes may be
completely irreconcilable, the diff mechanisms discussed here
solve an interesting part of the problem space.
</p>
<h3>
Problems with the line-oriented approach
</h3>
<p>
RDF graphs can be serialized and used with traditional
line-oriented tools. In the general case, with no constraints
on how the graphs are serialized, line-oriented deltas can be
as large as the data itself, even between files representing
the same graph. However, when files are edited by hand, small
changes to the data naturally result in small textual diffs.
But since the difference is expressed as the difference
between two text files, not the difference between two
graphs, the delta is dependent on the graph serialization.
It's not enough to have the original graph to use the delta;
one needs a copy of the particular serialization.
</p>
<p>
Pretty-printing algorithms reduce the large number of
possible serializations of an RDF graph to a few actual
serializations. The difference engine<a href=
"#Kly04">[Kly04]</a> produces human-readable difference
descriptions using an algorithm analogous to comparing
pretty-printed graphs; its descriptions are not sufficient to
reconstruct one graph from the other, however.
</p>
<p>
We find it practical to use CVS to manage both hand-edited
and machine-serialized RDF data files in many cases. A
notable exception is the reference results for tests:
comparison of experimental test results versus reference
results yield many false test failures every time we change
the pretty-printing algorithm in the slightest. The cost of
managing the reference results this way is barely tolerable.
</p>
<p>
The straightforward pretty-printing algorithm works in the
obvious way when all the nodes are named (either with URIs or
literals): triples are sorted by subject, and those that
share a subject are grouped together. Notation3 has syntax
for grouping triples that shared predicates. Unlabeled nodes
(<em>blank nodes</em> or <em>bnodes</em>) that have no
incoming triples are treated like named subjects. Bnodes that
have one incoming link serve as internal nodes in the
pretty-printing tree. Bnodes that have more than one incoming
triple are given arbitrary labels for the purpose of
serialization and are hence treated like named subjects. For
example, the triples
</p>
<pre class="example">
:Bob :pet _:p.
_:p :size "small".
:Bob :brother :Pete.
_:p :mother _:p2.
:Pete :pet _:p2.
</pre>
<p>
are pretty-printed as
</p>
<pre class="example">
:Bob :brother :Pete;
:pet [
:mother _:g0;
:size "small" ] .
:Pete :pet _:g0 .
</pre>
<p>
The ordering and the identification of bnodes are the two
ways which serializations of the same graph can arbitrarily
differ. <code>Cwm</code> not only attempts to find a
serialization which minimizes the number of arbitrarily named
nodes but often happens to regenerate arbitrary names
consistently across runs. Even so, diffs of pretty-printed
RDF are still unsatisfactory, since changes as small as one
triple can lead to arbitrarily large textual diffs if that
triple changes the set of bnodes that need arbitrary labels.
</p>
<p>
To completely eliminate the arbitrary choices in how to
serialize an RDF graph, we could employ a canonicalization
algorithm such as the one<a href="#Car03s">[Car03s]</a> in
Jena<a href="#Car03">[Car03]</a>, or <a href=
"/2000/10/swap/cant.py">cant.py</a> from our own SWAP
toolkit. One problem with this approach is that the canonical
form is expressed in the N-Triples<a href=
"#RDFT04">[RDFT04]</a> representation. Deltas between
N-Triples files are verbose and tedious to read for most
practical graphs. Further, the problem of large textual diffs
resulting from small changes remains: these canonicalization
algorithms work by computing a signature for each blank node
based on nearby triples and sorting the results; adding or
removing one triple near a blank node will change its
signature and hence potentially the labeling of many bnodes.
</p>
<h2>
Goals: Economy and Robustness
</h2>
<p>
SQL statements and text file diffs are attractive because
they succinctly represent the difference between two states.
If the difference between two text files were not much
smaller than either of the text files, it would be of little
use. The essential feature of a difference algorithm, then,
is <em>economy</em>: small differences between input states
should result in small deltas.
</p>
<p>
Much of the popularity of CVS is due to its support of
concurrent development. It makes a patch file<a href=
"#Wall">[Wall]</a> representing the changes each party has
made. These changes are made, in order, to the repository
file to generate new versions. In the event that two agents
take a copy of the same version <samp>v0</samp> and make
different changes to it (<samp>v1a</samp> and
<samp>v1b</samp>), the party that commits last attempts to
make <samp>v1</samp> which incorporates both diffs:
</p>
<p class="eqn-display">
v1 = <samp>\Sigma</samp>(<samp>\Sigma</samp>(v0,
<samp>\Delta</samp>(v0, v1a)), <samp>\Delta</samp>(v0, v1b))
</p>
<p>
Note that <samp>\Delta(v0, v1b)</samp> is applied to
something other than v0. The context diff and unidiff formats
are sufficiently robust that it does work in most practical
cases. When it does not work, then the user is left with the
problem of manually reconciling the conflicts. This happens
when, for example, one party moves the date of a meeting at
the same time as someone else moves or deletes the meeting.
It may be that the criterion that a problem needs human
involvement is very application-dependent.
</p>
<p>
There are thee failure modes:
</p>
<ol>
<li>Inconsistent changes were made. This failure mode is not
automatically soluble.
</li>
<li>The patch was incapable of finding the appropriate points
in v1a at which to make the change <samp>\Delta</samp>(v0,
v1b). This form of failure we can eliminate for certain RDF
graph deltas.
</li>
<li>The patch was misapplied: the context was used to
determine points at which to make the change, but the wrong
point was used, and erroneous data resulted. This is
unacceptable.
</li>
</ol>
<p>
A <em>robust</em> patch is one which may be applied so a file
different to the one it was originally generated from,
without being misapplied and hence generating erroneous
information. In the line oriented tools, the <em>patch</em>
program was introduced to be more robust than simply applying
the patch as a series of editor commands.
</p>
<h2>
Delta and Sigma for RDF Graphs
</h2>
<p>
An RDF graph is a set of (subject, predicate, object)
triples, i.e. a set of typed links between nodes. Each node
may or may not be named (either by a URI or a literal). As a
measure of the size of the difference between two RDF graphs
<samp>G1</samp> and <samp>G2</samp>, one can use the sum of
the size of the set differences <samp>|G1-G3|</samp> and
<samp>|G2-G3|</samp> where <samp>G3</samp> is the largest
common subgraph of <samp>G1</samp> and of <samp>G2</samp>.
</p>
<h3>
Computing differences between RDF graphs
</h3>
<p>
In the case in which all the nodes are named, computing the
difference between two graphs is simple and straightforward:
</p>
<p class="definition">
If <samp>G1</samp> and <samp>G2</samp> are ground RDF graphs,
then the <em>ground graph delta</em> of <samp>G1</samp> and
<samp>G2</samp> is a pair <samp>(insertions,
deletions)</samp> where <samp>insertions</samp> is the set
difference <samp>G2-G1</samp> and <samp>deletions</samp> is
<samp>G1-G2</samp>.
</p>
<p>
This form of delta is reasonably economical: the storage cost
is linear in the size of the difference between the graphs.
Straightforward extensions with slightly improved economy
might be more specific in expressing differences in which
only one or two parts of the triple have changed.
</p>
<p>
It is also completely robust. Each statement is independent,
with no variables: there is no cause for ambiguity. The
deletion statements may be deleted from, and the insertion
statements added to, any graph.
</p>
<p>
In the case where not all of the nodes are named, finding the
largest common subgraph becomes a case of the graph
isomorphism problem. The arc labels do have names (in a very
large set of practical cases, including all those which can
be serialized as RDF/XML). Graph isomorphism is in fact a
class of difficult problem that cannot be solved in
polynomial time but which has not been shown to be NP
complete<a href="#Kob93">[Kob93]</a>. While the general graph
isomorphism problem has readily available solutions<a href=
"#Ski97">[Ski97]</a><a href="#Ski01">[Ski01]</a>, they do not
seem to be a good match for the practical cases of RDF graph
diff.
</p>
<p>
There is an interesting subset of real cases in which there
are a mixture of named and unnamed nodes, but none of the
unnamed nodes is very far from a named node. In this case,
the unnamed nodes can be indirectly identified by giving a
path from a named node. The difference is then expressed by
giving this local context and the related changes.
</p>
<h3>
A patch file format for RDF deltas
</h3>
<p>
By analogy to the text diff, there is a need not only for a
difference-finding algorithm, but for a patch file format.
Such a format needs:
</p>
<ul>
<li>a way to uniquely identify what is changing
</li>
<li>a way to distinguish between the pieces added and those
subtracted
</li>
</ul>
<p>
It is straightforward to pinpoint the parts of the graph that
have changed when all nodes are named, but less so in the
presence of anonymous nodes.
</p>
<p>
To identify what is changing, we use Notation3 expressions
for quoted RDF graphs with schema variables, and we introduce
three new terms. For example:
</p>
<pre class="example">
@prefix diff: &lt;http://www.w3.org/2004/delta#&gt;.
{ ?x bank:accountNo "1234578"; bank:balance 4000}
diff:replacement
{ ?x bank:accountNo "1234578"; bank:balance 3575}.
</pre>
<p>
This one new property <code>replacement</code> can express
any change. Deletions can be written <code>{...}
diff:replacement {}</code> and additions can be written
<code>{} diff:replacement {...}</code>.
</p>
<p>
The second alternative is very similar but involves two
properties, one for inserting and one for deleting:
</p>
<pre class="example">
{ ?x bank:accountNo "1234578"}
diff:deletion { ?x bank:balance 4000};
diff:insertion { ?x bank:balance 3575}.
</pre>
<p>
The form using <code>diff:insertion</code> and
<code>diff:deletion</code> is implemented in <a href=
"/2000/10/swap/doc/cwm">cwm</a>.
</p>
<p>
The first and second form are related by
</p>
<pre class="definition">
{ ?F replacement ?G } &lt;=&gt; { ?F deletion ?F; insertion ?G }
</pre>
<h3>
Weak and Strong diffs
</h3>
<p>
To address robustness, we distinguish two types of RDF graph
deltas: a <em>weak</em> delta gives enough information to
apply it to exactly the graph it was computed from, but a
<em>strong</em> delta specifies the changes in a
context-independent manner. The difference is not in the
patch file format, but in the information a particular patch
gives.
</p>
<p>
Returning to the bank example, if bank account numbers are
globally unique, then the replacement pattern will bind ?x to
a node identifying a particular bank account. In OWL<a href=
"#OWL">[OWL]</a> terms, if <code>bank:accountNumber</code> is
an <code>owl:InverseFunctionalProperty</code>, then the node
must be the <code>owl:sameAs</code> any other node with the
same account number. In that case, the patch will be strong.
</p>
<p>
If, however, many accounts can have the same number, applying
that patch to another knowledge base may inadvertently alter
the wrong account. The patch would be weak.
</p>
<p>
In normal information processing, of course, numbers such as
bank account numbers are used to avoid this confusion.
Consider those graphs in which every blank node is in fact
unambiguously identified by one functional or inverse
functional property. Further, that property is invariant
under any changes represented by the deltas.
</p>
<p>
The pattern for terms goes as follows:
</p>
<p class="definition">
Given a background ontology <samp>W</samp> and a graph
<samp>G</samp>, if a blank node <samp>b</samp> in
<samp>G</samp> is the object of a triple whose subject
<samp>v</samp> is <em>functionally ground</em> and whose
predicate <samp>p</samp> is an
<code>owl:FunctionalProperty</code> according to
<samp>W</samp>, then <samp>v.p</samp> is a <em>functional
term label</em> for <samp>b</samp> in <samp>G</samp> with
respect to <samp>W</samp>. Likewise, <samp>v\uparrow q</samp>
is a functional term label for <samp>b</samp> if
<samp>q</samp> is an
<code>owl:InverseFunctionalProperty</code>, b is the subject,
and v is the object. Recursively, v is functionally ground if
it is a name (URI or literal) or a bnode with a functional
term label.
</p>
<p>
Then we can rewrite certain graphs:
</p>
<p class="definition">
With respect to a background ontology <samp>W</samp>, a graph
<samp>G</samp> is <em>fully labeled</em> iff every node in
<samp>G</samp> is functionally ground. A <em>functional RDF
graph</em> is a set of triples whose terms are URIs,
literals, or functional terms. A functional RDF graph
<samp>F</samp> is a <em>functional analog</em> of an RDF
graph <samp>G</samp> iff <samp>G</samp> is fully labeled and
<samp>F</samp> can be obtained from <samp>G</samp> by
replacing each bnode b in <samp>G</samp> with a functional
term label for b.
</p>
<p>
The diffs of functional RDF graphs are just as simple to make
as ground RDF deltas:
</p>
<p class="definition">
Given a background ontology <samp>W</samp>, a <em>strong</em>
delta between fully labeled graphs <samp>G1</samp> and
<samp>G2</samp> is a pair <samp>(insertions,
deletions)</samp> where <samp>insertions</samp> is the set
difference <samp>F2-F1</samp>, deletions is
<samp>F1-F2</samp>, and <samp>F1</samp> and <samp>F2</samp>
are functional analogs of <samp>G1</samp> and <samp>G2</samp>
respectively.
</p>
<p class="online">
(@@need to define sigma for strong deltas?) It is actually
the same as for any delta: horn match and delete or insert.
</p>
<p>
A strong delta is like a context diff that cannot be
mis-applied.
</p>
<p class="proposition">
If <samp>D</samp> is a strong delta between fully labeled
graphs <samp>k1</samp> and <samp>k2</samp>, and
<samp>k3</samp> is a subset of <samp>k1</samp>, then
<samp>\Sigma(k3, D)</samp> is consistent with
<samp>k2</samp>. <span class="online">@@TODO: proof</span>
</p>
<p>
One advantage of a strong patch is, then, that one can take a
patch from any true knowledge base change and apply it to a
subset knowledge base, and the result will be true. For
example, if changes to a knowledge base are represented by a
sequence of strong diffs, one can subscribe to the diffs from
any given point on, and acquire a subset of the final
knowledge base.
</p>
<p>
As a practical matter, achieving fully labeled graphs
requires care in building and using the ontology. As a
supplement to the good practice of using URIs to distributing
data, it is useful to identify things indirectly by using
terms with published ontologies that say whether they are
many-many, many-1, 1-many or 1-1. The <a href=
"/2000/10/swap/diff.py">diff.py</a> program from <a href=
"/2000/10/swap/Overview.html">SWAP</a> will generate a strong
diff between two files, provided it can find sufficient
information in the Web to fully label the input graphs.
</p>
<p class="online">
We note in passing that the ontologies we used all involved
inverse functional datatype properties, which are OWL/Full
but not OWL/DL.
</p>
<h2>
Application to Update and Sync
</h2>
<p>
Though we have made small scale tests, we are interested in
pursuing strong diffs, and suspect they will be are useful in
a variety of applications.
</p>
<h3>
Peer-peer update and sync
</h3>
<p>
The algorithm for synchronizing two databases can be
straightforwardly generalized to N. In a decentralized
peer-peer network such as Network News Transfer
Protocol<a href="#NNTP">[NNTP]</a> (or many others), messages
are timestamped and distributed eventually to every party,
though a message may be received by different parties at
different times. When the network is reliable, there may be a
well-defined maximum delivery time.
</p>
<p>
A crude algorithm is to apply the patches in order of the
time-stamp. If a message arrives with a timestamp preceding
the recent ones already taken into account, they are unwound
so that the new version can be built in the proper order. A
patch which fails (as in a CVS conflict) is rejected. In the
case of RDF graphs, failure can be a pre-agreed form of
consistency, such as (for example) OWL-DL consistency. The
sender of the failed patch will realize this as they will be
running the same algorithm on the same patches, and will have
to take recovery action.
</p>
<p>
A new version can be given a version id by hashing the
version id of its predecessor with the message id of the
patch used to make the new from the old. The community can
refer to versions by these ids, and if they want to refer to
a commonly held document, then one only has to wait for the
maximum delivery time to know that everyone in the community
will know the value of the knowledge base for that version.
Even without waiting, anyone who knows of a version with that
ID will know they have the same contents.
</p>
<h3>
Patches as knowledge
</h3>
<p>
The idea of the strong patch file format is interesting
because a patch is a little bit of knowledge. A patch for
example that where my phone number was 1234 it should now be
5678, when in the context in which it is known to be a change
to a valid knowledge base between one week and the next,
indicates that my phone number has actually changed. One
might conclude, say, that I moved or changed jobs. A strong
patch has meaning in itself, and distributing and filtering
these becomes an interesting way of processing knowledge. In
some areas (like houses for sale) it is the new changed
information which is of most interest, and in some areas
(like currency rates) if you listen to a stream of changes
you will in fact accumulate a working knowledge of the area.
</p>
<h3>
Patches as news
</h3>
<p>
From the historical <em>NCSA Mosaic What's New</em> page to
the current syndication of RSS streams <a href=
"#RSS">[RSS]</a>, the interest in news on (or off) the Web
demonstrates that there is great interest in changes to the
status quo. We speculate that this will also be the case on
the Semantic Web. When the state is represented in RDF, then
RDF diffs represent news. The W3C Technical Reports list is
available as RDF, and the W3C RSS feed is partly,
effectively, a list of changes to the Tech Reports list. This
could be formalized by explicitly distributing RDF diffs.
</p>
<h2>
Future directions
</h2>
<p>
The algorithm developed to date produces difference files
only on graphs which are labeled directly with URIs or
indirectly with functional properties or inverse functional
properties.
</p>
<p>
It may be useful to extend the algorithm to cope with graphs
which are not completely labeled, but where the unlabeled
bits are the same in each graph, and so a strong diff can
still be produced. Another avenue would be to look at using
more than one property to label a node when one is not
sufficient.
</p>
<p>
Applications which do not need robustness can use weak
patches. The algorithm could be extended to do more of a
canonicalization-style signature-based match to optionally
give a weak diff where a strong diff cannot be given.
</p>
<p class="online">
In practice, while RDF fundamentally has a graph structure,
the graph is often used to encode ordered lists (RDF
collections). While lists are in fact represented by a
structure of <em>first</em> and <em>rest</em> links within
the graph, when serialized they are normally represented
directly as lists, and within software implementations they
may be stored specially. The representation of changes to
lists may merit a special syntax in the difference file, to
avoid a mess of <em>rdf:first</em> and <em>rest:rest</em>
statements. (@@DanC: first/rest are functional, so I don't
think this case mertis anything special.)
</p>
<p>
RDF does not contain the notion of an unordered set, though
one can with OWL create a class which has an enumerated set
of members. If the use of unordered sets becomes common,
which the authors suspect would be wise in the long run, then
a difference engine should be aware of such sets and be able
to express differences between them.
</p>
<p>
This application, like the rule language, demonstrates the
usefulness of the quoted formulae of n3. The authors believe
that many applications will need this ability to quote RDF
graphs within graphs. As n3 becomes a language of
communication, difference files will of course have to
express changes to nested formulae. As these are graphs, this
is basically a straightforward recursive use of the
difference system for single graphs. A simple though verbose
alternative is to reify the n3 before building differences.
</p>
<p>
With these extensions, the simple difference file format may
lose the elegance of its current simplicity. However, even
with these extensions, most data and ontologies shipped
around the web -- the bottom layers of the semantic web layer
cake -- will be plain RDF graphs and so have simple
difference files.
</p>
<p>
Clearly there are many algorithms which can be imagined for
efficiently generating deltas for RDF graphs. The ones
written are not particularly efficient, having being designed
as proof of concept.
</p>
<h2>
Conclusions
</h2>
<p>
There are many uses for technology of communicating
differences between graphs or changes to a graph. While in
general the generation of differences is basically a graph
isomorphism problem, in a wide set of practical cases, one
can efficiently generate a difference, or patch file.
So-called strong patch files are particularly interesting,
and open up a new series of applications based on the
syndication of change information. However, to be able to
generate them, one needs either a well-labeled graph, which
in turn needs an ontological knowledge of inverse functional
properties to allow nodes to be indirectly labeled. The patch
file format proposed is simple, being a new ontology of only
two (or three) new properties, and directly uses Notation3
syntax and semantics, which itself is a simple extension of
RDF. This format can be generated by all sorts of
difference-finding algorithms. It can be absorbed by any
system capable of matching RDF subgraphs. The patch file
ontology is a candidate for a future standard for remote
update of RDF data.
</p>
<div>
<h2>
References
</h2>
<p class="online">
see <a href="lncs04/Diffbib.bib">Diffbib.bib</a>
</p>
<dl class="bib">
<dt class="misc">
[<a name="RDF04" id="RDF04">RDF04</a>]
</dt>
<dd>
<span class="author">Beckett, D.</span> <cite><a href=
"http://www.w3.org/TR/2004/REC-rdf-syntax-grammar-20040210/">
RDF/XML Syntax Specification (Revised)</a></cite>
<span class="institution">W3C</span> <span class=
"type">Recommendation</span>, 10 <span class=
"month">February</span> <span class="year">2004</span>.
<p class="online">
<a href=
"http://www.w3.org/TR/rdf-syntax-grammar">Latest
version</a> available at
<code>http://www.w3.org/TR/rdf-syntax-grammar</code>
</p>
</dd>
<dt class="misc">
[<a name="DC02" id="DC02">DC02</a>]
</dt>
<dd>
<span class="author">Beckett, D. and Miller, E. and
Brickley, D.</span> <a href=
"http://dublincore.org/documents/2002/07/31/dcmes-xml/"><cite>
Expressing Simple Dublin Core in RDF/XML</cite></a>
<span class="institution">Dublin Core Metadata
Initiative</span> <span class=
"type">Recommendation</span> 31 <span class=
"month">July</span> <span class="year">2002</span>
</dd>
<dt class="misc">
[<a name="RSS" id="RSS">RSS</a>]
</dt>
<dd>
<span class="author">Beged-Dov, Gabe et. al.</span>
<cite><a href="http://web.resource.org/rss/1.0/">RDF Site
Summary (RSS) 1.0</a></cite> 6 <span class=
"month">December</span> <span class="year">2000</span>
</dd>
<dt class="inproceedings">
[<a name="Ber90" id="Ber90">Ber90</a>]
</dt>
<dd>
<span class="author">Berliner, Brian</span> <cite>CVS II:
Parallelizing Software Development</cite> <span class=
"booktitle"><a href="http://www.usenix.org/">USENIX</a>
Conference Proceedings</span> pp <span class=
"pages">341--352</span> <span class=
"month">January</span> 22-26, <span class=
"year">1990</span> <span class="address">Washington,
D.C.</span>
<p class="online">
<a href=
"http://www.hpcc.ecs.soton.ac.uk/hpci/tools/cvs/html/cvs-paper.html">
online copy</a>; <a href=
"http://cvsweb.xfree86.org/cvsweb/cvs/doc/cvs-paper.ms">
ms source</a>
</p>
</dd>
<dt class="misc">
[<a name="Ber03" id="Ber03">Ber03</a>]
</dt>
<dd>
<span class="author">Berners-Lee, Tim and Hawke, Sandro
and Connolly, Dan</span> <cite><a href=
"http://www.w3.org/2000/10/swap/doc/">Semantic Web
Tutorial Using N3</a></cite> <span class=
"howpublished">Twelfth International World Wide Web
Conference</span> <span class="address">Budapest,
Hungary</span> <span class="month">May</span>
<span class="year">2003</span>
</dd>
<dt class="techreport">
[<a name="Car03" id="Car03">Car03</a>]
</dt>
<dd>
<span class="author">Carroll, Jeremy J. and Dickinson,
Ian and Dollin, Chris and Reynolds, Dave and Seaborne,
Andy and Wilkinson, Kevin</span> <cite><a href=
"http://www.hpl.hp.com/techreports/2003/HPL-2003-146.html">
Jena: Implementing the Semantic Web
Recommendations</a></cite> <span class=
"institution">Hewlett-Packard</span> <span class=
"number">HPL-2003-146</span> <span class=
"month">Dec</span> <span class="year">2003</span>
<p class="online">
<a href=
"http://www.hpl.hp.com/semweb/jena.htm">Jena</a>
includes a graph diff program <code>rdfcompare</code>
in the <a href=
"http://jena.sourceforge.net/tools.html">command line
tools</a>.
</p>
</dd>
<dt class="TechReport">
[<a name="Car03s" id="Car03s">Car03s</a>]
</dt>
<dd>
<span class="author">Caroll, Jeremy J.</span>
<cite><a href=
"http://www.hpl.hp.com/techreports/2003/HPL-2003-142.html">
Signing RDF Graphs</a></cite> <span class=
"institution">Hewlett-Packard</span> <span class=
"number">HPL-2003-142</span> <span class=
"month">Jul</span> <span class="year">2003</span>
</dd>
<dt class="Article">
[<a name="Codd70" id="Codd70">Codd70</a>]
</dt>
<dd>
<span class="author">Codd, E. F.</span> <cite><a href=
"http://www.acm.org/classics/nov95/">A Relational Model
of Data for Large Shared Data Banks</a></cite>,
<span class="journal">Communications of the ACM</span>,
Vol. <span class="volume">13</span>, No. <span class=
"number">6</span>, <span class="month">June</span>
<span class="year">1970</span>, pp. <span class=
"pages">377--387</span>.
</dd>
<dt class="Article">
[<a name="Gol03" id="Gol03">Gol03</a>]
</dt>
<dd>
<span class="author">Golbeck, Jennifer and Fragoso,
Gilberto and Hartel, Frank and Hendler, James and Parsia,
Bijan and Oberthaler, Jim</span> <cite><a href=
"http://www.mindswap.org/papers/WebSemantics-NCI.pdf">The
national cancer institute's thesaurus and
ontology</a></cite>. <span class="journal">Journal of Web
Semantics</span>, <span class=
"volume">1</span>(<span class="number">1</span>),
<span class="month">Dec</span> <span class=
"year">2003</span>.
</dd>
<dt class="misc">
[<a name="RDFT04" id="RDFT04">RDFT04</a>]
</dt>
<dd>
<span class="author">Grant, J. and Beckett, D.</span>
<cite><a href=
"http://www.w3.org/TR/2004/REC-rdf-testcases-20040210/">RDF
Test Cases</a></cite>, <span class=
"institution">W3C</span> <span class=
"type">Recommendation</span>, 10 <span class=
"month">February</span> <span class="year">2004</span>.
<p class="online">
<a href="http://www.w3.org/TR/rdf-testcases">Latest
version</a> available at
<tt>http://www.w3.org/TR/rdf-testcases</tt>
</p>
</dd>
<dt class="misc">
[<a name="RDFC04" id="RDFC04">RDFC04</a>]
</dt>
<dd>
<span class="author">Klyne, G. and Carroll, J. J.</span>
<cite><a href=
"http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/">Resource
Description Framework (RDF): Concepts and Abstract
Syntax</a></cite>, <span class="institution">W3C</span>
<span class="type">Recommendation</span>, 10 <span class=
"month">February</span> <span class="year">2004</span>.
<p class="online">
<a href="http://www.w3.org/TR/rdf-concepts/">Latest
version</a> available at
<code>http://www.w3.org/TR/rdf-concepts/</code>
</p>
</dd>
<dt class="TechReport">
[<a name="NNTP" id="NNTP">NNTP</a>]
</dt>
<dd>
<span class="author">Kantor, Brian and Lapsley,
Phil</span> <cite><a href=
"http://www.ietf.org/rfc/rfc977">Network News Transfer
Protocol</a></cite> <span class="institution">IETF</span>
<span class="number">RFC977</span> <span class=
"month">February</span> <span class="year">1986</span>
</dd>
<dt class="misc">
[<a name="Kly04" id="Kly04">Kly04</a>]
</dt>
<dd>
<span class="author">Klyne, Graham</span> <cite><a href=
"http://www.ninebynine.org/RDFNotes/Swish/Intro.html">Semantic
Web Inference Scripting in Haskell</a></cite>
<span class="month">Feb</span> <span class=
"year">2004</span>
<p class="online">
see esp. section <a href=
"http://www.ninebynine.org/RDFNotes/Swish/Intro.html#GraphDiff">
Comparing graphs</a>
</p>
</dd>
<dt class="Book">
[<a name="Kob93" id="Kob93">Kob93</a>]
</dt>
<dd>
<span class="author">Johannes K<span title=
"\&quot;o">&ouml;</span>bler and Uwe Sch<span title=
"\&quot;o">&ouml;</span>ning and Jacobo Tor<span title=
"\'a">&aacute;</span>n</span> <cite><a href=
"http://www.birkhauser.com/cgi-win/ISBN/0-8176-3680-3">The
Graph Isomorphism Problem: Its Structural
Complexity</a></cite> <span class="series">Progress in
Theoretical Computer Science</span>. <span class=
"publisher">Birkh<span title=
"\&quot;a">&auml;</span>user</span>, <span class=
"address">Boston, MA</span>, (<span class=
"year">1993</span>).
<p class="online">
<a href=
"http://www.informatik.hu-berlin.de/Institut/struktur/algorithmenII/Buecher/GI/">
preface, TOC, etc.</a>. cited in <a href=
"http://www.math.tu-berlin.de/~schwartz/papers/KaibelSchwartz2002.references.bib">
KaibelSchwartz2002.references.bib</a>
</p>
</dd>
<dt class="Article">
[<a name="Mill85" id="Mill85">Mill85</a>]
</dt>
<dd>
<span class="author">Miller, Webb and Myers, Eugene
W.</span> <cite>A File Comparison Program</cite>
<span class="journal">Software---Practice and
Experience</span>, <span class=
"volume">15</span>(<span class="number">11</span>), pp.
<span class="pages">1025--1040</span>, <span class=
"month">November</span> <span class="year">1985</span>.
<p class="online">
<a href=
"http://liinwww.ira.uka.de/cgi-bin/bibshow?e=TF0tqf/fyqboefe%7d789658&amp;r=bibtex&amp;mode=intra">
bib</a>
</p>
</dd>
<dt class="Book">
[<a name="Ski97" id="Ski97">Ski97</a>]
</dt>
<dd>
<span class="author">Skiena, Steve</span> <cite>The
Algorithm Design Manual</cite> <span class=
"publisher"><a href="http://www.telospub.com/">Telos
Pr</a></span> <span class="address">New York</span>
<span class="year">1997</span>
</dd>
<dt class="incollection">
[<a name="Ski01" id="Ski01">Ski01</a>]
</dt>
<dd>
<span class="author"><a href=
"http://www.cs.sunysb.edu/~skiena/">Skiena,
Steve</a></span> <span class="chapter">1.5.9</span>
<cite><a href=
"http://www.cs.sunysb.edu/~algorith/files/graph-isomorphism.shtml">
Graph Isomorphism</a></cite> in the <span class=
"booktitle"><a href=
"http://www.cs.sunysb.edu/~algorith/index.html">Stony
Brook Algorithm Repository</a></span> <span class=
"publisher">Stony Brook University</span> <span class=
"year">2001</span>
<p class="online">
with reference to <a href=
"http://www.cs.sunysb.edu/~algorith/implement/gmt/implement.shtml">
GMT - Graph Matching Toolkit</a>
</p>
</dd>
<dt class="Article">
[<a name="Tich85" id="Tich85">Tich85</a>]
</dt>
<dd>
<span class="author">Tichy, W.</span> <a href=
"http://portal.acm.org/citation.cfm?id=4202&amp;dl=ACM&amp;coll=GUIDE">
<cite>RCS--a system for version control</cite></a>
<span class="journal">Software Practice <span class=
"amp">&amp;</span> Experience</span> Volume <span class=
"volume">15</span> , Issue <span class="number">7</span>
(<span class="month">July</span> <span class=
"year">1985</span>) Pages: <span class=
"pages">637--654</span>
</dd>
<dt class="misc">
[<a name="Sync02" id="Sync02">Sync02</a>]
</dt>
<dd>
<cite><a href=
"http://www.openmobilealliance.org/tech/affiliates/syncml/syncmlindex.html">
SyncML Specifications, Version 1.1</a></cite>
<span class="month">Feb</span> <span class=
"year">2002</span> <span class="publisher"><a href=
"http://www.openmobilealliance.org/">Open Mobile Alliance
(OMA)</a></span>
</dd>
<dt class="misc">
[<a name="Wall" id="Wall">Wall</a>]
</dt>
<dd>
<span class="author">Wall, Larry et. al.</span>
<cite><a href=
"http://www.gnu.org/software/patch/patch.html">patch</a></cite>
<span class="publisher">Free Software Foundation</span>
27 <span class="month">Jun</span> <span class=
"year">2000</span>
</dd>
<dt class="misc">
[<a name="EARL" id="EARL">EARL</a>]
</dt>
<dd>
<span class="author">Chisholm, W. and Palmer, S.
B.</span> Editors: <cite><a href=
"http://www.w3.org/TR/2002/WD-EARL10-20021206/">Evaluation
and Report Language (EARL) 1.0</a></cite> <span class=
"institution">W3C</span> <span class="type">Working
Draft</span> (work in progress), 6 <span class=
"month">December</span> <span class="year">2002</span>
<p class="online">
<a href="http://www.w3.org/TR/EARL10/">Latest
version</a> available at http://www.w3.org/TR/EARL10/
</p>
</dd>
<dt class="misc">
[<a name="OWLT" id="OWLT">OWLT</a>]
</dt>
<dd>
<span class="author">Carroll, J. J. and De Roo, J.</span>
Editors: <cite><a href=
"http://www.w3.org/TR/2004/REC-owl-test-20040210/">OWL
Web Ontology Language Test Cases</a></cite> <span class=
"institution">W3C</span> <span class=
"type">Recommendation</span> , 10 <span class=
"month">February</span> <span class="year">2004</span>.
<p class="online">
<a href="http://www.w3.org/TR/owl-test/">Latest
version</a> available at http://www.w3.org/TR/owl-test/
</p>
</dd>
<dt class="misc">
[<a name="OWL" id="OWL">OWL</a>]
</dt>
<dd>
<span class="author">Schreiber, G. and Dean, M.</span>
Editors: <cite><a href=
"http://www.w3.org/TR/2004/REC-owl-ref-20040210/">OWL Web
Ontology Language Reference</a></cite> <span class=
"institution">W3C</span> <span class=
"type">Recommendation</span> , 10 <span class=
"month">February</span> <span class="year">2004</span>.
<p class="online">
<a href="http://www.w3.org/TR/owl-ref/">Latest
version</a> available at http://www.w3.org/TR/owl-ref/
</p>
</dd>
</dl>
</div>
<hr />
<div class="online">
<a href="Overview.html">Up to Design Issues</a>
</div>
</body>
</html>