You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
683 lines
22 KiB
683 lines
22 KiB
<html xmlns="http://www.w3.org/1999/xhtml">
|
|
<head>
|
|
<meta name="generator" content=
|
|
"HTML Tidy for Mac OS X (vers 31 October 2006 - Apple Inc. build 13), see www.w3.org" />
|
|
<meta http-equiv="content-type" content=
|
|
"text/html; charset=us-ascii" />
|
|
<title>
|
|
Relational Databases and the Semantic Web (in Design Issues)
|
|
</title>
|
|
<style type="text/css">
|
|
/*<![CDATA[*/
|
|
|
|
.work {background-color: #FFFFC1}
|
|
/*]]>*/
|
|
</style>
|
|
<link href="di.css" rel="stylesheet" type="text/css" />
|
|
</head>
|
|
<body bgcolor="#DDFFDD" text="#000000">
|
|
<address>
|
|
Tim Berners-Lee Created
|
|
<p>
|
|
<small>Date: September 1998.</small>
|
|
</p>
|
|
</address>
|
|
<p>
|
|
$Id: RDB-RDF.html,v 1.25 2009/08/27 21:38:09 timbl Exp $
|
|
</p>
|
|
<address>
|
|
<p>
|
|
Status: . Editing status: Comments please. An parenthetical
|
|
discussion to the <a href="Architecture.html">Web
|
|
Architecture at 50,000 feet</a>. and the <a href=
|
|
"Semantic.html">Semantic Web roadmap</a>.
|
|
</p>
|
|
</address>
|
|
<p>
|
|
<a href="Overview.html">Up to Design Issues</a>
|
|
</p>
|
|
<hr />
|
|
<h1>
|
|
Relational Databases on the Semantic Web
|
|
</h1>
|
|
<p>
|
|
There are many other data models which RDF's Directed
|
|
Labelled Graph (DLG) model compares closely with, and maps
|
|
onto. See a summary in
|
|
</p>
|
|
<ul>
|
|
<li>
|
|
<a href="RDFnot.html">What the Semantic Web can
|
|
represent</a>
|
|
</li>
|
|
</ul>
|
|
<p>
|
|
One is the Relational Database (RDB) model.
|
|
</p>
|
|
<h2>
|
|
<a name="ER" id="ER">The Semantic Web and Entity-Relationship
|
|
models</a>
|
|
</h2>
|
|
<p>
|
|
Is the RDF model an entity-relationship mode? Yes and no. It
|
|
is great as a basis for ER-modelling, but because RDF is used
|
|
for other things as well, RDF is more general. RDF is a model
|
|
of entities (nodes) and relationships. If you are used to the
|
|
"ER" modelling system for data, then the RDF model is
|
|
basically an openning of the ER model to work on the Web. In
|
|
typical ER model involved entity types, and for each entity
|
|
type there are a set of relationships (slots in the typical
|
|
ER diagram). The RDF model is the same, except that
|
|
relationships are first class objects: they are identified by
|
|
a URI, and so anyone can make one. Furthurmore, the set of
|
|
slots of an object is not defined when the class of an object
|
|
is defined. The Web works though anyone being (technically)
|
|
allowed to say anything about anything. This means that a
|
|
relationship between two objects may be stored apart from any
|
|
other information about the two objects. This is different
|
|
from object-oriented systems often used to implement ER
|
|
models, which generally assume that information about an
|
|
object is stored in an object: the definition of the class of
|
|
an object defines the storage implied for its properties.
|
|
</p>
|
|
<p>
|
|
For example, one person may define a vehicle as having a
|
|
number of wheels and a weight and a length, but not foresee a
|
|
color. This will not stop another person making the assertion
|
|
that a given car is red, using the color vocabular from
|
|
elsewhere.
|
|
</p>
|
|
<p>
|
|
Apart from this simple but significant change, many concepts
|
|
involved in the ER modelling take across directly onto the
|
|
Semantic Web model.
|
|
</p>
|
|
<h2>
|
|
The Semantic Web and Relational Databases
|
|
</h2>
|
|
<p>
|
|
The semantic web data model is very directly connected with
|
|
the model of relational databases. A relational database
|
|
consists of tables, which consists of rows, or records. Each
|
|
record consists of a set of fields. The record is nothing but
|
|
the content of its fields, just as an RDF node is nothing but
|
|
the connections: the property values. The mapping is very
|
|
direct
|
|
</p>
|
|
<ul>
|
|
<li>a record is an RDF node;
|
|
</li>
|
|
<li>the field (column) name is RDF propertyType; and
|
|
</li>
|
|
<li>the record field (table cell) is a value.
|
|
</li>
|
|
</ul>
|
|
<p>
|
|
Indeed, one of the main driving forces for the Semantic web,
|
|
has always been the expression, on the Web, of the vast
|
|
amount of relational database information in a way that can
|
|
be processsed by machines.
|
|
</p>
|
|
<p>
|
|
RDF's serialization format -- its syntax in XML -- is a very
|
|
suitable format for expressing relational database
|
|
information.
|
|
</p>
|
|
<h3>
|
|
Special aspects of the RDB model
|
|
</h3>
|
|
<p>
|
|
Relational database systems manage RDF data, but in a
|
|
specialized way. In a table, there are many records with the
|
|
same set of properties. An individual cell (which corresponds
|
|
to an RDF property) is not often thought of on its own. SQL
|
|
queries can join tables and extract data from tables, and the
|
|
result is generally a table. So, the practical use for which
|
|
RDB software is used typically optimized for doing operations
|
|
with a small number of tables some of which may have a large
|
|
number of elements.
|
|
</p>
|
|
<p>
|
|
A fundamental aspect of a database table is that often the
|
|
data in a table can be definitive. Neither RDF nor RDB models
|
|
have simple ways of expressing this. For example, not only
|
|
does a row in a table indicate that there is a red car whose
|
|
Massachusetts plate is "123XYZ", but the table may also carry
|
|
the unwritten semantics that if any car has a Massachusetts
|
|
plate then it must be in the table. (If any RDF node has
|
|
"Massachusetts plate number" property then than node is a
|
|
member of the table) The scope of the uniquenes of a value is
|
|
in fact a very interest property.
|
|
</p>
|
|
<p>
|
|
The original RDB model defined by E.F. Codd included
|
|
datatyping with inheritance, which he had intended would be
|
|
implememnted in the RDB products to a greater extent that it
|
|
has. For example, typically a person's home address house
|
|
number may be typed as an an integer, and their shoe size may
|
|
also be also be typed as an integer. One can as a result join
|
|
to tables through those fields, or list people whose shoe
|
|
size equals their house number. Practical RDB systems leave
|
|
it to the application builder to only make operations which
|
|
make sense. Once a database is expreted onto the Web, it
|
|
becomes possible to do all kinds of strange combinations, so
|
|
a stronger typing becomes very useful: it becomes a set of
|
|
inference rules.
|
|
</p>
|
|
<p>
|
|
In a pure RDB model, every table has a primary key: a column
|
|
whose value can be used to uniquely identify every row. Some
|
|
products do not enforce this, leading to an ambiguity in the
|
|
significance of duplicate rows. A curious feature is that the
|
|
primary key can be changed without changing the identity of a
|
|
row. (A person can change their name for example). SQL allows
|
|
tables to be set up so that such changes can cascade through
|
|
the local system to preseve referential integrity. This
|
|
clearly won't work on the Web. One solution is to use a row
|
|
ID -- which many systems do in fact use although SQL doesn't
|
|
expose it in a standard way. Another is for the application
|
|
to coinstrain the primary key not to change. Another is to
|
|
put up with links breaking.
|
|
</p>
|
|
<p>
|
|
RDB systems have datatypes at the atomic (unstructured)
|
|
level, as RDF and XML will/do. Combination rules tend in RDBs
|
|
to be loosely enforced, in that a query can join tables by
|
|
any columns which match by datatype -- without any check on
|
|
the semantics. You could for example create a list of houses
|
|
that have the same number as rooms as an employee's shoe
|
|
size, for every employee, even though the sense of that would
|
|
be questionable.
|
|
</p>
|
|
<p>
|
|
The new SQL99 standard is going to include new
|
|
object-oriented features, such as inherited typing and
|
|
structured contents of cells - arrays and structs. This RDB
|
|
model with things from the OO world. I don't deal with that
|
|
here in that the RDF model works as a lowest commoin
|
|
denominator being able to express either and both.
|
|
</p>
|
|
<h3>
|
|
Schemas and Schemas
|
|
</h3>
|
|
<p>
|
|
A difference between XML/RDF schemas (and SGML) on the one
|
|
hand and database schemas on the other is the expectation
|
|
that there will be a relatively small number of XML/RDF
|
|
schemas. Many web sites will export documents whose structure
|
|
is defined by the same schema, and this is in fact what
|
|
provides the interoperability.
|
|
</p>
|
|
<p>
|
|
A database schema is, as fasr as I know, created
|
|
independently for each database. Even if a million companies
|
|
clone the same form of employee database, there will be a
|
|
million schemas, one for each database.
|
|
</p>
|
|
<p>
|
|
It may be that RDF will fill a simple role in simply
|
|
expressing the equivalence of the terms in each database
|
|
schema.
|
|
</p>
|
|
<h3>
|
|
Exposing a database on the Web
|
|
</h3>
|
|
<p>
|
|
In order to be able to access a table, and make extra
|
|
statements about it which will enable its use in more and
|
|
more ways, the essential objects of the table must be
|
|
exported as first class objects on the Web.
|
|
</p>
|
|
<p>
|
|
When mapping any system onto the Web, the mapping into URI
|
|
space is critical. Here we are doing this common operation
|
|
generically for all relational databases. It is obviously
|
|
usefuil for this to be done in a consistent ways between
|
|
multiple vendors would be useful - an area for possible
|
|
standardization.
|
|
</p>
|
|
<p>
|
|
Here is a random example I may have gotten wrong, basd on
|
|
whatI understand of the naming within databases. The database
|
|
itself is defined within a schema which is listed in a
|
|
catalog.
|
|
</p>
|
|
<table border="1">
|
|
<caption>
|
|
Mapping an RDB into the Web - strawman
|
|
</caption>
|
|
<tbody>
|
|
<tr>
|
|
<td>
|
|
Catalog
|
|
</td>
|
|
<td>
|
|
http://www.acme.com/mycat
|
|
</td>
|
|
<td></td>
|
|
</tr>
|
|
<tr>
|
|
<td>
|
|
Schema
|
|
</td>
|
|
<td>
|
|
http://www.acme.com/mycat/schema1
|
|
</td>
|
|
<td></td>
|
|
</tr>
|
|
<tr>
|
|
<td>
|
|
Database
|
|
</td>
|
|
<td>
|
|
http://www.acme.com/mycat/schema1/empdb/
|
|
</td>
|
|
<th>
|
|
Relative:
|
|
</th>
|
|
</tr>
|
|
<tr>
|
|
<td>
|
|
Table
|
|
</td>
|
|
<td>
|
|
/mycat/schema1/empdb/emps
|
|
</td>
|
|
<td>
|
|
emps
|
|
</td>
|
|
</tr>
|
|
<tr>
|
|
<td>
|
|
Column name
|
|
</td>
|
|
<td>
|
|
/mycat/schema1/empdb/emps/shoe
|
|
</td>
|
|
<td>
|
|
emps/shoe
|
|
</td>
|
|
</tr>
|
|
<tr>
|
|
<td>
|
|
View
|
|
</td>
|
|
<td>
|
|
/mycat/schema1/empdb/emps2
|
|
</td>
|
|
<td>
|
|
emps2
|
|
</td>
|
|
</tr>
|
|
<tr>
|
|
<td>
|
|
Row
|
|
</td>
|
|
<td>
|
|
/mycat/schema1/empdb/emps/rowid=123
|
|
</td>
|
|
<td>
|
|
emps/rowid=123
|
|
</td>
|
|
</tr>
|
|
<tr>
|
|
<td>
|
|
Cell
|
|
</td>
|
|
<td>
|
|
/mycat/schema1/empdb/emps/rowid=123;col=shoe
|
|
</td>
|
|
<td>
|
|
emps/rowid=123;col=shoe
|
|
</td>
|
|
</tr>
|
|
<tr>
|
|
<td>
|
|
Arbitrary query
|
|
</td>
|
|
<td>
|
|
/mycat/schema1/empdb/?select+empno+from<em>[...]</em>
|
|
</td>
|
|
<td>
|
|
?select<em>[...]</em>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
<p>
|
|
2002 version, see <a href=
|
|
"http://www.w3.org/2000/10/swap/dbork/dbview.py">real
|
|
code</a> implemented by Dan Connolly:
|
|
</p>
|
|
<table border="1">
|
|
<tbody>
|
|
<tr>
|
|
<th>
|
|
<a name="table" id="table">What</a>
|
|
</th>
|
|
<th>
|
|
Uriref relative to http://www.acme.com/wherever/
|
|
</th>
|
|
<th>
|
|
rdf:type
|
|
</th>
|
|
</tr>
|
|
<tr class="work">
|
|
<td>
|
|
<p>
|
|
Database description of database "personnel"
|
|
</p>
|
|
</td>
|
|
<td>
|
|
personnel
|
|
<p>
|
|
(say - whatever)
|
|
</p>
|
|
</td>
|
|
<td>
|
|
soc:Work, rdfdocument, db:DatabaseDescription
|
|
</td>
|
|
</tr>
|
|
<tr>
|
|
<td>
|
|
The conceptual database(a table of tables??)
|
|
</td>
|
|
<td>
|
|
personnel#_database
|
|
<p>
|
|
(Arbitrary, must not clash, linked by
|
|
<code><strong>db:describes</strong></code> from
|
|
personnel)
|
|
</p>
|
|
</td>
|
|
<td></td>
|
|
</tr>
|
|
<tr class="work">
|
|
<td>
|
|
A document giving all the data in the database. May
|
|
support PUT?
|
|
</td>
|
|
<td>
|
|
personnel/_data
|
|
<p>
|
|
(Arbitrary, must not clash with table names, linked
|
|
by <strong><code>db:allData</code></strong> from
|
|
personnel)
|
|
</p>
|
|
</td>
|
|
<td>
|
|
soc:Work, rdfdocument
|
|
</td>
|
|
</tr>
|
|
<tr>
|
|
<td>
|
|
The concept of the table "employees": The class of
|
|
exactly those things which are in the table.
|
|
</td>
|
|
<td>
|
|
<p>
|
|
personnel/employees#.table
|
|
</p>
|
|
<p>
|
|
(was: personnel#employees, but changed to allow it to
|
|
be deref'd to giev useful data)
|
|
</p>
|
|
<p>
|
|
(defined in personnel)
|
|
</p>
|
|
</td>
|
|
<td>
|
|
rdfs:Class, db:Table
|
|
</td>
|
|
</tr>
|
|
<tr class="work">
|
|
<td>
|
|
A description of the table. Optimization: includes the
|
|
current size of the table. Identifies primary key if
|
|
any.
|
|
</td>
|
|
<td>
|
|
personnel/employees
|
|
<p>
|
|
(<strong>Convention</strong>. The bit of the
|
|
classname before the #)
|
|
</p>
|
|
</td>
|
|
<td>
|
|
soc:Work, rdfdocument, db:TableDescription
|
|
</td>
|
|
</tr>
|
|
<tr class="work">
|
|
<td>
|
|
A description of all the tables. Just an (optional)
|
|
optimization.
|
|
</td>
|
|
<td>
|
|
personnel/_all
|
|
<p>
|
|
(Arbitrary, must not clash, linked by
|
|
<code><strong>db:tableSchemas</strong></code> from
|
|
personnel/employees)
|
|
</p>
|
|
</td>
|
|
<td>
|
|
soc:Work, rdfdocument, db:TableDescription
|
|
</td>
|
|
</tr>
|
|
<tr>
|
|
<td>
|
|
The concept of a column in the table, the Property
|
|
something has iff that is recorded in the table.
|
|
</td>
|
|
<td>
|
|
personnel/employees#email
|
|
<p>
|
|
(Defined in personnel/employees)
|
|
</p>
|
|
</td>
|
|
<td>
|
|
rdf:Property, db:Column
|
|
</td>
|
|
</tr>
|
|
<tr class="work">
|
|
<td>
|
|
A document giving all the data in the table. May
|
|
support PUT
|
|
</td>
|
|
<td>
|
|
personnel/employees/_data
|
|
<p>
|
|
(Arbitrary, must not clash, linked by
|
|
<strong><code>db:tableData</code></strong> from
|
|
personnel/employees)
|
|
</p>
|
|
</td>
|
|
<td>
|
|
soc:Work, rdfdocument,
|
|
</td>
|
|
</tr>
|
|
<tr class="work">
|
|
<td>
|
|
A document giving the data in the row for which the
|
|
primary key is 1234. (Iff primary key exists). May
|
|
support PUT
|
|
</td>
|
|
<td>
|
|
personnel/employees/1234
|
|
<p>
|
|
(<strong>Convention.</strong> Note the primary key
|
|
value must be encoded suitably!)
|
|
</p>
|
|
</td>
|
|
<td>
|
|
soc:Work, rdfdocument
|
|
</td>
|
|
</tr>
|
|
<tr>
|
|
<td>
|
|
The concept of the thing describd by that row.
|
|
</td>
|
|
<td>
|
|
<p>
|
|
personnel/employees/1234#item
|
|
</p>
|
|
<p>
|
|
(<strong>Convention</strong>)
|
|
</p>
|
|
<p>
|
|
(when primary key exists, then employees#_data etc
|
|
use this URIref for the item 1234 intead of making
|
|
anonymous nodes)
|
|
</p>
|
|
<p>
|
|
(employees/_data#1234?@@)
|
|
</p>
|
|
</td>
|
|
<td>
|
|
personnel/employees#_Class
|
|
</td>
|
|
</tr>
|
|
<tr class="work">
|
|
<td>
|
|
A document giving the information in just one cell
|
|
</td>
|
|
<td>
|
|
personnel/employees/1234/email
|
|
<p>
|
|
(<strong>Convention</strong>)
|
|
</p>
|
|
</td>
|
|
<td>
|
|
[ is rdf:domain of personnel/employees#email ]
|
|
</td>
|
|
</tr>
|
|
<tr class="work">
|
|
<td>
|
|
Arbitrary query
|
|
</td>
|
|
<td>
|
|
personnel/_sql?select+empno+from<em>[...]</em>
|
|
<p>
|
|
(arbitrary, linked by
|
|
<code><strong>db:sqlService</strong></code> from
|
|
personnel if supported.)
|
|
</p>
|
|
</td>
|
|
<td>
|
|
soc:Work, rdfdocument
|
|
</td>
|
|
</tr>
|
|
<tr class="work">
|
|
<td>
|
|
Arbirary HTML form field match (select * from employees
|
|
where email like "*fred*") [@details]
|
|
</td>
|
|
<td>
|
|
personnel/_fquery?email=*fred*;name=Joe
|
|
<p>
|
|
(arbitrary, linked by
|
|
<code><strong>db:formService</strong></code> from
|
|
personnel if supported)
|
|
</p>
|
|
</td>
|
|
<td>
|
|
soc:Work, rdfdocument
|
|
</td>
|
|
</tr>
|
|
<tr>
|
|
<td>
|
|
POST point for RDF data, either new data, or assertions
|
|
that some (n3) Formula is a log:Falsehood.
|
|
</td>
|
|
<td>
|
|
<p>
|
|
personnel/_postme
|
|
</p>
|
|
<p>
|
|
(arbitrary, linked by
|
|
<code><strong>db:deltaService</strong></code> from
|
|
personnel if supported. Could be same URI
|
|
<code>personnel</code> in fact, as we are dealing
|
|
iwth a different method)
|
|
</p>
|
|
</td>
|
|
<td>
|
|
db:postable
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
<p>
|
|
@@@ How to use typing to indicate that the URI in the table
|
|
is a (relative?) URI to another object, not a string?
|
|
</p>
|
|
<p>
|
|
@@@ This works fine when implemented live on a database.
|
|
However, it is a little tricky to emulate in a typical
|
|
file-based web server because of the use of "personnel" in
|
|
this case both as directory and as
|
|
</p>
|
|
<p>
|
|
One of the things which makes life easier is to make the
|
|
mapping so that the relative URI syntax can be used to
|
|
advantage. For example, here, everything within the database
|
|
(the scope of an SQL statement) can be writted as a short
|
|
URI.
|
|
</p>
|
|
<p>
|
|
There is a question as to how much of the SQL query syntax
|
|
should be turned into identifier. For example, is a query on
|
|
a primary key really an identifier? Is the extraction of a
|
|
single cell really an identifier? It would be useful to be
|
|
able to treat them as such. However, it would be wiser to use
|
|
the "?" convention to indicate a generalized SQL idempotent
|
|
query. (A URL should <a href="Axioms.html#get">of course</a>
|
|
<em>never</em> be used to refer to the results of a
|
|
table-changing operation such as UPDATE or DELETE. In this
|
|
case, if HTTP were used, an SQL query should IMHO be POST ed
|
|
to the database URI. Of course, you can use your favorite
|
|
networked database access protocol)
|
|
</p>
|
|
<p>
|
|
In the above the column name of the table could be refered to
|
|
using the table as a namespace, a row for example being
|
|
</p>
|
|
<pre>
|
|
<foo<br /> xmlns:t="http://www.example.com/mycat/personnel/employees"><br /> <t:email>joe@example.com</t:email><br /> <t:age>45</t:age><br /></foo>
|
|
</pre>
|
|
<p>
|
|
and one row of the the result of joining this table (of
|
|
people) and another table (about people) by their primary
|
|
keys would use namespaces from both tables:
|
|
</p>
|
|
<pre>
|
|
<foo<br /> xmlns:t="http://www.example.com/mycat/personnel/employees"<br /> xmlns:u="http://www.acme.com/mycat/schema1/empdb/likes"><br /> <t:email>joe@example.com</t:email><br /> <t:age>45</t:age><br /> <u:music>blues</u:music><br /></foo>
|
|
</pre>
|
|
<hr />
|
|
<h2>
|
|
Later related work:
|
|
</h2><a href=
|
|
"http://www.cs.man.ac.uk/~ocorcho/documents/SWDB2004_BarrasaEtAl.pdf">R2O,
|
|
an Extensible and Semantically Based Database-to-Ontology
|
|
Mapping Language.</a> Barrasa J, Corcho O,
|
|
Gómez-Pérez A. Second Workshop on
|
|
Semantic Web and Databases (SWDB2004). Toronto, Canada. August
|
|
2004.
|
|
<hr />
|
|
<p>
|
|
<em>This has been elaborated with help of an RDB tutorial and
|
|
discussion from Andrew Eisenberg/Sybase</em>.
|
|
</p>
|
|
<hr />
|
|
<p>
|
|
See also: <a href="RDF-XML.html">Why RDF is more than XML</a>
|
|
</p>
|
|
<p>
|
|
<a href="Overview.html">Up to Design Issues</a>; back to
|
|
<a href="Architecture.html">Architecture from 50,000ft</a>
|
|
</p>
|
|
<p>
|
|
timbl
|
|
</p>
|
|
</body>
|
|
</html>
|