You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
500 lines
17 KiB
500 lines
17 KiB
<html>
|
|
<head><title>W3C QL98 Query Position Paper: RDF - Enabling Inferencing</title></head>
|
|
|
|
<body bgcolor="#ffffff">
|
|
|
|
<h1>Enabling Inferencing</h1>
|
|
|
|
Authors:<BR>
|
|
R.V. Guha (Netscape) <guha@netscape.com><BR>
|
|
Ora Lassila (Nokia) <ora.lassila@research.nokia.com><BR>
|
|
Eric Miller (OCLC) <emiller@oclc.org><BR>
|
|
Dan Brickley (ILRT, University of Bristol)
|
|
<daniel.brickley@bristol.ac.uk><BR>
|
|
<BR>
|
|
Date: 18 Nov 1998<BR>
|
|
|
|
|
|
|
|
<P>
|
|
Status of this document:<BR>
|
|
This is a position paper for the W3C Query Languages meeting in
|
|
Boston, December 3-4th 1998.</P>
|
|
|
|
|
|
<h3>Abstract</h3>
|
|
|
|
|
|
<BLOCKQUOTE>
|
|
The world wide web today is a network of hyperlinked resources. The
|
|
content of these resources is in most part opaque to
|
|
computers. Browsers display them and search
|
|
engines locate occurances of words within them, but the level
|
|
of "machine understanding" of the content, if any, is
|
|
very limited. A search engine, for example, might know that a resource
|
|
contained the textual string "<code>lion</code>" but not that it was a
|
|
representation of a <em>lion</em>, where lions are known to be members of
|
|
the class of <em>mammals</em>. By enabling richer representation such as
|
|
this, RDF makes it possible to express queries that go beyond simple
|
|
text-matching.
|
|
<BR><BR>
|
|
This paper presents an overview of the
|
|
query services that might be built on top of XML/RDF data. It does not
|
|
present a specific proposal for an RDF query language; instead, it
|
|
argues for a query language that is expressed in terms of the RDF
|
|
logical data model rather than one particular concrete syntax
|
|
|
|
</BLOCKQUOTE>
|
|
|
|
|
|
|
|
<h4>Evolving the Web from a Document Repository to a Knowledge Base</h4>
|
|
|
|
|
|
|
|
With the advent of RDF and XML, we how have an opportunity
|
|
to encode or annotate the content in a more machine
|
|
understandable way. This will help the web evolve from
|
|
a set of opaque pages to a rich knowledge base.
|
|
This in turn will enable many new interesting applications,
|
|
one of the most important of which will be the
|
|
precise search and retrieval of content.
|
|
<P>
|
|
|
|
<h5>Example</h5>
|
|
|
|
<blockquote>
|
|
The content of images is typically very opaque to computers.
|
|
Searching for images that contain particular kinds of scenes
|
|
or items is usually done by searching for words which might occur
|
|
on a page which refers to the image. This method is highly
|
|
inaccurate. If the image were associated with a piece of RDF
|
|
that clearly specified its content, significantly more precise
|
|
retrieval would be possible. E.g., a photo of a lion could
|
|
be annotated as depicting a lion. The following piece of RDF
|
|
does this.
|
|
</blockquote>
|
|
|
|
|
|
<center><table cellpadding="5" border="1" bgcolor="#80ffff" width="95%">
|
|
<tr><td><pre>
|
|
|
|
<!-- somewhere on the web.. some RDF statements about a picture -->
|
|
|
|
<RDF xmlns:cx="http://www.wwc.org/cat.rdf"
|
|
xmlns:P="http://www.images.org/image-desc-schema.rdf"
|
|
xmlns:vocab="http://vocab.org/useful#"
|
|
xmlns:rdf="http://www.w3.org/TR/rdf.rdf">
|
|
|
|
<P:Photograph rdf:about="http://www.imagelib.com/lion1.jpg">
|
|
<P:depicts>
|
|
<cx:Lion>
|
|
<vocab:color resource="http://vocab.org/useful#tan"/>
|
|
<vocab:gender resource="http://vocab.org/useful#female"/>
|
|
</cx:Lion>
|
|
</P:depicts>
|
|
<P:depicts rdf:resource="http://registries.org/people/Fred"/>
|
|
</P:Photograph>
|
|
<rdf:/RDF>
|
|
|
|
<!-- a picture of a tan coloured female lion and a
|
|
person identified only by URI -->
|
|
|
|
|
|
|
|
<!-- somewhere else on the web... -->
|
|
|
|
<rdf:RDF xmlns:vocab="http://vocab.org/useful#"
|
|
xmlns:rdf="http://www.w3.org/TR/WD-rdf-syntax#">
|
|
|
|
<vocab:Person rdf:about="http://registries.org/people/Fred">
|
|
<vocab:gender rdf:resource="http://vocab.org/useful#male"/>
|
|
<vocab:name> Fred </vocab:name>
|
|
</vocab:Person>
|
|
|
|
</rdf:RDF>
|
|
|
|
</pre>
|
|
|
|
see <A HREF="#footnote">footnote</A> for explanation of syntax
|
|
</table></center>
|
|
|
|
|
|
<P>
|
|
With this information, a search engine could do a very
|
|
precise search for pictures of lions. Searching for the word
|
|
"lion" on the other hand retrieves 784150 pages in Alta Vista,
|
|
most of which are references to the Lions Club, Lion King, etc.
|
|
</P>
|
|
|
|
<P>
|
|
While this kind of simple matching based retrieval
|
|
would be useful, it does not even come close to exploiting
|
|
the full potential of having machine understandable content (MUC).
|
|
</P>
|
|
|
|
<P>
|
|
In order to fully exploit this, we need to be able to
|
|
build inferential services on top of this MUC. Such a service
|
|
would combine the "raw" MUC with a set of axioms/rules,
|
|
enabling machines to infer knowledge that is implicit
|
|
in the MUC.
|
|
</P>
|
|
|
|
|
|
<h5>Example: inference-based image retrieval</H5>
|
|
|
|
<P>Imagine an appropriately captioned photograph of
|
|
a child's birthday party. Now consider someone searching
|
|
for an image with decorations. Typically, children's birthday
|
|
parties have decorations. However, since the caption does
|
|
not explicitly state that the image scene contains decorations,
|
|
a simple matching based algorithm will not find this image.
|
|
<BR>
|
|
An inferential service could draw on a rich set of rules
|
|
about the world (including events like birthday parties) to
|
|
infer that the photograph probably includes some decorations,
|
|
thereby improving the retrieval.
|
|
</P>
|
|
|
|
<h5>Example: class hierarchies</h5>
|
|
|
|
<P>The RDF Schema specification language provides facilities for
|
|
machine-readable vocabularies to be specified using a hierarchical type
|
|
system. This allows a resource to be described as member of some
|
|
specific class
|
|
(eg. 'Snow Leopard') and have it's membership of more general classes (eg.
|
|
'Big Cat', 'Mammal') implied by the RDF type system. This makes it
|
|
feasible to express searches for resources using general categories, and
|
|
have the results include resources whose membership of those broad
|
|
categories is inferred from their membership of some more detailed
|
|
sub-category.</P>
|
|
|
|
<h3>Logical vs Physical Models</h3>
|
|
|
|
Inferencing engines typically work by applying a set of "rules"
|
|
--- statements such as, if (a and b) then infer c) --- to a set of
|
|
ground atomic facts (this is a gross simplification of inferencing, but will
|
|
suffice for the purposes of this paper). The application of
|
|
rules to derive new conclusions can occur either when
|
|
a query is posed or when new facts or obtained. The set of rules
|
|
and facts can be centralized or distributed.
|
|
|
|
<P>
|
|
Inferencing engines always work on a "logical model".
|
|
The logical model is an (typically set-theoretic) abstraction.
|
|
|
|
<P>
|
|
The logical model is by definition an abstract entity.
|
|
Logical models are typically grounded in
|
|
one or more concrete syntaxes (aka physical models).
|
|
<P>
|
|
W3C logical models are based on RDF and syntax models are based on XML.
|
|
<P>
|
|
The distinction between the Logical Model vs the Syntax Model
|
|
has evolved over decades of work in math and computer science and
|
|
is found wherever representation of information is involved.
|
|
<P>
|
|
<UL>
|
|
<LI> <B>Analogy:</B>
|
|
In relational databases, we have <em> Relational Algebra </em>
|
|
which provides the logical model and syntaxes such as tab delimited
|
|
tables which provide the physical model. The former is important
|
|
for defining query languages (such as SQL) and the later is important
|
|
for transfering content across the wire.
|
|
</UL>
|
|
<P>
|
|
Any particular concrete manipulation is always on a physical model.
|
|
Therefore, it is often tempting to either confuse the
|
|
two or try make do with just the physical model. However,
|
|
there are several reasons why complex applications such as
|
|
inferencing engines are based on the logical model.
|
|
|
|
<OL>
|
|
|
|
<LI>There is a one-to-many mapping from a logical model to the concrete
|
|
representations of that logical model. A concrete syntax needs
|
|
to make certain commitments that a logical model need not.
|
|
The logical model hides details of the physical model
|
|
that don't carry any semantics. This in turn makes it easier
|
|
to build applications such as inferencing engines and higher
|
|
level query languages.
|
|
<P>
|
|
Examples:
|
|
<UL>
|
|
<LI> In Relational Databases the "order" of rows in a table carries
|
|
no semantics. However, in a tab-delimited file the rows have
|
|
to appear in <it>some</it> order. By operating at the relational
|
|
level (as opposed to the file format level), SQL hides an aspect
|
|
of the data that does not carry any semantics.
|
|
|
|
<LI> In predicate logic, "A or B" is semantically equivalent to "B or A",
|
|
even though the two physical strings are very different.
|
|
A logical model for predicate logic (such as Tarskian semantics)
|
|
hides this difference.
|
|
|
|
</UL>
|
|
|
|
|
|
|
|
<P>
|
|
<LI> Logical models provide support for certain operations which may
|
|
be difficult or impossible to properly define at a physical level.
|
|
<P>
|
|
For example, if the logical model of a knowledge is a directed
|
|
labelled graph (as with RDF), the aggregation of multiple knowledge
|
|
bases can be defined cleanly as graph superposition at the logical
|
|
level, even though it would be hard to define the concept of "aggregation"
|
|
two XML files.
|
|
|
|
<P>
|
|
<LI> Most interesting inferencing systems have infinite deductive
|
|
closures. This means that the deductive closure --- the set of
|
|
all the statements implied --- can never have a completele concrete representation.
|
|
In such cases, a query language for determining whether a
|
|
proposition is in the deductive closure
|
|
necceserily needs to be in terms of the logical model.
|
|
</OL>
|
|
|
|
<P>
|
|
<B>
|
|
Given the importance of the logical model, it is clear that we need
|
|
query languages not just for XML but also for RDF.
|
|
</B>
|
|
|
|
<BR>
|
|
<BR>
|
|
|
|
|
|
|
|
|
|
|
|
<h3>Query languages for RDF.</h3>
|
|
|
|
<P>
|
|
This position paper suggests a general outline for an RDF querying
|
|
system.
|
|
RDF's simple yet powerful data model allows for an equally simple
|
|
yet powerful query language. The query language is based on a single
|
|
query mechanism : <em>subgraph matching.</em>
|
|
</P>
|
|
|
|
|
|
<P>
|
|
Every query is against an RDF knowledge base (KB), which in turn could be
|
|
an aggregation
|
|
of two or more RDF knowledge bases. Every RDF/XML block (i.e., the RDF
|
|
within a <RDF> ...</RDF>) can be thought of as a serialised
|
|
RDF knowledge base.
|
|
</P>
|
|
|
|
<P>
|
|
The query is itself simply an RDF model (i.e., a directed labelled
|
|
graph), some of whose resources and properties may represent
|
|
<em>variables</em>.
|
|
|
|
There are two outputs to every query,
|
|
|
|
<OL>
|
|
<LI> A subgraph (of the KB against which the query is issued) which
|
|
matches the query.
|
|
|
|
<LI> A table of sets of legal bindings for the variables, i.e., when these
|
|
bindings are applied to the variables in the query, we get (1).
|
|
</OL>
|
|
</P>
|
|
|
|
<P>
|
|
Here are a couple of salient points about the query language outlined above.
|
|
</P>
|
|
|
|
<UL>
|
|
<LI> It can be used for a wide range of queries from simple graph traversal to
|
|
complex datalog like queries. For the sake of efficiency, a concrete query
|
|
language might add a number of "utility" functions for the simple graph
|
|
traversal.
|
|
<LI> One of the results of a query is itself an RDF knowledge base. This means
|
|
that it is possible to issue a query against the result of another query. In this
|
|
sense, this query language is similar to relational query languages. This feature
|
|
will make it possible to construct recursive queries.
|
|
</UL>
|
|
|
|
<P>
|
|
RDF Schema constructs such as <em>subClassOf</em> and <em>subPropertyOf</em>
|
|
allow some simple inferences. In future, more complex rules will be
|
|
expressible and more powerful inference engines will become possible.
|
|
Ideally, the query language used by an inferencing system to access
|
|
the knowledge base should be the same the query language the inferencing
|
|
system responds to.
|
|
</P>
|
|
|
|
<P>
|
|
To enable this, a query can take an additional parameter which specifies
|
|
whether its answer should be based on either the "raw RDF graph" or on
|
|
the deductive closure of the knowledge base.
|
|
</P>
|
|
|
|
|
|
|
|
<h4>Examples</h4>
|
|
|
|
<P><B>Note:</B> The syntax of these queries could easily be represented in RDF/XML
|
|
syntax. For the purposes of this paper we use a simple syntax
|
|
in which '$x' and '$y' represent variables and properties are shown
|
|
using namespace prefixes.</P>
|
|
|
|
<H5>Query 1: A lion</H5>
|
|
|
|
The query: <B>rdf:type($y, cx:Lion).</B>
|
|
<BR><BR>
|
|
|
|
<EM>
|
|
(the example syntax used here means
|
|
"Find resources 'y' which have a
|
|
http://www.w3.org/TR/WD-rdf-syntax#type
|
|
property whose value is the resource
|
|
http://www.wwc.org/cat.rdf#Lion").</EM>
|
|
|
|
<BR>
|
|
<BR>returns:
|
|
<center><table cellpadding="5" border="1" bgcolor="#80ffff" width="95%">
|
|
<tr><td><pre>
|
|
|
|
<rdf:RDF xmlns:P = "http://www.images.org/image-desc-schema.rdf#"
|
|
xmlns:rdf = "http://www.w3.org/TR/WD-rdf-syntax#">
|
|
<cx:Lion/>
|
|
</rdf:RDF>
|
|
|
|
and
|
|
(($y . [anonymous-resource]))
|
|
</pre>
|
|
</table></center>
|
|
|
|
|
|
<H5>Query 2 : A photograph depicting a male</H5>
|
|
|
|
The query: <B>P:depicts($x, $y) and rdf:type($x, P:Photograph)
|
|
and vocab:gender($y, vocab:male)
|
|
</B>
|
|
|
|
<BR><BR>
|
|
(meaning: <EM>"Find values for 'x' and 'y' where resource
|
|
'x' has an http://www.images.org/image-desc-schema.rdf#depicts property
|
|
whose value is 'y', and where 'x' has an
|
|
http://www.w3.org/TR/WD-rdf-syntax#type property with value
|
|
http://www.images.org/image-desc-schema.rdf#Photograph and
|
|
'y' has an http://vocab.org/useful#gender property whose value is
|
|
http://vocab.org/useful#male
|
|
</EM>)
|
|
|
|
|
|
<BR><BR>
|
|
|
|
returns:
|
|
|
|
<center><table cellpadding="5" border="1" bgcolor="#80ffff" width="95%">
|
|
<tr><td><pre>
|
|
|
|
<rdf:RDF
|
|
xmlns:cx = "http://www.wwc.org/cat.rdf"
|
|
xmlns:P = "http://www.images.org/image-desc-schema.rdf#"
|
|
xmlns:rdf = "http://www.w3.org/TR/WD-rdf-syntax#">
|
|
|
|
|
|
<P:Photograph rdf:about="http://www.imagelib.com/lion1.jpg">
|
|
<P:depicts>
|
|
<cx:Lion>
|
|
<vocab:color resource="http://vocab.org/useful#tan"/>
|
|
<vocab:gender resource="http://vocab.org/useful#female"/>
|
|
</cx:Lion>
|
|
</P:depicts>
|
|
<P:depicts>
|
|
<cx:Person rdf:about="http://registries.org/people/Fred" >
|
|
<vocab:gender rdf:resource="http://vocab.org/#male"/>
|
|
</cx:Person>
|
|
</P:depicts>
|
|
</P:Photograph>
|
|
</rdf:RDF>
|
|
|
|
<!-- note that the sub-graph returned here includes information
|
|
from two sources; statements about the photograph and about
|
|
Fred when taken together tell us that this is a photograph of
|
|
a male -->
|
|
</pre>
|
|
and<BR>
|
|
(($x . [http://www.imagelib.com/lion1.jpg])($y . [http://registries.org/people/Fred]))
|
|
|
|
</table></center>
|
|
|
|
|
|
|
|
<BR>
|
|
Similarly, the query "<B>P:depicts($x, $y) and rdf:type($y, cx:Lion) and
|
|
vocab:gender($y, vocab:female)</B>" would retrieve only illustrations of
|
|
female <em>Lions</em>.
|
|
|
|
<BR><BR>
|
|
|
|
<h3>Conclusion</h3>
|
|
With the advent of simple and powerful data models such as RDF and formal,
|
|
flexible syntaxes such as XML, we how have an opportunity to encode or
|
|
annotate web content in a more machine understandable way. These
|
|
standards
|
|
provide the ability to layer inferencing services that will facilitate
|
|
the evolution of the web from a set of opaque pages to a rich knowledge base.
|
|
As such, the web of today, the vast unstructured mass of information, may
|
|
in the future be transformed into something more manageable - and thus
|
|
something far more useful.
|
|
|
|
<P>
|
|
<BR><BR>
|
|
</P>
|
|
|
|
|
|
|
|
<HR NOSHADE />
|
|
|
|
<h3>Notes</H3>
|
|
|
|
<A NAME="footnote"></A>
|
|
|
|
<P>The following is a human-readable interpretation of the RDF used in the
|
|
example...</P>
|
|
|
|
<P>The first block of RDF uses four vocabularies to state that there is
|
|
a resource (http://www.imagelib.com/lion1.jpg) which is a member of the class
|
|
'Photograph' and which depicts an object that is an member of the class
|
|
'Lion' and which in turn has a color property with value 'tan', and a
|
|
gender
|
|
property with value 'female'. The photograph also depicts a second object
|
|
identified only by URI (http://registries.org/people/Fred). A second
|
|
source of information provides further RDF statements about
|
|
[http://registries.org/people/Fred]. In this case, we learn a name
|
|
("Fred") and that Fred is male.</P>
|
|
|
|
|
|
|
|
<h3>References</h3>
|
|
|
|
|
|
W3C Data Formats (W3C NOTE 29-October-1997)
|
|
<A
|
|
HREF="http://www.w3.org/TR/NOTE-rdfarch">http://www.w3.org/TR/NOTE-rdfarch</A>
|
|
|
|
<BR><BR>
|
|
Resource Description Framework (RDF) Schemas; <A
|
|
HREF="http://www.w3.org/TR/WD-rdf-schema">http://www.w3.org/TR/WD-rdf-schema</A>
|
|
<BR><BR>
|
|
Resource Description Framework (RDF) Model and Syntax;
|
|
<A
|
|
HREF="http://www.w3.org/TR/WD-rdf-syntax/">http://www.w3.org/TR/WD-rdf-syntax/</A>
|
|
|
|
<BR><BR>
|
|
Extensible Markup Language (XML) 1.0;
|
|
<A HREF="Extensible Markup
|
|
Language (XML) 1.0">http://www.w3.org/TR/1998/REC-xml-19980210</A>
|
|
|
|
</body>
|
|
|
|
</html>
|
|
|