server_playground/doc/www.w3.org/TandS/QL/QL98/pp/enabling.html


								<html>

								<head><title>W3C QL98 Query Position Paper: RDF - Enabling Inferencing</title></head>


								<body bgcolor="#ffffff">


								<h1>Enabling Inferencing</h1>


								Authors:<BR>

								R.V. Guha (Netscape) &lt;guha@netscape.com&gt;<BR>

								Ora Lassila (Nokia) &lt;ora.lassila@research.nokia.com&gt;<BR>

								Eric Miller (OCLC) &lt;emiller@oclc.org&gt;<BR>

								Dan Brickley (ILRT, University of Bristol)

								&lt;daniel.brickley@bristol.ac.uk&gt;<BR>

								<BR>

								Date: 18 Nov 1998<BR>


								<P>

								Status of this document:<BR>

								This is a position paper for the W3C Query Languages meeting in

								Boston, December 3-4th 1998.</P>


								<h3>Abstract</h3>


								<BLOCKQUOTE>

								The world wide web today is a network of hyperlinked resources. The

								content of these resources is in most part opaque to

								computers. Browsers display them and search

								engines locate occurances of words within them, but the level

								of "machine understanding" of the content, if any, is

								very limited. A search engine, for example, might know that a resource

								contained the textual string "<code>lion</code>" but not that it was a

								representation of a <em>lion</em>, where lions are known to be members of

								the class of <em>mammals</em>. By enabling richer representation such as

								this, RDF makes it possible to express queries that go beyond simple

								text-matching.

								<BR><BR>

								This paper presents an overview of the

								query services that might be built on top of XML/RDF data. It does not

								present a specific proposal for an RDF query language; instead, it

								argues for a query language that is expressed in terms of the RDF

								logical data model rather than one particular concrete syntax


								</BLOCKQUOTE>


								<h4>Evolving the Web from a Document Repository to a Knowledge Base</h4>


								With the advent of RDF and XML, we how have an opportunity

								to encode or annotate the content in a more machine

								understandable way. This will help the web evolve from

								a set of opaque pages to a rich knowledge base.

								This in turn will enable many new interesting applications,

								one of the most important of which will be the

								precise search and retrieval of content.

								<P>


								<h5>Example</h5>


								<blockquote>

								The content of images is typically very opaque to computers.

								Searching for images that contain particular kinds of scenes

								or items is usually done by searching for words which might occur

								on a page which refers to the image. This method is highly

								inaccurate. If the image were associated with a piece of RDF

								that clearly specified its content, significantly more precise

								retrieval would be possible. E.g., a photo of a lion could

								be annotated as depicting a lion. The following piece of RDF

								does this.

								</blockquote>


								<center><table cellpadding="5" border="1" bgcolor="#80ffff" width="95%">

								<tr><td><pre>


								 &lt;!-- somewhere on the web.. some RDF statements about a picture --&gt;


								 &lt;RDF xmlns:cx="http://www.wwc.org/cat.rdf"

								              xmlns:P="http://www.images.org/image-desc-schema.rdf"

								              xmlns:vocab="http://vocab.org/useful#"

								              xmlns:rdf="http://www.w3.org/TR/rdf.rdf"&gt;


								          &lt;P:Photograph rdf:about="http://www.imagelib.com/lion1.jpg"&gt;

								                &lt;P:depicts&gt;

								                   &lt;cx:Lion&gt;

								                        &lt;vocab:color resource="http://vocab.org/useful#tan"/&gt;

								                        &lt;vocab:gender resource="http://vocab.org/useful#female"/&gt;

								                   &lt;/cx:Lion&gt;

								                &lt;/P:depicts&gt;

								                  &lt;P:depicts rdf:resource="http://registries.org/people/Fred"/&gt;

								            &lt;/P:Photograph&gt;

								         &lt;rdf:/RDF&gt;


								       &lt;!-- a picture of a tan coloured female lion and a

								            person identified only by URI --&gt;


								  &lt;!-- somewhere else on the web... --&gt;


								 &lt;rdf:RDF  xmlns:vocab="http://vocab.org/useful#"

								              xmlns:rdf="http://www.w3.org/TR/WD-rdf-syntax#"&gt;


								    &lt;vocab:Person rdf:about="http://registries.org/people/Fred"&gt;

								       &lt;vocab:gender rdf:resource="http://vocab.org/useful#male"/&gt;

								       &lt;vocab:name&gt Fred &lt;/vocab:name&gt;

								    &lt;/vocab:Person&gt;


								 &lt;/rdf:RDF&gt;


								</pre>


								see <A HREF="#footnote">footnote</A> for explanation of syntax

								</table></center>


								<P>

								With this information, a search engine could do a very

								precise search for pictures of lions. Searching for the word

								"lion" on the other hand retrieves 784150 pages in Alta Vista,

								most of which are references to the Lions Club, Lion King, etc.

								</P>


								<P>

								While this kind of simple matching based retrieval

								would be useful, it does not even come close to exploiting

								the full potential of having machine understandable content (MUC).

								</P>


								<P>

								In order to fully exploit this, we need to be able to

								build inferential services on top of this MUC. Such a service

								would combine the "raw" MUC with a set of axioms/rules,

								enabling machines to infer knowledge that is implicit

								in the MUC.

								</P>


								<h5>Example: inference-based image retrieval</H5>


								<P>Imagine an appropriately captioned  photograph of

								a child's birthday party. Now consider someone searching

								for an image with decorations. Typically, children's birthday

								parties have decorations. However, since the caption does

								not explicitly state that the image scene contains decorations,

								a simple matching based algorithm will not find this image.

								<BR>

								An inferential service could draw on a rich set of rules

								about the world (including events like birthday parties) to

								infer that the photograph probably includes some decorations,

								thereby improving the retrieval.

								</P>


								<h5>Example: class hierarchies</h5>


								<P>The RDF Schema specification language provides facilities for

								machine-readable vocabularies to be specified using a hierarchical type

								system. This allows a resource to be described as member of some

								specific class

								(eg. 'Snow Leopard') and have it's membership of more general classes (eg.

								'Big Cat', 'Mammal') implied by the RDF type system. This makes it

								feasible to express searches for resources using general categories, and

								have the results include resources whose membership of those broad

								categories is inferred from their membership of some more detailed

								sub-category.</P>


								<h3>Logical vs Physical Models</h3>


								 Inferencing engines typically work by applying a set of "rules"

								--- statements such as, if (a and b) then infer c) --- to a set of

								ground atomic facts (this is a gross simplification of inferencing, but will

								suffice for the purposes of this paper). The application of

								rules to derive new conclusions can occur either when

								a query is posed or when new facts or obtained. The set of rules

								and facts can be centralized or distributed.


								<P>

								 Inferencing engines always work on a "logical model".

								The logical model is an (typically set-theoretic) abstraction.


								<P>

								 The logical model is by definition an abstract entity.

								Logical models are typically grounded in

								one or more concrete syntaxes (aka physical models).

								<P>

								W3C logical models are based on RDF and syntax models are based on XML.

								<P>

								The distinction between the Logical Model vs the Syntax Model

								has evolved over decades of work in math and computer science and

								is found wherever representation of information is involved.

								<P>

								<UL>

								<LI> <B>Analogy:</B>

								 In relational databases, we have <em> Relational Algebra </em>

								which provides the logical model and syntaxes such as tab delimited

								tables which provide the physical model. The former is important

								for defining query languages (such as SQL) and the later is important

								for transfering content across the wire.

								</UL>

								<P>

								Any particular concrete manipulation is always on a physical model.

								Therefore, it is often tempting to either confuse the

								two or try make do with just the physical model. However,

								there are several reasons why complex applications such as

								inferencing engines are based on the logical model.


								<OL>


								 <LI>There is a one-to-many mapping from a logical model to the  concrete

								     representations of that logical model. A concrete syntax needs

								      to make certain commitments that a logical model need not.

								       The logical model hides details of the physical model

								        that don't carry any semantics. This in turn makes it easier

								        to build applications such as inferencing engines and higher

								        level query languages.

								   <P>

								   Examples:

								    <UL>

								      <LI> In Relational Databases the "order" of rows in a table carries

								           no semantics. However, in a tab-delimited file the rows have

								           to appear in <it>some</it> order. By operating at the relational

								           level (as opposed to the file format level), SQL hides an aspect

								           of the data that does not carry any semantics.


								      <LI> In predicate logic, "A or B" is semantically equivalent to "B or A",

								           even though the two physical strings are very different.

								           A  logical model for predicate logic (such as Tarskian semantics)

								           hides this difference.


								     </UL>


								<P>

								 <LI> Logical models provide support for certain operations which may

								   be difficult or impossible to properly define at a physical level.

								<P>

								   For example, if the logical model of a knowledge is a directed

								   labelled graph (as with RDF), the aggregation of multiple knowledge

								   bases can be defined cleanly as graph superposition at the logical

								   level, even though it would be hard to define the concept of "aggregation"

								   two XML files.


								<P>

								 <LI> Most interesting inferencing systems have infinite deductive

								   closures. This means that the deductive closure --- the set of

								   all the statements implied --- can never have a completele concrete representation.

								   In such cases, a query language for determining whether a

								   proposition is in the deductive closure

								   necceserily needs to be in terms of the logical model.

								</OL>


								<P>

								<B>

								Given the importance of the logical model, it is clear that we need

								query languages not just for XML but also for RDF.

								</B>


								<BR>

								<BR>


								<h3>Query languages for RDF.</h3>


								<P>

								This position paper suggests a general outline for an RDF querying

								system.

								RDF's simple yet powerful data model allows for an equally simple

								yet powerful query language. The query language is based on a single

								query mechanism : <em>subgraph matching.</em>

								</P>


								<P>

								Every query is against an RDF knowledge base (KB), which in turn could be

								an aggregation

								of two or more RDF knowledge bases. Every RDF/XML block (i.e., the RDF

								within a &lt;RDF&gt; ...&lt;/RDF&gt;) can be thought of as a serialised

								RDF knowledge base.

								</P>


								<P>

								The query is itself simply an RDF model (i.e., a directed labelled

								graph), some of whose resources and properties may represent

								<em>variables</em>.


								There are two outputs to every query,


								<OL>

								  <LI> A subgraph (of the KB against which the query is issued) which

								matches the query.


								  <LI> A table of sets of legal bindings for the variables, i.e., when these

								       bindings are applied to the variables in the query, we get (1).

								</OL>

								</P>


								<P>

								Here are a couple of salient points about the query language outlined above.

								</P>


								<UL>

								<LI> It can be used for a wide range of queries from simple graph traversal to

								complex datalog like queries. For the sake of efficiency, a concrete query

								language might add a number of "utility" functions for the simple graph

								traversal.

								<LI> One of the results of a query is itself an RDF knowledge base. This means

								that it is possible to issue a query against the result of another query. In this

								sense, this query language is similar to relational query languages. This feature

								will make it possible to construct recursive queries.

								</UL>


								<P>

								RDF Schema constructs such as <em>subClassOf</em> and <em>subPropertyOf</em>

								allow some simple inferences. In future, more complex rules will be

								expressible and more powerful inference engines will become possible.

								Ideally, the query language used by an inferencing system to access

								the knowledge base should be the same the query language the inferencing

								system responds to.

								</P>


								<P>

								To enable this, a query can take an additional parameter which specifies

								whether its answer should be based on either the  "raw RDF graph" or on

								the deductive closure of the knowledge base.

								</P>


								<h4>Examples</h4>


								<P><B>Note:</B> The syntax of these queries could easily be represented in RDF/XML

								syntax. For the purposes of this paper we use a simple syntax

								in which '$x' and '$y' represent variables and properties are shown

								using namespace prefixes.</P>


								<H5>Query 1: A lion</H5>


								The query:  <B>rdf:type($y, cx:Lion).</B>

								<BR><BR>


								<EM>

								(the example syntax used here means

								"Find resources 'y' which have a

								http://www.w3.org/TR/WD-rdf-syntax#type

								property whose value is the resource

								http://www.wwc.org/cat.rdf#Lion").</EM>


								<BR>

								<BR>returns:

								<center><table cellpadding="5" border="1" bgcolor="#80ffff" width="95%">

								<tr><td><pre>


								        &lt;rdf:RDF xmlns:P = "http://www.images.org/image-desc-schema.rdf#"

								             xmlns:rdf = "http://www.w3.org/TR/WD-rdf-syntax#"&gt;

								               &lt;cx:Lion/&gt;

								        &lt;/rdf:RDF&gt;


								and

								  (($y . [anonymous-resource]))

								</pre>

								</table></center>


								<H5>Query 2 : A photograph depicting a male</H5>


								The query: <B>P:depicts($x, $y) and rdf:type($x, P:Photograph)

								and vocab:gender($y, vocab:male)

								</B>


								<BR><BR>

								(meaning: <EM>"Find values for 'x' and 'y' where resource

								'x' has an http://www.images.org/image-desc-schema.rdf#depicts property

								whose value is 'y', and where 'x' has an

								http://www.w3.org/TR/WD-rdf-syntax#type property with value

								http://www.images.org/image-desc-schema.rdf#Photograph and

								'y' has an http://vocab.org/useful#gender property whose value is

								http://vocab.org/useful#male

								</EM>)


								<BR><BR>


								returns:


								<center><table cellpadding="5" border="1" bgcolor="#80ffff" width="95%">

								<tr><td><pre>


								        &lt;rdf:RDF

									     xmlns:cx = "http://www.wwc.org/cat.rdf"

								             xmlns:P = "http://www.images.org/image-desc-schema.rdf#"

								             xmlns:rdf = "http://www.w3.org/TR/WD-rdf-syntax#"&gt;


								          &lt;P:Photograph rdf:about="http://www.imagelib.com/lion1.jpg"&gt;

								                &lt;P:depicts&gt;

								                   &lt;cx:Lion&gt;

								                        &lt;vocab:color resource="http://vocab.org/useful#tan"/&gt;

								                        &lt;vocab:gender resource="http://vocab.org/useful#female"/&gt;

								                   &lt;/cx:Lion&gt;

								                &lt;/P:depicts&gt;

								                &lt;P:depicts&gt;

								                 &lt;cx:Person rdf:about="http://registries.org/people/Fred" &gt;

								                       &lt;vocab:gender rdf:resource="http://vocab.org/#male"/&gt;

								                 &lt;/cx:Person&gt;

								               &lt;/P:depicts&gt;

								            &lt;/P:Photograph&gt;

								        &lt;/rdf:RDF&gt;


								        &lt;!-- note that the sub-graph returned here includes information

								             from two sources; statements about the photograph and about

								             Fred when taken together tell us that this is a photograph of

								             a male --&gt;

								</pre>

								and<BR>

								(($x . [http://www.imagelib.com/lion1.jpg])($y . [http://registries.org/people/Fred]))


								</table></center>


								<BR>

								Similarly, the query "<B>P:depicts($x, $y) and rdf:type($y, cx:Lion) and

								vocab:gender($y, vocab:female)</B>"  would retrieve only illustrations of

								female <em>Lions</em>.


								<BR><BR>


								<h3>Conclusion</h3>

								With the advent of simple and powerful data models such as RDF and formal,

								flexible syntaxes such as XML, we how have an opportunity to encode or

								annotate web content in a more machine understandable way.  These

								standards

								provide the ability to layer inferencing services that will facilitate

								the evolution of the web from a set of opaque pages to a rich knowledge base.

								As such, the web of today, the vast unstructured mass of information, may

								in the future be transformed into something more manageable - and thus

								something far more useful.


								<P>

								<BR><BR>

								</P>


								<HR NOSHADE />


								<h3>Notes</H3>


								<A NAME="footnote"></A>


								<P>The following is a human-readable interpretation of the RDF used in the

								example...</P>


								<P>The first block of RDF uses four vocabularies to state that there is

								a resource (http://www.imagelib.com/lion1.jpg) which is a member of the class

								'Photograph' and which depicts an object that is an member of the class

								'Lion' and which in turn has a color property with value 'tan', and a

								gender

								property with value 'female'. The photograph also depicts a second object

								identified only by URI (http://registries.org/people/Fred). A second

								source of information provides further RDF statements about

								[http://registries.org/people/Fred]. In this case, we learn a name

								("Fred") and that Fred is male.</P>


								<h3>References</h3>


								W3C Data Formats (W3C NOTE 29-October-1997)

								<A

								HREF="http://www.w3.org/TR/NOTE-rdfarch">http://www.w3.org/TR/NOTE-rdfarch</A>


								<BR><BR>

								Resource Description Framework (RDF) Schemas;  <A

								HREF="http://www.w3.org/TR/WD-rdf-schema">http://www.w3.org/TR/WD-rdf-schema</A>

								<BR><BR>

								Resource Description Framework (RDF) Model and Syntax;

								<A

								HREF="http://www.w3.org/TR/WD-rdf-syntax/">http://www.w3.org/TR/WD-rdf-syntax/</A>


								<BR><BR>

								Extensible Markup Language (XML) 1.0;

								<A HREF="Extensible Markup

								Language (XML) 1.0">http://www.w3.org/TR/1998/REC-xml-19980210</A>


								</body>


								</html>