You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
526 lines
19 KiB
526 lines
19 KiB
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
|
|
<HTML>
|
|
<HEAD>
|
|
<META HTTP-EQUIV="Content-Type" CONTENT="text/html;charset=iso-8859-1">
|
|
<!-- style borrowed from NOTE in parts -->
|
|
<STYLE type=text/css>
|
|
.example {
|
|
BACKGROUND-COLOR: #f9f5de; BORDER-BOTTOM: 1px solid; BORDER-LEFT: 1px solid; BORDER-RIGHT: 1px solid; BORDER-TOP: 1px solid; COLOR: #5d0091; MARGIN-LEFT: 10%; WIDTH: 65%
|
|
}
|
|
BODY {
|
|
margin: 2em 1em 2em 70px;
|
|
font-family: sans-serif;
|
|
color: black;
|
|
background: white;
|
|
background-position: top left;
|
|
background-attachment: fixed;
|
|
background-repeat: no-repeat;
|
|
}
|
|
</STYLE>
|
|
<TITLE>Nodes and Arcs 1989-1999: WWW history and RDF</TITLE>
|
|
</HEAD>
|
|
<BODY>
|
|
<DIV class="head">
|
|
|
|
<!-- lose that official looking icon for now -->
|
|
|
|
<!--IMG src="http://www.w3.org/Icons/WWW/w3c_home" ALT ="W3C" class="W3CIcon" -->
|
|
|
|
|
|
<H1>
|
|
Nodes and Arcs 1989-1999
|
|
</H1>
|
|
</DIV>
|
|
|
|
<H2>The WWW Proposal and RDF: Then and Now</H2>
|
|
|
|
<P>
|
|
Initial version: 1999-11-12, Dan Brickley
|
|
<A HREF="mailto:danbri@w3.org"><TT>danbri@w3.org</TT></A><BR>
|
|
|
|
<P>
|
|
<STRONG>Status:</STRONG> <BR>
|
|
This is a work in progress and a personal view of the
|
|
technical relationship between RDF and older ideas from Web architecture.
|
|
It is an early release as an informal discussion document for
|
|
feedback from the <A HREF="/RDF/Interest/"> RDF Interest Group</A>. This is <EM>not</EM> a formal
|
|
publication of any working group, or of the W3C itself. Some typos
|
|
remain...
|
|
|
|
</P>
|
|
|
|
<P>This document is provided as a background discussion motivating
|
|
the <A HREF="/1999/11/11-WWWProposal/">WWW Proposal in RDF</A>
|
|
document. It was originally a sub-section of that work but grew too long
|
|
and was reworked as a standalone commentary. As such, there is some
|
|
duplication with that document which should be removed in any future
|
|
versions.
|
|
</P>
|
|
|
|
<H3>Information Management: Then and Now</H3>
|
|
|
|
<P>
|
|
The <A HREF="http://www.w3.org/History/1989/proposal.html">original
|
|
proposal of the WWW</A> from 1989 included a figure showing how
|
|
information about a Web of relationships amongst named objects could unify
|
|
a number of information management tasks.
|
|
</P>
|
|
|
|
|
|
|
|
<P>
|
|
<IMG SRC="/History/1989/Image1.gif" ALT="nodes and arcs figure from the
|
|
WWW proposal" >
|
|
</P>
|
|
|
|
|
|
|
|
<H3>RDF, WWW and Knowledge Management</H3>
|
|
|
|
<P>
|
|
Having <A HREF="/1999/11/110-WWWProposal/">
|
|
re-represented this data using RDF</A>, what can we do with
|
|
it that we couldn't before? The <A HREF="/RDF/">RDF</A> pages list a
|
|
number of query and logic oriented applications that suggest approaches to
|
|
WWW knowledge management unavailable in 1989. For example, we can show a
|
|
simple <A HREF="rdfqdemo.html">Javascript-based RDF Query demonstrator</A>
|
|
that queries this RDF database. (note that this is an in-progress
|
|
work and currently functions in only a subset of Javascript/ECMAScript
|
|
browsers).
|
|
|
|
</P>
|
|
|
|
<P>
|
|
The remainder of this document revisits some of the initial aims of the
|
|
WWW, and connects these to the architecture adopted for the Resource
|
|
Description Framework.
|
|
</P>
|
|
|
|
|
|
<H3>A digression: RDF in context</H3>
|
|
|
|
<P>
|
|
<STRONG>Note</STRONG>:
|
|
The following discussion is only one interpretation of the relationship
|
|
between the RDF data modeling system and the original system of knowledge
|
|
management outlined in the WWW proposal.
|
|
Readers are encouraged to consult the <A
|
|
HREF="http://www.w3.org/History/1989/proposal.html">original WWW
|
|
proposal</A> before continuing, and to reach their own conclusions about
|
|
this perspective on RDF.
|
|
</P>
|
|
|
|
<P>
|
|
A few relevant excerpts from the WWW proposal are reproduced here for convenience.
|
|
</P>
|
|
|
|
<BLOCKQUOTE>
|
|
CERN is a wonderful organisation. It involves several thousand people,
|
|
many of them very creative, all working toward common goals.
|
|
Although they are nominally organised into a hierarchical management
|
|
structure,this does not constrain the way people will communicate,
|
|
and share information, equipment and software across groups.
|
|
</BLOCKQUOTE>
|
|
|
|
<BLOCKQUOTE>
|
|
|
|
The actual observed working structure of the organisation is a multiply
|
|
connected "web" whose interconnections evolve with time. In this
|
|
environment, a new person arriving, or someone taking on a new task, is
|
|
normally given a few hints as to who would be useful people to
|
|
talk to. Information about what facilities exist and how to find out about
|
|
them travels in the corridor gossip and occasional newsletters, and
|
|
the details about what is required to be done spread in a similar way. All
|
|
things considered, the result is remarkably successful, despite
|
|
occasional misunderstandings and duplicated effort.
|
|
</BLOCKQUOTE>
|
|
|
|
|
|
<BLOCKQUOTE>
|
|
A problem, however, is the high turnover of people. When two years is a
|
|
typical length of stay, information is constantly being lost. The
|
|
introduction of the new people demands a fair amount of their time and
|
|
that of others before they have any idea of what goes on. The
|
|
technical details of past projects are sometimes lost forever, or only
|
|
recovered after a detective investigation in an emergency. Often, the
|
|
information has been recorded, it just cannot be found.
|
|
</BLOCKQUOTE>
|
|
|
|
|
|
<P>
|
|
This scenario is a familiar one. The challenges faced by CERN in 1989 are
|
|
common to many companies and organizations in 1999. We now have widespread
|
|
access to Internet information sources, typically accessed via the World
|
|
Wide Web. However, the WWW has not yet provided a solution to the
|
|
challenges it was initially proposed to address.</P>
|
|
|
|
<P>
|
|
Word-of-mouth information
|
|
is supplemented by online information sources, but access to these is
|
|
still through relatively crude search systems. A common complaint about
|
|
the WWW is that the 'search engines' which provide most users with
|
|
information discovery facilities are somewhat crude.
|
|
|
|
Searching for
|
|
keywords and phrases amongst the Web
|
|
pages of a large company or organization, let along the <EM>entire</EM> Web,
|
|
will often result in a huge number of document being discovered. Often
|
|
these bear no obvious relationship to the information needs of the user.
|
|
</P>
|
|
|
|
<P>
|
|
The original WWW proposal suggested that it should be possible to pose
|
|
questions to an information management system and have them answered by a
|
|
mechanism that understands something of the complex
|
|
web of interelationships that exist between people, document,
|
|
organizations and other entities.
|
|
</P>
|
|
|
|
<H3>Ask the Web?</H3>
|
|
|
|
<P>
|
|
Currently, users search for data on the Web by asking questions that are
|
|
of the form: "which documents contain <EM>these</EM> words and
|
|
phrases?"
|
|
</P>
|
|
|
|
<P>
|
|
The Resource Description Framework (<A HREF="/RDF/">RDF</A>), following
|
|
the original WWW design, suggests that we can do better than this. What
|
|
questions might we want to ask the Web? A few were sketched in the WWW
|
|
proposal...
|
|
</P>
|
|
|
|
<BLOCKQUOTE>
|
|
|
|
<P>
|
|
The sort of information we are discussing answers, for example, questions
|
|
like
|
|
</P>
|
|
|
|
<UL>
|
|
<LI> Where is this module used?
|
|
<LI> Who wrote this code? Where does he work?
|
|
<LI> What documents exist about that concept?
|
|
<LI> Which laboratories are included in that project?
|
|
<LI> Which systems depend on this device?
|
|
<LI> What documents refer to this one?
|
|
</UL>
|
|
</BLOCKQUOTE>
|
|
|
|
<P>
|
|
With the exception of the last item on this wishlist ('which documents
|
|
refer to this one'), the current Web (or Web search engines) does not allow
|
|
such questions to be easily answered. There is however a close affinity between
|
|
the model recently adopted in RDF and the structures described (but
|
|
which were until recently unimplemented) in the WWW proposal. The WWW
|
|
proposal notes that 'Linked Information Systems' can be applied to this
|
|
set of problems...
|
|
</P>
|
|
|
|
<BLOCKQUOTE>
|
|
In providing a system for manipulating this sort of information, the hope
|
|
would be to allow a pool of information to develop which could
|
|
grow and evolve with the organisation and the projects it describes.
|
|
For this to be possible, the method of storage must not place its own
|
|
restraints on the information. This is why a "web" of notes with links
|
|
(like references) between them is far more useful than a fixed
|
|
hierarchical system. When describing a complex system, many people resort
|
|
to diagrams with circles and arrows. Circles and arrows leave
|
|
one free to describe the interrelationships between things in a way that
|
|
tables, for example, do not. The system we need is like a diagram of
|
|
circles and arrows, where circles and arrows can stand for anything.
|
|
</BLOCKQUOTE>
|
|
|
|
<P>
|
|
The proposal then goes on to describe a number of 'node' types and 'arrow'
|
|
types such as might be used to represent diagrammatically the entities and
|
|
relationships typical of a complex organisation such as CERN...
|
|
</P>
|
|
|
|
<BLOCKQUOTE>
|
|
We can call the circles nodes, and the arrows links. Suppose each node is
|
|
like a small note, summary article, or comment. I'm not over
|
|
concerned here with whether it has text or graphics or both. Ideally, it
|
|
represents or describes one particular person or object. Examples of
|
|
nodes can be:
|
|
</BLOCKQUOTE>
|
|
|
|
|
|
<BLOCKQUOTE>
|
|
<UL>
|
|
<LI> People </LI>
|
|
<LI> Software modules </LI>
|
|
<LI> Groups of people </LI>
|
|
<LI> Projects </LI>
|
|
<LI> Concepts </LI>
|
|
<LI> Documents </LI>
|
|
<LI> Types of hardware </LI>
|
|
<LI> Specific hardware objects </LI>
|
|
</UL>
|
|
</BLOCKQUOTE>
|
|
|
|
|
|
<P>
|
|
The proposal also lists a number of relationship types that might hold
|
|
between these various types of thing. For some pair of entities A and B,
|
|
they might stand in one of any number of relationships. It might be true
|
|
that 'A'...
|
|
</P>
|
|
|
|
<BLOCKQUOTE>
|
|
<UL>
|
|
<LI> depends on B
|
|
<LI> is part of B
|
|
<LI> made B
|
|
<LI> refers to B
|
|
<LI> uses B
|
|
<LI> is an example of B
|
|
</UL>
|
|
</BLOCKQUOTE>
|
|
|
|
<P>
|
|
In doing so, the WWW proposal makes an interesting claim: that the complex
|
|
mesh of information relating people, software, documents, concepts,
|
|
organizations and other types of stuff could be understood through a very
|
|
simple metaphor. The metaphor is that of a <EM>web</EM> of named
|
|
relationships connecting uniquely identified things. This is, and not through
|
|
coincidence, the exact same model for representing information as that
|
|
adopted in RDF.
|
|
</P>
|
|
|
|
<H3>Nodes and Arrows; Entities and Relationships</H3>
|
|
|
|
<P>
|
|
There are a number of different terminologies for talking about the same
|
|
broad family of approaches to information management. The WWW proposal uses the terminology
|
|
of 'node and arrow' diagrams, such as that reproduced above. Many in the
|
|
database and data modeling communities talk of 'entity - relationship'
|
|
modeling. RDF models are often represented graphically as 'node and arc'
|
|
diagrams. In RDF contexts we also talk about the entities represented by
|
|
nodes as 'Resources', and the relationships and attributes shown as
|
|
arcs/arrows are called 'Properties'.
|
|
</P>
|
|
|
|
<P>
|
|
Despite terminological differences, RDF can be seen as the eventual
|
|
formalization of this long-delayed component of the Web architecture. RDF
|
|
is the W3C's <A HREF="/Press/1999/RDF-REC">recommended</A> technology for
|
|
describing 'data about data', or metadata. The notion of 'data about data'
|
|
is somewhat confusing in a Web context. It is often useful to think about
|
|
RDF models as a form of 'self describing' data. To understand this, it is
|
|
important to appreciate the central role played by <EM>identifiers</EM> in
|
|
the Web architecture.
|
|
</P>
|
|
|
|
<BLOCKQUOTE>
|
|
"The Web works best when anything of value and identify is a first
|
|
class object. If something does not have a URI, you can't refer to it,
|
|
and the power of the Web is the less for that."<BR>
|
|
-- TimBL, Dec 1996<BR>
|
|
<A HREF="http://www.w3.org/DesignIssues/Axioms">http://www.w3.org/DesignIssues/Axioms</A>
|
|
</BLOCKQUOTE>
|
|
|
|
|
|
<P>
|
|
On the Web, everything is a considered to be a 'resource', ie. a thing
|
|
that can be identified, and through identification, be used. The vast 'nodes and
|
|
arrows' diagram that constitutes the current World Wide Web consists mostly of documents
|
|
connected by links whose type is relatively meaningless (the label is "href",
|
|
which merely means "links to"). With the development of RDF and XML, we
|
|
can anticipate a richer Web in which these nameable interrelationships are modelled in
|
|
RDF and written down in XML syntax using RDF and X-Link. </P>
|
|
|
|
<P>The Web model is for all online resources to have unique
|
|
identifiers. In addition, unique identifiers can be assigned to a variety
|
|
of non electronic resources. The URI specification defines a
|
|
convention for representing these identifiers as short textual
|
|
strings; social and legal conventions define policies for assigning these
|
|
identifiers to resources of all kinds: eg. documents, concepts and
|
|
countless other entities. The URI system, like the Web itself, is
|
|
designed to be extensible: as new ways of identifying objects (eg. DOIs,
|
|
URNs etc) are proposed, Web URIs can accomodate these.
|
|
</P>
|
|
|
|
<P>
|
|
The crucial point is that every individual node, every <EM>type of
|
|
node</EM>, and every <EM>type of arc</EM> in the 'nodes and arc' diagram be
|
|
uniquely identifiable. The WWW familiar to users in 1999 is built on this
|
|
principle: everything that exists to the Web is identified on the Web
|
|
using URI identifiers. For example, a mailbox is identified with a
|
|
'mailto:' identifier, web pages are typically identified using 'http:'
|
|
names. The power of the Web comes from this simple, almost trivial,
|
|
principle: that <EM> unique identification is extremely useful for
|
|
information management</EM>.
|
|
</P>
|
|
|
|
<H3>RDF: Metadata as self-describing data</H3>
|
|
|
|
<P>
|
|
RDF is about self describing data in the sense that the principle of
|
|
unique identification which underpins the Web is applied to the practice
|
|
of modeling information. Although the RDF model of 'nodes and arcs' is
|
|
almost unchanged from that outlined in the WWW proposal document, RDF
|
|
takes things much further. By combining the principle of unique
|
|
identification with the nodes and arrows representation system, we gain a
|
|
powerfully simple perspective on information management.
|
|
</P>
|
|
|
|
<P>
|
|
We say that RDF's information model is self-describing because both the
|
|
types of relationships (arrows, arcs) and the types of nodes that
|
|
we see in node and arrow diagrams are themselves considered 'first class'
|
|
objects, uniquely identifiable and therefore describable. We make the
|
|
building blocks of our data modeling system into identifiable things on
|
|
the Web by giving them URI names, so that different computer systems
|
|
across the world can each make unambiguous use of the same types of nodes
|
|
and arcs.
|
|
</P>
|
|
|
|
<P> For example, when two
|
|
objects are connected, as in the original diagram, by a 'wrote' arrow (eg. "Tim Berners Lee"
|
|
--wrote--> "This document" ), the relationship we call "wrote" is given
|
|
a Web identifier of its own. In 1999, we can use RDF and URIs to do this,
|
|
and the <A HREF="/XML/">XML</A> data format to interchange such
|
|
information between computers. The
|
|
<A HREF="http://purl.org/dc/">Dublin Core Metadata Initiative</A>, for
|
|
example, have defined a set of concepts such as
|
|
'Title', 'Creator', 'Description', 'Date'. So, instead of just writing the
|
|
simple label 'creator', RDF uses a Web
|
|
identifier: 'http://purl.org/dc/elements/1.0/Creator'. This gives us a
|
|
node on the Web which represents the relationship
|
|
'Creator' that holds between creative agents (persons, organizations) and
|
|
the works they create. Since we now have a URI for the notion of
|
|
'Creator', other communities can describe relationships between this and
|
|
other nodes in the Web.
|
|
</P>
|
|
|
|
<P>
|
|
Why is this self-describing? Since the notion of Creator here is just
|
|
another node or resource on the Web, RDF (ie. nodes and arrows) itself can be used to make
|
|
statements about that thing. We might want to annotate it with a label, or textual
|
|
description (in one or more natural languages). Or we might want to relate it to other resources. This is
|
|
exactly what we see in the original WWW diagram: the node drawn as
|
|
"Hypertext" is shown as having an "includes" arrow pointing to "Linked
|
|
Information". This is a representation of the notion of a Linked Information
|
|
Systems, such as the proposed WWW itself. A number of nodes are also drawn
|
|
representing <EM>examples</EM> (or instances) of linked information
|
|
systems, eg. ENQUIRE, Hypercard. Similarly, the node representing the
|
|
category of "Hierarchical Systems" (examples being GroupTalk, UUCP/News,
|
|
CERNDOC, VAX/Notes) is itself a "first class resource" in the diagram.
|
|
</P>
|
|
|
|
<H3>Asking the Web and RDF Query</H3>
|
|
|
|
<P>
|
|
So... assuming we compose node-and-arc views of our diverse information
|
|
systems, and assuming we give unique identifiers to everything that
|
|
matters to our information management needs, what does this buy us?
|
|
</P>
|
|
|
|
<P>
|
|
If we give unique identifiers (URIs) to...:
|
|
</P>
|
|
<UL>
|
|
<LI>types of thing (eg. the set of Hypertext systems, the set of People or organizations)</LI>
|
|
<LI>the relationships that stand between those things (eg. 'describes',
|
|
'wrote', 'includes', 'unifies'...)</LI>
|
|
<LI>particular examples of those types of thing (eg. CERNDOC, Hypercard)</LI>
|
|
</UL>
|
|
|
|
<P>
|
|
...then we have an RDF-ready information system. We can use the universal
|
|
syntax provided by XML to write down and exchange messages that contain
|
|
information can be interpreted according to
|
|
this model, and we can use the nodes-and-arcs model to provide a common
|
|
'interpretation strategy' for a wide range of information management
|
|
scenarios.
|
|
</P>
|
|
|
|
|
|
<H4>For example...</H4>
|
|
|
|
<P>
|
|
If we want to ask for the identifiers of all things that are 'information
|
|
systems' which are 'unified' by a system described by some named individual,
|
|
we could couch this as a query consisting of URI identifiers and 'question
|
|
marks' or variables.
|
|
</P>
|
|
|
|
<P>
|
|
For example (in a fictional syntax):
|
|
</P>
|
|
|
|
<P>
|
|
<CODE>
|
|
type(?X, InformationSystem), unifies(?X,?Y), describes(?Z,?X), wrote(?P,?Z).
|
|
</CODE>
|
|
</P>
|
|
<P>or</P>
|
|
<P>
|
|
<CODE>
|
|
type(?X, InformationSystem), unifies(?X,?Y), describes(?Z,?X), wrote("Tim Berners-Lee",?Z).
|
|
</CODE>
|
|
</P>
|
|
|
|
<P>
|
|
This is a computerish way of asking for groups of objects 'X','Y','Z','P',
|
|
where P wrote Z, Z describes X, X 'unifies' Y, and X is an Information
|
|
System. In our example from the original figure in the WWW proposal
|
|
this would find a number of scenarios where nodes could be found that
|
|
match this query. Written out in full, the answer to this query might look
|
|
something like the following.
|
|
</P>
|
|
|
|
<PRE>
|
|
<CODE>
|
|
X= A Proposal: Mesh
|
|
Y= ENQUIRE
|
|
Z= 'This document' (ie. http://www.w3.org/History/1989/proposal.html)
|
|
P= Tim Berners-Lee
|
|
|
|
X= A Proposal: Mesh
|
|
Y= CERNDOC
|
|
Z= 'This document' (ie. http://www.w3.org/History/1989/proposal.html)
|
|
P= Tim Berners-Lee
|
|
|
|
X= A Proposal: Mesh
|
|
Y= VAX/Notes
|
|
Z= 'This document' (ie. http://www.w3.org/History/1989/proposal.html)
|
|
P= Tim Berners-Lee
|
|
|
|
X= A Proposal: Mesh
|
|
Y= UUCP/News
|
|
Z= 'This document' (ie. http://www.w3.org/History/1989/proposal.html)
|
|
P= Tim Berners-Lee
|
|
</CODE>
|
|
</PRE>
|
|
|
|
<H3>Conclusions: Querying RDF information models</H3>
|
|
|
|
<P>
|
|
We have seen a few brief examples based around the <A
|
|
HREF="/History/1989/Image1.gif">image</A> included in the original WWW
|
|
proposal. The simple query presented here shows the way in which a
|
|
question might be asked of a system that is organised around the 'nodes
|
|
and arcs' model common to both the WWW proposal and RDF.
|
|
</P>
|
|
|
|
<P>
|
|
The RDF system does not yet include a specification for querying RDF
|
|
models. However, a number of <A HREF="/RDF/#sw">projects and
|
|
applications</A> exist that are exploring mechanisms for implementing RDF
|
|
query. Most of them take a similar form to the above scenario; the only
|
|
difference is that within the formal RDF model, URI identifiers must be
|
|
used to unambiguously identify each node and each relationship-type
|
|
(eg. 'creator' becomes 'http://purl.org/dc/elements/1.0/Creator'. In the
|
|
simple query example above, we abbreviate these URIs for increased
|
|
readability.
|
|
</P>
|
|
|
|
|
|
<ADDRESS>
|
|
<A HREF="mailto:danbri@w3.org">danbri@w3.org</A> November 1999
|
|
</ADDRESS>
|
|
</BODY>
|
|
</HTML>
|