You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
566 lines
25 KiB
566 lines
25 KiB
<html xmlns="http://www.w3.org/1999/xhtml">
|
|
<head>
|
|
<meta name="generator" content=
|
|
"HTML Tidy for Mac OS X (vers 31 October 2006 - Apple Inc. build 13), see www.w3.org" />
|
|
<title>
|
|
The meaning of a document -- Axioms of Web architecture
|
|
</title>
|
|
<link rel="Stylesheet" href="di.css" type="text/css" />
|
|
<meta http-equiv="Content-Type" content=
|
|
"text/html; charset=us-ascii" />
|
|
</head>
|
|
<body bgcolor="#DDFFDD" text="#000000" lang="en" xml:lang="en">
|
|
<address>
|
|
Tim Berners-Lee<br />
|
|
Date: 1999, last change: $Date: 2009/08/27 21:38:08 $<br />
|
|
Status: personal view only. Editing status: first draft.
|
|
<em>Written partly when the Namespace argument came around
|
|
again and I realized that where there</em>
|
|
</address>
|
|
<p>
|
|
<a href="./">Up to Design Issues</a>
|
|
</p>
|
|
<h3>
|
|
Axioms of Web Architecture: the meaning of a document
|
|
</h3>
|
|
<p>
|
|
<em>Abstract: The meaning of a document is then the product
|
|
of some text in some language) and the meaning of the
|
|
language. The text is found in a document and the language
|
|
defined in a document called a schema.</em>
|
|
</p>
|
|
<hr />
|
|
<h1>
|
|
Meaning
|
|
</h1>
|
|
<p>
|
|
<em>Grounding the meaning of a document in URI space.</em>
|
|
</p>
|
|
<p>
|
|
What is the meaning of a document?
|
|
</p>
|
|
<p>
|
|
The meaning of a document on the Web can be defined more
|
|
precisely than an arbitrary paper document. Because we have
|
|
the benefit of a global namespace (URIs), things become
|
|
possible which were not before. One example is global
|
|
hypertext; another is the rigid (though rarely absolute)
|
|
specification of meaning. Just as a hypertext document can
|
|
now exactly point to another document when it makes a
|
|
reference (instead of making some vague natural language
|
|
reference to it), so can a formal document make a precise
|
|
reference to the language it uses.
|
|
</p>
|
|
<p>
|
|
A writer of a document uses the language to convey his intent
|
|
to the reader. It is essential that the intent of the writer
|
|
can be well defined for both parties and in general for a
|
|
third party.
|
|
</p>
|
|
<p>
|
|
The "<dfn>language</dfn>" here I means the set of symbols,
|
|
the syntactic rules which constrain their combination, and
|
|
some semantics which are conveyed by defining their
|
|
interpretation in one or more other formal language, or in
|
|
some natural language.
|
|
</p>
|
|
<table border="5">
|
|
<tbody>
|
|
<tr>
|
|
<td>
|
|
The meaning of a document is then the product of the
|
|
text of the document (in some language) and the meaning
|
|
of the language.
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
<p>
|
|
On the Web, <a href="Axioms.html#Universality2">important
|
|
things are identified by URIs</a>. This should clearly apply
|
|
both to the document itself and to the language. The party
|
|
which defines what a URI refers to I call the publisher, or
|
|
owner of the URI. HTTP allows a delegated system of authority
|
|
for ownership (DNS) to define ownership of URIs, and it also
|
|
provides a network protocol to retrieve documents
|
|
representing that identified by the URI. The text a document
|
|
is defined by its publisher and the meaning of the language
|
|
is defined by the publisher of the language.
|
|
</p>
|
|
<p>
|
|
Natural languages are constantly evolving and rather vague,
|
|
in that no one (except <em>Scrabble</em> players) use a
|
|
particular dictionary as a definitive set of meanings. In
|
|
practice, the meaning of a word in a natural language is the
|
|
sum of the associations of that word -- logical or poetic --
|
|
in the mind of the reader or writer. Of course society works
|
|
on the basis of a very strong similarity of the webs of
|
|
association in different people's minds.
|
|
</p>
|
|
<p>
|
|
In the semantic web, however, meaning is not vague: the idea
|
|
is that languages must be defined formally and as precisely
|
|
as possible. The semantic web consists of some "terminal"
|
|
languages which are defined solely in natural language terms,
|
|
and some languages for which there are machine-readable
|
|
interpretations into other formal languages. Whereas programs
|
|
processing documents in the first sort of language will
|
|
typically have to be hand coded, documents in the second set
|
|
may be processed automatically to convert them into languages
|
|
in the first set.
|
|
</p>
|
|
<p>
|
|
URIs can be of various sorts, with various properties
|
|
depending on their scheme (and, for http URIs, the
|
|
publisher), but some URIs can be dereferenced to a definitive
|
|
document. The document resulting from dereferencing the URI
|
|
for a language is a place where the publisher of the language
|
|
can put definitive information about the meaning of a
|
|
language.
|
|
</p>
|
|
<h3>
|
|
<a name="Language1" id="Language1">Language and document
|
|
subsets</a>
|
|
</h3>
|
|
<p>
|
|
As languages evolve, there can be many languages which are
|
|
similar. "Similarity" doesn't mean much, but something which
|
|
is well defined is when a document in one language A can be
|
|
treated precisely as though it had been in another language
|
|
B.
|
|
</p>
|
|
<h3>
|
|
<a name="Meaning1" id="Meaning1">Meaning in XML</a>
|
|
</h3>
|
|
<p>
|
|
In XML, a language is a "namespace", and the document about
|
|
the language is called a "schema". In XML, one document can
|
|
contain a mixture of languages, and so the schema if written
|
|
in XML may contain information about syntactic constraints
|
|
(in XML-schema language) and/or RDF properties (in rdf-schema
|
|
language), or any combination of the above. (<a href=
|
|
"#Language">note</a>)
|
|
</p>
|
|
<p>
|
|
XML puts no constraints on a language apart from syntactic
|
|
structure. There is not (without RDF and logic or some other
|
|
higher level) any overall framework into which new languages
|
|
can be introduced. So, the question of <strong>what an XML
|
|
document means depends</strong> first upon the fully
|
|
qualified name of the <strong>document element</strong>. No
|
|
semantics can be attached to any of its descendents in the
|
|
document tree except in as much as is defined by the
|
|
specification of that element type in that namespace. One
|
|
cannot talk about the "meaning" of a subtree of a document
|
|
without understanding the semantics of the language. In fact,
|
|
because languages only necessarily define meaning for
|
|
documents, the only way one can talk about the meaning of a
|
|
subset of a document is to define a how those parts of the
|
|
document can be reassembled into a second whole document.
|
|
This is what must be done when a digital signature is applied
|
|
to a document.
|
|
</p>
|
|
<h3>
|
|
<a name="Meaning" id="Meaning">The Meaning of Digital
|
|
Signature</a>
|
|
</h3>
|
|
<p>
|
|
The language defines semantics. On the simple philosophy that
|
|
one place is enough, It is not the place of a digital
|
|
signature to define semantics. A digital signature on a
|
|
document may give a party reason to use the information
|
|
therein for purposes it would not have otherwise. The issuer
|
|
of a public key may also put constraints on what sort of
|
|
guarantees are made by signature with a given key. But the
|
|
signature itself must not affect the semantics - the meaning
|
|
- of a document. To allow it to would be to create an
|
|
inconsistency between the intent of the writer of the
|
|
original document and the meaning of the signed document. So,
|
|
signatures themselves have no meaning. The meaning has to be
|
|
ascibed to them by other documents. For example, I may say,
|
|
"If an organization is a member of W3C according to a
|
|
document signed with this key, then that organzation is
|
|
indeed a member". That is a trust statement which gives the
|
|
key a connection into the world of meaning of documents.
|
|
</p>
|
|
<h3>
|
|
<a name="Style" id="Style">Style as meaning</a>
|
|
</h3>
|
|
<p>
|
|
(Although few people would think of presentation style of a
|
|
document as its "meaning", and many of us spend a lot of time
|
|
emphasising the difference between style and content and
|
|
semantics, in fact much of what applies to style applies to
|
|
semantics. Therefore the "meaning in terms of presentation"
|
|
is a good test case for the architecture of the system. (For
|
|
many documentation systems, the only semantics required is
|
|
"H2 means a big bold block on the left"!) Style sheets
|
|
provide an "interpretation"of a document by mapping it onto
|
|
another well-defined language of formatting properties. The
|
|
style sheet language gives a good definition (in English) of
|
|
what is needed. This is an interesting comparison, and I
|
|
mention it as a place where architectural conssistency should
|
|
be maintained, but it isn't what I normally mean by
|
|
"meaning".)
|
|
</p>
|
|
<h3>
|
|
<a name="Logical" id="Logical">Logical meaning</a>
|
|
</h3>
|
|
<p>
|
|
When XML is used to encode logic, then a document is a
|
|
formula and the (see <a href="Logic.html">Logic on the
|
|
web</a>). Then, the way new predicates and constants interact
|
|
is defined by the logic. The way fundamental new parts of the
|
|
language (such as quantification) are added is part of a more
|
|
general question of how arbitrary languages interact.
|
|
Examples we have seen are the mixing of XHTML and XSL. What
|
|
is the result - XHTML or XSL? A document or a style sheet?
|
|
Both?
|
|
</p>
|
|
<h3>
|
|
<a name="Mixing" id="Mixing">Mixing Languages</a>
|
|
</h3>
|
|
<p>
|
|
XML puts no contarints on a language apart from syntactic
|
|
structure. There is not (without for example RDF and logic)
|
|
some overall framework into which new languages can be
|
|
introduced. This means that every language has to define how
|
|
it canbe extended by mixing with other languages. Typically
|
|
it will indicate the element types which can be subclassed by
|
|
extensions and therefore incorporated into documents wherever
|
|
that element type is allowed.
|
|
</p>
|
|
<p>
|
|
One particular example of such a type is common to almost all
|
|
languages. This is the sentence, the fully qualified
|
|
assertion or statement, the formula with no free variables.
|
|
Almost all whole documents count as such, though an
|
|
interesting counterexample is a style sheet which represent a
|
|
function: it specified the result document as a functin of an
|
|
input document, and so itself cannot be said to be a
|
|
stand-alone statement. (If I sent you a message consisting
|
|
only af a stylesheet with no coverletter, what would it
|
|
signify? What would it mean if I digitally signed it?)
|
|
</p>
|
|
<p>
|
|
With that exception, it clearly makes sense to allow any
|
|
language which has the concept of a sentence -- maybe any
|
|
language at all - to allow sentences from other languages to
|
|
be included anywhere where a sentence of its own could go.
|
|
<strong>This should be a generic feature of XML
|
|
schemas</strong>.
|
|
</p>
|
|
<p>
|
|
(It is would be against the minimalist principle for XML
|
|
generically to define other common subclasses. Note that the
|
|
RDF spec does define properties and node types and the
|
|
concept of subclassing in RDF. HTML defines things like block
|
|
and inline elements, which can be subclassed in extensions;
|
|
SVG and SMIL probably define similar concepts. The
|
|
significance of this when looking at downloaded support code
|
|
would be that, for example, in a set of Java classes
|
|
implementing HTML, that any subclass of "Inline element"
|
|
would export the same software API to allow it to be
|
|
justified and line wrapped in a text flow object. So there is
|
|
a natural correspondence between element type subclassing and
|
|
support class subclassing, but the tow must remain distinct.
|
|
Language specifications must always define what a language
|
|
means without refering to implementations if they can
|
|
possibly avoid it)
|
|
</p>
|
|
<p>
|
|
Note that without the assurances given by such information
|
|
you cannot just go around embedding one language in another.
|
|
Every language has to address the issue which the concept of
|
|
RDF transparency potentially solves for RDF. A surrounding
|
|
XML context must have the ability to quote, deny, negate or
|
|
whatever any element. In fact, nothing in XML says that the
|
|
menaing of a fragment is not affected by thing anywhere else
|
|
in a document. Nothing suggests that the process of removing
|
|
sub-trees creates a valid document. (How does xml fragment
|
|
deal with this?)
|
|
</p>
|
|
<h3>
|
|
<a name="Grounded" id="Grounded">Grounded documents</a>
|
|
</h3>
|
|
<p>
|
|
We can say a document is "grounded" if its meaning is
|
|
completely defined because every term used is explicitly,
|
|
directly or indirectly, an explicit direct or indirect
|
|
referece to its definition in a document on the Web. Clearly
|
|
a definition of "grounding" depends on the set of documents
|
|
one considers acceptable definitions. "Grounded in W3C
|
|
Recommendations" would imply that the closure under [i.e. set
|
|
of all the things you can possibly end up with by repeated
|
|
applications of] the operation of looking up definitions
|
|
would be a subset of the set of W3C recommendations.
|
|
</p>
|
|
<p>
|
|
This is the basis for the entire web and internet
|
|
architecture stack today. (See also: <a href=
|
|
"Stack.html">Stack</a>) . All commercial use on the web is
|
|
largely to be considered in this light, that the meaning of
|
|
each messaeg sent across the Internet is well-defined by a
|
|
series of specifications.
|
|
</p>
|
|
<p>
|
|
(A sense of grounding also can be appliyed seperately to
|
|
different sorts of "understanding". When "understanding"
|
|
means presentation to a human for human understanding, a
|
|
presentation-grounded documents points to all information
|
|
such as schemata and style sheets which will enable it to be
|
|
presented.)
|
|
</p>
|
|
<h3>
|
|
Grounding as a myth: the Web of Meaning
|
|
</h3>
|
|
<p>
|
|
The concept of grounded documents is important for
|
|
predicatble systems, but it is a bad model for the web -- or
|
|
for life -- in the long run. Words in a <em>natural</em>
|
|
langauge such as English are not grounded in a unique base
|
|
set<a href="#Grounding">*</a>. Every time you look one up in
|
|
the dictionary all you find are more words. The world is
|
|
web-like, and any attempt by the Web to constrain it to be
|
|
tree-like is bound to force a misrepresentation of realtity.
|
|
This is the Wittgenstein view of meaning. Understanding this
|
|
view sometimes confuses people about the very systematic way
|
|
in which meaning in Internet protocols is defined by layers
|
|
and layers of specs.
|
|
</p>
|
|
<p>
|
|
In fact, the two views both apply, one nested inside the
|
|
other. Yes, meaning is use - but in the Internet protocols,
|
|
society has set up social constraints - laws and other
|
|
expectations - which constrain use to be according to the
|
|
specs. This is a social constraint which your computer is
|
|
under when you use the Internet, just as when you fill out a
|
|
tax form you don't have a choice as to how to interpret the
|
|
meaning of "Adjusted Gross Income on line 39 of a US IRS form
|
|
1040". There is a whole department of the government which
|
|
defines what it is and which socially owns the term. So while
|
|
the
|
|
</p>
|
|
<p>
|
|
What will change with the Semantic Web's development is that
|
|
its grounding in legacy systems will fade into history. Right
|
|
now, the meaning of "Invoice total vale" is effectively
|
|
defined by the software which you plug your RDF document
|
|
into, and how it treats invoices. This is an important way to
|
|
bootstrap the semantic web with useful terms. That will
|
|
become less important as many different software poducts
|
|
share teh same term. In the end, it is weblike form which
|
|
will characterize the semantic web. Everyone will be defining
|
|
things in terms of other things which they feel are useful
|
|
and stable enough. It will be impossible to insist that there
|
|
be a global ordering between more basic and less basic
|
|
specifications -- and to do so would stop the web scaling. No
|
|
one will agree on a directed <em>acyclic</em> graph
|
|
determining what terms are "more basic" than others. For any
|
|
set of definitions in one direction, there can always be some
|
|
reverse definitions which can be seen by others as just as
|
|
valid.
|
|
</p>
|
|
<p>
|
|
So, while the concept of documents grounded in a given base
|
|
set is important for interoperability, it must not be seen as
|
|
a goal to force the semantic web into an acyclic structure.
|
|
There will be no single Dewy decimal system for the semantic
|
|
web. The concepts of well-defined stable specifications will
|
|
still be essential. So will respect for the definitions of
|
|
terms. The difference will be that any one will chose their
|
|
own set of langauges they consider "basic", and find ways of
|
|
defining other languages they come across in terms of those.
|
|
A rich web of conversions, translations will grow up to
|
|
support this. The web of trust will provdie tools for
|
|
navigating within and selecting from this web in a safe way.
|
|
And of course, global standarsdw il wlways make like much
|
|
easier where they can be made.
|
|
</p>
|
|
<h3>
|
|
FAQ: Surely meaning is only defined by use?
|
|
</h3>
|
|
<p>
|
|
<em>This is all very well</em>, runs a popular line,
|
|
<em>except that to talk about "meaning" at all is basically
|
|
bogus</em>. <em>The meaning of words, and therefore
|
|
languages, is defined by use - by how people actuall respond
|
|
to them, by how they are processed. Surely the only way I can
|
|
guarantee that someone will interpret a document in a
|
|
particular way is to have some out-of-band agreement with
|
|
them first?</em>
|
|
</p>
|
|
<p>
|
|
Philosophically, it is indeed the case that you need some
|
|
out-of-band (not in the message itself) agreement. In real
|
|
life, though, in fact there a lot of widely-held agreements.
|
|
In fact, the law is a set of agreements which you are deemed
|
|
to accept whether you formally agree or not. So when you are
|
|
sent a tax form, you can't argue that the language of the tax
|
|
form is not one you interpret in that way. they just stick
|
|
you in jail.
|
|
</p>
|
|
<p>
|
|
The web works like one big agreement. By connecting your
|
|
computer to it and getting email from POP and IMAP ports,
|
|
there is an understanding that what you get are MIME
|
|
messages, and the same thing when you pick up web page using
|
|
HTTP. So by using the web you are entering a world where the
|
|
assumption can be made that messages are to be interpreted by
|
|
a set of specifications. the specifications are (currently)
|
|
generally written in english, and imperfect, but basically
|
|
debate about them is practically about details, not aboutteh
|
|
philosophy as to whether they apply. So that is why one can
|
|
in practice talk about meaning.
|
|
</p>
|
|
<h3>
|
|
FAQ: Doesn't the meaning of a document depend on its context?
|
|
</h3>
|
|
<p>
|
|
Of course it does. If i exclose a phtocopy of a document as
|
|
an attachment, it doesn't mean I am sending you that letter.
|
|
</p>
|
|
<p>
|
|
However, theer are a lot of contexts for a document which
|
|
have the same implication for the meaning of that document.
|
|
Publication, by email to a public list, or HTTP, or FTP, or
|
|
printing on paper and nailing to a tree, in each case leaves
|
|
the meaning of a document defined in the same way. These
|
|
contexts, in which a document is published by a party, or a
|
|
message converyed from one party to another, are so common
|
|
and basic that the meaning of the document in these contexts
|
|
is referred to simply as the meaning of the document (or
|
|
message).
|
|
</p>
|
|
<p>
|
|
The webarchitecture separately enumerates the ways in which
|
|
these contexts actually work under he hood (publication using
|
|
HTTP, etc) and teh way documents are interpreted and dealt
|
|
with once published. That way, XML langauegs don't ahve to
|
|
keep referring to "meaning when received with a 200 code in
|
|
HTTP".
|
|
</p>
|
|
<hr />
|
|
<h2>
|
|
See also
|
|
</h2>
|
|
<ul>
|
|
<li>
|
|
<a href="Metadata.html#Self-descr">Self-describing
|
|
information in "Metadata"</a>
|
|
</li>
|
|
<li>
|
|
<a href="Evolution.html">Evolvability</a>
|
|
</li>
|
|
</ul>
|
|
<h2>
|
|
Footnotes
|
|
</h2>
|
|
<h3>
|
|
<a name="Name-less" id="Name-less">Name-less and Address-less
|
|
systems</a>
|
|
</h3>
|
|
<p>
|
|
(Technically, it is possible to create a network with
|
|
"source-based routing" in which everything whether server or
|
|
document is identified by an md5 checksum or other random
|
|
unique ID, and network nodes learn to send packets with full
|
|
routing instructions. This is a little like the old email
|
|
addresses which specified a routing path like
|
|
timbl@cernvax!mcvax!mitmail!whatever. The process of
|
|
hypertext link involves the client A contacting the server B
|
|
of the source document of the link and finding the path which
|
|
B had stored as a way to get from B to the server C of the
|
|
link's destination document. Then the client A can contact C
|
|
first through the root ABC but then from local information
|
|
and information from B and C can maybe derive a more
|
|
efficient route AC. Such a system has different scaling
|
|
properties as a subset of teh information about the network
|
|
must reside in the network hosts rather than in the routers.
|
|
Its efficeny and scaling properties rely on features of the
|
|
topology of the web such as locality of reference.)
|
|
</p>
|
|
<h3>
|
|
<a name="Language" id="Language">Language identity crisis in
|
|
XML</a>
|
|
</h3>
|
|
<p>
|
|
(There is currently (1999/9) much debate in the XML world
|
|
over exactly what defines a language, the proposed answers
|
|
ranging though: the publisher of the namespace including any
|
|
information in the definitive schema; a separate note of a
|
|
schema; a schema plus a different namsepcae URI document plus
|
|
a version plus an HTML profile; and "nothing". If this debate
|
|
resolves itself such that athe identity of a language is not
|
|
clearly defined. In that case the XML namespace mechanism may
|
|
prove an insufficiently firm foundation for the semantic web,
|
|
or any application of data on the web.)
|
|
</p>
|
|
<h3 id="Grounding">
|
|
Grounding of words in English
|
|
</h3>
|
|
<p>
|
|
(Distracton: Is there a set of english words in the OED
|
|
which, if understood, allow one to understand any definition
|
|
by sufficient recursive dereferencing?)
|
|
</p>
|
|
<h3>
|
|
References:
|
|
</h3>
|
|
<p>
|
|
DNS mess: Weaving the Web p126, etc.
|
|
</p>
|
|
<p>
|
|
<a href=
|
|
"http://www.ietf.org/mail-archive/ietf-announce/msg05299.html">
|
|
Carpenter, Brian, et. al , "IAB Technical Comment on the
|
|
Unique DNS Root", IETF-announce, 1999/9/27.</a>
|
|
</p>
|
|
<h2>
|
|
Fodder
|
|
</h2>
|
|
<p>
|
|
[@@ Dan's quote (Ted N?) about all things being hopelesly?
|
|
intertwigled@@ :-) .. maybe some Bhuddist quotation about
|
|
interconnectedness...]
|
|
</p>
|
|
<p>
|
|
"I'm very glad you asked me that, Mrs Rawlinson. The term
|
|
`holistic' refers to my conviction that what we are concerned
|
|
with here is the fundamental interconnectedness of all
|
|
things. I do not concern myself with such petty things as
|
|
fingerprint powder, telltale pieces of pocket fluff and inane
|
|
footprints. I see the solution to each problem as being
|
|
detectable in the pattern and web of the whole. The
|
|
connections between causes and effects are often much more
|
|
subtle and complex than we with our rough and ready
|
|
understanding of the physical world might naturally suppose,
|
|
Mrs Rawlinson. Let me give you an example. If you go to an
|
|
acupuncturist with toothache he sticks a needle instead into
|
|
your thigh. Do you know why he does that, Mrs Rawlinson? No,
|
|
neither do I, Mrs Rawlinson, but we intend to find out. A
|
|
pleasure talking to you, Mrs Rawlinson. Goodbye." -- Douglas
|
|
Adams, _Dirk Gentley's Holistic Detective Agency
|
|
</p>
|
|
<p>
|
|
<a href="http://www.xent.com/nov99/0596.html">quoted in
|
|
Fork</a>
|
|
</p>
|
|
<p>
|
|
@@ Statistiscs from OED
|
|
</p>
|
|
<p>
|
|
<a href=
|
|
"http://www.eastgate.com/ht99/slides/Welcome.htm">Mark
|
|
Bernstein, "Everything is intertwingled"</a>.Opening Keynote,
|
|
Hypertext '99, Darmstadt, Germany. February 23, 1999.
|
|
</p>
|
|
<hr />
|
|
<p>
|
|
<a href="Overview.html">Up to Design Issues</a>
|
|
</p>
|
|
<p>
|
|
<a href="../People/Berners-Lee">Tim BL</a>
|
|
</p>
|
|
</body>
|
|
</html>
|