You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
435 lines
20 KiB
435 lines
20 KiB
<html xmlns="http://www.w3.org/1999/xhtml">
|
|
<head>
|
|
<meta name="generator" content=
|
|
"HTML Tidy for Mac OS X (vers 31 October 2006 - Apple Inc. build 13), see www.w3.org" />
|
|
<meta http-equiv="Content-Type" content="text/html" />
|
|
<title>
|
|
The Name Myth -- Axioms of Web architecture
|
|
</title>
|
|
<link href="di.css" rel="stylesheet" type="text/css" />
|
|
</head>
|
|
<body bgcolor="#DDFFDD" text="#000000">
|
|
<address>
|
|
Tim Berners-Lee
|
|
<p>
|
|
Date: December 19, 1996
|
|
</p>
|
|
<p>
|
|
Status: personal view. Editing status: Italic text is
|
|
rough. Reques complete edit and possibly massaging, but
|
|
content is basically there.
|
|
</p>
|
|
</address>
|
|
<p>
|
|
<a href="Overview.html">Up to Design Issues</a>
|
|
</p>
|
|
<h3>
|
|
Axioms of Web Architecture: 2
|
|
</h3>
|
|
<hr />
|
|
<h1>
|
|
The Myth of Names and Addresses<br />
|
|
</h1>
|
|
<p>
|
|
The discussion above about the universality of URIs
|
|
(Universal Resource Identifiers) mentions briefly how URIs
|
|
are designed to encompass both things we think of as
|
|
addresses and those we think of as names. Much of the
|
|
discussion of this issue has been clouded by attempts to
|
|
distinguish names from addresses. The term "identifier" was
|
|
picked in an attempt to side-step this issue but
|
|
historically, that did not prevent a quagmire of circular
|
|
discussion which in some circles paralyzed any forward
|
|
progress. Therefore, in this section let me state the
|
|
philosophy which to my mind sets this problem in the right
|
|
light and should prevent further fruitless discussion.
|
|
<i><br /></i>
|
|
</p>
|
|
<p>
|
|
There is the commonly held belief that names and addresses
|
|
are different and distinct. We learn the importance of the
|
|
difference between identifiers in a programming language and
|
|
addresses within a computer memory. We learn the difference
|
|
in properties between fully qualified domain names on the
|
|
internet and internet protocol addresses. This can lead us
|
|
easily into imagining that there are two types of objects:
|
|
Names, which once attached to an object follow it for its
|
|
life wherever it should reside, and "addresses" which change
|
|
frequently whenever an object moves or is copied or
|
|
replicated from one "location" to another.
|
|
</p>
|
|
<p>
|
|
However, the only true location is a point in three
|
|
dimensional space, and within computer systems and especially
|
|
networked computer systems there is a very large number of
|
|
complex indirection between almost anything we would call a
|
|
name <i>or</i> an address and the actual physical location of
|
|
the memory cell which stores it. At one end of the spectrum a
|
|
computer memory address often is really an address within a
|
|
virtual memory space allocated to a particular project, and
|
|
when used is translated by the hardware into a physical
|
|
memory address, or for that matter into an address, into a
|
|
piece of memory which is being moved out into somewhere and
|
|
swapping the file on disk storage. Filenames are mapped
|
|
though mount tables and directory files into "inodes" which
|
|
are mapped onto track and sector locations. Internet protocol
|
|
addresses [IP Addresses] similarly are not bound absolutely
|
|
to a given computer: they can be re-allocated within the
|
|
constraints that because they are used for routing, there is
|
|
information connecting parts of the IP address with routing
|
|
information and so the computer corresponding to a given IP
|
|
address cannot be moved far in the routing structure. So, we
|
|
see that the constraint on how you can re-use an address is a
|
|
function of what information is in the address. When most
|
|
programs or people mention IP addresses, they simply quote
|
|
four decimal numbers, each between naught and 255 without
|
|
worrying about the internal structure. So, the information
|
|
within the IP address which prevents it being re-used in a
|
|
different area is to most people not explicit: It is, if you
|
|
like, hidden within there as the reason why IP addresses
|
|
can't be used. When we want to use something to refer to a
|
|
computer but still be able to move the computer or at least
|
|
the thing corresponding to that identification across from
|
|
one part of the internet to another, we use our domain name.
|
|
The domain name system, being completely independent of the
|
|
routing system, allows us to allocate any IP address at all
|
|
to a computer of a given domain name. Therefore, if we
|
|
believe the naming myth the domain name is a name and the IP
|
|
address is truly an address.
|
|
</p>
|
|
<h2>
|
|
<a name="Anecdotes" id="Anecdotes">Two anecdotes about names
|
|
and addresses</a>
|
|
</h2>
|
|
<p>
|
|
Two real-life anecdotes illustrate the dangers of making this
|
|
assumption. When there were only a few web servers and I kept
|
|
a registry of all those which I knew, I was contacted by a
|
|
group in Australia who were putting up a server with some
|
|
interesting botanical information. They sent me some details
|
|
of the server to be put into the list and they gave me the IP
|
|
address of the machine. My email reply explained that I
|
|
always prefer to refer to servers by their domain name rather
|
|
than their IP address and asked them for the domain name of
|
|
the server. They replied that the domain name they would use
|
|
would depend on the department within the university which
|
|
was responsible for maintaining the server but due to a
|
|
university re-organization, it was not at this point clear
|
|
which department that would be. However, they explained that
|
|
they could guarantee that the IP address of the server would
|
|
remain unchanged for a long time.
|
|
</p>
|
|
<p>
|
|
Several years later, the list of servers now abandoned as a
|
|
single list of all World Wide Web servers was among the
|
|
now-extensive web of information maintained on the server
|
|
known as info.cern.ch, the first World Wide Web server set up
|
|
at the start of the World Wide Web project. At this time the
|
|
responsibility for the coordination of World Wide Web
|
|
protocols was shifting from CERN to MIT/LCS and the embryonic
|
|
World Wide Web Consortium. For a while, CERN continued to
|
|
maintain the server, but later the master sources for that
|
|
information were maintained in America. Soon after this the
|
|
authorities at CERN requested that the name info.cern.ch
|
|
should no longer be used to refer to this information, as it
|
|
was no longer under control of CERN and they could no longer
|
|
assume responsibility for it. In fact, there was a policy
|
|
that names in the cern.ch domain should never be allowed to
|
|
refer to Internet addresses which were not physically on the
|
|
CERN site. Therefore all hypertext pointers into the
|
|
info.cern.ch space have had to be changed over the course of
|
|
time to point to the <code>w3.org</code> space.
|
|
</p>
|
|
<p>
|
|
These two examples show the "name" of objects having to be
|
|
changed even though the objects retained their essential
|
|
identity. The reason was in each case imbedded information in
|
|
the name: the domain name on the server contains authority
|
|
information about the maintainer of the computer whose
|
|
address corresponds to the domain name. If the authority for
|
|
an object changes, whether it "moves" on not, then there may
|
|
be a need to change its name under these circumstances. It
|
|
turns out that for almost any naming or addressing system in
|
|
which there is some information (other than random numbers or
|
|
dates of creation of the objects) built into the name that
|
|
the name might have to be changed when the facts
|
|
corresponding to that information change. Therefore it
|
|
becomes simply a matter of choice between naming or
|
|
addressing systems as to what sort of information you wish to
|
|
include implicitly or explicitly within your "name" or
|
|
"address".
|
|
</p>
|
|
<h2>
|
|
<a name="Why" id="Why">Why Names Change</a><br />
|
|
</h2>
|
|
<p>
|
|
<small>See also:</small>
|
|
</p>
|
|
<ul>
|
|
<li>
|
|
<small>In the Syyle Guide for Online Hypertext, <a href=
|
|
"../Provider/Style/Overview.html"><i>Cool URLs don't
|
|
change</i></a></small>
|
|
</li>
|
|
</ul>
|
|
<p>
|
|
It is worth looking at some of the reasons for names in
|
|
practical use to change or need to be changed. Some World
|
|
Wide Web servers have unwisely simply mapped the URL space
|
|
onto a Unix filename space, and the results of this,
|
|
especially in the early days, were URLs which might look like
|
|
this:
|
|
</p>
|
|
<p>
|
|
http://pegasus.cs.foo.edu/disk1/students/romeo/cool/latest/readthis.html
|
|
</p>
|
|
<p>
|
|
Looking at the segments of this name we can see as many
|
|
reasons for the name to need to be changed.
|
|
</p>
|
|
<p>
|
|
The "http:" will only be changed if the document is later
|
|
served up using a different protocol and, in fact, that is
|
|
probably one of the least likely pieces to change.
|
|
</p>
|
|
<p>
|
|
"Pegasus", the name of the computer, probably has a
|
|
significance within the university as a computer dedicated to
|
|
some particular tasks such as supporting personal student
|
|
activities, and maybe maintained by a particular department
|
|
or may even be a name from a project for which the computer
|
|
was originally put into use before it became shared with
|
|
general user space. So, "pegasus" will be changed whenever
|
|
the function of supporting this particular student's web
|
|
pages has to be shared with other functions.
|
|
</p>
|
|
<p>
|
|
"Cs" indicates the computer science department, so the
|
|
document is bound to the computer science department. It may
|
|
not be something which the computer science department has a
|
|
lot of interest in, and the student may well transfer his or
|
|
her interests to other departments in the future.
|
|
</p>
|
|
<p>
|
|
The name of the university "foo.edu" will probably last for a
|
|
good while, though whether the university wants to continue
|
|
to be associated with the document for more than two or three
|
|
years is questionable.
|
|
</p>
|
|
<p>
|
|
The next section of the path, "disk1", is clearly a mistake.
|
|
In fact, of course, disc1 is just a name which can be
|
|
attached to any physical disk, but by grouping together all
|
|
the students on a certain disk in this arbitrary way, one
|
|
makes a binding between all the documents which they create
|
|
which will have to be broken whenever the computer is
|
|
reorganized. In fact, the relocation tables which most
|
|
servers support allow much translation of names to take place
|
|
and make this sort of path quite unnecessary.
|
|
</p>
|
|
<p>
|
|
The next element identifies Romeo as a student which may
|
|
change even though he continues to study for the rest of his
|
|
life, and then the next path element "romeo" identifies the
|
|
author of the document. As in the case with CERN above, the
|
|
original author of a document may later not wish to keep
|
|
maintenance or responsibility for ongoing versions. For
|
|
example, the document may be submitted to an organization
|
|
which publishes it and formally takes over responsibility for
|
|
its upkeep; it may achieve a status of some kind as a
|
|
standard or an accepted thesis which causes its maintainers
|
|
to change. The original author may in fact deliberately
|
|
simply pass on authorship of the document to someone else. In
|
|
any of these cases the name would have to change, and all
|
|
references to that name would break.
|
|
</p>
|
|
<p>
|
|
The student himself has not been very wise with his choice of
|
|
path name. For many people, what is "cool" changes with time
|
|
and for most people what is "latest" changes with time.
|
|
</p>
|
|
<p>
|
|
Perhaps the unlikely to change piece of information in the
|
|
URL "readthis" as it contains no information at all, just
|
|
like the proverbial "click here". Effectively, it is a random
|
|
name assigned to the document and as such, is perhaps the
|
|
safest part of the path.
|
|
</p>
|
|
<p>
|
|
The last element of the path, "html" is not strictly
|
|
necessary with most servers, as at least some servers will,
|
|
given a URL of "readthis" , serve up the data
|
|
from a file which is called "readthis.html". Here the student
|
|
is making it difficult for himself later to change the format
|
|
or formats in which the file is available, without at least
|
|
some confusion. Suppose, for example, that he later decides
|
|
that the information is worth providing in audio format for
|
|
blind readers. The CERN server can easily be configured so
|
|
that clients specifically requesting audio formats in
|
|
preference to HTML can be served as preferentially whereas
|
|
more normal clients will get the HTML. So, here again is a
|
|
part of the path which may be later regretted.
|
|
</p>
|
|
<p>
|
|
You can play this game with almost any name and address in
|
|
any system, and it is interesting to ask yourself in each
|
|
case: to what extent do I call this a "name" and to what
|
|
extend do I call it an "address"? So, in conclusion we see
|
|
that any information explicitly owned or implicitly included
|
|
in a name is a threat to its longevity. We see
|
|
that the difference between a "name" and an "address"
|
|
is not so fundamental. That is why
|
|
</p>
|
|
<table border="1" cellpadding="2">
|
|
<tbody>
|
|
<tr>
|
|
<td>
|
|
When a new URI scheme is defined, the specification
|
|
defining ity should describe the name-like and
|
|
address-like properties of URIs in the new scheme, so
|
|
that that those using them can know what to be able to
|
|
expect.
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
<h2>
|
|
<a name="What" id="What">What's in a name?</a><br />
|
|
</h2>
|
|
<p>
|
|
Why is information included then? Generally, the information
|
|
is included because in order to discover anything about the
|
|
name, one has to "dereference" the name. Typically this uses
|
|
some official or unofficial set of indexes distributed or
|
|
otherwise to look up the name. Many names are hierarchical in
|
|
the authority which allocates them. DNS names are a good
|
|
example. Road names within towns are another good example.
|
|
Therefore to find out where the new "North Street" is located
|
|
in small town one goes to the town for the definitive answer.
|
|
For information as to where the server "pegasus.cs.foo.edu"
|
|
is, one must send a message directly or indirectly to a
|
|
server controlled by the Foo University.
|
|
</p>
|
|
<p>
|
|
Is it possible to omit all such information from a name?
|
|
Certainly. Message identifiers in mail have only the need to
|
|
be unique. So, whereas hierarchical names and time stamps may
|
|
be used to help make such identifiers unique, you cannot
|
|
dereference the names at all. Perhaps we should call these
|
|
"identifiers" rather than "names". Within a certain context,
|
|
it is extremely useful to be able to refer to a mail message
|
|
by its mail identifier. We say that these identifiers support
|
|
the notion of equality: even though they cannot be
|
|
dereferenced, you can test two mail messages to find out
|
|
whether they are in fact the same simply by testing their
|
|
identifiers. You can also within a finite set of mail
|
|
messages look up a message of a given identifier. You just
|
|
can't do this on a global scale. So this then is the essence
|
|
of the naming problem:<br />
|
|
</p>
|
|
<table border="1" cellpadding="2">
|
|
<tbody>
|
|
<tr>
|
|
<td>
|
|
The naming problem: if you put information in a name,
|
|
it decreases its longevity; if you don't you can't
|
|
dereference it to a resource.
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
<h3>
|
|
<a name="social" id="social">Naming: A social and contracual
|
|
Issue</a>
|
|
</h3>
|
|
<p>
|
|
Many, many solutions to the naming problem have been
|
|
attempted and successfully deployed in different
|
|
circumstances. At one end of the scale, it would be in fact
|
|
possible using a huge network of hash tables around the
|
|
world, to keep a hash index of all randomly generated unique
|
|
names. The problem with this idea is that there would have to
|
|
be one single funding model and one homogeneous quality of
|
|
service for all names. There would be no way to pay more for
|
|
a more persistent name.
|
|
</p>
|
|
<p>
|
|
At the other end of the scale, hierarchical systems such as
|
|
the domain name system, and the x500 name system, have been
|
|
implemented. Suppose one wants to use a name which can be
|
|
dereferenced and therefore must put some information in it.
|
|
That information will lead us to some authority or some root
|
|
to dereferencing the name. How can we maintain the lifetime
|
|
of that name as something which can be dereferenced? The only
|
|
way is that we have a contract with all the agencies which
|
|
are involved in supporting the systems which dereference that
|
|
name that they should continue their operation giving a
|
|
certain quality of service for a certain period of time.
|
|
</p>
|
|
<p>
|
|
Suppose the Foo Alumni Association ran a URL service in which
|
|
a special name such as
|
|
"http://alumni.foo.edu/1998/romeo/202-aab" would be available
|
|
to any graduating paying their dues, and maintained
|
|
indefinitely (perpetual care) on receipt of a suitable
|
|
endowment.
|
|
</p>
|
|
<p>
|
|
Of course, as organizations disolve and mutate, there is
|
|
nothing to stop one organization from taking over the support
|
|
of the archives another. Forthis purpose, it
|
|
would be very useful to have a syntax for putting a date into
|
|
a domain name. This would allow a system to find an
|
|
archive server. Imaging that, failing to find
|
|
"info.cern.ch", one could search back and find an entry
|
|
"info.cern.ch.1994" which pointed to www.w3.org as a current
|
|
server holding archive information for info.cern.ch as it was
|
|
in 1994, with, of course, pointers to newer versions of
|
|
the documents.
|
|
</p>
|
|
<h3>
|
|
<a name="QoS" id="QoS">Quality of Service</a>
|
|
</h3>
|
|
<p>
|
|
Looking at an "http:" URL, while some look more sensible than
|
|
others, it is not immediately evident whether great pains are
|
|
being taken to make the name very persistent. We have
|
|
just discussed such a range of reasons why names can change,
|
|
and clearly the social and contractual arrangements can be
|
|
quite involved, so it is clearly difficult to simply define a
|
|
quality of service for naming. However, defining some
|
|
well known quality of service levels would be a very useful
|
|
task. This is the sort of task ideally suited to a group of
|
|
trechnologies, librraians or archivists.
|
|
</p>
|
|
<p>
|
|
In any event, for identifiers in the http space and
|
|
many others, it would be useful to be able to assert what the
|
|
quality of service is. This is information about a URI and a
|
|
resource. Like the <a href=
|
|
"Generic.html#Dimensions">information about generic URIs</a>,
|
|
it is about the sort of identity between the URI and the
|
|
resource.
|
|
</p>
|
|
<table border="1" cellpadding="2">
|
|
<tbody>
|
|
<tr>
|
|
<td>
|
|
Metadata should be used to express the quality of
|
|
service for the binding between a URI and a resource.
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
<h2>
|
|
<i><br /></i>
|
|
</h2>
|
|
<hr />
|
|
<p>
|
|
<a href="Metadata.html">Next: Metadata architecture</a>
|
|
</p>
|
|
<p>
|
|
<a href="Overview.html">Up to Design Issues</a>
|
|
</p>
|
|
</body>
|
|
</html>
|