server_playground/doc/www.w3.org/DesignIssues/NameMyth.html


								<html xmlns="http://www.w3.org/1999/xhtml">

								  <head>

								    <meta name="generator" content=

								    "HTML Tidy for Mac OS X (vers 31 October 2006 - Apple Inc. build 13), see www.w3.org" />

								    <meta http-equiv="Content-Type" content="text/html" />

								    <title>

								      The Name Myth -- Axioms of Web architecture

								    </title>

								    <link href="di.css" rel="stylesheet" type="text/css" />

								  </head>

								  <body bgcolor="#DDFFDD" text="#000000">

								    <address>

								      Tim Berners-Lee

								      <p>

								        Date: December 19, 1996

								      </p>

								      <p>

								        Status: personal view. Editing status: Italic text is

								        rough. Reques complete edit and possibly massaging, but

								        content is basically there.

								      </p>

								    </address>

								    <p>

								      <a href="Overview.html">Up to Design Issues</a>

								    </p>

								    <h3>

								      Axioms of Web Architecture: 2

								    </h3>

								    <hr />

								    <h1>

								      The Myth of Names and Addresses<br />

								    </h1>

								    <p>

								      The discussion above about the universality of URIs

								      (Universal Resource Identifiers) mentions briefly how URIs

								      are designed to encompass both things we think of as

								      addresses and those we think of as names. Much of the

								      discussion of this issue has been clouded by attempts to

								      distinguish names from addresses. The term "identifier" was

								      picked in an attempt to side-step this issue but

								      historically, that did not prevent a quagmire of circular

								      discussion which in some circles paralyzed any forward

								      progress. Therefore, in this section let me state the

								      philosophy which to my mind sets this problem in the right

								      light and should prevent further fruitless discussion.

								      <i><br /></i>

								    </p>

								    <p>

								      There is the commonly held belief that names and addresses

								      are different and distinct. We learn the importance of the

								      difference between identifiers in a programming language and

								      addresses within a computer memory. We learn the difference

								      in properties between fully qualified domain names on the

								      internet and internet protocol addresses. This can lead us

								      easily into imagining that there are two types of objects:

								      Names, which once attached to an object follow it for its

								      life wherever it should reside, and "addresses" which change

								      frequently whenever an object moves or is copied or

								      replicated from one "location" to another.

								    </p>

								    <p>

								      However, the only true location is a point in three

								      dimensional space, and within computer systems and especially

								      networked computer systems there is a very large number of

								      complex indirection between almost anything we would call a

								      name <i>or</i> an address and the actual physical location of

								      the memory cell which stores it. At one end of the spectrum a

								      computer memory address often is really an address within a

								      virtual memory space allocated to a particular project, and

								      when used is translated by the hardware into a physical

								      memory address, or for that matter into an address, into a

								      piece of memory which is being moved out into somewhere and

								      swapping the file on disk storage. Filenames are mapped

								      though mount tables and directory files into "inodes" which

								      are mapped onto track and sector locations. Internet protocol

								      addresses [IP Addresses] similarly are not bound absolutely

								      to a given computer: they can be re-allocated within the

								      constraints that because they are used for routing, there is

								      information connecting parts of the IP address with routing

								      information and so the computer corresponding to a given IP

								      address cannot be moved far in the routing structure. So, we

								      see that the constraint on how you can re-use an address is a

								      function of what information is in the address. When most

								      programs or people mention IP addresses, they simply quote

								      four decimal numbers, each between naught and 255 without

								      worrying about the internal structure. So, the information

								      within the IP address which prevents it being re-used in a

								      different area is to most people not explicit: It is, if you

								      like, hidden within there as the reason why IP addresses

								      can't be used. When we want to use something to refer to a

								      computer but still be able to move the computer or at least

								      the thing corresponding to that identification across from

								      one part of the internet to another, we use our domain name.

								      The domain name system, being completely independent of the

								      routing system, allows us to allocate any IP address at all

								      to a computer of a given domain name. Therefore, if we

								      believe the naming myth the domain name is a name and the IP

								      address is truly an address.

								    </p>

								    <h2>

								      <a name="Anecdotes" id="Anecdotes">Two anecdotes about names

								      and addresses</a>

								    </h2>

								    <p>

								      Two real-life anecdotes illustrate the dangers of making this

								      assumption. When there were only a few web servers and I kept

								      a registry of all those which I knew, I was contacted by a

								      group in Australia who were putting up a server with some

								      interesting botanical information. They sent me some details

								      of the server to be put into the list and they gave me the IP

								      address of the machine. My email reply explained that I

								      always prefer to refer to servers by their domain name rather

								      than their IP address and asked them for the domain name of

								      the server. They replied that the domain name they would use

								      would depend on the department within the university which

								      was responsible for maintaining the server but due to a

								      university re-organization, it was not at this point clear

								      which department that would be. However, they explained that

								      they could guarantee that the IP address of the server would

								      remain unchanged for a long time.

								    </p>

								    <p>

								      Several years later, the list of servers now abandoned as a

								      single list of all World Wide Web servers was among the

								      now-extensive web of information maintained on the server

								      known as info.cern.ch, the first World Wide Web server set up

								      at the start of the World Wide Web project. At this time the

								      responsibility for the coordination of World Wide Web

								      protocols was shifting from CERN to MIT/LCS and the embryonic

								      World Wide Web Consortium. For a while, CERN continued to

								      maintain the server, but later the master sources for that

								      information were maintained in America. Soon after this the

								      authorities at CERN requested that the name info.cern.ch

								      should no longer be used to refer to this information, as it

								      was no longer under control of CERN and they could no longer

								      assume responsibility for it. In fact, there was a policy

								      that names in the cern.ch domain should never be allowed to

								      refer to Internet addresses which were not physically on the

								      CERN site. Therefore all hypertext pointers into the

								      info.cern.ch space have had to be changed over the course of

								      time to point to the <code>w3.org</code> space.

								    </p>

								    <p>

								      These two examples show the "name" of objects having to be

								      changed even though the objects retained their essential

								      identity. The reason was in each case imbedded information in

								      the name: the domain name on the server contains authority

								      information about the maintainer of the computer whose

								      address corresponds to the domain name. If the authority for

								      an object changes, whether it "moves" on not, then there may

								      be a need to change its name under these circumstances. It

								      turns out that for almost any naming or addressing system in

								      which there is some information (other than random numbers or

								      dates of creation of the objects) built into the name that

								      the name might have to be changed when the facts

								      corresponding to that information change. Therefore it

								      becomes simply a matter of choice between naming or

								      addressing systems as to what sort of information you wish to

								      include implicitly or explicitly within your "name" or

								      "address".

								    </p>

								    <h2>

								      <a name="Why" id="Why">Why Names Change</a><br />

								    </h2>

								    <p>

								      <small>See also:</small>

								    </p>

								    <ul>

								      <li>

								        <small>In the Syyle Guide for Online Hypertext, <a href=

								        "../Provider/Style/Overview.html"><i>Cool URLs don't

								        change</i></a></small>

								      </li>

								    </ul>

								    <p>

								      It is worth looking at some of the reasons for names in

								      practical use to change or need to be changed. Some World

								      Wide Web servers have unwisely simply mapped the URL space

								      onto a Unix filename space, and the results of this,

								      especially in the early days, were URLs which might look like

								      this:

								    </p>

								    <p>

								      http://pegasus.cs.foo.edu/disk1/students/romeo/cool/latest/readthis.html

								    </p>

								    <p>

								      Looking at the segments of this name we can see as many

								      reasons for the name to need to be changed.

								    </p>

								    <p>

								      The "http:" will only be changed if the document is later

								      served up using a different protocol and, in fact, that is

								      probably one of the least likely pieces to change.

								    </p>

								    <p>

								      "Pegasus", the name of the computer, probably has a

								      significance within the university as a computer dedicated to

								      some particular tasks such as supporting personal student

								      activities, and maybe maintained by a particular department

								      or may even be a name from a project for which the computer

								      was originally put into use before it became shared with

								      general user space. So, "pegasus" will be changed whenever

								      the function of supporting this particular student's web

								      pages has to be shared with other functions.

								    </p>

								    <p>

								      "Cs" indicates the computer science department, so the

								      document is bound to the computer science department. It may

								      not be something which the computer science department has a

								      lot of interest in, and the student may well transfer his or

								      her interests to other departments in the future.

								    </p>

								    <p>

								      The name of the university "foo.edu" will probably last for a

								      good while, though whether the university wants to continue

								      to be associated with the document for more than two or three

								      years is questionable.

								    </p>

								    <p>

								      The next section of the path, "disk1", is clearly a mistake.

								      In fact, of course, disc1 is just a name which can be

								      attached to any physical disk, but by grouping together all

								      the students on a certain disk in this arbitrary way, one

								      makes a binding between all the documents which they create

								      which will have to be broken whenever the computer is

								      reorganized. In fact, the relocation tables which most

								      servers support allow much translation of names to take place

								      and make this sort of path quite unnecessary.

								    </p>

								    <p>

								      The next element identifies Romeo as a student which may

								      change even though he continues to study for the rest of his

								      life, and then the next path element "romeo" identifies the

								      author of the document. As in the case with CERN above, the

								      original author of a document may later not wish to keep

								      maintenance or responsibility for ongoing versions. For

								      example, the document may be submitted to an organization

								      which publishes it and formally takes over responsibility for

								      its upkeep; it may achieve a status of some kind as a

								      standard or an accepted thesis which causes its maintainers

								      to change. The original author may in fact deliberately

								      simply pass on authorship of the document to someone else. In

								      any of these cases the name would have to change, and all

								      references to that name would break.

								    </p>

								    <p>

								      The student himself has not been very wise with his choice of

								      path name. For many people, what is "cool" changes with time

								      and for most people what is "latest" changes with time.

								    </p>

								    <p>

								      Perhaps the unlikely to change piece of information in the

								      URL "readthis" as it contains no information at all, just

								      like the proverbial "click here". Effectively, it is a random

								      name assigned to the document and as such, is perhaps the

								      safest part of the path.

								    </p>

								    <p>

								      The last element of the path, "html" is not strictly

								      necessary with most servers, as at least some servers will,

								      given a URL of &nbsp;"readthis" , &nbsp;serve up the data

								      from a file which is called "readthis.html". Here the student

								      is making it difficult for himself later to change the format

								      or formats in which the file is available, without at least

								      some confusion. Suppose, for example, that he later decides

								      that the information is worth providing in audio format for

								      blind readers. The CERN server can easily be configured so

								      that clients specifically&nbsp;requesting audio formats in

								      preference to HTML can be served as preferentially whereas

								      more normal clients will get the HTML. So, here again is a

								      part of the path which may be later regretted.

								    </p>

								    <p>

								      You can play this game with almost any name and address in

								      any system, and it is interesting to ask yourself in each

								      case: to what extent do I call this a "name" and to what

								      extend do I call it an "address"? So, in conclusion we see

								      that any information explicitly owned or implicitly included

								      in a name is a threat to its longevity. &nbsp;We see

								      &nbsp;that the difference between a "name" and an "address"

								      is not so fundamental. &nbsp;That is why

								    </p>

								    <table border="1" cellpadding="2">

								      <tbody>

								        <tr>

								          <td>

								            When a new URI scheme is defined, the specification

								            defining ity should describe the name-like and

								            address-like properties of URIs in the new scheme, so

								            that that those using them can know what to be able to

								            expect.

								          </td>

								        </tr>

								      </tbody>

								    </table>

								    <h2>

								      <a name="What" id="What">What's in a name?</a><br />

								    </h2>

								    <p>

								      Why is information included then? Generally, the information

								      is included because in order to discover anything about the

								      name, one has to "dereference" the name. Typically this uses

								      some official or unofficial set of indexes distributed or

								      otherwise to look up the name. Many names are hierarchical in

								      the authority which allocates them. DNS names are a good

								      example. Road names within towns are another good example.

								      Therefore to find out where the new "North Street" is located

								      in small town one goes to the town for the definitive answer.

								      For information as to where the server "pegasus.cs.foo.edu"

								      is, one must send a message directly or indirectly to a

								      server controlled by the Foo University.

								    </p>

								    <p>

								      Is it possible to omit all such information from a name?

								      Certainly. Message identifiers in mail have only the need to

								      be unique. So, whereas hierarchical names and time stamps may

								      be used to help make such identifiers unique, you cannot

								      dereference the names at all. Perhaps we should call these

								      "identifiers" rather than "names". Within a certain context,

								      it is extremely useful to be able to refer to a mail message

								      by its mail identifier. We say that these identifiers support

								      the notion of equality: even though they cannot be

								      dereferenced, you can test two mail messages to find out

								      whether they are in fact the same simply by testing their

								      identifiers. You can also within a finite set of mail

								      messages look up a message of a given identifier. You just

								      can't do this on a global scale. So this then is the essence

								      of the naming problem:<br />

								    </p>

								    <table border="1" cellpadding="2">

								      <tbody>

								        <tr>

								          <td>

								            The naming problem: if you put information in a name,

								            it decreases its longevity; if you don't you can't

								            dereference it to a resource.

								          </td>

								        </tr>

								      </tbody>

								    </table>

								    <h3>

								      <a name="social" id="social">Naming: A social and contracual

								      Issue</a>

								    </h3>

								    <p>

								      Many, many solutions to the naming problem have been

								      attempted and successfully deployed in different

								      circumstances. At one end of the scale, it would be in fact

								      possible using a huge network of hash tables around the

								      world, to keep a hash index of all randomly generated unique

								      names. The problem with this idea is that there would have to

								      be one single funding model and one homogeneous quality of

								      service for all names. There would be no way to pay more for

								      a more persistent name.

								    </p>

								    <p>

								      At the other end of the scale, hierarchical systems such as

								      the domain name system, and the x500 name system, have been

								      implemented. Suppose one wants to use a name which can be

								      dereferenced and therefore must put some information in it.

								      That information will lead us to some authority or some root

								      to dereferencing the name. How can we maintain the lifetime

								      of that name as something which can be dereferenced? The only

								      way is that we have a contract with all the agencies which

								      are involved in supporting the systems which dereference that

								      name that they should continue their operation giving a

								      certain quality of service for a certain period of time.

								    </p>

								    <p>

								      Suppose the Foo Alumni Association ran a URL service in which

								      a special name such as

								      "http://alumni.foo.edu/1998/romeo/202-aab" would be available

								      to any graduating paying their dues, and maintained

								      indefinitely (perpetual care) on receipt of a suitable

								      endowment.

								    </p>

								    <p>

								      Of course, as organizations disolve and mutate, there is

								      nothing to stop one organization from taking over the support

								      of &nbsp;the archives another. &nbsp;Forthis purpose, it

								      would be very useful to have a syntax for putting a date into

								      a domain name. &nbsp;This would allow a system to find an

								      archive server. &nbsp;Imaging that, failing to find

								      "info.cern.ch", one could search back and find an entry

								      "info.cern.ch.1994" which pointed to www.w3.org as a current

								      server holding archive information for info.cern.ch as it was

								      in 1994, with, of course, &nbsp;pointers to newer versions of

								      the documents.

								    </p>

								    <h3>

								      <a name="QoS" id="QoS">Quality of Service</a>

								    </h3>

								    <p>

								      Looking at an "http:" URL, while some look more sensible than

								      others, it is not immediately evident whether great pains are

								      being taken to make the name very persistent. &nbsp;We have

								      just discussed such a range of reasons why names can change,

								      and clearly the social and contractual arrangements can be

								      quite involved, so it is clearly difficult to simply define a

								      quality of service for naming. &nbsp;However, defining some

								      well known quality of service levels would be a very useful

								      task. This is the sort of task ideally suited to a group of

								      trechnologies, librraians or archivists.

								    </p>

								    <p>

								      &nbsp;In any event, for identifiers in the http space and

								      many others, it would be useful to be able to assert what the

								      quality of service is. This is information about a URI and a

								      resource. &nbsp;Like the <a href=

								      "Generic.html#Dimensions">information about generic URIs</a>,

								      it is about the sort of identity between the URI and the

								      resource.

								    </p>

								    <table border="1" cellpadding="2">

								      <tbody>

								        <tr>

								          <td>

								            Metadata should be used to express the quality of

								            service for the binding between a URI and a resource.

								          </td>

								        </tr>

								      </tbody>

								    </table>

								    <h2>

								      <i><br /></i>

								    </h2>

								    <hr />

								    <p>

								      <a href="Metadata.html">Next: Metadata architecture</a>

								    </p>

								    <p>

								      <a href="Overview.html">Up to Design Issues</a>

								    </p>

								  </body>

								</html>