server_playground/doc/www.w3.org/DesignIssues/HTTP-URI.html


								<html xmlns="http://www.w3.org/1999/xhtml">

								  <head>

								    <meta name="generator" content=

								    "HTML Tidy for Mac OS X (vers 31 October 2006 - Apple Inc. build 13), see www.w3.org" />

								    <title>

								      What do HTTP URIs Identify? - Design Issues

								    </title>

								    <link rel="Stylesheet" href="di.css" type="text/css" />

								    <meta http-equiv="Content-Type" content=

								    "text/html; charset=us-ascii" />

								  </head>

								  <body bgcolor="#DDFFDD" text="#000000" lang="en" xml:lang="en">

								    <address>

								      Tim Berners-Lee<br />

								      Date: 2002-07-27, last change: $Date: 2007/01/15 20:05:15

								      $<br />

								      Status: personal view only. Editing status: first draft. This

								      was a result of my being in a minority with this opinion on

								      the Technical Architecture Group, and yet finding it the only

								      one I could accept. This is related to TAG issue

								      HTTPRange-14.

								    </address>

								    <p>

								      <a href="./">Up to Design Issues</a>

								    </p>

								    <p>

								      <strong>Note: (2006). This architectural question has now

								      been <a href=

								      "http://lists.w3.org/Archives/Public/www-tag/2005Jun/0039.html">

								      decided</a> by the W3C TAG, in a compromise which I think

								      works quite well, and is described in a <a href=

								      "HTTP-URI2">later short note</a> and a TAG finding.</strong>

								    </p>

								    <hr />

								    <h1>

								      What do HTTP URIs Identify?

								    </h1>

								    <h3>

								      Background Note

								    </h3>

								    <p>

								      This question has been addressed only vaguely in the

								      specifications. However, the lack of very concise logical

								      definition of such things had not been a problem, until the

								      formal systems started to use them. There were no formal

								      systems addressing this sort of issue (as far as I know,

								      except for Dan Connolly's Larch work [@@]), until the

								      <a href="/2001/sw">Semantic Web</a> introduced languages such

								      as RDF which have well-defined logical properties and are

								      used to describe (among other things) web operations.

								    </p>

								    <p>

								      The efforts of the <a href="/2001/tag">Technical Architecture

								      Group</a> to create an architecture document with common

								      terms highlighted this problem. (It demonstrates the

								      ambiguity of natural language that no significant problem had

								      been noticed over the past decade, even though the original

								      author or HTTP , and later co-author of HTTP 1.1 who also did

								      his PhD thesis on an analysis of the web, and both of whom

								      have worked with Web protocols ever since, had had

								      conflicting ideas of what the various terms actually mean.)

								    </p>

								    <p>

								      This document explains why the author find it difficult to

								      work in the alternative proposed philosophies. If it

								      misrepresents those others' arguments, then it fails, for

								      which I apologize in advance and will endeavor to correct.

								    </p>

								    <h2>

								      1. Web Concepts as here proposed

								    </h2>

								    <p>

								      The WWW is a space of information objects. The URI was

								      originally called a UDI, and originally all URIs identified

								      information objects. Now, URI schemes exist which identify

								      more or less anything (e.g. UUIDs) or electronic mailboxes

								      (mailto:) but is we look purely at HTTP URIs, they define a

								      web of information objects. Information objects -- perhaps in

								      Cyc terms <a href="">ConceptualWorks</a> -- are normally

								      things which

								    </p>

								    <ul>

								      <li>Carry some sort of message, and

								      </li>

								      <li>Can be represented, to a greater or lesser authenticity,

								      in bits

								      </li>

								    </ul>

								    <p>

								      I want to make it clear that such things are generic (See

								      <a href="/DesignIssues/Generic">Generic Resources)</a> --

								      while they are documents, they generally are abstractions

								      which may have many different bit representations, as a

								      function of, for example:

								    </p>

								    <ul>

								      <li>Time -- the contents can vary with revision --

								      </li>

								      <li>Content-type in which the bits are encoded

								      </li>

								      <li>Natural language in which a human-readable document is

								      written

								      </li>

								      <li>Machine language in which a machine-processable document

								      is written

								      </li>

								      <li>and a few more

								      </li>

								    </ul>

								    <p>

								      but the philosophy is that an HTTP URI may identify something

								      with a vagueness as to the dimensions above, but it still

								      must be used to refer to a unique conceptual object whose

								      various representations have a very large a mount in common.

								      Formally, it is the publisher which defines the what an HTTP

								      URI identifies, and so one should look to the publisher for a

								      commitment as to the exact nature of the identity along these

								      axes.

								    </p>

								    <p>

								      I'm going to refer to this as a <strong>document</strong>,

								      because it needs a term and that is the best I have to date,

								      but the reader should be sure to realize that this does not

								      mean a conventional office document, it can be for example

								    </p>

								    <ul>

								      <li>A poem

								      </li>

								      <li>An order for ball bearings

								      </li>

								      <li>A painting

								      </li>

								      <li>A Movie

								      </li>

								      <li>A review of a movie

								      </li>

								      <li>A sound clip

								      </li>

								      <li>A record of the temperature of the furnace

								      </li>

								      <li>An array a million integers, all zero

								      </li>

								    </ul>

								    <p>

								      and so on, as limited only by our imagination.

								    </p>

								    <p>

								      The Web works because, given an HTTP URI, one can in a large

								      number of cases, get a representation of the document. For a

								      human readable document, the person is presented with the

								      information by virtue of some gadget which is given the bits

								      of a representation. In the case of a hypertext document, a

								      reference to another document is encoded such that, upon user

								      request, the referenced document can in turn be automatically

								      presented. In the case of a machine-readable document,

								      identifiers of concepts, being HTTP URIs, will often allow

								      definitive reference information about those concepts to be

								      pulled in to guide further actions.

								    </p>

								    <p>

								      The web, then, is made of documents as the internet is made

								      of cables and routers. The documents can be about anything,

								      so when we move to talk about the contents of documents we

								      break away from talking about information space and the whole

								      universe of human -- and machine -- discourse is open to us.

								      Web pages can compare a renaissance choral works with jazz

								      pop hits, and discuss whether pigs have wings.

								      Machine-processable documents can encode information about

								      shoes, and ships, and sealing-wax. Until recently, the

								      Internet protocol standards out of which the Web is built had

								      little to say about such things. They were concerned only

								      with the human-readable side, so it was people, reading

								      natural language (not internet specs) who formed and

								      communicated the concepts at this level. Nowadays, however,

								      semantic web languages allow information to be expressed not

								      only about URIs, TCP ports and documents, but also about

								      arbitrary concepts - the shoes, and ships and sealing wax,

								      and whether pigs have wings. Simple semantic web application

								      allow one to order shoes and travel on ships, and determine

								      that, given the data, pigs do not have wings.

								    </p>

								    <p>

								      For these purposes it is of course quite essential to

								      distinguish between something described by a document and the

								      document itself. Now that we -- for the first time -- have

								      not only internet protocols which can talk about document but

								      also those which talk about real world things, we must either

								      distinguish or be hopelessly fuzzy.

								    </p>

								    <p>

								      And is this bad, is it an inhibition to have to work our way

								      though documents before we can talk about whatever we desire?

								      I would argue not, because it is very important not to lose

								      track of the reasons for our taking and processing any piece

								      of information. The process of publishing and reading is a

								      real social process between social entities, not mechanical

								      agents. To be socially responsible, to be able to handle

								      trust, and so on, we must be aware of these operations. The

								      difference between a car and what some web page says about it

								      is crucial - not only when you are buying a car.

								    </p>

								    <p>

								      Some have opined that the abstraction of the document is

								      nonsense, and all that exists, when a web page describes a

								      car, is the car and various representations of it, the HTML,

								      PNG and GIF bit streams. This is however very weak in my

								      opinion. The various representations have much more in common

								      than simply the car. And the relationship to the car can be

								      many and varied: home page, picture, catalog entry, invoice,

								      remote control panel, weblog, and so on. The document itself

								      is an important part of society - to dismiss its existence is

								      to prevent us being aware of human and aspects of information

								      without which we are impoverished. By contrast, the

								      difference between different representations of the document

								      (GIF or PNG image for example) is very small, and the

								      relationship between versions of a document which changes

								      through time a very strong one.

								    </p>

								    <h2>

								      2. Trying out the Alternatives

								    </h2>

								    <p>

								      The folks who disagree with the model do so for a number of

								      different arguments. This article, therefore will have to

								      take them one by one but the ones which come to mind are as

								      follows:

								    </p>

								    <ol>

								      <li>

								        <a href="#L728">Every web page (or many of therm) are in

								        fact themselves representations of some abstract thing, and

								        the URI really identifies that</a> thing, not a document at

								        all.

								      </li>

								      <li>

								        <a href="#L876">There are many levels of identification

								        (representation as a set of bits, document, car which the

								        web page is about) and the URI publisher, as owner of the

								        URI, has the right to define it to mean whatever he or she

								        likes;</a>

								      </li>

								      <li>

								        <a href="#L883">Actually the URI has to, like in English,

								        identify these different things ambiguously. Machines have

								        to disambiguate using common sense and logic</a>

								      </li>

								      <li>

								        <a href="#L890">Actually the URI has to, like in English,

								        identify these different things ambiguously. Machines have

								        to disambiguate using the fact that different properties

								        will refer to different levels</a>.

								      </li>

								      <li>

								        <a href="#L897">Actually the URI has to, like in English,

								        identify these different things ambiguously. Machines have

								        to disambiguate using extra information which will be

								        provided in other ways along with the URI</a>

								      </li>

								      <li>

								        <a href="#L909">Actually the URI has to, like in English,

								        identify these different things ambiguously. Machines have

								        to disambiguate them by context: A catalog card will talk

								        about a document. A car catalog will talk about a car</a>.

								      </li>

								      <li>

								        <a href="#L920">They may have been used to identify

								        documents up till now, but for RDF and the Semantic Web, we

								        should change that and start to use them as the Dublin Core

								        and RDF Core groups have for abstract concepts</a>.

								      </li>

								    </ol>

								    <h3 id="L728">

								      2.1 Identify abstract things not documents

								    </h3>

								    <p>

								      Let's take the alternatives in order. These alternatives all

								      make sense. Each one, however, has problems I can't see any

								      way around when we consider them as a basis as

								    </p>

								    <p>

								      The first was,

								    </p>

								    <blockquote>

								      <p>

								        Every web page (or many of them) are in fact themselves

								        representations of some abstract thing, and the URI really

								        identifies that thing, not a document at all.

								      </p>

								    </blockquote>

								    <p>

								      Well, that wasn't the model I had when URIs were invented and

								      HTTP was written. However, let's see how it flies. If we

								      stick with the principle that a URI (or URIref) must

								      unambiguously identify the same thing in any context, then we

								      come to the conclusion that URIs can not identify the web

								      page. If a web page is about a car, then the URI can't be

								      used to refer to the web page.

								    </p>

								    <h4>

								      2.1.1 <a name="s2.1.1" id="s2.1.1">Same URI can identify a

								      web page and a car</a>

								    </h4>

								    <p>

								      What, a web page can't be a car? At this point a pedantic

								      line reasoning suggests that we should allow web pages and

								      cars to conceptually overlap, so that something can be both.

								      This is counterintuitive, as a web page is in common sense,

								      not a concrete object whereas a car is. But sure, we could

								      construct a mathematics in which we use the terms rather

								      specially and something can be at the same time a web page

								      and a car.

								    </p>

								    <p>

								      Frankly, this doesn't serve the social purpose of the

								      semantic web, to be able to deal with common sense concepts

								      and objects. A web page about a car and a car are in most

								      people's minds quite distinct (as I argue further below). A

								      philosophy in which they are identical does not allow me to

								      distinguish between them. not only conflicts with reality as

								      I see it, but also leaves us no way to make statements

								      individually about the two things.

								    </p>

								    <h4>

								      <img alt=

								      "A car has a different identifier -- and very different properties."

								      src="diagrams/http-uri-1.png" />

								    </h4>

								    <h4>

								      2.1.2 <a name="identifies" id="identifies">The URI identifies

								      the car, not the web page</a>

								    </h4>

								    <p>

								      So lets fall back on the idea that the URI identifies the

								      <em>subject</em> of the web page, but not the web page

								      itself. This makes sense. We can build the semantic web on

								      top of that easily.

								    </p>

								    <p>

								      The problem with this is that there are a large number of

								      systems which already do use URIs to identify the document.

								      This is the whole metadata world. Think of a few:

								    </p>

								    <ul>

								      <li>The Dublin Core

								      </li>

								      <li>RSS

								      </li>

								      <li>The HTTP headers

								      </li>

								      <li>The Adobe XML system

								      </li>

								      <li>Access control systems

								      </li>

								    </ul>

								    <p>

								      (I'm sticking with the machine-processable languages as

								      examples because human-processable ones like HTML have a

								      level of ambiguity traditional in human natural language but

								      quite out of place in the WWW infrastructure -- or the

								      Semantic Web. You can argue that people say "I work for

								      w3.org" or "http://www.amazon.com/shrdlu?asin=314159265359"

								      is a great book, just as they happily say "<em>Moby Dick</em>

								      weighs over three thousand tons", "<em>Moby Dick</em> was

								      finished over a century ago" and "I left <em>Moby Dick</em>

								      on the beach" without expecting to be misunderstood. So we

								      won't use human language as a guide when defining

								      unambiguously the question of what a URI identifies. If we

								      want to do that on the Semantic Web, we will say "I work for

								      <em>the organization whose home page is</em>

								      http://www.ww3.org.)

								    </p>

								    <p>

								      Some argue the the URI which I associate with someone's home

								      page actually identifies that person. They argue that

								      conventionally people use the identifier to identify the

								      person. However, consider another page put together by

								      friends who found a photograph of the same person. A lot of

								      content filtering systems would collect that URI and put put

								      into their list. Even though the photo had many

								      representations which different devices could download using

								      content negotiation and/or CC/PP (color or black and white

								      and versions of different resolutions) the URI itself would

								      be listed as containing nudity. The public are very aware of

								      different works on the web, even though they have the same

								      topic.

								    </p>

								    <h4>

								      2.1.3 <a name="Indirect" id="Indirect">Indirect

								      identification</a>

								    </h4>

								    <p>

								      You can argue that a web page <em>indirectly</em> identifies

								      something, of course, and I am quite happy with that. If you

								      identify an organization as that which has home page

								      http://www.w3.org, then you are not saying that

								      http://www.w3.org/ itself is that organization. This scenario

								      is very very common, just as we identify people and things by

								      their "unambiguous properties": books by ISBN, people by

								      email address, and so forth. So long as we don't think that

								      the person <em>is</em> an email address, we are fine. Some

								      people have thought that in saying "An HTTP URI can't

								      identify an organization" I was ruling out this indirect

								      identification, but not so: I am very much in favor of it.

								      The whole SQL world, after all, only identified things

								      indirectly by a key property. This causes no contradiction.

								      Perhaps I should say "An HTTP URI can't directly identify an

								      organization". But by "identify" I mean "directly identify",

								      and "identity" is a fairly direct word and concept, so I will

								      stick with it.

								    </p>

								    <p>

								      Conclusion so far: the idea that a URI identifies the thing

								      the document is about doesn't work because we can only use a

								      URI to identify one thing and we have and already do use it

								      to identify documents on the web.

								    </p>

								    <h4>

								      2.1.4 <a name="argument" id="argument">The argument for HTTP

								      URIs identifying a Conceptual Work</a>

								    </h4>

								    <p>

								      So what's wrong with the URI being taken to identify whatever

								      the owner says?

								    </p>

								    <p>

								      Let's look at what we mean by <em>identifies</em>. When we

								      say there is identity, that means that there is some form of

								      sameness that we associate with the identifier. Now, for all

								      the philosophical argument, we can never test the identity of

								      an abstract thing. What we can test is a representation which

								      has been returned by the server when given that URI. When we

								      use aURI, and get back several possible representations of

								      it, then what expectation do we have about those

								      representations?

								    </p>

								    <p>

								      Take the test case that I see the web page which has a

								      picture of a car, and I see in the URI in the URI bar in the

								      browser. I email you the URI, "you see, the car is a

								      Toyota?". You click on the link. Your browser shows the same

								      URI as mine in the "URL bar" but you see a table of the car's

								      weight, length, height, color, and registration number. We

								      are confused. The web didn't work because you didn't get the

								      same information as me. I expected you to get the same

								      information, basically. That is how the Web works. That is

								      the expectation behind every hypertext link - that the

								      follower of the link should get basically the same

								      information as the person who made the link. I say,

								      "basically" because I would not have cared whether you saw or

								      JPEG or a GIF. It probably wouldn't have mattered if you had

								      seen a lower resolution or even black-and-white copy of the

								      picture. If you are visually impaired, you may have been able

								      to manage with a well-written description of the picture. But

								      the the essential information is the same, not just the

								      subject of the page.

								    </p>

								    <p>

								      So now we have put the four corners on the expectation we

								      have of a URI -- that all representations have essentially

								      the same <em>information content</em>. And what we mean by

								      "essentially" allows in fact some wriggle room, and in the

								      end it rests on a common understanding between publisher of

								      the information and quoter of the URI. The sameness we are

								      after is the sameness of information content. <em>That</em>

								      is what is identified by the URI. That is why we say that the

								      URI identifies that conceptual information content,

								      irrespective of its particular representation: the

								      <em>conceptual work</em>. Without that common understanding,

								      the web does not work.

								    </p>

								    <p>

								      Some people have said, "If we say that URIs identify people,

								      nothing breaks". But all the time they, day to day, rely on

								      sameness of the information things on the web, and use URIs

								      with that implicit assumption. As we formalize how the web

								      works, we have to make that assumption explicit.

								    </p>

								    <h3 id="L876">

								      2.2 Author definition

								    </h3>

								    <p>

								      So how can we break free of that line of reasoning? We can

								      try throwing away the rule that a URI identifies only one

								      thing.

								    </p>

								    <blockquote>

								      <p>

								        There are many levels of identification (representation as

								        a set of bits, document, car which the web page is about)

								        and the URI publisher, as owner of the URI, has the right

								        to define it to mean whatever he or she likes.

								      </p>

								    </blockquote>

								    <p>

								      Well, this one is tempting from the point of view that the

								      owner of an identifier should reign supreme when it comes to

								      saying what it identifies. It is quite a logically consistent

								      position to take. After all, isn't this the case with

								      <code>uuid</code>'s? And for a new scheme, this would be

								      interesting. How can we do it though, with HTTP? the problem

								      is an engineering one: I can't in practice use a URI until I

								      have some definitive information from the publisher as to

								      what it identifies.

								    </p>

								    <p>

								      2.2.1 Default

								    </p>

								    <p>

								      Why can't a URI default to identifying a web page until you

								      know otherwise? Because the web is open and you will never

								      know when you might lean some other information which will

								      make the default incorrect. (You can't use such "closed

								      world" reasoning).

								    </p>

								    <p>

								      2.2.2 Web operation

								    </p>

								    <p>

								      Why can't a URI identify a web page until you have done some

								      well-defined operation -- such as HTTP HEAD or GET -- and

								      checked for information in that? Well, that would certainly

								      work logically. Suppose we we define a return code or HTTP

								      header which means "abstract object requested". It would mean

								      that every web application which deals with web pages as web

								      pages would actually be working under an ambiguity, and RDF

								      processors could be programmed to look for that special

								      information. We can't retrofit the millions of web servers

								      out there, I assume.

								    </p>

								    <p>

								      I feel that there is a great benefit to fixing this question

								      at the spec level. Otherwise, what happens? I read a web

								      page, I like it and I am going to annotate it as being a

								      great one -- but first I have to find out whether the URI my

								      browser is used, conceptually by the author of the page, to

								      represent some abstract idea? Before I recommend the

								      <em>Vietnam War</em> page, I have to be careful I am not

								      recommending the Vietnam War.

								    </p>

								    <p>

								      There has been no way to do this before RDF, but then

								      similarly no real need for it. (What, is this just a problem

								      with RDF? No, it will happen with any webized knowledge

								      representation system.). We really need to have communication

								      in which two people use the same URI to mean the same thing.

								      If there

								    </p>

								    <p>

								      We could fix HTTP so that it would return me some extra

								      semantic headers explaining the whole thing. And in the case

								      that the URI was deemed to be some abstract thing, I would

								      not have the option of recommending the web page. Too bad: it

								      has no URI.

								    </p>

								    <p>

								      The authors of document

								      &lt;http://www.w3.org/2000/10/rdf-tests/rdfcore/Manifest.rdf&gt;

								      certainly thought that they could use

								      "http://www.w3.org/2000/10/rdf-tests/TestSchema/NegativeParserTest"

								      to identify an abstract thing which is a type of software

								      test. Now they have a choice as to what to make the server

								      return for them when I ask for it. It returns 404 "doesn't

								      match anything we have available". It can't really, because

								      HTTP doesn't allow one to return a class, only a document.

								      And if it were to return a document, then I wouldn't be able

								      to refer to that document without accidentally referring to

								      the class of negative parser tests.

								    </p>

								    <p>

								      So, we could change HTTP to make this work. We could make a

								      new form of redirect, <em>343 Abstract Object, please see . .

								      .</em>, which would tell the client that the thing requested

								      was abstract, and would suggest a document to read about it.

								      This avenue of argument is still outstanding. We could take

								      it. It isn't the status quo, but we could make changes in

								      HTTP if the community felt that this was they way to go.

								    </p>

								    <h3 id="L883">

								      2.3 Logic disambiguates

								    </h3>

								    <p>

								      Otherwise,we have to try another way of letting the URI mean

								      sometimes one thing and sometimes another. Here is another.

								    </p>

								    <blockquote>

								      <p>

								        Actually the URI has to, like in English, identify these

								        different things ambiguously. Machines have to disambiguate

								        using common sense and logic

								      </p>

								    </blockquote>

								    <p>

								      This is possible in theory. It is a mess. It fails

								      particularly spectacularly when a URI is used ambiguously to

								      refer to a web page and the thing that web page is about,

								      which happens to be another web page. <em>Anyone can write

								      anything about anything</em> is a Web motto, but here it

								      falls down. <em>Anyone can write anything about anything

								      except those things which might get confused with the

								      document they are writing</em>. It breaks the axiom that we

								      mean the same thing by a URI - in all contexts. (And RDF has

								      a model theory in which necessarily in any interpretation, a

								      symbol always denotes one thing).

								    </p>

								    <h3 id="L890">

								      2.4 Different Properties

								    </h3>

								    <blockquote>

								      <p>

								        Actually the URI has to, like in English, identify these

								        different things ambiguously. Machines have to disambiguate

								        using the fact that different properties will refer to

								        different levels.

								      </p>

								    </blockquote>

								    <p>

								      One way of getting here is to start by considering that HTTP

								      headers can be divided into those which refer to the

								      representation (or the document) and those that refer to,

								      say, a car or a donkey. We can look at all RDF properties and

								      other attributes in other languages and divide them in in

								      such a way. So, when I say "http://example.com/albert is a

								      color photo", I am referring to the representation; when I

								      say "http://example.com/albert used to work down the mill" I

								      am referring to the person; when I say

								      "http://example.com/albert was taken on a rainy day" I am

								      revering to the original photograph, which is basically the

								      representation of Albert.

								    </p>

								    <p>

								      This one has the problem when a web page refers to a web

								      page. It can still be pursued, by having different verbs for

								      talking about ownership of the web page and ownership of the

								      car. This is a classic example of the 2-level syndrome (see

								      also <em>Dictionaries in the Library</em>). The basic fallacy

								      is that you can make the system general by introducing a

								      second level - a new set of attributes, properties, or

								      whatever, which allow you to refer to the metadata of

								      something separately from the thing itself. These systems

								      either turn out to be just limited 2-level systems (like XML

								      and DTDs) or have to be extended to be recursive in some way

								      later on such that in fact the two levels become unnecessary.

								    </p>

								    <h3 id="L897">

								      2.5 Extra info with URI

								    </h3>

								    <blockquote>

								      <p>

								        Actually the URI has to, like in English, identify these

								        different things ambiguously. Machines have to disambiguate

								        using extra information which will be provided in other

								        ways along with the URI

								      </p>

								    </blockquote>

								    <p>

								      This twist now relies on sending extra information with a

								      URI. Effectively, the URI scheme has now failed to identify

								      anything by itself. Those most familiar URIs as used by HTML

								      sometimes suggested adding new attributes to the anchor tags

								      of HTML documents to disambiguate a reference. I guess it

								      would work if HTML anchors were the only uses of URIs. By

								      contrast, they are used in thousands of places and way, many

								      of which I am unaware. The architecture, however, is not that

								      way: the architecture of the WWW is that a URI is a global

								      unambiguous identifier. Not a URI and something else.

								    </p>

								    <p>

								      (The various designs such a WebDav's propfind which use HTTP

								      methods apart from GET to retreive information suffer from

								      this same problem. the information does not have a URI: it is

								      not on the web.)

								    </p>

								    <h3 id="L909">

								      2.6 Different meaning in different context

								    </h3>

								    <blockquote>

								      <p>

								        Actually the URI has to, like in English, identify these

								        different things ambiguously. Machines have to disambiguate

								        them by context: A catalog card will talk about a document.

								        A car catalog will talk about a car.

								      </p>

								    </blockquote>

								    <p>

								      This works in the short term, when the two contexts are

								      disjoint groups who do not need to communicate. It is in fact

								      the current state: the groups of people who use HTTP URIs to

								      talk about documents, and those who have just started to use

								      them to talk about abstract concepts haven't collided yet.

								      (Well, they have in my code. I need to be able to model the

								      metadata about an HTTP URI as that about a document, and it

								      being a class at the same time doesn't jive.)

								    </p>

								    <p>

								      It doesn't work in the long term because it breaks the axiom

								      that a URI must identify one thing,

								    </p>

								    <h3 id="L920">

								      2.7 Change it for the Semantics Web

								    </h3>

								    <blockquote>

								      <p>

								        They may have been used to identify documents up till now,

								        but for RDF and the Semantic Web, we should change that and

								        start to use them as the Dublin Core and RDF Core groups

								        have for abstract concepts.

								      </p>

								    </blockquote>

								    <p>

								      I think that we would have to design a new URI scheme before

								      we change things that much. That is tempting of course. But

								      then -- building a semantic web out of what we have is

								      tempting too. It was tempting to rehash TCP a little when

								      making HTTP. It wasn't practical, and we would have lost a

								      lot more than we would have gained. There is a lot to be said

								      for using common technology. We've got an infrastructure of

								      documents. We want to build an infrastructure of knowledge.

								      Let's build it using the documents. We might find that the

								      commonality with the web of human-readable information is a

								      boon.

								    </p>

								    <h3 id="L735">

								      2.8 Abandon any identification of abstract things

								    </h3>

								    <p>

								      An argument which surprised me is that yes, HTTP URIs

								      identify documents, but in fact the frgament identifier must

								      only be used to identify parts -- fragments -- of documents.

								      This means that RDF cannot in fact use HTTP URI schemes at

								      all. A completely different system would have to be put

								      together -- either a new set of URIs, or RDF conventions in

								      which the relationship to the part of a document in which

								      something was described became explicit. In N3 this would

								      like like

								    </p>

								    <p>

								      [ is rdf:referent of &lt;#fmyCar&gt; ] [ is rdf:referent of

								      &lt;#color&gt; ] [ is rdf:referent of &lt;#blue&gt; ]

								    </p>

								    <p>

								      Of course, languages would quickly generate special syntax

								      for this. Alternatively, the RDF system would built entirely

								      on the understanding that we were referring always to that

								      denoted by a given bit of document, not the bit of document

								      itself. This would mean that there would be no way for the

								      RDF system to refer to documents themselves directly.

								    </p>

								    <p>

								      This is actually a consistent way of working. It would be a

								      change only for those people who use RDF to talk about

								      documents as documents. We could change.

								    </p>

								    <h2>

								      <a name="L409" id="L409">3. Conclusion</a>

								    </h2>

								    <p>

								      I didn't have this thought out a few years ago. It has only

								      been in actually building a relatively formal system on top

								      of the web infrastructure that I have had to clarify these

								      concepts my own mind. I am forced to conclude that modeling

								      the HTTP part of the web as a web of abstract documents if

								      the only way to go which is practical and, by the

								      philosophical underpinnings of the WWW, tenable.

								    </p>

								    <p>

								      I apologize again if I have misunderstood or misrepresented

								      other's arguments in this process of this explanation of my

								      own position.

								    </p>

								    <p>

								      Tim Berners-Lee

								    </p>

								    <p>

								      2002-07-28Z

								    </p>

								    <hr />

								    <h3>

								      FAQ

								    </h3>

								    <p>

								      <em>Q: But surely, if a document is identified by a namespace

								      URI, then when we look up an RDF namespace will millions of

								      words in it we will have too long a document to be

								      practical!</em>

								    </p>

								    <p>

								      A: It is arguable, for such as situation, whether the

								      namespace itself is more cumbersome to manage than the

								      document is to deliver. You can make an analogy with

								      hypertext: Isn't the model of retrieving a document going to

								      be inefficient when the documents are huge? Answers are

								      twofold in each case,

								    </p>

								    <p>

								      Firstly, yes it is likely to be less convenient, but that is

								      no reason to skew which is a good engineering design for the

								      vast proportion of namespaces (or hypertext documents) which

								      are not huge.

								    </p>

								    <p>

								      Secondly, the HTTP protocol actually does have methods of

								      retrieving parts of a large document.

								    </p>

								    <p>

								      <em>Q: It seems strange that an HTTP URI should be limited to

								      referring to documents, but that all one has to add is this

								      little hash mark and suddenly you say it can be used to

								      identify anything.</em>

								    </p>

								    <p>

								      A: The hash is not a minor appendage to the URI: It is the

								      most significant piece of punctuation in the whole URIref.

								      The hash adds a whole new level of abstraction and

								      specification! It is true that in a hypertext page and that

								      page scrolled to a given point seem very similar. The same

								      applies to a graphic chart and an object within that chart,

								      especially when it is displayed in the context of the

								      original document. So I suppose it may be a shock when the

								      technique is used with a semantic web language to refer to

								      not the document, but something which the document discusses.

								      That does allow it to break out of the whole concept of

								      documents and into -- anything. But no one promised the

								      Semantic Web would be boring. :-)

								    </p>

								    <p>

								      <em>Q: I thought you said "anything should be able to have a

								      URI"?</em>

								    </p>

								    <p>

								      Yes, and it should. There is nothing in the URI spec to say

								      what an individual scheme should or should not be created to

								      identify. A new URI scheme could for example be ale to

								      identify anything. But here we are talking about HTTP URIs.

								      And remember that with semantic web languages, you can use a

								      URIref (very different from a URI) to identify anything, for

								      example with HTTP and RDF.

								    </p>

								    <p>

								      <em>Q: But what about CGI scripts? Surely you don't mean the

								      HTTP URI identifies the script?</em>

								    </p>

								    <p>

								      A: Of course not. When we talk about the "document"

								      identified by a URI it is very often an virtual document

								      produced by, for example, a CGI script. The URI identifies

								      the document on the web, with no regard to the process which

								      causes representations of it to be served.

								    </p>

								    <p>

								      <em>Q: Some HTTP URIs can be POSTed to. Can you still say

								      they identify documents?</em>

								    </p>

								    <p>

								      A: Well, some HTTP URIs can't be accessed at all, and some

								      access is not allowed, and yes, some URIs are not only

								      documents but also can be posted to. So they object is more

								      complex than simply a document. But that it has this extra

								      functionality doesn't make it any less a HTTP document

								      formally. Something can have extra features and still remain

								      in the same class of things.

								    </p>

								    <p>

								      <em>Q: What do you mean by "identify", anyway, in Model

								      Theory terms? (2003)</em>

								    </p>

								    <p>

								      The closest term used in Model Theory to the way I am using

								      <em>identify</em> is <em>denote</em>. Model theory analyses

								      communication and understanding by imagining a set of

								      <em>interpretations</em>, where an interpretation is a

								      mapping from a symbol to that which it denotes. Model

								      theorists and linguists tend to complain that one cannot talk

								      about the meaning of a term, as you can never know what

								      anyone means by anything, you can only see how they react. A

								      given agent may have many possible interpretations, but new

								      information the agent believes which mentions a symbol will

								      rule out interpretations with which are inconsistent with the

								      symbol. By the process of exchange of a lot of information,

								      one arrives at a state in which one behaves as though other

								      agents has the effectively the same set of interpretations.

								      Under these conditions, one can think of the thing

								      <em>identified</em> by the symbol in the community as being

								      the set of things denoted by the symbol in the

								      interpretations which agents in the community are left with.

								      There has been much more discussion of this process (which is

								      the essence of the writing of a standard and the purpose of

								      documents like this) in email on www-tag with Pat Hayes and

								      others in 2003.

								    </p>

								    <address>

								      The rest are from Aaron Swartz

								    </address>

								    <p>

								      <em>Q: Can you point to something in the spec that says HTTP

								      URIs must identify a document?</em>

								    </p>

								    <p>

								      There are many answers. I can point to things which could be

								      interpreted to say that. The HTTP spec defines resources as

								      <em>network data objects</em>. To me that "data" indicates

								      the information nature of the thing. It precludes, in most

								      people's minds, a car or the Andromeda Galaxy.

								    </p>

								    <p>

								      I could explain that, as I originally wrote the HTTP spec,

								      that was the author's intent.

								    </p>

								    <p>

								      But I think the fairest thing is to say that the spec was

								      written it was not sufficiently clear about this particular

								      ambiguity, and for reasons mentioned above, this hasn't been

								      a problem until now.

								    </p>

								    <p>

								      <em>Q: Isn't it a little weird to start making pronouncements

								      about the entire HTTP Web when neither the spec nor the other

								      TAG members agree?</em>

								    </p>

								    <p>

								      Pronouncements about the whole Web are really important where

								      they are needed. In that case the TAG has a duty to make

								      them. And so do I. It seems to me that this assumption is one

								      we have been implicitly making and are now breaking, in a way

								      which will make the semantic web either inconsistent or much

								      less efficient. The TAG members do not agree on this: that is

								      why they asked my to write this document. It is written as a

								      TAG action item about tag issue HTTPRange-14. Things get a

								      lot weirder than that. ;-)

								    </p>

								    <p>

								      <em>Q: Why do we need to use URI-refs to identify abstract

								      concepts in a protocol where we can get more information

								      about them? .I thought URIs were doing just fine. If we have

								      to resort to UUIDs to identify things, I'll get annoyed

								      because I won't be able to put them in my browser.</em>

								    </p>

								    <p>

								      Well, there you are... you want to be able to put something

								      in your browser, then you must have a representation of it.

								      So somewhere in the picture, representations aside, is a

								      ConceptualWork. If the ConceptualWork is important, then it

								      needs a URI, in my opinion. The alternatives are attractive

								      when you start to look at them, but each has a different

								      snag. I have tried to explain above.

								    </p>

								    <p>

								      <em>Q: How can you say that the Semantic Web can use the hash

								      mark to make a URI-ref identify anything when the URI RFC is

								      very clear that hash marks only work when you dereference the

								      document.</em>

								    </p>

								    <p>

								      I wouldn't say that hash marks "only work when you deference

								      a document" any more than your street address "only works

								      when I visit you", or your date of birth "only worked when

								      you were born". I can use your street address -- or your data

								      of birth -- to help identify you. What the spec defines is a

								      way of using this particular URI to get some information over

								      the Internet. The whole web works by what someone recently

								      referred to as a "confusion" between name and address. It

								      isn't a confusion. It is a connection between two pieces of

								      architecture without which the web would not be. Rethink. It

								      is primarily a name. We have made a way of looking it up. So

								      you don't have to look it up for the name to "work" as an

								      identifier. Just as you don't go and look it up when someone

								      quotes the RDF namespace -- it works because the same

								      identifier identifies the same thing in any context. Looked

								      up or not. The same thing is true for foo#bar. If the

								      document foo is never served, one can still (if one owns it)

								      talk about foo#bar with authority. It is of course good

								      practice to serve documents.

								    </p>

								    <p>

								      <em>Q: Are all Semantic Web agents going to start

								      dereferencing every document they hear about?</em>

								    </p>

								    <p>

								      No, any more than you have to dereference every hypertext

								      link you see.

								    </p>

								    <p>

								      <em>Q: Isn't the Semantic Web broken if we have to start

								      disagreeing with major specifications like this?</em>

								    </p>

								    <p>

								      This philosophy is quite consistent with the HTTP spec as it

								      is.

								    </p>

								    <h3>

								      Exercises

								    </h3>

								    <p>

								      1) What does "<a href=

								      "http://www.amazon.com/exec/obidos/ASIN/0679600108/qid=1027958807/sr=2-3/ref=sr_2_3/103-4363499-9407855">http://www.amazon.com/exec/obidos/ASIN/0679600108/qid=1027958807/sr=2-3/ref=sr_2_3/103-4363499-9407855</a>"

								      identify?

								    </p>

								    <ol>

								      <li>A whale

								      </li>

								      <li>"Moby Dick or the Whale" by Herman Melville

								      </li>

								      <li>A web page on Amazon offering a book for sale

								      </li>

								      <li>A URI string

								      </li>

								      <li>All the above

								      </li>

								    </ol>

								    <p>

								      When was the thing it identified last changed?

								    </p>

								    <p>

								      Have you read the thing it identifies?

								    </p>

								    <p>

								      2) What does "<a href=

								      "http://www.vrc.iastate.edu/magritte.gif">http://www.vrc.iastate.edu/magritte.gif</a>"

								      identify?

								    </p>

								    <ol>

								      <li>A pipe

								      </li>

								      <li>I don't know, but whatever it is it isn't not a pipe.

								      </li>

								      <li>A contradiction

								      </li>

								      <li>

								        <strong>A picture by Magritte</strong>

								      </li>

								      <li>

								        <strong>A photograph of a picture by Magritte</strong>

								      </li>

								      <li>

								        <strong>A representation as a series of 341632 bits in of a

								        photo of a painting</strong>

								      </li>

								      <li>Validly 4, 5 and 6 but not 1

								      </li>

								    </ol>

								    <p>

								      <img alt="Hint: This is not a pipe" src=

								      "http://www.vrc.iastate.edu/magritte.gif" />

								    </p>

								    <p>

								      3) What does "<a href=

								      "http://dm93.org/2002/03/dans-car-23423423">http://dm93.org/2002/03/dans-car-23423423"</a>

								      identify?

								    </p>

								    <ol>

								      <li>An inaccessible web page

								      </li>

								      <li>A black Toyota

								      </li>

								    </ol>

								    <p>

								      4) What does "<a href=

								      "http://dm93.org/y2002/myCar-232">http://dm93.org/y2002/myCar-232</a>"

								      identify?

								    </p>

								    <ol>

								      <li>A black toyota

								      </li>

								      <li>A web page

								      </li>

								    </ol>

								    <p>

								      When was the thing identified last changed?

								    </p>

								    <p>

								      What does the writing on Dan's car say?

								    </p>

								    <p>

								      Answers: 1:3. 2:7 Note here the web tolerates vagueness along

								      the axis of different representations of the same image, but

								      not of semantic level between the image and the pipe. 3:1;

								      4:2

								    </p>

								    <h3>

								      References

								    </h3>

								    <p>

								      @@@links

								    </p>

								    <ul>

								      <li>The huge discussion of this issue on www-tag@w3.org

								      </li>

								      <li>

								        <a href="http://www.textuality.com/tag/s1.1.html">Tim

								        Bray's text</a>

								      </li>

								      <li>RFC 1634 and points west

								      </li>

								      <li>Roy Fielding's short history of URI specifications

								      </li>

								      <li>Weaving the Web

								      </li>

								      <li>

								        <a href=

								        "http://www.cyc.com/cycdoc/vocab/info-vocab.html">Cyc's

								        page about Conceptual Works</a> cyc:ConceptualWork <a href=

								        "http://ilrt.org/discovery/chatlogs/rdfig/2002-07-31.html#T15-56-58-1">

								        proposed as what I mean by document by DanC</a>.

								      </li>

								    </ul>

								    <hr />

								    <p>

								      <a href="Overview.html">Up to Design Issues</a>

								    </p>

								    <p>

								      <a href="../People/Berners-Lee">Tim BL</a>

								    </p>

								  </body>

								</html>