You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
505 lines
20 KiB
505 lines
20 KiB
<html xmlns="http://www.w3.org/1999/xhtml">
|
|
<head>
|
|
<meta name="generator" content=
|
|
"HTML Tidy for Mac OS X (vers 31 October 2006 - Apple Inc. build 13), see www.w3.org" />
|
|
<title>
|
|
Univeral Resource Identifiers -- Axioms of Web architecture
|
|
</title>
|
|
<link href="di.css" rel="stylesheet" type="text/css" />
|
|
<meta http-equiv="Content-Type" content=
|
|
"text/html; charset=us-ascii" />
|
|
</head>
|
|
<body bgcolor="#DDFFDD" text="#000000" lang="en" xml:lang="en">
|
|
<address>
|
|
Tim Berners-Lee
|
|
<p>
|
|
Date: January 1998
|
|
</p>
|
|
<p>
|
|
Status: personal view. Editing status: Spellchecked.
|
|
</p>
|
|
</address>
|
|
<p>
|
|
<a href="Overview.html">Up to Design Issues</a>
|
|
</p>
|
|
<h3>
|
|
Axioms of Web Architecture: 0
|
|
</h3>
|
|
<ul>
|
|
<li>
|
|
<a href="Model.html#Model">The Web model</a>
|
|
</li>
|
|
<li>
|
|
<a href="Model.html#Resource">Resources</a>
|
|
</li>
|
|
<li>
|
|
<a href="Model.html#Fragement">Fragment IDs</a>
|
|
</li>
|
|
<li>
|
|
<a href="Model.html">Document sets and relative
|
|
addressing</a>
|
|
</li>
|
|
<li>...
|
|
</li>
|
|
</ul>
|
|
<hr />
|
|
<h1>
|
|
<a name="Model" id="Model">The Web Model</a>
|
|
</h1>
|
|
<p>
|
|
The web is a very general concept -- one universal space of
|
|
information. The concepts it requires such as identifiers and
|
|
information resources (documents) are as general and abstract
|
|
as possible. However, there have been some design decisions
|
|
made which define some interfaces, and effectively define
|
|
modules or agents which are independent. These agents are
|
|
independent in many ways
|
|
</p>
|
|
<ul>
|
|
<li>There is knowledge they have individually but do not
|
|
share
|
|
</li>
|
|
<li>There is knowledge their designers had individually but
|
|
did not share
|
|
</li>
|
|
</ul>
|
|
<p>
|
|
This is basic modularity. The interfaces are defined by the
|
|
data formats and protocols, and the important features to
|
|
understand about the design I have ranted about in the linked
|
|
articles in this series. This modularity, ability for
|
|
different parts of the system, shows up when different specs
|
|
are independent, such that you could change one without
|
|
having to change the other.
|
|
</p>
|
|
<h2>
|
|
<a name="Resource" id="Resource">The Information Resource</a>
|
|
</h2>
|
|
<p>
|
|
(Formerly, <a href="#Resource1">Resource</a>)
|
|
</p>
|
|
<p>
|
|
This is the current term for a certain unit of information in
|
|
the Web. In many cases on the current Web, thinking
|
|
"document" will do. It is something which conveys
|
|
information. The Web model is that information in the
|
|
information space is in the abstract chunked into addressable
|
|
things known as resources.
|
|
</p>
|
|
<p>
|
|
In the technical architecture, resources have identifiers,
|
|
Universal Resource Identifiers, and the properties of these
|
|
identifiers are elaborated later. In fact the concept of a
|
|
unit of information is central, not only in the technical
|
|
architecture, but in society's concepts of information, as a
|
|
document is not only the unit for reference, retrieval and
|
|
presentation (typically), but also the unit of ownership,
|
|
license to use, payment, confidentiality, endorsement, etc.
|
|
So though technically we can derive such things as compound
|
|
document, generic documents, and resources which look
|
|
anything but the typical notion of a "document", we have to
|
|
be able to support these social aspects of information at the
|
|
same time, so we can't mess with it too much.
|
|
</p>
|
|
<h2>
|
|
<a name="Fragement" id="Fragement">Fragment Id and "#"</a>
|
|
</h2>
|
|
<p>
|
|
In the hypertext architecture, when making a reference, such
|
|
as a hypertext link, we don't just refer to an information
|
|
resource. Well, we can, but we can also refer to a particular
|
|
part of or view of a resource. The string which, within the
|
|
document, defines the other end of the link has two parts. It
|
|
has the identifier of the document as a whole, and then
|
|
optionally it has a hash sign "#" and a string representing
|
|
the view of the object required. This suffix is called
|
|
a fragment identifier. (Even though it doesn't
|
|
represent necessarily a fragment of the document: it could
|
|
represent how the document should be viewed.). The fragment
|
|
identifier only has relevance in the context of the web page
|
|
in question. This has an implication how the software is
|
|
built. For example, An "access" module can be given just the
|
|
bit of the URI without the fragment identifier. It gets the
|
|
information, and creates a software object for the hypertext
|
|
page. That object is passed the fragment identifier.
|
|
</p>
|
|
<p>
|
|
<img src="ParseHash.png" width="100%" alt=
|
|
"The URI is split off at the hash into a fragement ID and the rest"
|
|
border="0" />
|
|
</p>
|
|
<p>
|
|
In fact, analyzing the system a little more, the access
|
|
function can be broken into the underlying access which
|
|
creates the object by passing two things to some kind of
|
|
object creator ("factory"): a data stream and a MIME type.
|
|
</p>
|
|
<h3>
|
|
Generally
|
|
</h3>
|
|
<p>
|
|
Hypertext is a specific application, but this principle works
|
|
for other applications on the Web. In fact, when we discuss
|
|
<a href="Webize">webizing</a> an application, we take some
|
|
computer language, and we take what were document-global
|
|
things, say global variables in a programming language, and
|
|
make them truly global by appending the URI of the document
|
|
and "#".
|
|
</p>
|
|
<p>
|
|
Clearly, in different applications the fragment identifier
|
|
will have completely different function. The independence
|
|
here means that new applications (such as the Semantic Web)
|
|
can be built, just like hypertext web, just by introducing
|
|
new types of document.
|
|
</p>
|
|
<h2>
|
|
Independence
|
|
</h2>
|
|
<p>
|
|
The model of how the web works is that there are two separate
|
|
functions. The part (blue in the picture) which
|
|
accesses the document deals with its identifier, but does not
|
|
know what view will be required. It creates some
|
|
software object which represents and presents the resource.
|
|
That object does not need to know how it was created
|
|
(necessarily), and so does not need to know the URI it was
|
|
identified by. However, it does know how to interpret the
|
|
Fragment ID.
|
|
</p>
|
|
<p>
|
|
So we have two axioms:
|
|
</p>
|
|
<table border="1" cellpadding="2">
|
|
<tbody>
|
|
<tr>
|
|
<td>
|
|
The access machinery does not need to look at the
|
|
fragment ID.
|
|
</td>
|
|
<td></td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
<table border="1" cellpadding="2">
|
|
<tbody>
|
|
<tr>
|
|
<td>
|
|
The presentation object does not need to know the URI
|
|
of the resource
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
<p>
|
|
The equivalent axioms when we are talking about
|
|
specifications amount to:
|
|
</p>
|
|
<table border="1" cellspacing="5" cellpadding="5">
|
|
<tbody>
|
|
<tr>
|
|
<td>
|
|
The specifications for access protocols are independent
|
|
of the specifications for fragment identifiers.
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
<h3>
|
|
Why?
|
|
</h3>
|
|
<p>
|
|
For one thing, consider the special case of a link within a
|
|
document. In this case, the link <b>only</b> specifies
|
|
a fragment identifier. The object can follow the link
|
|
itself. It doesn't have to consult the access code in
|
|
order to figure out where the link goes to.
|
|
Because the "#" syntax s universal to all access
|
|
methods, the object can process the link internally.
|
|
For a static HTML file, for example, this means that
|
|
you can write and HTMl file with internal links without
|
|
worrying or knowing about exactly what URIs the file will
|
|
get. It means you don't have to alter the file if you
|
|
chose to serve it in some new name or address space. If
|
|
the "#" syntax was not a universal specification for the web,
|
|
this would break: you couldn't do it. As Jim Gettys points
|
|
out, as the era of digitally signed documents comes upon us,
|
|
changing a signed document will break the signature on it. So
|
|
allowing one to make a self-consistent document with internal
|
|
links in a way independent of the namespace is even more
|
|
essential.
|
|
</p>
|
|
<h3>
|
|
Why else?
|
|
</h3>
|
|
<p>
|
|
This independence is very important for the evolution of the
|
|
Web. It means that people can go off and design all
|
|
kinds of new systems for naming, addressing and accessing
|
|
documents, without having to worry about what sort of
|
|
documents will be moved. It means that people can go
|
|
off and make new media types (MIME types), each of which can
|
|
have different concepts for views and fragments, without
|
|
having to talk to the people developing the access
|
|
technology. This has already (1998) proved incredibly
|
|
enabling to the community, as HTTP has advanced in parallel
|
|
with many other ways of accessing data, and the number of
|
|
exciting media types has grown very rapidly, and will be the
|
|
key to many new revolutions built on top of the basic Web
|
|
idea.
|
|
</p>
|
|
<p>
|
|
If you look at the diagram you ill notice how the fragment
|
|
IDs are generated by and understood by just the one module.
|
|
You see how, when designing a new MIME type, one is
|
|
quite free to be creative in making new and powerful forms of
|
|
fragment ID, knowing hat no other specifications will refer
|
|
to them, and nothing else will break.
|
|
</p>
|
|
<h2>
|
|
Document sets and relative addressing
|
|
</h2>
|
|
<p>
|
|
Now let us look at what happens when we follow a link.
|
|
For example, say a hypertext page is clicked on.
|
|
The page has a representation of the end point of the
|
|
link. It hands it to the application. In fact,
|
|
often, there are links between pages whose URIs are very
|
|
similar and only differ in the right hand part. This
|
|
isn't true of all name spaces: for example, when making links
|
|
between news articles identifies by the news id (news:foo)
|
|
unique ID, you have to specify the whole thing. However, if
|
|
you restrict publication of a set of documents to a
|
|
hierarchical name or address space, then you can arrange for
|
|
documents which are very related and have many links to be in
|
|
the same part of the tree.
|
|
</p>
|
|
<p>
|
|
In this case, the links between these documents are "relative
|
|
URIs".
|
|
</p>
|
|
<p>
|
|
What happens then is that the relative URI, which only has
|
|
the locally different part of the URI in it, is handed back
|
|
to what in the diagram I have called the "application", to be
|
|
turned into an absolute URI by being combined with the
|
|
absolute URI of the resource, which the application has
|
|
remembered.
|
|
</p>
|
|
<p>
|
|
Note that the application is aware of the absolute URI but
|
|
still the resource does not have to.
|
|
</p>
|
|
<p>
|
|
Note that the fragment id is still circulated around a loop
|
|
between the object (green) which understands it and the
|
|
applications (yellow) which handles it transparently but does
|
|
not understand or change it.
|
|
</p>
|
|
<p>
|
|
Now there was a design decision that the application could
|
|
have passed to the access module both the relative URI and
|
|
the absolute URI. Then, different namespaces would have been
|
|
able to have different algorithms for resolving a base URI
|
|
and a relative URI into a new absolute URI. But the decision
|
|
was made that the relative address format should be common
|
|
across all name spaces.
|
|
</p>
|
|
<p>
|
|
<img src="Parse2.png" width="100%" alt=
|
|
"The URI is split off at the hash into a fragement ID and the rest"
|
|
border="0" />
|
|
</p>
|
|
<h3>
|
|
Why?
|
|
</h3>
|
|
<p>
|
|
Just as we considered internal links above, now consider
|
|
relative links between a bunch of documents, like the
|
|
sections of a book, which are close in the tree. In
|
|
practice, such document sets are moved from place to place,
|
|
from file systems into HTTP space or FTP space, and because
|
|
the relative address rules are universal, the documents do
|
|
not have to be modified every time they are moved. (Yes, if
|
|
you move half the set to one place and half to another, you
|
|
have to fix links). This is happening all the time.
|
|
People are creating and programs are generating
|
|
hypertext with relative links without knowing or caring what
|
|
absolute URI will be used to refer to the material.
|
|
</p>
|
|
<h2>
|
|
The access scheme
|
|
</h2>
|
|
<p>
|
|
<img src="Parse3.png" width="100%" alt=
|
|
"The URI is split off at the hash into a fragement ID and the rest"
|
|
border="0" />
|
|
</p>
|
|
<p>
|
|
The so-called "access scheme" is the first part of the URI.
|
|
As we have seen above, you don't have to know anything about
|
|
it to parse relative URIs or to process the fragment
|
|
identifier of a URI. The knowledge of particular schemes is
|
|
limited to the "access" function (blue in the above diagram).
|
|
</p>
|
|
<p>
|
|
The scheme is a very important flexibility point, and should
|
|
not be abused. Anyone dereferencing a URI must have a
|
|
knowledge of the scheme it uses.
|
|
</p>
|
|
<p>
|
|
The access scheme defines a huge part of URI space. The
|
|
scheme defines a subspace with particular properties
|
|
</p>
|
|
<p>
|
|
The access scheme is <i>by definition</i> the highest point
|
|
of flexibility. What does that mean? It means that if the
|
|
whole Web develops problems which we cannot solve within the
|
|
existing protocols, or if new spaces are designed which
|
|
really can't be accessed through or mapped into existing
|
|
spaces, then we can create a new space. We have faith that we
|
|
will be able to use this flexibility point in the future,
|
|
because it worked successfully for integrating the older
|
|
spaces such as Gopher and FTP spaces into the Web.
|
|
</p>
|
|
<table border="1" cellpadding="2">
|
|
<tbody>
|
|
<tr>
|
|
<td>
|
|
If you have ported a concept between environments in
|
|
the past, then there is a better hope that you can in
|
|
the future.
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
<h3>
|
|
The danger of too many access schemes
|
|
</h3>
|
|
<p>
|
|
However, we do not do this lightly. When we introduce a new
|
|
space, it may have very different properties and we expect
|
|
that the deployment of new software will be needed to allow
|
|
access to it. Some spaces may be gatewayable into HTTP space,
|
|
and this will often provide a transition path. This is why
|
|
early browsers allowed one to declare in a configuration file
|
|
what gateways to use for what new spaces.
|
|
</p>
|
|
<p>
|
|
If we use this extension point frivolously, ironically, it
|
|
will cease to work. Suppose very many schemes are introduced.
|
|
The access scheme space itself becomes a namespace with all
|
|
the problems which current namespaces such as DNS are trying
|
|
to solve, but which are very hard problems:
|
|
</p>
|
|
<ul>
|
|
<li>Clashes in the namespace would destroy interoperability;
|
|
</li>
|
|
<li>Ownership of the space becomes commercially valuable;
|
|
</li>
|
|
<li>Democratic and fair management becomes essential and
|
|
difficult;
|
|
</li>
|
|
</ul>
|
|
<p>
|
|
Worse, though, technology will be needed to automatically
|
|
dereference the schemes themselves and download code to
|
|
handle them. Something like DNS will be needed. The top level
|
|
namespace then becomes in fact DNS, or something like it.
|
|
This, however, begs the question. What happens if later DNS
|
|
needs to be replaced? There is no top-level extension switch
|
|
left. The world is stuck with whatever form of access-scheme
|
|
name service exists.
|
|
</p>
|
|
<p>
|
|
Therefore, I conclude that access schemes should not be open
|
|
to trivial extension, and that the access scheme should only
|
|
be extended by the introduction of new standards with full
|
|
open review by the entire community.
|
|
</p>
|
|
<h3>
|
|
Alternatives to new schemes
|
|
</h3>
|
|
<p>
|
|
Whereas some schemes (like "data:") are clearly neat and new
|
|
and orthogonal to HTTP, many schemes could in fact be
|
|
integrated into http, using HTTP extension mechanisms.
|
|
</p>
|
|
<p>
|
|
In fact, is HTTP is to be taken as a general computing
|
|
protocol, then use of an <a href="Extensible.html">extensible
|
|
language system</a> for the HTTP request message would allow
|
|
a huge amount of extension, covering protocols with different
|
|
functionality (exporting different interfaces).
|
|
</p>
|
|
<h3>
|
|
Evolving scheme spaces
|
|
</h3>
|
|
<p>
|
|
When considering the evolution of a space, it is important to
|
|
remember that primarily the access scheme refers to a part of
|
|
the URI space, and secondarily it refers to a protocol.
|
|
Therefore, one can in fact change the protocols used to
|
|
access resources within a scheme's namespace, without
|
|
changing the space. For example, a new DNS protocol could be
|
|
introduced which over time would replace the current one,
|
|
without changing the DNS space. This would effectively
|
|
redefine the HTTP and FTP protocols, but would not harm the
|
|
namespaces. When touch-tone dialing was introduced, the
|
|
telephone numbering system remained the same. So an indexing
|
|
system could be introduced which, when deployed, would allow
|
|
http:// space objects to be found with greater reliability or
|
|
speed than the current protocols, while maintaining the HTTP
|
|
space as being the concatenation of a DNS name and an opaque
|
|
string.
|
|
</p>
|
|
<hr />
|
|
<h2>
|
|
Footnote
|
|
</h2>
|
|
<h4>
|
|
<a name="Resource1" id="Resource1">Resource</a>
|
|
</h4>
|
|
<p>
|
|
The word "document" in the original "Universal Document
|
|
Identifier" in the first web spec was changed to "Resource"
|
|
in the IETF discussions, because (a) the word "document"
|
|
didn't seem to cover all kinds of information resources such
|
|
as movies and sounds, and (b) actually URIs exist for
|
|
communication endpoints such as mailboxes (mailto:) and login
|
|
ports (telnet:). "Resource" was, though, later used by RDF as
|
|
a term for anything - the top class which is the superclass
|
|
of all classes. This stemmed from RDF's initial use as a
|
|
language for describing information resources on the Web,
|
|
although RDF was designed to be used to describe anything as
|
|
a general knowledge representation system. The term
|
|
"Information Resource" was adopted by the TAG for the Web
|
|
Architecture document. When people, including the author in
|
|
the article above, refer to an information resource, they
|
|
often
|
|
</p>
|
|
<h2>
|
|
Related material elsewhere in these notes
|
|
</h2>
|
|
<p>
|
|
<i>Content/Version negotiation and Fragment ID persistence:
|
|
warnings and awareness.</i> See <a href=
|
|
"Fragment.html">Fragment Identifiers</a>
|
|
</p>
|
|
<p>
|
|
<i> If you negotiate between MIME types which have
|
|
different fragment ID representations, you run a risk &
|
|
should warn the client.</i>
|
|
</p>
|
|
<p>
|
|
To be added:
|
|
</p>
|
|
<p>
|
|
<i>Level breaking with care: optimizing in HTTPNG etc</i>
|
|
</p>
|
|
<hr />
|
|
<p>
|
|
<a href="Overview.html">Up to Design Issues</a>, On to URIs
|
|
</p>
|
|
</body>
|
|
</html>
|