You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
742 lines
42 KiB
742 lines
42 KiB
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
|
|
"http://www.w3.org/TR/REC-html40/loose.dtd">
|
|
<html>
|
|
<head>
|
|
<meta name="Author" content="Tim Berners-Lee">
|
|
<title>The World Wide Web: Past, Present and Future</title>
|
|
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
|
|
</head>
|
|
|
|
<body bgcolor="#FFFFFF" lang="en">
|
|
<h1>The World Wide Web: Past, Present and Future</h1>
|
|
|
|
<h4>Tim Berners-Lee</h4>
|
|
|
|
<p><i>August 1996</i></p>
|
|
|
|
<p>The author is the Director of the World Wide Web Consortium and a principal
|
|
research scientist at the Laboratory for Computer Science, Massachusetts
|
|
Institute of Technology, 545 Technology Square, Cambridge MA 02139 U.S.A.
|
|
http://www.w3.org</p>
|
|
|
|
<p><i>Draft response to invitation to publish in IEEE Computer special issue
|
|
of October 1996</i>. <i>The special issue was I think later abandoned</i>.</p>
|
|
|
|
<h4>Abstract</h4>
|
|
|
|
<blockquote>
|
|
The World Wide Web was designed originally as an interactive world of shared
|
|
information through which people could communicate with each other and with
|
|
machines. Since its inception in 1989 it has grown initially as a medium for
|
|
the broadcast of read-only material from heavily loaded corporate servers to
|
|
the mass of Internet connected consumers. Recent commercial interest its use
|
|
within the organization under the "Intranet" buzzword takes it into the
|
|
domain of smaller, closed, groups, in which greater trust allows more
|
|
interaction. In the future we look toward the web becoming a tool for even
|
|
smaller groups, families, and personal information systems. Other
|
|
interesting developments would be the increasingly interactive nature of the
|
|
interface to the user, and the increasing use of machine-readable
|
|
information with defined semantics allowing more advanced machine processing
|
|
of global information, including machine-readable signed
|
|
assertions.</blockquote>
|
|
|
|
<h2>Introduction</h2>
|
|
|
|
<p><i>This paper represents the personal views of the author, not those of the
|
|
World Wide Web Consortium members, nor of host institutes.</i></p>
|
|
|
|
<p>This paper gives an overview of the history, the current state, and
|
|
possible future directions for the World Wide Web. The Web is simply defined
|
|
as the universe of global network-accessible information. It is an abstract
|
|
space with which people can interact, and is currently chiefly populated by
|
|
interlinked pages of text, images and animations, with occasional sounds,
|
|
three dimensional worlds, and videos. Its existence marks the end of an era of
|
|
frustrating and debilitating incompatibilities between computer systems. The
|
|
explosion of advisability and the potential social and economical impact has
|
|
not passed unnoticed by a much larger community than has previously used
|
|
computers. The commercial potential in the system has driven a rapid pace of
|
|
development of new features, making the maintenance of the global
|
|
interoperability which the Web brought a continuous task for all concerned. At
|
|
the same time, it highlights a number of research areas whose solutions will
|
|
become more and more pressing, which we will only be able to mention in
|
|
passing in this paper. Let us start, though, as promised, with a mention of
|
|
the original goals of the project, conceived as it was as an answer to the
|
|
author's personal need, and the perceived needs of the organization and larger
|
|
communities of scientists and engineers, and the world in general.</p>
|
|
|
|
<h2>History</h2>
|
|
|
|
<h3>Before the web</h3>
|
|
|
|
<p>The origins of the ideas on hypertext can be traced back to historic work
|
|
such as Vanevar Bush's famous article "As We May Think" in Atlantic monthly in
|
|
1945 in which he proposed the "Memex" machine which would by a process of
|
|
binary coding, photocells and instant photography, allow microfilms
|
|
cross-references to be made and automatically followed. It continues with Doug
|
|
Englebart's "NLS" system which used digital computers and provided hypertext
|
|
email and documentation sharing, with Ted Nelson's coining of the word
|
|
"hypertext". For all these visions, the real world in which the
|
|
technologically rich field of High Energy Physics found itself in 1980 was one
|
|
of incompatible networks, disk formats, data formats, and character encoding
|
|
schemes, which made any attempt to transfer information between dislike
|
|
systems a daunting and generally impractical task. This was particularly
|
|
frustrating given that to a greater and greater extent computers were being
|
|
used directly for most information handling, and so almost anything one might
|
|
want to know was almost certainly recorded magnetically somewhere.</p>
|
|
|
|
<h3>Design Criteria</h3>
|
|
|
|
<p>The goal of the Web was to be a shared information space through which
|
|
people (and machines) could communicate.</p>
|
|
|
|
<p>The intent was that this space should span from a private information
|
|
system to a public information, from high value carefully checked and designed
|
|
material, to off-the-cuff ideas which make sense only to a few people and may
|
|
never be read again.</p>
|
|
|
|
<p>The design of the world-wide web was based on a few criteria.</p>
|
|
|
|
<p></p>
|
|
<ul>
|
|
<li>An information system must be able to record random associations between
|
|
any arbitrary objects, unlike most database systems;</li>
|
|
<li>If two sets of users started to use the system independently, to make a
|
|
link from one system to another should be an incremental effort, not
|
|
requiring unscalable operations such as the merging of link
|
|
databases.</li>
|
|
<li>Any attempt to constrain users as a whole to the use of particular
|
|
languages or operating systems was always doomed to failure;</li>
|
|
<li>Information must be available on all platforms, including future
|
|
ones;</li>
|
|
<li>Any attempt to constrain the mental model users have of data into a
|
|
given pattern was always doomed to failure;</li>
|
|
<li>If information within an organization is to be accurately represented in
|
|
the system, entering or correcting it must be trivial for the person
|
|
directly knowledgeable.</li>
|
|
</ul>
|
|
|
|
<p>The author's experience had been with a number of proprietary systems,
|
|
systems designed by physicists, and with his own <i>Enquire</i> program
|
|
(1980) which allowed random links, and had been personally useful, but had not
|
|
been usable across a wide area network.</p>
|
|
|
|
<p>Finally, a goal of the Web was that, if the interaction between person and
|
|
hypertext could be so intuitive that the machine-readable information space
|
|
gave an accurate representation of the state of people's thoughts,
|
|
interactions, and work patterns, then machine analysis could become a very
|
|
powerful management tool, seeing patters in our work and facilitating our
|
|
working together through the typical problems which beset the management of
|
|
large organizations.</p>
|
|
|
|
<h2>Basic Architectural Principles</h2>
|
|
|
|
<p>The World Wide Web architecture was proposed in 1989 and is illustrated in
|
|
the figure. It was designed to meet the criteria above, and according to
|
|
well-known principles of software design adapted to the network situation.</p>
|
|
|
|
<p><img src="../../../Talks/9603tbl/arch1a.gif" width="500" height="350"></p>
|
|
|
|
<p><i>Fig: Original WWW architecture diagram from 1990. The pink arrow shows
|
|
the common standards: URL, and HTTP, with format negotiation of the data
|
|
type</i>.</p>
|
|
|
|
<h4>Independence of specifications</h4>
|
|
|
|
<p>Flexibility was clearly a key point. Every specification needed to ensure
|
|
interoperability placed constraints on the implementation and use of the Web.
|
|
Therefore, as few things should be specified as possible (minimal constraint)
|
|
and those specifications which had to be made should be made independent
|
|
(modularity and information hiding). The independence of specifications would
|
|
allow parts of the design to be replaced while preserving the basic
|
|
architecture. A test of this ability was to replace them with older
|
|
specifications, and demonstrate the ability to intermix those with the new.
|
|
Thus, the old FTP protocol could be intermixed with the new HTTP protocol in
|
|
the address space, and conventional text documents could be intermixed with
|
|
new hypertext documents.</p>
|
|
|
|
<p>It is worth pointing out that this principle of minimal constraint was a
|
|
major factor in the web's adoption. At any point, people needed to make minor
|
|
and incremental changes to adopt the web, first as a parallel technology to
|
|
existing systems, and then as the principle one. The ability to evolve from
|
|
the past to the present within the general principles of architecture gives
|
|
some hope that evolution into the future will be equally smooth and
|
|
incremental.</p>
|
|
|
|
<h4>Universal Resource Identifiers</h4>
|
|
|
|
<p>Hypertext as a concept had been around for a long time. Typically, though,
|
|
hypertext systems were built around a database of links. This did not scale in
|
|
the sense of the requirements above. However, it did guarantee that links
|
|
would be consistent, and links to documents would be removed when documents
|
|
were removed. The removal of this feature was the principle compromise made in
|
|
the W3 architecture, which then, by allowing references to be made without
|
|
consultation with the destination, allowed the scalability which the later
|
|
growth of the web exploited.</p>
|
|
|
|
<p>The power of a link in the Web is that it can point to any document (or,
|
|
more generally, resource) of any kind in the universe of information. This
|
|
requires a global space of identifiers. These Universal Resource Identifiers
|
|
are the primary element of Web architecture. The now well-known structure
|
|
starts with a prefix such as "http:" to indicate into which space the rest of
|
|
the string points. The URI space is universal in that any new space of any
|
|
kind which has some kind of identifying, naming or addressing syntax can be
|
|
mapped into a printable syntax and given a prefix, and can then become part of
|
|
URI space. The properties of any given URI depend on the properties of the
|
|
space into which it points. Depending on these properties, some spaces tend to
|
|
be known as "name" spaces, and some as "address" spaces, but the actual
|
|
properties of a space depend not only on its definition, syntax and support
|
|
protocols, but also on the social structure supporting it and defining the
|
|
allocation and reallocation of identifiers. The web architecture, fortunately,
|
|
does not depend on the decision as to whether a URI is a name or and address,
|
|
although the phrase URL (locator) was coined in IETF circles to indicate that
|
|
most URIs actually in use were considered more like addresses than names. We
|
|
await the definition of more powerful name spaces, but note that this is not a
|
|
trivial problem.</p>
|
|
|
|
<h4>Opaqueness of identifiers</h4>
|
|
|
|
<p>An important principle is that URIs are generally treated as opaque
|
|
strings: client software is not allowed to look inside them and to draw
|
|
conclusions about the object referenced.</p>
|
|
|
|
<h4>Generic URIs</h4>
|
|
|
|
<p>Another interesting feature of URIs is that they can identify objects (such
|
|
as documents) generically: One URI can be given, for example, for a book,
|
|
which is available in several languages and several data formats. Another
|
|
URI could be given for the same book in a specific language, and another URI
|
|
could be given for a bit stream representing a specific edition of the book in
|
|
a given language and data format. Thus the concept of "identity" of an Web
|
|
object allows for genericity, which is unusual in object-oriented systems.</p>
|
|
|
|
<h4>HTTP</h4>
|
|
|
|
<p>As protocols went for accessing remote data, a standard did exist in the
|
|
<em>File Transfer Protocol</em> (FTP). However, this was not optimal for the
|
|
web, in that it was too slow and not sufficiently rich in features, so a new
|
|
protocol designed to operate with the speed necessary for traversing hypertext
|
|
links, HyperText Transfer Protocol, was designed. The HTTP URIs are resolved
|
|
into the addressed document by splitting them into two halves. The first half
|
|
is applied to the Domain Name Service [ref] to discover a suitable server, and
|
|
the second half is an opaque string which is handed to that server.</p>
|
|
|
|
<p>A feature of HTTP is that it allows a client to specify preferences in
|
|
terms of language and data format. This allows a server to select a suitable
|
|
specific object when the URI requested was generic. This feature is
|
|
implemented in various HTTP servers but tends to be underutilized by clients,
|
|
partly because of the time overhead in transmitting the preferences, and
|
|
partly because historically generic URIs have been the exception. This
|
|
feature, known as format negotiation, is one key element of independence
|
|
between the HTTP specification and the HTML specification.</p>
|
|
|
|
<h4>HTML</h4>
|
|
|
|
<p>For the interchange of hypertext, the <em>Hypertext Markup Language</em>
|
|
was defined as a data format to be transmitted over the write. Given the
|
|
presumed difficulty of encouraging the world to use a new global information
|
|
system, HTML was chosen to resemble some SGML-based systems in order to
|
|
encourage its adoption by the documentation community, among whom SGML was a
|
|
preferred syntax, and the hypertext community, among whom SGML was the only
|
|
syntax considered as a possible standard. Though adoption of SGML did allow
|
|
these communities to accept the Web more easily, SGML turned out to have very
|
|
complex and not very well defined syntax, and the attempt to find a compromise
|
|
between full SGML compatibility and ease of use of HTML bedeviled the experts
|
|
for a long time.</p>
|
|
|
|
<h2>Early History</h2>
|
|
|
|
<p>The road from conception to adoption of an idea is often tortuous, and for
|
|
the Web it certainly had its curves. It was clearly impossible to convince
|
|
anyone to use the system as it was, having a small audience and content only
|
|
about itself. Some of the steps were as follows.</p>
|
|
<ul>
|
|
<li>The initial prototype was written in NeXTStep (October-December 1990).
|
|
This allowed the simple addition of new links and new documents, as a
|
|
"wysiwyg" editor which browsed at the same time. However, the limited
|
|
deployment of NeXStep limited its visibility. The initial Web describing
|
|
the Web was written using this tool, with links to sound and graphic
|
|
files, and was published by a simple HTTP server.</li>
|
|
<li>To ensure global acceptance, a "line mode" browser was written by Nicola
|
|
Pellow, a very portable hypertext browser which allows web information to
|
|
be retrieved on any platform. This was all many people at the time saw of
|
|
the Web. (1991)</li>
|
|
<li>In order to seed the Web with data, a second server was written which
|
|
provided a gateway into a "legacy" phonebook database on a mainframe at
|
|
CERN. This was the first "useful" Web application, and so many people at
|
|
that point saw the web as a phone book program with a strange user
|
|
interface. However, it got the line mode browser onto a few desks. This
|
|
gateway server was followed by a number of others, making a web client a
|
|
useful tool within the Physics community at least.</li>
|
|
<li>No further resources being available at CERN, the Internet community at
|
|
large was encouraged to port the WorldWideWeb program to other platforms.
|
|
"Erwise", "Midas", "Viola-WWW" for X windows and "Cello" for Windows(tm)
|
|
were various resulting clients which unfortunately were only browsers,
|
|
though Viola-WWW, by Pei Wei, was interestingly based on an interpreted
|
|
mobile code language (Viola) and comparable in some respects to the later
|
|
Hot Java(TM)</li>
|
|
<li>The Internet Gopher was seen for a long time as a preferable information
|
|
system, avoiding the complexities of HTML, but rumors of the technology
|
|
being licensable provoked a general re-evaluation.</li>
|
|
<li>In 1993, Marc Andreessen of the National Center for Supercomputing
|
|
Applications, having seen ViolaWWW, wrote "Mosaic", a WWW client for X.
|
|
Mosaic was easy to install, and later allowed inline images, and became
|
|
very popular.</li>
|
|
<li>In 1994, Navisoft Inc created a browser/editor more reminiscent of the
|
|
original WorldWideWeb program, being able to browse and edit in the same
|
|
mode. [This is currently known as "AOLPress"].</li>
|
|
</ul>
|
|
|
|
<p>An early metric of web growth was the load on the first web server
|
|
<tt>info.cern.ch</tt> (originally running on the same machine as the first
|
|
client, now replaced by <tt>www.w3.org</tt>). Curiously, this grew as a steady
|
|
exponential as the graph (on a log scale) shows, at a factor of ten per year,
|
|
over three years. Thus the growth was clearly an explosion, though one could
|
|
not put a finger on any particular date as being more significant than
|
|
others.</p>
|
|
|
|
<p><img src="../../../Talks/9603tbl/bang.gif" alt="Graph of hits on
|
|
info.cern.ch 1991-94, rising by factor of 10 each year." width="411"
|
|
height="331"></p>
|
|
|
|
<p><i>Figure. Web client growth from July 1991 to July 1994. Missing points
|
|
are lost data. Even the ratio between weekend and weekday growth remained
|
|
remarkably steady.</i></p>
|
|
|
|
<p>That server included suggestions on finding and running clients and
|
|
servers. It included a page on Etiquette, which included such conventions as
|
|
the email address "webmaster" as a point of contact for queries about a
|
|
server, the fact that the URL consisting only of the name of the server should
|
|
be a default entry point, no matter what the topology of a server's internal
|
|
links.</p>
|
|
|
|
<p>This takes development to the point where the general public became aware
|
|
of it, and the rest is well documented. HTML, which was intended to be the
|
|
warp and weft of a hypertext tapestry crammed with rich and varied data types,
|
|
became surprisingly ubiquitous. Rather than relying on the extent of computer
|
|
availability and Internet connectivity, the Web started to drive it. The URL
|
|
syntax of the "http:" type became as self-describing to the public as 800
|
|
numbers.</p>
|
|
|
|
<h2>Current situation</h2>
|
|
|
|
<p>Now we summarize the current state of web deployment, and some of the
|
|
recent developments.</p>
|
|
|
|
<h4>Incompatibilities and tensions</h4>
|
|
|
|
<p>The common standards of URIs, HTTP and HTML have allowed growth of the web,
|
|
and have also allowed the development resources of companies and universities
|
|
across the world to be applied to the exploitation and extension of the web.
|
|
This has resulted in a mass of new data types and protocols.</p>
|
|
|
|
<p>In the case of new data formats, the ability of HTTP to handle arbitrary
|
|
data formats has allowed easy expansion, so the introduction, for example, of
|
|
three dimension scene description language "VRML", or the Java(tm) byte code
|
|
format for the transfer of mobile program code, has been easy. What has been
|
|
less easy has been for servers to know what clients have supported, as the
|
|
format negotiation system has not been widely deployed in clients. This has
|
|
lead, for example, to the deplorable engineering practice, in the server, of
|
|
checking the browser make and version against a table kept by the server. This
|
|
makes it difficult to introduce new clients, and is of course very difficult
|
|
to maintain. It has lead to the "spoofing" of well-known clients by new less
|
|
well known ones on order to extract sufficiently rich data from servers. This
|
|
has been accompanied by an insufficiency in the MIME types used to describe
|
|
data: text/html is used to refer to many levels of HTML; image/png is used to
|
|
refer to any PNG format graphic, when it is interesting to know how many
|
|
colors it encodes; Java(tm) files are shipped around without any visible
|
|
indication of the runtime support they will require to execute.</p>
|
|
|
|
<h4>Forces toward compatibility and progress</h4>
|
|
|
|
<p>Throughout the industry, from 1992 on, there was a strong worry that a
|
|
fragmentation of the Web standards would eventually destroy the universe of
|
|
information upon which so many developments, technical and commercial, were
|
|
being built. This lead to the formation in 1994 of the World Wide Web
|
|
Consortium. At the time of writing, the Consortium has around 150 members
|
|
including all the major developers of Web technology, and many others whose
|
|
businesses are increasingly based on the ubiquity and functionality of the
|
|
Web. Based at the Massachusetts Institute of Technology in the USA and at the
|
|
<i>Institute Nationale pour la Récherche en Informatique et Automatique</i> in
|
|
Europe, the Consortium provides a vendor-neutral forum where competing
|
|
companies can meet to agree on common specifications for the common good. The
|
|
Consortium's mission, taken broadly, is to realize the full potential of the
|
|
Web, and the directions in which this is interpreted are described later
|
|
on.</p>
|
|
|
|
<h4>From Protecting Minors to Ensuring Quality: PICS</h4>
|
|
|
|
<p>Of the developments to web protocols are driven sometimes by technical
|
|
needs of the infrastructure, such as those of efficient caching, sometimes by
|
|
particular applications, and sometimes by the connection between the Web and
|
|
the society which can be built around it. Sometimes these become interleaved.
|
|
An example of the latter was the need to address worries of parents, schools,
|
|
and governments that young children would gain access to material which though
|
|
indecency, violence or other reason, was judged harmful to them. Under threat
|
|
of government restrictions of internet use, or worse, government censorship,
|
|
the community reacted rapidly in the form of W3C's Platform for Internet
|
|
Content Selection (PICS) initiative. PICS introduces new protocol elements and
|
|
data formats to the web architecture, and is interesting in that the
|
|
principles involved may apply to future developments.</p>
|
|
|
|
<p>Essentially, PICS allows parents to set up filters for their children's
|
|
information intake, where the filters can refer to the parent's choice of
|
|
independent rating services. Philosophically, this allows parents (rather than
|
|
centralized government) to define what is too "indecent" for their children.
|
|
It is, like the Internet and the Web, a decentralized solution.</p>
|
|
|
|
<p>Technically, PICS involves a specification for a machine readable "label".
|
|
Unlike HTML, PICS labels are designed to be read by machine, by the filter
|
|
software. They are sets of attribute-value pairs, and are self-describing in
|
|
that any label carries a URL which, when dereferenced, provides both
|
|
machine-readable and human-readable explanations of the semantics of the
|
|
attributes and their possible values.</p>
|
|
|
|
<p><i>Figure: The RSAC-i rating scheme. An example of a PICS scheme.</i></p>
|
|
|
|
<p>PICS labels may be obtained in a number of ways. They may be transported on
|
|
CD-ROM, or they may be sent by a server along with labeled data. (PICS labels
|
|
may be digitally signed, so that their authenticity can be verified
|
|
independently of their method of delivery). They may also be obtained in real
|
|
time from a third party. This required a specification for a protocol for a
|
|
party A to ask a party B for any labels which refer to information originated
|
|
by party C.</p>
|
|
|
|
<p>Clearly, this technology, which is expected soon to be well deployed under
|
|
pressure about communications decency, is easily applied to many other uses.
|
|
The label querying protocol is the same as an annotation retrieval protocol.
|
|
Once deployed, it will allow label servers to present annotations as well as
|
|
normal PICS labels. PICS labels may of course be used for many different
|
|
things. Material will be able to be rated for quality for adult or scholarly
|
|
use, forming "Seals of Approval" and allowing individuals to select their
|
|
reading, buying, etc, wisely.</p>
|
|
|
|
<h4>Security and Ecommerce</h4>
|
|
|
|
<p>If the world works by the exchange of information and money, the web allows
|
|
the exchange of information, and so the interchange of money is a natural next
|
|
step. In fact, exchanging cash in the sense of unforgeable tokens is
|
|
impossible digitally, but many schemes which cryptographically or otherwise
|
|
provide assurances of promises to pay allow check book, credit card, and a
|
|
host of new forms of payment scheme to be implemented. This article does not
|
|
have space for a discussion of these schemes, nor of the various ways proposed
|
|
to implement security on the web. The ability of cryptography to ensure
|
|
confidentiality, authentication, non-repudiation, and message integrity is not
|
|
new. The current situation is that a number of proposals exist for specific
|
|
protocols for security, and for payment a fairly large and growing number of
|
|
protocols and research ideas are around. One protocol, Netscape's "Secure
|
|
Socket Layer", which gives confidentiality of a session, is well deployed. For
|
|
the sake of progress, the W3 Consortium is working on protocols to negotiate
|
|
the security and payment protocols which will be used.</p>
|
|
|
|
<h4>Machine interaction with the web</h4>
|
|
|
|
<p>To date, the principle machine analysis of material on the web has been its
|
|
textual indexing by search engines. Search engines have proven remarkably
|
|
useful, in that large indexes can be searched very rapidly, and obscure
|
|
documents found. They have proved to be remarkably useless, in that their
|
|
searches generally take only vocabulary of documents into account, and have
|
|
little or no concept of document quality, and so produce a lot of junk. Below
|
|
we discuss how adding documents with defined semantics to the web should
|
|
enable much more powerful tools.</p>
|
|
|
|
<p>Some promising new ideas involve analysis not only of the web, but of
|
|
people's interaction with it, to automatically reap more idea of quality and
|
|
relevance. Some of these programs, sophisticated search tools, have been
|
|
described as "agents" (because they act on behalf of the user), though the
|
|
term is normally used for programs that are actually mobile. There is
|
|
currently little generally deployed use of mobile agents. Mobile code is used
|
|
to create interesting human interfaces for data (such as Java "applets"), and
|
|
to bootstrap the user into a new distributed applications. Potentially,
|
|
mobile code has a much greater impact on the software architecture of software
|
|
on client and server machines. However, without a web of trust to allow mobile
|
|
programs (or indeed fixed web-searching programs) to act on a use's behalf,
|
|
progress will be very limited. </p>
|
|
|
|
<h2>Future directions</h2>
|
|
|
|
<p>Having summarized the origins of the Web, and its current state, we now
|
|
look at some possible directions in which developments could take it in the
|
|
coming years. One can separate these into three long term goals. The first
|
|
involves the improvement of the infrastructure, to provide a more functional,
|
|
robust, efficient and available service. The second is to enhance the web as a
|
|
means of communication and interaction between people. The third is to allow
|
|
the web, apart form being a space browseable by humans, to contain rich data
|
|
in a form understandable by machines, thus allowing machines to take a
|
|
stronger part in analyzing the web, and solving problems for us.</p>
|
|
|
|
<h3>Infrastructure</h3>
|
|
|
|
<p>When the web was designed, the fact that anyone could start a server, and
|
|
it could run happily on the Internet without regard to registration with any
|
|
central authority or with the number of other HTTP servers which others might
|
|
be running was seen as a key property, which enabled it to "scale". Today,
|
|
such scaling is not enough. The numbers of clients is so great that the need
|
|
is for a server to be able to operate more or less independently of the number
|
|
of clients. The are cases when the readership of documents is so great that
|
|
the load on severs becomes quite unacceptable.</p>
|
|
|
|
<p>Further, for the web to be a useful mirror of real life, it must be
|
|
possible for the emphasis on various documents to change rapidly and
|
|
dramatically. If a popular newscast refers by chance to the work of a
|
|
particular schoolchild on the web, the school cannot be expected to have the
|
|
resources to serve copies of it to all the suddenly interested parties.</p>
|
|
|
|
<p>Another cause for evolution is the fact that business is now relying on the
|
|
Web to the extend that outages of servers or network are not considered
|
|
acceptable. An architecture is required allowing fault tolerance. Both these
|
|
needs are addressed by the automatic, and sometimes preemptive, replication of
|
|
data. At the same time, one would not wish to see an exacerbation of the
|
|
situation suffered by Usenet News administrators who have to manually
|
|
configure the disk and caching times for different classes of data. One would
|
|
prefer an adaptive system which would configure itself so as to best use the
|
|
resources available to the various communities to optimize the quality of
|
|
service perceived. This is not a simple problem. It includes the problems
|
|
of</p>
|
|
<ul>
|
|
<li>categorizing documents and users so as to be able to treat them in
|
|
groups;</li>
|
|
<li>anticipating high usage of groups of documents by groups of users;</li>
|
|
<li>deciding on optimal placement of copies of data for rapid access;</li>
|
|
<li>an algorithm for finding the cheapest or nearest copy, given a URL;</li>
|
|
</ul>
|
|
|
|
<p>Resolution of these problems must occur within a context in which different
|
|
areas of the infrastructure are funded through different bodies with different
|
|
priorities and policies.</p>
|
|
|
|
<p>These are some of the long term concerns about the infrastructure, the
|
|
basic architecture of the web. In the shorter term, protocol designers are
|
|
increasing the efficiency of HTTP communication, particularly for the case of
|
|
a user whose performance limiting item is a telephone modem.</p>
|
|
|
|
<h3>Human Communication</h3>
|
|
|
|
<p>In the short term, work at W3C and elsewhere on improving the web as a
|
|
communications medium has mainly centered around the data formats for various
|
|
displayable document types: continued extensions to HTML, the new Portable
|
|
Network Graphics (PNG) specification, the Virtual Reality Markup Language
|
|
(VRML), etc. Presumably this will continue, and though HTML will be considered
|
|
part of the established infrastructure (rather than an exciting new toy),
|
|
there will always be new formats coming along, and it may be that a more
|
|
powerful and perhaps a more consistent set of formats will eventually displace
|
|
HTML. In the longer term, there are other changes to the Web which will be
|
|
necessary for its potential for human communication to be realized.</p>
|
|
|
|
<p>We have seen that the Web initially was designed to be a space within which
|
|
people could work on an expression of their shared knowledge. This was seen as
|
|
being a powerful tool, in that</p>
|
|
<ul>
|
|
<li>when people combine to build a hypertext of their shared understanding,
|
|
they have it at all times to refer to, to allay misunderstandings of
|
|
one-time messages.</li>
|
|
<li>when new people join a team, they have all the legacy of decisions and
|
|
hopefully reasons available for their inspection;</li>
|
|
<li>when people leave a team, their work is captured and integrated already,
|
|
a "debriefing" not being necessary;</li>
|
|
<li>with all the workings of a project on the web, machine analysis of the
|
|
organization becomes very enticing, perhaps allowing us to draw
|
|
conclusions about management and reorganization which an individual person
|
|
would find hard to elucidate;</li>
|
|
</ul>
|
|
|
|
<p>The intention was that the Web should be used as a personal information
|
|
system, as a group tool at all scales from the team of two, to the world
|
|
population deciding on ecological issues. An essential power of the system, as
|
|
mentioned above, was the ability to move and link information between these
|
|
layers, bringing the links between them into clear focus, and helping maintain
|
|
consistency when the layers are blurred.</p>
|
|
|
|
<p>At the time of writing, the most famous aspect of the web is the corporate
|
|
site which addresses the general consumer population. Increasingly, the power
|
|
of the web within an organization is being appreciated, under the buzzword of
|
|
the "Intranet". It is of course by definition difficult to estimate the amount
|
|
of material on private parts of the web. However, when there were only a few
|
|
hundred public servers in existence, one large computer company had over a
|
|
hundred internal servers. Although to set up a private server needs some
|
|
attention to access control, once it is done its use is accelerated by the
|
|
fact that the participants share a level of trust, by being already part of a
|
|
company of group. This encourages information sharing at a more spontaneous
|
|
and direct level than the publication rituals of passage appropriate for
|
|
public material.</p>
|
|
|
|
<p>A recent workshop shed light on a number of areas in which the Web
|
|
protocols could be improved to aid collaborative use:</p>
|
|
<ul>
|
|
<li>Better editors to allow direct interaction with web data;</li>
|
|
<li>Notificaton of those interested when information has changed;</li>
|
|
<li>Integration of audio and video internet conferencing technologies</li>
|
|
<li>Hypertext links which represent in a visible and analyzable way the
|
|
semantics of human processes such as argument, peer review, and workflow
|
|
management;</li>
|
|
<li>Third party annotation servers;</li>
|
|
<li>Verifiable authentication, allowing group membership to be established
|
|
for access control;</li>
|
|
<li>The representation of links as first class objects with version control,
|
|
authorship and ownership;</li>
|
|
</ul>
|
|
|
|
<p>among others.</p>
|
|
|
|
<p>At the microcosmic end of the scale, the web should be naturally usable as
|
|
a personal information system. Indeed, it will not be natural to use the Web
|
|
until global data and personal data are handled in a consistent way. From the
|
|
human interface point of view, this means that the basic computer interface
|
|
which typically uses a "desktop" metaphor must be integrated with hypertext.
|
|
It is not as though there are many big differences: file systems have links
|
|
("aliases", "shortcuts") just like web documents. Useful information
|
|
management objects such as folders and nested lists will need to be
|
|
transferable in standard ways to exist on the web. The author also feels that
|
|
the importance of the filename in computer systems will decrease until the
|
|
ubiquitous filename dialog box disappears. What is important about information
|
|
can best be stated in its title and the links which exist in various forms,
|
|
such as enclosure of a file within a folder, appearance of an email address in
|
|
a "To:" field of a message, the relationship of a document to its author, etc.
|
|
These semantically rich assertions make sense to a person. If the user
|
|
specifies essential information such as the availability and reliability
|
|
levels required of access to a document, and the domain of visibility of a
|
|
document, then that leaves the system to manage the niceties of disk space in
|
|
such a way as to give the required quality of service.</p>
|
|
|
|
<p>The end result, one would hope, will be a consistent and intuitive universe
|
|
of information, some part of which what one sees whenever one sees a computer
|
|
screen, whether it be a pocket screen, a living room screen, or an auditorium
|
|
screen.</p>
|
|
|
|
<h3>Machine interaction with the web</h3>
|
|
|
|
<p>As mentioned above, an early but long term goal of the web development was
|
|
that, if the web came to accurately reflect the knowledge and interworkings of
|
|
teams of people, that machine analysis would become a tool enabling us to
|
|
analysis the ways in which we interact, and facilitating our working together.
|
|
With the growth of commercial applications of the web, this extends to the
|
|
ideal of allowing computers to facilitate business, acting as agents with
|
|
power to act financially.</p>
|
|
|
|
<p>The first significant change required for this to happen is that data on
|
|
the web which is potentially useful to such a program must be available in a
|
|
machine-readable form with defined semantics. This could be done along the
|
|
lines of the Electronic Document Interchange (EDI) [ref], in which a number of
|
|
forms such as offers for sale, bills of sale, title deeds, and invoices are
|
|
devised as digital equivalents of the paper documents. In this case, the
|
|
semantics of each form is defined by a human readable specification document.
|
|
Alternatively, general purpose languages could be defined in which assertions
|
|
could be made, within which axiomatic concepts could be defined from time to
|
|
time in human readable documents. In this case, the power of the language to
|
|
combine concepts originating from different areas could lead to a very much
|
|
more powerful system on which one could base machine reasoning systems.
|
|
Knowledge Representation (KR) languages are something which, while
|
|
interesting academically, have not had a wide impact on applications of
|
|
computer. But then, the same was true of hypertext before the Web gave it
|
|
global scope.</p>
|
|
|
|
<p>There is a bi-directional connection between developments in machine
|
|
processing of global data and in cryptographic security. For machine
|
|
reasoning over a global domain to be effective, machines must be able to
|
|
verify the authenticity of assertions found on the web: this requires a global
|
|
security infrastructure allowing signed documents. Similarly, a global
|
|
security infrastructure seems to need the ability to include, in the
|
|
information about cryptographic keys and trust, the manipulation of fairly
|
|
complex assertions. It is perhaps the chicken-and-egg interdependence which
|
|
has, along with government restrictions on the use of cryptography, delayed
|
|
the deployment of either kind of system to date.</p>
|
|
|
|
<p>The PICS system may be a first step in this direction, as its labels are
|
|
machine readable.</p>
|
|
|
|
<h3>Ethical and social concerns</h3>
|
|
|
|
<p>At the first International World Wide Web Conference in Geneva in May 1994,
|
|
the author made a closing comment that, rather than being a purely academic or
|
|
technical field, the engineers would find that many ethical and social issues
|
|
were being addressed by the kinds of protocol they designed, and so that they
|
|
should not consider those issues to be somebody else's problem. In the short
|
|
time since then, such issues have appeared with increasing frequency. The
|
|
PICS initiative showed that the form of network protocols can affect the form
|
|
of a society which one builds within the information space.</p>
|
|
|
|
<p>Now we have concerns over privacy. Is the right to a really private
|
|
conversation one which we enjoy only in the middle of a large open space, or
|
|
should we give it to individuals connected across the network? Concepts of
|
|
intellectual property, central to our culture, are not expressed in a way
|
|
which maps onto the abstract information space. In an information space, we
|
|
can consider the authorship of materials, and their perception; but we have
|
|
seen above how there is a need for the underlying infrastructure to be able to
|
|
make copies of data simply for reasons of efficiency and reliability. The
|
|
concept of "copyright" as expressed in terms of copies made makes little
|
|
sense. Furthermore, once those copies have been made, automatically by the
|
|
system, this gives the possibility them being seized, and a conversation
|
|
considered private being later exposed. Indeed, it is difficult to list all
|
|
the ways in which privacy can be compromised, as operations which were
|
|
previously manual can be done in bulk extremely easily. How can content
|
|
providers get feedback out the demographic make-up of those browsing their
|
|
material, without compromising individual privacy? Though boring in small
|
|
quantities, the questions individuals ask of search engines, in bulk, could be
|
|
compromising information. </p>
|
|
|
|
<p>In the long term, there are questions as to what will happen to our
|
|
cultures when geography becomes weakened as a diversifying force? Will the net
|
|
lead to a monolithic (American) culture, or will it foster even more disparate
|
|
interest groups than exist today? Will it enable a true democracy by informing
|
|
the voting public of the realities behind state decisions, or in practice will
|
|
it harbor ghettos of bigotry where emotional intensity rather than truth gains
|
|
the readership? It is for us to decide, but it is not trivial to assess the
|
|
impact of simple engineering decisions on the answers to such questions.</p>
|
|
|
|
<h3>Conclusion</h3>
|
|
|
|
<p>The Web, like the Internet, is designed so as to create the desired "end to
|
|
end" effect, whilst hiding to as large an extent as possible the intermediate
|
|
machinery which makes it work. If the law of the land can respect this, and
|
|
be couched in an "end to end" terms, such that no government or other
|
|
interference in the mechanisms is legal that would break the end to end rules,
|
|
then it can continue in that way. If not, engineers will have to learn the
|
|
art of designing systems so that the end to end functionality is guaranteed
|
|
whatever happens in between. What TCP did for reliable delivery (providing
|
|
it end-to-end when the underlying network itself did not provide it) ,
|
|
cryptography is doing for confidentiality. Further protocols may do this for
|
|
information ownership, payment, and other facets of interaction which are
|
|
currently bound by geography. For the information space to be a powerful place
|
|
in which to solve the problems of the next generations, its integrity,
|
|
including its independence of hardware, packet route, operating system, and
|
|
application software brand, is essential. Its properties must be consistent,
|
|
reliable, and fair, and the laws of our countries will have to work hand in
|
|
hand with the specifications of network protocols to make that so.</p>
|
|
|
|
<h2>References</h2>
|
|
|
|
<p>Space is insufficient for a bibliography for a field involving so much work
|
|
by so many. The World Wide Web has a dedicated series of conferences run by an
|
|
independent committee. For papers on advances and proposals on Web related
|
|
topics, the reader is directed to past and future conferences. The proceedings
|
|
of the last two conferences to date are as below.</p>
|
|
|
|
<p><i>Proceedings of the Fourth International World Wide Web Conference</i>
|
|
<i>(Boston 1995)</i>, The World Wide Web Journal, Vol. 1, Iss. 1, O'Reilly,
|
|
Nov. 1995. ISSN 1085-2301, ISBN: 1-56592-169-0. [[Later issues may also be of
|
|
interest.]</p>
|
|
|
|
<p><i>Proceedings of the Fifth Internatonal World Wide Web Conference</i>,
|
|
Computer Networks and ISDN systems, Vol 28 Nos 7-11, Elsevier, May 1996.</p>
|
|
|
|
<p>Also refered to in the text:</p>
|
|
|
|
<p>[1] Bush, Vannevar, "As We May Think", <i>Atlantic Monthly</i>, July 1945.
|
|
(Reprinted also in the following:)</p>
|
|
|
|
<p>[2] Nelson, Theodore, <i>Literary Machines 90.1</i>, Mindful Press,
|
|
1990</p>
|
|
|
|
<p>[3] Englebart, Douglas, <i>Boosting Our Collective IQ - Selected
|
|
Readings</i>, Boostrap Institute/BLT Press, 1995, <AUGMENT,133150,>,
|
|
ISBN:1-895936-01-2</p>
|
|
|
|
<p>[5] On Gopher, See F. Anklesaria, M. McCahill, P. Lindner, D. Johnson, D.
|
|
John, D. Torrey, B. Alberti, "The Internet Gopher Protocol (a distributed
|
|
document search and retrieval protocol)", RFC 1436 03/18/1993. ,
|
|
http://ds.internic.net/rfc/rfc1436.txt</p>
|
|
|
|
<p>[6] On EDI, See http://polaris.disa.org/edi/edihome.htp</p>
|
|
|
|
<p></p>
|
|
</body>
|
|
</html>
|