Another abandoned server code base... this is kind of an ancestor of taskrambler.
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 
 
 

907 lines
29 KiB

<?xml version="1.0" encoding="iso-8859-1"?>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="generator" content=
"HTML Tidy for Mac OS X (vers 31 October 2006 - Apple Inc. build 13), see www.w3.org" />
<title>
Fractal Web - Commentary on Web Architecture
</title>
<link rel="Stylesheet" href="di.css" type="text/css" />
<meta http-equiv="content-type" content=
"text/html; charset=us-ascii" />
</head>
<body bgcolor="#DDFFDD" xml:lang="en" lang="en" text="#000000">
<address>
Tim Berners-Lee<br />
Date: 1998, last change: $Date: 2011/09/27 22:31:21 $<br />
Status: personal view only. Editing status: Mature. Appended
to at intervals when new things turn up.
</address>
<p>
<a href="./">Up to Design Issues</a>
</p>
<h3>
Commentary on Architecture
</h3>
<hr />
<h1>
The Scale-free nature of the Web
</h1>
<p>
This article was originally entitled "The Fractal nature of
the web". Since then, i have been assured that while many
people seem to use <em>fractal</em> to refer to a Zipf (1/f)
distribution, it should really only be used in spaces of
finite dimension, like the two-dimensional planes of
MandelBrot sets. The correct term for the Web, then, is
<em>scale-free</em>.
</p>
<p>
This isn't an observation so much as a requirement.
</p>
<p>
I have <a href="#Berners-Lee">discussed elsewhere</a> how we
must avoid the two opposite social deaths of a global
monoculture and a set of isolated cults, and how the fractal
patterns found in nature seem to present themselves as a good
compromise. It seems that the compromise between stability
and diversity is served by there the same amount of structure
at all scales. I have no mathematical theory to demonstrate
that this is an optimization of some metric for the
resilience of society and its effectiveness as an organism,
nor have I even that metric. (Mail me if you do!)
</p>
<p>
However, it seems from experience that groups are stable when
they have a set of peers, when they have a substructure.
Neither the set of peers nor the substructure must involve
huge numbers, as groups cannot "scale", that is, work
effectively with a very large number of liaisons with peers,
or when composed as a set of a very large number of parts. If
this is the case then by induction there must be a continuum
of group sizes from the vary largest to the very smallest.
</p>
<p>
This seems to be a general rule which can guide our design,
and against which we can measure actual patterns of use.
</p>
<p>
It is in fact another aspect of the tension between many
languages and one global language. Locally defined languages
are easy to create, needing local consensus about meaning:
only a limited number of people have to share a mental
pattern of relationships which define the meaning. However,
global languages are so much more effective at communication,
reaching the parts that local languages cannot. This tension
is exemplified in the standards process, when ideas have to
be exposed to successively larger and larger groups, with
friction and hard work at each stage.
</p>
<p>
Other interesting things to model passing though a fractal
system include DNA traits in intermarrying populations
Someone suggested (who?) that the invention of the bicycle
made a great difference to average health in the Welsh
valleys because it allowed greater intermarrying and so
increased the effective gene pool size Clearly, global travel
could end up reducing the diversity. viruses propagating
through schools and traveling business people; and problems
propagating to someone who has a solution are more good
exercises (State your assumptions!).
</p>
<h3>
Zipf happens
</h3>
<p>
Whether we like it or not, early measurements of web traffic
by the DEC WRL firewall showed DEC employees browsing sites
with a Zipf (1/n) distribution of popularity. (Anyone got any
other measurements? [Neilsen 1997]). Recent analyses suggest
the Web becoming smaller for its size seem to use.
</p>
<p>
How can we use knowledge of the Web's fractal nature? By
planning network bandwidth between long-range and short-range
communication, planning for cache usage, etc. The physical
network can be expected to have a variety of scale
geographically, like the road system. However, the structure
of the Web is interestingly different because of the lack of
two-dimensional constraint. The challenge is to use this
flexibility in building an effective society on top of the
Web.
</p>
<h3>
Looking for a metric
</h3>
<p>
What do we mean by "effective"? We mean we would like to
combine scientist's creative ability and knowledge to find a
cure for AIDS. We would like to preserve world peace by
allowing xenophobia to disperse in a web of understanding,
while at the same time preserving the diversity of culture
which gives the human race its richness. These are of course
the same classic problems of the management of a large
organization, of combining individual creativity with
corporate vision.
</p>
<p>
If the web of society has an imbalance, we pay for it. We pay
for insufficient global understanding with war. We pay for
insufficient family communication with broken families and
unsupported individuals. At any level of scale, missing
social structure at that scale will prevent problems at that
scale being addressed, and also prevent resources at that
scale being used. It would therefore be great to have a way
of measuring for a given web the degree to which it has a
balanced fractal pattern, and if not where its weaknesses
are.
</p>
<p>
Those looking for the "small world" effect chose metrics such
as the maximum or mean value of the shortest path between any
two points. This gives us a metric for effectiveness at the
global scale, but not of the chewiness.
</p>
<p>
Clustering algorithms can produce globs of various sizes, and
a measure of the chewiness of a web may be that the cluster
sizes have a Zipf distribution. For example, using Jon
Kleinberg's algorithm (which for a link matrix A associates
concepts with the eigenvectors of A*A), the strength of the
cluster is the value of the eigenvalue, and (while this does
not directly indicate size) an interesting test would be on
the relative absolute values (squares?) of successive
eigenvalues.
</p>
<p>
Looking it at from the point of view of an individual (a
graph node), an interesting question is the proportion of the
traffic which is to local or more distant nodes. In
Marchiori's model [<a href="#marchiori">Marchiori</a>]
traffic flows between two nodes in inverse proportion to the
resistance of the shortest path. The total "efficiency" is
deemed to be the total flow between all pairs of nodes. Can
we measure a "chewiness" which measures the approximation of
the system to a fractal distribution of long and short range
communication? If the Marchiori model were modified to use
parallel conductance (more like a real signal flow system)
then would this be simpler?
</p>
<p>
Suppose for example we look at the amount of connection we
have with nodes whose distance, or groups whose size, is of
each order of magnitude and look for smoothness up to the
global level.
</p>
<h3>
Stop Press
</h3>
<p>
<em>2000/03</em>
</p>
<p>
Well, here I was thinking that while it is intuitively clear
that society has to be fractal, I didn't know a mathematical
justification for it, when <a href=
"http://www.cs.cornell.edu/home/kleinber/kleinber.html">Jon
Kleinberg</a> comes up with what for me is his second cool
web result.
</p>
<p>
This is a paper takes the case of a two-dimensional grid. It
imagines each cell having a certain distribution of links of
various lengths. It demonstrates that in order to achieve the
connectivity a la <em>6 degrees of separation</em> which
scales with the log of the size of the system, then the
distribution of link density as a function of distance must
be precisely an inverse-square law. That is, each cell must
have the same number of links (on average) to cells 1-10
squares away as to cells 10-100 away, etc. Anything more
local or more global leads to less of a small-world
phenomenon: this is the only scalable solution.
</p>
<p>
True, this applies to a geographical grid, and a square
rather uniform one at that. However, He does generalize it to
more dimensions. Furthermore, you can see logically how the
system works. To get a postcard to an arbitrary person in
Massachusetts through a network of friends, you must have
enough local friends to be able to find someone who will know
someone in Massachusetts. The person they find in
Massachusetts must be able to pass it to people successively
closer and closer to the target. this only works if there is
connectivity on each scale. True, no one has derived the
metric of the number of hops a message takes as being an
essential metric for systems, but on the other hand there is
a clear analogy with the number of hops between a problem and
a solution in a large organization .
</p>
<p>
Other work:
</p>
<ul>
<li>
<a href="http://dmag.upf.es/livingsw">Living semantic
web</a>
</li>
</ul>
<h3>
Data from Swoogle April 2005
</h3><img style="width: 500px; height: 400px; float: right;"
alt="Yes, zipf dist from Swoogle" src=
"diagrams/swoogle/figure6-2005-04.png" /><br />
Nice to see some Zipf-shaped curves. &nbsp;Swoogle <a href=
"http://swoogle.umbc.edu/modules.php?name=Swoogle_Statistics&amp;file=figure&amp;figurename=figure6">
notes</a>:
<ul>
<li>All these series follows Zipf's distribution, except the
tail
</li>
<li>The sharp decrease the tail in "class populated" shows
that the most populated classes highly correlated such that
their are populated by almost the same amount of SWDs.
Similar situation can be observed in other series.
</li>
<li>The closeness of the sharp decrease of "class populated"
and "property populated" is caused by the co-existence of
certain classes and certain properties.
</li>
</ul>
<h2 id="personal">
Postscript - A personal exercise
</h2>
<p>
There will I am sure be a lot of ways in which the fractal
requirement is used in web design. You can also use it in
that task of figuring out how you fit in to society at large
(and at small). Do your personal interactions spread across
the scales? Here is a self-help chart to help think about
this. You fill in the groups in your life.
</p>
<table border="1">
<tbody>
<tr>
<th>
Scale
</th>
<td>
1
</td>
<td>
10
</td>
<td>
1000
</td>
<td>
10k
</td>
<td>
100k
</td>
<td>
1M
</td>
<td>
10M
</td>
<td>
100M
</td>
<td>
1G
</td>
</tr>
<tr>
<th>
Group
</th>
<td>
You
</td>
<td>
family,
<p>
group
</p>
</td>
<td>
...
</td>
<td>
...
</td>
<td>
town?
</td>
<td>
city?
</td>
<td>
country?
</td>
<td>
USA
</td>
<td>
World population
</td>
</tr>
<tr>
<th>
Time spent
</th>
<td>
?
</td>
<td>
?
</td>
<td>
?
</td>
<td>
?
</td>
<td>
?
</td>
<td>
?
</td>
<td>
?
</td>
<td>
?
</td>
<td>
?
</td>
</tr>
<tr>
<th>
Money spent
</th>
<td>
?
</td>
<td>
?
</td>
<td>
?
</td>
<td>
?
</td>
<td>
?
</td>
<td>
?
</td>
<td>
?
</td>
<td>
?
</td>
<td>
?
</td>
</tr>
<tr>
<th>
etc
</th>
<td>
?
</td>
<td>
?
</td>
<td>
?
</td>
<td>
?
</td>
<td>
?
</td>
<td>
?
</td>
<td>
?
</td>
<td>
?
</td>
<td>
?
</td>
</tr>
</tbody>
</table>
<p>
Another way to do this is find 11 jars, and label one with
each scale in powers of 10. (You don't have to paint them but
it helps).
</p>
<p>
<img src="diagrams/jars.png" alt="11 jars from 1 to 1G" />
</p>
<p>
Put marbles in each can for each time period you spend on
matters at a given scale, such as an international meeting,
or a school sportsfield, or with your family, or alone in a
treehouse. How well balanced do the jars become?
</p>
<p>
As a social person, do you spend enough time with groups of
each size? If not, are there people one click from you who
do, and through whom you are indirectly present in those
groups? One of the concerns is that the last column - the
global column - tends in my observation to get the smallest
amount money at least, as in the US federal and state and
town taxes are spread around the other areas but the level of
international aid is very much lower. The cool thing is that
I think people are born with DNA which gives them a healthy
interest at all these levels. People who stick at one scale
all their lives feel very uncomfortable. Maybe our
preferences have evolved to form naturally a fractal society.
</p>
<h3>
<a name="tco" id="tco">Total Cost of Ontologies (2005)</a>
</h3>(I can't remember where I originally brought this up, I
think at the Web Science workshop in London 2005/9. This is
from ISWC 2005 slides.)
<p>
One of the interesting things about assuming a fractal
distribution is you can think about the number of ontologies
an the time it takes to make them, and the total cost of
using ontologies. So let us for example naivel assume
that<br />
ontologies are evenly spread across orders of magnitude;
committe &nbsp;size goes&nbsp; as log(community),&nbsp;time
as comitee^2, cost is shared across community.<br />
</p>
<table style="text-align: left; width: 100%;" border="1"
cellpadding="2" cellspacing="2">
<tbody>
<tr>
<td>
Scale
</td>
<td>
Eg
</td>
<td>
Committe size
</td>
<td>
Cost per ontology (weeks)
</td>
<td>
Cost for me
</td>
</tr>
<tr>
<td>
0
</td>
<td>
Me
</td>
<td>
1
</td>
<td>
1
</td>
<td>
1.000000
</td>
</tr>
<tr>
<td>
10
</td>
<td>
My team
</td>
<td>
4
</td>
<td>
16
</td>
<td>
1.600000
</td>
</tr>
<tr>
<td>
100
</td>
<td>
Group
</td>
<td>
7
</td>
<td>
49
</td>
<td>
0.490000
</td>
</tr>
<tr>
<td>
1000
</td>
<td></td>
<td>
10
</td>
<td>
100
</td>
<td>
0.100000
</td>
</tr>
<tr>
<td>
10k
</td>
<td>
Enterprise
</td>
<td>
13
</td>
<td>
169
</td>
<td>
0.016900
</td>
</tr>
<tr>
<td>
100k
</td>
<td>
Business area
</td>
<td>
16
</td>
<td>
256
</td>
<td>
0.002560
</td>
</tr>
<tr>
<td>
1M
</td>
<td></td>
<td>
19
</td>
<td>
361
</td>
<td>
0.000361
</td>
</tr>
<tr>
<td>
10M
</td>
<td></td>
<td>
22
</td>
<td>
484
</td>
<td>
0.000048
</td>
</tr>
<tr>
<td>
100M
</td>
<td>
National, State
</td>
<td>
25
</td>
<td>
625
</td>
<td>
0.000006
</td>
</tr>
<tr>
<td>
1G
</td>
<td>
EU, US
</td>
<td>
28
</td>
<td>
784
</td>
<td>
0.000001
</td>
</tr>
<tr>
<td>
10G
</td>
<td>
Planet
</td>
<td>
31
</td>
<td>
961
</td>
<td>
0.000000
</td>
</tr>
</tbody>
</table><br />
Total cost of 10 ontologies: 3.2 weeks. Serious project: 30
ontologies, TCO = 10 weeks.<br />
Lesson: <span style="font-weight: bold;">Do your bit. Others
will do theirs.</span><br />
Thank those who do working groups.
<h3>
<a name="exp" id="exp">Q: How can the semantic web
work...</a>
</h3>
<p>
<em>... when we are all in one big domain of discourse but
people are all making their own local ontologies?</em>
(2007/3/3)
</p>
<p>
Rather than 'domain of discourse' , or set of things
considered, I think of 'community', set of agents
communicating using certain terms. When one thinks in terms
of domain of discourse, one tends to conclude that everyone
who talk at all about a car (say) has cars in their domain of
discourse and so everyone must share the model which includes
the single class Car.
</p>
<p>
It isn't like that though. An agent plays a role in many
different overlapping communities. When I tag a photo as
being of my car, or I agree to use my car in a car pool, or
when I register the car with the Registry of Motor Vehicles,
I probably use different ontologies. There is some finite
effort it would take to integrate the ontologies, to
establish some OWL (or rules, etc) to link them.
</p>
<ul>
<li>Everyone is encouraged to reuse other people's classes
and properties to the greatest extent they can.
</li>
<li>Some ontologies will already exist and by publicly shred
by many, such as ical:dtstart, geo:longitude, etc. This is
the single global community.
</li>
<li>Some ontologies will be established by smaller
communities of many sizes.
</li>
</ul>
<p>
Why do I think the structure should be will be fractal?
Clearly there will be many more small communities, local
ontologies, than global ones. Why a 1/f distribution? Well,
it seems to occur in many systems including the web, and may
be optimal for some problems. That we should design for a
fractal distribution of ontologies is a hunch. But it does
solve the issue you raise. Some aspects of the web have been
shown to be fractal already.
</p>Here are some properties of the interconnections:
<ul>
<li>- The connections between the ontologies may be made
after their creation, not necessarily involving the original
ontology designers.
</li>
<li>- There is a cost of connecting ontologies, figuring out
how they connect, which people will pay when and only when
they need the benefit of extra interoperability.
</li>
<li>- Sometimes when connecting ontologies, it is so awkward
there is pressure to change the terms that one community uses
to fit in better with the other community. Again, a finite
cost to make the change, against a benefit or more interop.
</li>
</ul>
<p>
Yes, if web-based means an overlapping set of many ontologies
in a fractal distribution. In his fractal tangle, there wil
be several recurring patterns at different scales. One
pattern is a local integration within (say) an enterprise,
which starts point-point (problems scale as n^2) and then
shifts with EIA to a hub-and-spoke as you say, where the
effort scales as N. Then the hub is converted to use RDF, and
that means the hub then plugs into a external bus, as it
connects to shared ontologies.
</p>
<p>
So the idea is that in any one message, some of the terms
will be from a global ontology, some from subdomains. The
amount of data which can be reused by another agent will
depend on how many communities they have in common, how many
ontologies they share.
</p>
<p>
In other words, one global ontology is not a solution to the
problem, and a local subdomain is not a solution either. But
if each agent has uses a mix of a few ontologies of different
scale, that is forms a global solution to the problem.
</p>
<h2>
Conjecture
</h2>
<p>
The conjecture is that there is some model which reasonably
well described these systems, and that given that model one
can show that the scale-free distribution of communities is
optimal.
</p>
<p>
There are many other questions. Of course existing systems on
the earth may be very much influenced by the geographical
reality of a two-dimensional surface. Historical groups have
been nested geographically. So though there may be aspects in
which community size is scale-free, that maybe a completely
different optimisation problem from the one we have when on
the Internet anyone can connect to anyone. If you could
devise an algorithm for connecting people into groups, and so
that they each participated in communities of different sizes
in a scale-free way, then how much more effective (at solving
problems, etc) can you make a web-based society which ignores
geographical borders? To what extent does humanity as
currently connected by the web in fact deviate from
geographical nesting anyway?
</p>
<hr />
<h2>
References
</h2>
<p>
Jacob Nielsen "<a href=
"http://www.useit.com/alertbox/zipf.html">Zipf Curves and
Website Popularity</a>", (Sidebar to column <a href=
"http://www.useit.com/alertbox/9704b.html">Increasing returns
for websites</a>)
</p>
<p>
<a name="R&Eacute;KA" id="R&Eacute;KA">R&Eacute;KA ALBERT</a>
<em>et al:</em> <a href=
"http://www.nature.com/server-java/Propub/nature/401130A0.frameset">
Diameter of the World-Wide Web,</a> Nature
<strong>401</strong>, 130 (1999) <em>Brief
communications</em>
</p>
<p>
<a name="Berners-Le" id="Berners-Le">Berners-Lee, T</a>,
"<a href="/People/Berners-Lee/Weaving">Weaving the Web</a>",
HarperSanFrancisco 1999, pp199-204
</p>
<p>
<a href="http://doi.acm.org/10.1145/572326.572328">Dill, S,
et al., "Self-similarity in the web"</a> ACM Transactions on
Internet Technology (TOIT) Volume 2 ,B Issue 3 B (August
2002). Thanks Jim Hendler for the pointer. Findings seem to
justify the ideas above.
</p>
<p>
DECWRL results, presented at an early WWW conference.
</p>
<p>
<a name="marchiori" id="marchiori">Marchiori M &amp; Latora
V, "</a><a href=
"http://axpfct.ct.infn.it/%7Elatora/harmony_physicaA2000.pdf">Harmony
in the small world</a>". Private communication 1999. Later
published in <em>Physica A</em>, vol. 285 (pages 539--546),
2000.
</p>
<p>
<a name="Kleinberg" href=
"http://www.cs.cornell.edu/home/kleinber/kleinber.html" id=
"Kleinberg">Jon Kleinberg</a>, <a href=
"http://www.cs.cornell.edu/home/kleinber/swn.ps">The
small-world phenomenon: An algorithmic perspective.</a>
Cornell Computer Science Technical Report 99-1776, October
1999. (<a href=
"http://www.cs.cornell.edu/home/kleinber/swn.ps">ps</a>,
&nbsp;In)
</p>
<p>
Daniel A. Menasc&eacute; et al., <em><a href=
"http://www2002.org/CDROM/alternate/724/">Fractal
Characterization of Web Workloads</a></em>,
</p>
<h2>
Follow up
</h2>
<p>
Things which turned up later, not necessarily referencing this.
</p>
<p>
T. Berners-Lee and L.Kagal, <a href="http://dig.csail.mit.edu/2007/Papers/AIMagazine/fractal-paper.pdf">
The Fractal Nature of the Semantic Web</a>
AI Magazine, 2007.
</p>
<p>
Tim Berners-Lee, "Its just like a bag of chips", in Gov 2.0 Expo 2010.<br/>
<object width="640" height="385"><param name="movie"
value="http://www.youtube.com/v/ga1aSJXCFe0?fs=1&amp;hl=en_US"></param><param
name="allowFullScreen" value="true"></param><param name="allowscriptaccess"
value="always"></param><embed src="http://www.youtube.com/v/ga1aSJXCFe0?fs=1&amp;hl=en_US"
type="application/x-shockwave-flash" allowscriptaccess="always"
allowfullscreen="true" width="640" height="385"></embed></object>
</p>
<p>
Joab Jackson, <a href="http://www.itworld.com/software/109194/berners-lee-deconstructs-a-bag-chips">
<em>Berners-Lee deconstructs a bag of chips</em></a> IT World, May 27, 2010
</p>
<p>
Paul Barford and Sally Floyd, <a href=
"http://www.cs.bu.edu/pub/barford/ss_lrd.html"><em>Self-similarity
and long range dependence in networks</em></a>" web site.
</p>
<p>
Clay Shirky,<a href=
"http://www.shirky.com/writings/powerlaw_weblog.html"><em>Power
Laws, Weblogs, and Inequality</em></a>
</p>
<p>
Kottke, <a href=
"http://www.kottke.org/03/02/weblogs-and-power-laws"><em>Weblogs
and power laws</em></a>, February 09, 2003 at 06:39 pm.
Distribution of links to the top blogs follows a power law.
</p>
<p>
Richard McManus, <a href=
"http://www.readwriteweb.com/archives/fractal_web_app.php"><em>
Fractal Web applied to Blogging</em></a>, January 15, 2004.
<cite>"As you have seen, the Tim Berners-Lee interview [with
Christopher Lydon] has inspired me to think and write about
how I can improve my 'fractibility' (if there is such a
word)!)"</cite>
</p>
<hr />
<p>
<a href="Overview.html">Up to Design Issues</a>
</p>
<p>
<a href="../People/Berners-Lee">Tim BL</a>
</p>
</body>
</html>