You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
860 lines
34 KiB
860 lines
34 KiB
<html xmlns="http://www.w3.org/1999/xhtml">
|
|
<head>
|
|
<meta name="generator" content=
|
|
"HTML Tidy for Mac OS X (vers 31 October 2006 - Apple Inc. build 13), see www.w3.org" />
|
|
<title>
|
|
Web architecture: Metadata
|
|
</title>
|
|
<link href="di.css" rel="stylesheet" type="text/css" />
|
|
<meta http-equiv="Content-Type" content="text/html" />
|
|
</head>
|
|
<body bgcolor="#DDFFDD" text="#000000">
|
|
<address>
|
|
Tim Berners-Lee
|
|
<p>
|
|
Date started: January 6, 1997
|
|
</p>
|
|
<p>
|
|
. Status: personal view, but corresponds generally to
|
|
the W3C architecture for metadata.
|
|
</p>
|
|
<p>
|
|
.
|
|
</p>
|
|
<p>
|
|
Additions are at the end about consistency in
|
|
label/metaset/collection syntax and semantics.
|
|
</p>
|
|
<p>
|
|
The syntaxes used in this document are meant to illustrate
|
|
the architecture and be clear but are otherwise random.
|
|
This note was written before the more general <a href=
|
|
"Semantic.html">Semantic Web</a> note.
|
|
</p>
|
|
</address>
|
|
<p>
|
|
<a href="Overview.html">Up to Design Issues</a>
|
|
</p>
|
|
<h3>
|
|
Axioms of Web Architecture: Metadata
|
|
</h3>
|
|
<hr />
|
|
<h1>
|
|
Metadata Architecture
|
|
</h1>
|
|
<h4 id="Preface">
|
|
Preface
|
|
</h4>
|
|
<p>
|
|
<em>This document was written before the Semantic Web
|
|
Roadmap, but is an introduction to the same ideas. Both
|
|
introduce the world of machine-readable data on the web. This
|
|
document introduces the concepts in the historical sequence
|
|
at W3C, where the first driving applications of semantic web
|
|
were metadat, and the first driving metadata applications
|
|
were endorsement labels (<a href="#PICS">PICS</a>)</em>.
|
|
</p>
|
|
<h2>
|
|
Documents, Metadata, and Links<br />
|
|
</h2>
|
|
<p>
|
|
The thing which you get when you follow a link, when you
|
|
de-reference a URI, has a lot of names. Formally we call it a
|
|
<b>resource</b>. Sometimes it is referred to as a document
|
|
because many of the things currently on the Web are human
|
|
readable documents. Sometimes it is referred to as an object
|
|
when the object is something which is more machine readable
|
|
in nature or has hidden state. I will use the words document
|
|
and resource interchangeably in what follows and sometimes
|
|
may slip into using "object".
|
|
</p>
|
|
<p>
|
|
One of the characteristics of the World Wide Web is that
|
|
resources, when you retrieve them, do not stand simply by
|
|
themselves without explanation, but there is information
|
|
about the resource. Information about information is
|
|
generally known as <b>Metadata</b>. Specifically, in the web
|
|
design,
|
|
</p>
|
|
<h4>
|
|
Definition
|
|
</h4>
|
|
<table border="1" cellpadding="2">
|
|
<tbody>
|
|
<tr>
|
|
<td>
|
|
Metadata is machine understandable information about
|
|
web resources or other things
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
<p>
|
|
The phrase "machine understandable" is key. We are
|
|
talking here about information which software agents can use
|
|
in order to make life easier for us, ensure we obey our
|
|
principles, the law, check that we can trust what we are
|
|
doing, and make everything work more smoothly and rapidly.
|
|
Metadata has well defined semantics and structure.
|
|
</p>
|
|
<p>
|
|
Metadata was called "Metadata" because it started life, and
|
|
is currently still chiefly, information about web resources,
|
|
so data about data. In the future, when the metadata
|
|
languages and engines are more developed, it should also form
|
|
a strong basis for a web of machine understandable
|
|
information about anything: about the people, things,
|
|
concepts and ideas. We keep this fact in our minds in
|
|
the design, even though the first step is to make a system
|
|
for information about information.
|
|
</p>
|
|
<p>
|
|
For an example of metadata, when an object is retrieved using
|
|
the HTTP protocol, the protocol allows information about its
|
|
date, its expiry date, its owner, and other arbitrary
|
|
information to be sent by the server. The world of the World
|
|
Wide Web is therefore a world of information and some of that
|
|
information is information about information. In order to
|
|
have a coherent picture of this, we need a few axioms about
|
|
metadata. The first axiom is that :
|
|
</p>
|
|
<h4>
|
|
Axiom
|
|
</h4>
|
|
<table border="1" cellpadding="2">
|
|
<tbody>
|
|
<tr>
|
|
<td>
|
|
metadata is data.
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
<p>
|
|
That is to say, information about information is to be
|
|
counted in all respects as information. There are various
|
|
parts of this.
|
|
</p>
|
|
<p>
|
|
One is that metadata can be stored regarded as data, it can
|
|
be stored in a resource. So, one resource may contain
|
|
information about itself or about another resource. In
|
|
current practice on the World Wide Web there are three ways
|
|
in which one gets metadata. The first is the data about a
|
|
document contained within the document itself, for example in
|
|
the HEAD part of an HTML documents or within word processor
|
|
documents. The second is that during the HTTP transfer the
|
|
server transfers some metadata to the client about the object
|
|
which is being transferred. This, during an http GET, is
|
|
transferred from the server to the client and, during a PUT
|
|
or a POST, is transferred from the client to the server. One
|
|
of the things which we have to rationalize in our
|
|
architecture of the World Wide Web is who exactly is making
|
|
the statement. Whose statement, whose property is that
|
|
metadata. The third way in which metadata is found is when it
|
|
is looked up in another document. This practice has not been
|
|
very common until the PICS initiative was to define label
|
|
formats specifically for representing information about World
|
|
Wide Web resources. The PICS architecture specifically allows
|
|
for PICS labels which are resources about other resources to
|
|
be buried within the resource itself, to be retrieved as
|
|
separate resources, or to be passed over during the http
|
|
transaction. To conclude,
|
|
</p>
|
|
<table border="1" cellpadding="2">
|
|
<tbody>
|
|
<tr>
|
|
<td>
|
|
Metadata about one document can occur within the
|
|
document, or within a separate document, or it may be
|
|
transferred accompanying the document.<br />
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
<p>
|
|
Put another way, metadata can be a first class object.
|
|
</p>
|
|
<p>
|
|
The second part of the above axiom is:
|
|
</p>
|
|
<table border="1" cellpadding="2">
|
|
<tbody>
|
|
<tr>
|
|
<td>
|
|
Metadata can describe metadata
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
<p>
|
|
That is, metadata itself may have attributes such as
|
|
ownership and an expiry date, and so there is meta-metadata
|
|
but we don't distinguish many levels, we just say that
|
|
metadata is data and that from that it follows that it can
|
|
have other data about itself. This gives the Web a certain
|
|
consistency.
|
|
</p>
|
|
<h2>
|
|
The Form of Metadata<br />
|
|
</h2>
|
|
<p>
|
|
Metadata consists of assertions about data, and such
|
|
assertions typically, when represented in computer systems,
|
|
take the form of a name or type of assertion and a set of
|
|
parameters, just as in the natural language a sentence takes
|
|
the form of a verb and a subject, an object and various
|
|
clauses.
|
|
</p>
|
|
<h4>
|
|
<a name="independent" id="independent">Axiom</a>
|
|
</h4>
|
|
<table border="1" cellpadding="2">
|
|
<tbody>
|
|
<tr>
|
|
<td>
|
|
The architecture is of metadata represented as a set of
|
|
independent assertions.
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
<p>
|
|
This model implies that in general, two assertions about the
|
|
same resource can stand alone and independently. When they
|
|
are grouped together in one place, the combined assertion is
|
|
simply the sum (actually the logical AND) of the independent
|
|
ones. Therefore (because AND is commutative) collections of
|
|
assertions are essentially unordered sets. This design
|
|
decision rules out for example, in simple sets of data,
|
|
assertions which are somehow cumulative or later ones
|
|
override earlier ones. Each assertion stands independently of
|
|
others.
|
|
</p>
|
|
<p>
|
|
We will see below how logical expressions are formed to
|
|
combine assertions in more varied ways, and syntactic rules
|
|
which allow the subject at least of the assertion to be made
|
|
implicit. But neither of these change the basic operation of
|
|
combining assertions in unordered AND lists.
|
|
</p>
|
|
<h3>
|
|
<a name="Attributes" id="Attributes">Attributes</a>
|
|
</h3>
|
|
<p>
|
|
Assertions about resources are often referred to as
|
|
attributes of the resource. That is, the type of assertion is
|
|
an assertion that the object, the resource in question, has a
|
|
particular named property such as it's author, and in that
|
|
case the parameter is the name or identity of the author.
|
|
Similarly, if the attribute is the document's date of expiry
|
|
then the parameter is that date.
|
|
</p>
|
|
<p>
|
|
Often, a group of assertions about the same resource occur
|
|
together, in which case the syntax generally omits the URI of
|
|
that resource as it is implicit. In these cases, when it is
|
|
clear from the context about which resource the assertion is
|
|
being made, the assertion often takes the form of a list of
|
|
attributes and values. In RFC822 format messages, such as
|
|
mail messages and HTTP messages, metadata is transferred
|
|
where the attribute name is an RFC822 header name and the
|
|
rest of the RFC822 line is the value of the attribute, such
|
|
as Date: and From: and To: information. The attribute value
|
|
pair model is that used by most activities defining the
|
|
semantics of metadata today.<br />
|
|
</p>
|
|
<p>
|
|
I use the word "assertion" to emphasize the fact that the
|
|
attribute value pair when it is transferred is a statement
|
|
made by some party. It does not simply and directly imply
|
|
that the resource at any given time has that value for the
|
|
given attribute. It must be seen as a statement by a
|
|
particular party with or without implicit or explicit
|
|
guarantees as to validity. Throughout the World Wide Web, as
|
|
trust becomes an important issue, it will be important for
|
|
software -- and people -- to keep track of and take into
|
|
account who said what in terms of data and metadata. So, our
|
|
model of data of a resource is something about which
|
|
typically we know the creator or the person responsible, and
|
|
typically the date of which the information was created,
|
|
which implies, in the case of a piece of information which
|
|
makes an assertion, the date at which the assertion was made.
|
|
</p>
|
|
<p>
|
|
An assertion
|
|
</p>
|
|
<blockquote>
|
|
(A u1, p, q...)
|
|
</blockquote>
|
|
<p>
|
|
typically has as explicit parameters,
|
|
</p>
|
|
<ul>
|
|
<li>the URI of the resource about which the assertion is made
|
|
(u1).
|
|
</li>
|
|
<li>some identifier (A) for the type of assertion being made,
|
|
such as author or date or expiry date.
|
|
</li>
|
|
<li>other parameters (p, q,...) according to the type of
|
|
assertion.
|
|
</li>
|
|
</ul>
|
|
<p>
|
|
As implicit or explicit or implicit parameters,
|
|
</p>
|
|
<ul>
|
|
<li>The party making the assertion
|
|
</li>
|
|
<li>The date/time of the assertion
|
|
</li>
|
|
<li>etc...
|
|
</li>
|
|
</ul>
|
|
<p>
|
|
We can often make an analogy with programming languages. An
|
|
assertion in metadata can be compared with a function call in
|
|
a programing language. In object oriented languages, the
|
|
object of the function has a special place among the
|
|
parameters just as the subject of an assertion does in
|
|
metadata. In object oriented languages, though, the set of
|
|
possible functions depends on the object, whereas in metadata
|
|
the set of assertion types is more or less unlimited, defined
|
|
by independent choice of vocabulary. <em>Anyone can say
|
|
anything about anything</em>.
|
|
</p>
|
|
<h3>
|
|
A space for attribute names
|
|
</h3>
|
|
<p>
|
|
It is appropriate for the Web architecture to define like
|
|
this the topology and the general concepts of links and
|
|
metadata. What about the significance of individual
|
|
relationships? Sometimes, as above, these are special,
|
|
defined in the architecture, and having an architectural
|
|
significance or a significance to the protocols. In other
|
|
cases, the significance of relationships or indeed of
|
|
attributes is part of other specifications, other design, or
|
|
other applications, and must be defined easily by third
|
|
parties. Therefore, the set of such relationship and
|
|
attributes names must be extremely easily extensible and
|
|
therefore extensible in a decentralized manner. This is why
|
|
</p>
|
|
<table border="1" cellpadding="2">
|
|
<tbody>
|
|
<tr>
|
|
<td>
|
|
the URL space is an appropriate space for the
|
|
definition of attribute names.
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
<p>
|
|
We have already (1997) several vocabularies of attribute
|
|
names: for example, the HTML elements which can occur within
|
|
the HEAD element, or as another example, the headers in an
|
|
HTTP request which specify attributes of the object. These
|
|
are defined within the scope of particular specifications.
|
|
There is always pressure to extend these specifications in a
|
|
flexible way. HTTP header names are generally extended
|
|
arbitrarily by those doing experiments. The same can also be
|
|
true of HTML elements and extension mechanisms have been
|
|
proposed for both. If we look generically at the very wide
|
|
space of all such metadata attribute names, we find something
|
|
in which the dictionary would be so large that ad hoc
|
|
arbitrary extension would be just as chaotic as central
|
|
registration would be stifling.
|
|
</p>
|
|
<blockquote>
|
|
<b>Aside: Comparison with Entity-Relationship models</b>.
|
|
<p>
|
|
This architecture, in which the assertion identifier is
|
|
taken from (basically) URL space differs from the
|
|
"Entity-relationship" (ER) model and many similar models
|
|
like it, including most object-oriented programming
|
|
systems. In an ER model, typically every object is typed
|
|
and the type of an object defines the attributes can have,
|
|
and therefore the assertions which are being made about it.
|
|
Once a person is defined as having a name, address and
|
|
phone number, then the schema has to be altered or a new
|
|
derived type of person must be introduced before one can
|
|
make assertions about the race, color or credit card number
|
|
of a person. The scope of the attribute name is the entity
|
|
type, just as in OOP the scope of a method name is an
|
|
object type (or interface)By contrast, in the web, the
|
|
hypertext link allows statements of new forms to be made
|
|
about any object, even though (before anything other than
|
|
syntax checking) this may lead to nonsense or paradox. One
|
|
can define a property "coolness" within one's own part of
|
|
the web, and then make statements about the "coolness" of
|
|
any object on the web.
|
|
</p>
|
|
<p>
|
|
This design difference is in essence a resurfacing of the
|
|
decision to make links mondirectional, sacrificing
|
|
consistency for scalability.
|
|
</p>
|
|
<p>
|
|
An advantage of ER systems is that they allow one to work,
|
|
in the user interface for example, with a set of properties
|
|
which "should" be defined for each entity. You can define
|
|
these in the Metadata's predicate calculus by defining an
|
|
expression for a "well specified" object. ("For all
|
|
<i>X</i> such that <i>X</i> is a customer <i>X</i> is
|
|
well-specified if there exists <i>n</i> such that <i>n</i>
|
|
is the name of <i>X</i> and there exists <i>t</i> such that
|
|
<i>t</i> is the telephone number of <i>X</i> and...)
|
|
</p>
|
|
<p>
|
|
end of aside.
|
|
</p>
|
|
</blockquote>
|
|
<h3>
|
|
<a name="MetadataHeaders" id="MetadataHeaders">Metadata
|
|
("Entity") headers in HTTP</a>
|
|
</h3>
|
|
<p>
|
|
In the above it is important to realize that the HTTP headers
|
|
which contain what can be considered as metadata ("entity
|
|
headers") should be separated quite distinctly from HTTP
|
|
headers which do not. HTTP headers which contain metadata
|
|
contain information which can follow the document around. For
|
|
example, it is reasonable for a cache to pass such
|
|
information on without treatment, it is reasonable for
|
|
clients or other programs which process data to store those
|
|
headers as metadata with the document for later processing.
|
|
The content of those headers do not have to be associated
|
|
with that particular HTTP transaction. By contrast, the
|
|
RFC822 headers in HTTP which deal specifically with the
|
|
transaction or deal specifically with the TCP link between
|
|
the two application programs have a shorter scope and can
|
|
only be regarded as parameters of the HTTP method. To make
|
|
this separation clear will be to make it easier not only to
|
|
understand HTTP and how it should be processed, it will also
|
|
make it clear which pieces of HTTP can be used easily and
|
|
transparently by other protocols which may use different
|
|
methods with different parameters. The clarification of the
|
|
architecture of HTTP such that both the metadata and the
|
|
methods can be extended into other domains is an important
|
|
part of the work of the World Wide Web Consortium. The
|
|
Internet protocols SMTP and NNTP and HTTP as well as many new
|
|
and proposed protocols share much of the semantics of the
|
|
RFC822 headers. Formalizing the shared space and making it
|
|
clear that there is a single design for a particular header,
|
|
rather than four designs which are independent and happen to
|
|
look very similar, requires a general architecture, some
|
|
careful thought, and is essential for the future design of
|
|
protocols. It will allow protocol design to happen in small
|
|
groups which can take for granted the bulk of previous work
|
|
and concentrate on independent new design.
|
|
</p>
|
|
<h4>
|
|
Authorship of HTTP entity headers
|
|
</h4>
|
|
<p>
|
|
It may be possible to remove or at least encompass the
|
|
apparent anomaly of metadata transferred from an HTTP server
|
|
by creating a special link type which links the document
|
|
itself to the set of attributes which the server would give
|
|
in the HTTP headers. In other words, the server would be able
|
|
to say, "here is a document, here is some metadata about it,
|
|
and the metadata about it has the following URL". This would
|
|
allow one, for example, request a signed copy of the HTTP
|
|
headers. It would allow one to ask about the intellectual
|
|
property rights of those headers, and the authorship of those
|
|
headers.
|
|
</p>
|
|
<p>
|
|
It is important to be completely clear about the authorship
|
|
of the HTTP headers. The server should be seen as a software
|
|
agent acting on behalf of a party which is the publisher or
|
|
document author: the definer of the URI to resource identity
|
|
mapping. The webmaster is only an administrator who is
|
|
responsible for ensuing that (through an appropriately
|
|
configured server) the transactions on the wire faithfully
|
|
represent the statements and wishes of that party.
|
|
</p>
|
|
<h2>
|
|
Links<br />
|
|
</h2>
|
|
<p>
|
|
An assertion of relationship between two resources is known
|
|
as a <b>link</b>.
|
|
</p>
|
|
<p>
|
|
In this case, it is a triple
|
|
</p>
|
|
<blockquote>
|
|
(<i>A u1 u2</i>)
|
|
</blockquote>
|
|
<p>
|
|
of:
|
|
</p>
|
|
<ul>
|
|
<li>the type of assertion being made, that is, the
|
|
relationship which is being asserted,
|
|
</li>
|
|
<li>the first URI,
|
|
</li>
|
|
<li>and the second URI.
|
|
</li>
|
|
</ul>
|
|
<p>
|
|
These sorts of assertions, links, are the basis of navigation
|
|
in the World Wide Web; they can be used for building
|
|
structure within the World Wide Web and also for creating a
|
|
semantic Web which can express knowledge about the world
|
|
itself. That is to say, links may be used both for the
|
|
structure of data, in which case they are metadata, but also
|
|
they may be used as a form of data.
|
|
</p>
|
|
<p>
|
|
Links, like all metadata can be transferred in three ways.
|
|
They can be embedded in a document, which is one end of the
|
|
link, they can be transferred in an HTTP message, for example
|
|
what is called the header of the document, and they can be
|
|
stored in a third document. This latter method has not been
|
|
used widely on the World Wide Web to date.
|
|
</p>
|
|
<h2>
|
|
Goal: <a name="Self-descr" id="Self-descr">Self-describing
|
|
information</a><br />
|
|
</h2>
|
|
<p>
|
|
A critical part of the design of the whole system is the way
|
|
that the semantics of metadata or indeed of data are defined.
|
|
The semantics of metadata in our RFC822 headers in mail
|
|
messages and in http messages are defined by hand in english
|
|
in the specifications of those protocols. The PICS system
|
|
takes this to one stage further in terms of flexibility by
|
|
allowing a message to contain a pointer to the document which
|
|
defines, in human readable terms, the semantics of each
|
|
assertion made within a <a href="#PICS">PICS</a> label. In
|
|
the future we would like to move toward a state in which any
|
|
metadata or eventually any form of machine readable data
|
|
carries a reference to the specification of the semantics of
|
|
all the assertions made within it.
|
|
</p>
|
|
<p>
|
|
For example, suppose that when a link is defined between two
|
|
documents, the relationship which is being asserted is
|
|
defined in a such way that it can be looked up on the World
|
|
Wide Web (i.e. using some form of URI), and someone or some
|
|
program, which has not come across that relationship before
|
|
can follow the link and extend its understanding or
|
|
functionality to take advantage of this new form of
|
|
assertion.
|
|
</p>
|
|
<p>
|
|
In the case of PICS, one can dynamically pick up a human
|
|
readable definition of what that assertion really means. In
|
|
PICS (and in theory in SGML using DTDs), one can also pick up
|
|
a machine readable definition of what form that assertion can
|
|
take, what syntax, what types of parameters it can take. This
|
|
allows a human interface to a new PICS scheme to built on the
|
|
fly. To go one step further, one could, given a suitable
|
|
logic or knowledge representation language, pick up a machine
|
|
readable definition of the semantics of that assertion in
|
|
terms of other relationships.
|
|
</p>
|
|
<p>
|
|
The advantages of such self describing information is that it
|
|
allows development of new applications and new functionality
|
|
independently by many groups across the web. Without
|
|
self-describing information, development must wait for large
|
|
companies or standards committees to meet and agree on the
|
|
commonly agreed semantics.
|
|
</p>
|
|
<p>
|
|
Of course a pragmatic way of extending software to handle new
|
|
forms of information is to dynamically download the code to
|
|
support a software object which can handle such data for one.
|
|
Whereas this is a powerful technique, and one which will be
|
|
used increasingly, it is not sufficient. It is not sufficient
|
|
because one has to trust the implementation of the object,
|
|
and the state.
|
|
</p>
|
|
<h4>
|
|
Goal
|
|
</h4>
|
|
<table border="1" cellpadding="2">
|
|
<tbody>
|
|
<tr>
|
|
<td>
|
|
As much as possible of the syntax and semantics should
|
|
be able to be acquired by reference from a metadata
|
|
document.
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
<h3>
|
|
Building Applications using Link Relationships
|
|
</h3>
|
|
<p>
|
|
It turns out that a very large number of applications both
|
|
built on top of the web and also built within the
|
|
infrastructure of the Web can largely be built by defining
|
|
new relationship types. Examples of these are the document
|
|
versioning problem which can be largely solved by defining
|
|
link values relating documents to previous and future
|
|
versions and to lists of versions; intellectual property
|
|
rights, distribution terms, and other labeling which can be
|
|
solved by making a link from one document to the document
|
|
containing the metadata.
|
|
</p>
|
|
<hr />
|
|
<h3>
|
|
Summary so far
|
|
</h3>
|
|
<ol>
|
|
<li>Metadata is data
|
|
</li>
|
|
<li>Metadata may refer to any resource which has a URI
|
|
</li>
|
|
<li>Metadata may be stored in any resource no matter to which
|
|
resource it refers
|
|
</li>
|
|
<li>Metadata can be regarded as a set of assertions, each
|
|
assertion being about a resource (A <i>u1</i>
|
|
...).
|
|
</li>
|
|
<li>Assertions which state a named relationship between two
|
|
resources are known links (A <i>u1 u2</i>)
|
|
</li>
|
|
<li>Assertion types (including link relationships) should be
|
|
first class objects in the sense that they should be able to
|
|
be defined in addressable resources and referred to by the
|
|
address of that resource A in { u }
|
|
</li>
|
|
<li>The development of new assertion types and link
|
|
relationships should be done in a consistent manner so that
|
|
these sort of assertions can be treated generically by people
|
|
and by software.
|
|
</li>
|
|
</ol>
|
|
<hr />
|
|
<p>
|
|
<i>Rough from here on down</i>
|
|
</p>
|
|
<h3 id="Label">
|
|
Label syntax: Assertions about a common subject
|
|
</h3>
|
|
<p>
|
|
When labeling information, it is often useful to make a lot
|
|
of statements about the same object. It is also useful to be
|
|
able to make the same set of statements about a set of
|
|
resources. For example, the assertions
|
|
</p>
|
|
<pre>
|
|
(A1 u1 a b ... )
|
|
(A2 u1 c d )
|
|
(A2 u1 a f g h )
|
|
|
|
</pre>
|
|
<p>
|
|
might be written
|
|
</p>
|
|
<pre>
|
|
(for u1
|
|
(A1 a b ... )
|
|
(A2 c d )
|
|
(A3 a f g h )
|
|
)
|
|
|
|
</pre>
|
|
<p>
|
|
Therefore in the syntax of an actual assertion the subject is
|
|
implicit. This is just the case with RFC822 headers which
|
|
implicitly refer to the following body, and with HTML "HEAD"
|
|
element contents which implicitly refer to the containing
|
|
document. (Though notice there is a fundamental
|
|
difference, discussed <a href=
|
|
"w:/DesignIssues/temp.html#mesages">below</a>, between a
|
|
general label and a message header because the message header
|
|
is definitive.)
|
|
</p>
|
|
<p>
|
|
So it is wise to recognise the label as case which it is wise
|
|
to specifically optimize in the syntax. <em>[In RDF this
|
|
indeed the case, that the subject is established as a
|
|
context, and then many properties are given within that
|
|
context. -2000/9]</em>
|
|
</p>
|
|
<p>
|
|
Assertions, when the subject is implicit, are known as
|
|
attribute-value pairs as discussed above. Let's use the term
|
|
"label" for a set of assertions with the subject extracted.
|
|
Like the label on a jam jar, it contains information
|
|
but there must be something else (in this case if its
|
|
placement on the jar) which tells you to what it applies.
|
|
(The PICS label in fact contained other information
|
|
too, including the subject and meta-meta-data about the
|
|
authorship of the label.)
|
|
</p>
|
|
<p>
|
|
Local definition:
|
|
</p>
|
|
<table border="1" cellpadding="2">
|
|
<tbody>
|
|
<tr>
|
|
<td>
|
|
A label is a set of assertions with a common implicit
|
|
subject. In this architecture it is a set of
|
|
attribute-value pairs
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
<p>
|
|
<i>(There is a convention that you can write "Jam" on a jam
|
|
jar label. You don't write "Jam jar" or "Jam Jar
|
|
label". Even though I once saw a label on a cardboard
|
|
box with the words "Equipment shipping box label" on it!)</i>
|
|
</p>
|
|
<h3>
|
|
Authorship of Metadata
|
|
</h3>
|
|
<p>
|
|
It follows from the fact that metadata is data that here can
|
|
be metadata about it. Some of this metadata becomes
|
|
crucial when we consider a trust model. The logic we
|
|
need includes the author of metadata
|
|
</p>
|
|
<p>
|
|
p1: (A u1 . . .)
|
|
</p>
|
|
<p>
|
|
where p1 is ,in a system with low trust, the author as
|
|
stated, but in a cryptographically secure system is a
|
|
principle represented by a key.
|
|
</p>
|
|
<p>
|
|
On the web, the granularity of information is the resource.
|
|
Authorship and access control genrally use this granularity.
|
|
Therefore, typically, the trust one places in an assertion is
|
|
function the document which asserted it, and the metadata
|
|
about that document. However, when information is then
|
|
combined from many resources, one needs a language which
|
|
allows the source of the original to be recorded. Like
|
|
blockquote in HTML, this separates the data itself from the
|
|
resource, so the resource does assert the data directly but
|
|
asserts that it was asserted.
|
|
</p>
|
|
<h2>
|
|
Analysing labels
|
|
</h2>
|
|
<p>
|
|
See <a href="Labels.html">Analysing PICS labels as generic
|
|
Metadata</a>
|
|
</p>
|
|
<p>
|
|
where we look at PICS labels and try to sift out the actual
|
|
semantics of them. This is a thought experiment generating
|
|
requiremnts. The conclusions are that information such as
|
|
authorship and date information in fact form a tree of
|
|
assertions about assertions, and it is important to be clear
|
|
about the structure of that tree. The notion of a message is
|
|
brought up there too, but not followed up as it is not
|
|
germaine to the discussion at this point.
|
|
</p>
|
|
<h2>
|
|
Algebraic Manipulations
|
|
</h2>
|
|
<p>
|
|
If you can make assumptions about the properties of labels
|
|
then you can manipulate them, possibly without knowing
|
|
everything about their meaning. Properties such as
|
|
commutativity, transitivity and associativity would be very
|
|
useful to have easily available: perhaps in the syntax, or
|
|
failing that in the schema.
|
|
</p>
|
|
<p>
|
|
[See <a href="Semantic.html">Semantic Web roadmap</a> for
|
|
higher levels of logic]
|
|
</p>
|
|
<p>
|
|
For example, given a label saying a pair of jeans has a 32
|
|
inch waist and a price of $28, I can deduce a label which
|
|
just has the price of $28. But given a label which says
|
|
that the punishment for the crime is a 2 month in jail and a
|
|
fine of $3000, I can't deduce one that says that that
|
|
the punishment is 2 months in jail.
|
|
</p>
|
|
<p>
|
|
A typical use of metadata will be to provide a statement
|
|
along with its proof to be verified by another party.
|
|
Being able to process these things efficiently and with
|
|
limited knowledge will be crucial.
|
|
</p>
|
|
<p>
|
|
The most practical way to do this is to create a basic
|
|
commonvocabulary for the logical functions. Sometimes known
|
|
as the "RDF upper layers", these are mentioned in the
|
|
<a href="Semantic.html">note on the Semantic Web.</a>
|
|
</p>
|
|
<h4>
|
|
Ordered/Unordered
|
|
</h4>
|
|
<p>
|
|
The <a href="#independent">axiom of independence of
|
|
assertions</a> above gives us that in any set of assertions,
|
|
as assertions are independently true, specific assertions may
|
|
be removed or reordered, leaving the document just as valid
|
|
(though possibly less informative).
|
|
</p>
|
|
<p>
|
|
Examples of unordered things currently are: RFC822 message
|
|
header lines, SGML attributes. Examples of ordered things
|
|
are: HTTP header lines and SGML elements.
|
|
</p>
|
|
<p>
|
|
Do we need a form in which we can make an assertion which has
|
|
many parameters which are in fact not mutable in any way?
|
|
</p>
|
|
<h2>
|
|
Summary of Requirements
|
|
</h2>
|
|
<p>
|
|
There are ways of representing the above things:
|
|
messages, labels, specifying labels, and statements and
|
|
distinguish between them.
|
|
</p>
|
|
<p>
|
|
As much as possible of the syntax and semantics should be
|
|
able to be acquired by reference from a metadata document.
|
|
</p>
|
|
<p>
|
|
It must be possible to mix multiple vocabularies within the
|
|
same scope.
|
|
</p>
|
|
<p>
|
|
The syntax and structure should be such that as many
|
|
manipulations as possible can be done without having to know
|
|
the semantics of the vocabulary in use.
|
|
</p>
|
|
<p>
|
|
A common voabulary for basic logic and knowledge
|
|
representation functionality will be required.
|
|
</p>
|
|
<hr />
|
|
<h2>
|
|
References
|
|
</h2>
|
|
<p>
|
|
<a name="PICS" id="PICS">PICS</a> - The PICS project was a
|
|
project to define standards for interchange of endorsement
|
|
information, aimed at the content filterting problem. See the
|
|
PICS home page.
|
|
</p>
|
|
<hr />
|
|
<address>
|
|
Tim BL, January 1997
|
|
<p>
|
|
Last edit $Date: 2009/08/27 21:38:08 $
|
|
</p>
|
|
</address>
|
|
</body>
|
|
</html>
|