server_playground/doc/www.w3.org/DesignIssues/Metadata.html


								<html xmlns="http://www.w3.org/1999/xhtml">

								  <head>

								    <meta name="generator" content=

								    "HTML Tidy for Mac OS X (vers 31 October 2006 - Apple Inc. build 13), see www.w3.org" />

								    <title>

								      Web architecture: Metadata

								    </title>

								    <link href="di.css" rel="stylesheet" type="text/css" />

								    <meta http-equiv="Content-Type" content="text/html" />

								  </head>

								  <body bgcolor="#DDFFDD" text="#000000">

								    <address>

								      Tim Berners-Lee

								      <p>

								        Date started: January 6, 1997

								      </p>

								      <p>

								        . Status: personal view, but corresponds &nbsp;generally to

								        the W3C architecture for metadata.

								      </p>

								      <p>

								        .

								      </p>

								      <p>

								        Additions are at the end about consistency in

								        label/metaset/collection syntax and semantics.

								      </p>

								      <p>

								        The syntaxes used in this document are meant to illustrate

								        the architecture and be clear but are otherwise random.

								        This note was written before the more general <a href=

								        "Semantic.html">Semantic Web</a> note.

								      </p>

								    </address>

								    <p>

								      <a href="Overview.html">Up to Design Issues</a>

								    </p>

								    <h3>

								      Axioms of Web Architecture: Metadata

								    </h3>

								    <hr />

								    <h1>

								      Metadata Architecture

								    </h1>

								    <h4 id="Preface">

								      Preface

								    </h4>

								    <p>

								      <em>This document was written before the Semantic Web

								      Roadmap, but is an introduction to the same ideas. Both

								      introduce the world of machine-readable data on the web. This

								      document introduces the concepts in the historical sequence

								      at W3C, where the first driving applications of semantic web

								      were metadat, and the first driving metadata applications

								      were endorsement labels (<a href="#PICS">PICS</a>)</em>.

								    </p>

								    <h2>

								      Documents, Metadata, and Links<br />

								    </h2>

								    <p>

								      The thing which you get when you follow a link, when you

								      de-reference a URI, has a lot of names. Formally we call it a

								      <b>resource</b>. Sometimes it is referred to as a document

								      because many of the things currently on the Web are human

								      readable documents. Sometimes it is referred to as an object

								      when the object is something which is more machine readable

								      in nature or has hidden state. I will use the words document

								      and resource interchangeably in what follows and sometimes

								      may slip into using "object".

								    </p>

								    <p>

								      One of the characteristics of the World Wide Web is that

								      resources, when you retrieve them, do not stand simply by

								      themselves without explanation, but there is information

								      about the resource. Information about information is

								      generally known as <b>Metadata</b>. Specifically, in the web

								      design,

								    </p>

								    <h4>

								      Definition

								    </h4>

								    <table border="1" cellpadding="2">

								      <tbody>

								        <tr>

								          <td>

								            Metadata is machine understandable information about

								            web resources or other things

								          </td>

								        </tr>

								      </tbody>

								    </table>

								    <p>

								      The phrase "machine understandable" is key. &nbsp;We are

								      talking here about information which software agents can use

								      in order to make life easier for us, ensure we obey our

								      principles, the law, check that we can trust what we are

								      doing, and make everything work more smoothly and rapidly.

								      Metadata has well defined semantics and structure.

								    </p>

								    <p>

								      Metadata was called "Metadata" because it started life, and

								      is currently still chiefly, information about web resources,

								      so data about data. &nbsp;In the future, when the metadata

								      languages and engines are more developed, it should also form

								      a strong basis for a web of machine understandable

								      information about anything: about the people, things,

								      concepts and ideas. &nbsp;We keep this fact in our minds in

								      the design, even though the first step is to make a system

								      for information about information.

								    </p>

								    <p>

								      For an example of metadata, when an object is retrieved using

								      the HTTP protocol, the protocol allows information about its

								      date, its expiry date, its owner, and other arbitrary

								      information to be sent by the server. The world of the World

								      Wide Web is therefore a world of information and some of that

								      information is information about information. In order to

								      have a coherent picture of this, we need a few axioms about

								      metadata. The first axiom is that :

								    </p>

								    <h4>

								      Axiom

								    </h4>

								    <table border="1" cellpadding="2">

								      <tbody>

								        <tr>

								          <td>

								            metadata is data.

								          </td>

								        </tr>

								      </tbody>

								    </table>

								    <p>

								      That is to say, information about information is to be

								      counted in all respects as information. There are various

								      parts of this.

								    </p>

								    <p>

								      One is that metadata can be stored regarded as data, it can

								      be stored in a resource. So, one resource may contain

								      information about itself or about another resource. In

								      current practice on the World Wide Web there are three ways

								      in which one gets metadata. The first is the data about a

								      document contained within the document itself, for example in

								      the HEAD part of an HTML documents or within word processor

								      documents. The second is that during the HTTP transfer the

								      server transfers some metadata to the client about the object

								      which is being transferred. This, during an http GET, is

								      transferred from the server to the client and, during a PUT

								      or a POST, is transferred from the client to the server. One

								      of the things which we have to rationalize in our

								      architecture of the World Wide Web is who exactly is making

								      the statement. Whose statement, whose property is that

								      metadata. The third way in which metadata is found is when it

								      is looked up in another document. This practice has not been

								      very common until the PICS initiative was to define label

								      formats specifically for representing information about World

								      Wide Web resources. The PICS architecture specifically allows

								      for PICS labels which are resources about other resources to

								      be buried within the resource itself, to be retrieved as

								      separate resources, or to be passed over during the http

								      transaction. To conclude,

								    </p>

								    <table border="1" cellpadding="2">

								      <tbody>

								        <tr>

								          <td>

								            Metadata about one document can occur within the

								            document, or within a separate document, or it may be

								            transferred accompanying the document.<br />

								          </td>

								        </tr>

								      </tbody>

								    </table>

								    <p>

								      Put another way, metadata can be a first class object.

								    </p>

								    <p>

								      The second part of the above axiom is:

								    </p>

								    <table border="1" cellpadding="2">

								      <tbody>

								        <tr>

								          <td>

								            Metadata can describe metadata

								          </td>

								        </tr>

								      </tbody>

								    </table>

								    <p>

								      That is, metadata itself may have attributes such as

								      ownership and an expiry date, and so there is meta-metadata

								      but we don't distinguish many levels, we just say that

								      metadata is data and that from that it follows that it can

								      have other data about itself. This gives the Web a certain

								      consistency.

								    </p>

								    <h2>

								      The Form of Metadata<br />

								    </h2>

								    <p>

								      Metadata consists of assertions about data, and such

								      assertions typically, when represented in computer systems,

								      take the form of a name or type of assertion and a set of

								      parameters, just as in the natural language a sentence takes

								      the form of a verb and a subject, an object and various

								      clauses.

								    </p>

								    <h4>

								      <a name="independent" id="independent">Axiom</a>

								    </h4>

								    <table border="1" cellpadding="2">

								      <tbody>

								        <tr>

								          <td>

								            The architecture is of metadata represented as a set of

								            independent assertions.

								          </td>

								        </tr>

								      </tbody>

								    </table>

								    <p>

								      This model implies that in general, two assertions about the

								      same resource can stand alone and independently. When they

								      are grouped together in one place, the combined assertion is

								      simply the sum (actually the logical AND) of the independent

								      ones. Therefore (because AND is commutative) collections of

								      assertions are essentially unordered sets. This design

								      decision rules out for example, in simple sets of data,

								      assertions which are somehow cumulative or later ones

								      override earlier ones. Each assertion stands independently of

								      others.

								    </p>

								    <p>

								      We will see below how logical expressions are formed to

								      combine assertions in more varied ways, and syntactic rules

								      which allow the subject at least of the assertion to be made

								      implicit. But neither of these change the basic operation of

								      combining assertions in unordered AND lists.

								    </p>

								    <h3>

								      <a name="Attributes" id="Attributes">Attributes</a>

								    </h3>

								    <p>

								      Assertions about resources are often referred to as

								      attributes of the resource. That is, the type of assertion is

								      an assertion that the object, the resource in question, has a

								      particular named property such as it's author, and in that

								      case the parameter is the name or identity of the author.

								      Similarly, if the attribute is the document's date of expiry

								      then the parameter is that date.

								    </p>

								    <p>

								      Often, a group of assertions about the same resource occur

								      together, in which case the syntax generally omits the URI of

								      that resource as it is implicit. In these cases, when it is

								      clear from the context about which resource the assertion is

								      being made, the assertion often takes the form of a list of

								      attributes and values. In RFC822 format messages, such as

								      mail messages and HTTP messages, metadata is transferred

								      where the attribute name is an RFC822 header name and the

								      rest of the RFC822 line is the value of the attribute, such

								      as Date: and From: and To: information. The attribute value

								      pair model is that used by most activities defining the

								      semantics of metadata today.<br />

								    </p>

								    <p>

								      I use the word "assertion" to emphasize the fact that the

								      attribute value pair when it is transferred is a statement

								      made by some party. It does not simply and directly imply

								      that the resource at any given time has that value for the

								      given attribute. It must be seen as a statement by a

								      particular party with or without implicit or explicit

								      guarantees as to validity. Throughout the World Wide Web, as

								      trust becomes an important issue, it will be important for

								      software -- and people -- to keep track of and take into

								      account who said what in terms of data and metadata. So, our

								      model of data of a resource is something about which

								      typically we know the creator or the person responsible, and

								      typically the date of which the information was created,

								      which implies, in the case of a piece of information which

								      makes an assertion, the date at which the assertion was made.

								    </p>

								    <p>

								      An assertion

								    </p>

								    <blockquote>

								      (A u1, p, q...)

								    </blockquote>

								    <p>

								      typically has as explicit parameters,

								    </p>

								    <ul>

								      <li>the URI of the resource about which the assertion is made

								      (u1).

								      </li>

								      <li>some identifier (A) for the type of assertion being made,

								      such as author or date or expiry date.

								      </li>

								      <li>other parameters (p, q,...) according to the type of

								      assertion.

								      </li>

								    </ul>

								    <p>

								      As implicit or explicit or implicit parameters,

								    </p>

								    <ul>

								      <li>The party making the assertion

								      </li>

								      <li>The date/time of the assertion

								      </li>

								      <li>etc...

								      </li>

								    </ul>

								    <p>

								      We can often make an analogy with programming languages. An

								      assertion in metadata can be compared with a function call in

								      a programing language. In object oriented languages, the

								      object of the function has a special place among the

								      parameters just as the subject of an assertion does in

								      metadata. In object oriented languages, though, the set of

								      possible functions depends on the object, whereas in metadata

								      the set of assertion types is more or less unlimited, defined

								      by independent choice of vocabulary. <em>Anyone can say

								      anything about anything</em>.

								    </p>

								    <h3>

								      A space for attribute names

								    </h3>

								    <p>

								      It is appropriate for the Web architecture to define like

								      this the topology and the general concepts of links and

								      metadata. What about the significance of individual

								      relationships? Sometimes, as above, these are special,

								      defined in the architecture, and having an architectural

								      significance or a significance to the protocols. In other

								      cases, the significance of relationships or indeed of

								      attributes is part of other specifications, other design, or

								      other applications, and must be defined easily by third

								      parties. Therefore, the set of such relationship and

								      attributes names must be extremely easily extensible and

								      therefore extensible in a decentralized manner. This is why

								    </p>

								    <table border="1" cellpadding="2">

								      <tbody>

								        <tr>

								          <td>

								            the URL space is an appropriate space for the

								            definition of attribute names.

								          </td>

								        </tr>

								      </tbody>

								    </table>

								    <p>

								      We have already (1997) several vocabularies of attribute

								      names: for example, the HTML elements which can occur within

								      the HEAD element, or as another example, the headers in an

								      HTTP request which specify attributes of the object. These

								      are defined within the scope of particular specifications.

								      There is always pressure to extend these specifications in a

								      flexible way. HTTP header names are generally extended

								      arbitrarily by those doing experiments. The same can also be

								      true of HTML elements and extension mechanisms have been

								      proposed for both. If we look generically at the very wide

								      space of all such metadata attribute names, we find something

								      in which the dictionary would be so large that ad hoc

								      arbitrary extension would be just as chaotic as central

								      registration would be stifling.

								    </p>

								    <blockquote>

								      <b>Aside: Comparison with Entity-Relationship models</b>.

								      <p>

								        This architecture, in which the assertion identifier is

								        taken from (basically) URL space differs from the

								        "Entity-relationship" (ER) model and many similar models

								        like it, including most object-oriented programming

								        systems. In an ER model, typically every object is typed

								        and the type of an object defines the attributes can have,

								        and therefore the assertions which are being made about it.

								        Once a person is defined as having a name, address and

								        phone number, then the schema has to be altered or a new

								        derived type of person must be introduced before one can

								        make assertions about the race, color or credit card number

								        of a person. The scope of the attribute name is the entity

								        type, just as in OOP the scope of a method name is an

								        object type (or interface)By contrast, in the web, the

								        hypertext link allows statements of new forms to be made

								        about any object, even though (before anything other than

								        syntax checking) this may lead to nonsense or paradox. One

								        can define a property "coolness" within one's own part of

								        the web, and then make statements about the "coolness" of

								        any object on the web.

								      </p>

								      <p>

								        This design difference is in essence a resurfacing of the

								        decision to make links mondirectional, sacrificing

								        consistency for scalability.

								      </p>

								      <p>

								        An advantage of ER systems is that they allow one to work,

								        in the user interface for example, with a set of properties

								        which "should" be defined for each entity. You can define

								        these in the Metadata's predicate calculus by defining an

								        expression for a "well specified" object. ("For all

								        <i>X</i> such that <i>X</i> is a customer <i>X</i> is

								        well-specified if there exists <i>n</i> such that <i>n</i>

								        is the name of <i>X</i> and there exists <i>t</i> such that

								        <i>t</i> is the telephone number of <i>X</i> and...)

								      </p>

								      <p>

								        end of aside.

								      </p>

								    </blockquote>

								    <h3>

								      <a name="MetadataHeaders" id="MetadataHeaders">Metadata

								      ("Entity") headers in HTTP</a>

								    </h3>

								    <p>

								      In the above it is important to realize that the HTTP headers

								      which contain what can be considered as metadata ("entity

								      headers") should be separated quite distinctly from HTTP

								      headers which do not. HTTP headers which contain metadata

								      contain information which can follow the document around. For

								      example, it is reasonable for a cache to pass such

								      information on without treatment, it is reasonable for

								      clients or other programs which process data to store those

								      headers as metadata with the document for later processing.

								      The content of those headers do not have to be associated

								      with that particular HTTP transaction. By contrast, the

								      RFC822 headers in HTTP which deal specifically with the

								      transaction or deal specifically with the TCP link between

								      the two application programs have a shorter scope and can

								      only be regarded as parameters of the HTTP method. To make

								      this separation clear will be to make it easier not only to

								      understand HTTP and how it should be processed, it will also

								      make it clear which pieces of HTTP can be used easily and

								      transparently by other protocols which may use different

								      methods with different parameters. The clarification of the

								      architecture of HTTP such that both the metadata and the

								      methods can be extended into other domains is an important

								      part of the work of the World Wide Web Consortium. The

								      Internet protocols SMTP and NNTP and HTTP as well as many new

								      and proposed protocols share much of the semantics of the

								      RFC822 headers. Formalizing the shared space and making it

								      clear that there is a single design for a particular header,

								      rather than four designs which are independent and happen to

								      look very similar, requires a general architecture, some

								      careful thought, and is essential for the future design of

								      protocols. It will allow protocol design to happen in small

								      groups which can take for granted the bulk of previous work

								      and concentrate on independent new design.

								    </p>

								    <h4>

								      Authorship of HTTP entity headers

								    </h4>

								    <p>

								      It may be possible to remove or at least encompass the

								      apparent anomaly of metadata transferred from an HTTP server

								      by creating a special link type which links the document

								      itself to the set of attributes which the server would give

								      in the HTTP headers. In other words, the server would be able

								      to say, "here is a document, here is some metadata about it,

								      and the metadata about it has the following URL". This would

								      allow one, for example, request a signed copy of the HTTP

								      headers. It would allow one to ask about the intellectual

								      property rights of those headers, and the authorship of those

								      headers.

								    </p>

								    <p>

								      It is important to be completely clear about the authorship

								      of the HTTP headers. The server should be seen as a software

								      agent acting on behalf of a party which is the publisher or

								      document author: the definer of the URI to resource identity

								      mapping. The webmaster is only an administrator who is

								      responsible for ensuing that (through an appropriately

								      configured server) the transactions on the wire faithfully

								      represent the statements and wishes of that party.

								    </p>

								    <h2>

								      Links<br />

								    </h2>

								    <p>

								      An assertion of relationship between two resources is known

								      as a <b>link</b>.

								    </p>

								    <p>

								      In this case, it is a triple

								    </p>

								    <blockquote>

								      (<i>A u1 u2</i>)

								    </blockquote>

								    <p>

								      of:

								    </p>

								    <ul>

								      <li>the type of assertion being made, that is, the

								      relationship which is being asserted,

								      </li>

								      <li>the first URI,

								      </li>

								      <li>and the second URI.

								      </li>

								    </ul>

								    <p>

								      These sorts of assertions, links, are the basis of navigation

								      in the World Wide Web; they can be used for building

								      structure within the World Wide Web and also for creating a

								      semantic Web which can express knowledge about the world

								      itself. That is to say, links may be used both for the

								      structure of data, in which case they are metadata, but also

								      they may be used as a form of data.

								    </p>

								    <p>

								      Links, like all metadata can be transferred in three ways.

								      They can be embedded in a document, which is one end of the

								      link, they can be transferred in an HTTP message, for example

								      what is called the header of the document, and they can be

								      stored in a third document. This latter method has not been

								      used widely on the World Wide Web to date.

								    </p>

								    <h2>

								      Goal: <a name="Self-descr" id="Self-descr">Self-describing

								      information</a><br />

								    </h2>

								    <p>

								      A critical part of the design of the whole system is the way

								      that the semantics of metadata or indeed of data are defined.

								      The semantics of metadata in our RFC822 headers in mail

								      messages and in http messages are defined by hand in english

								      in the specifications of those protocols. The PICS system

								      takes this to one stage further in terms of flexibility by

								      allowing a message to contain a pointer to the document which

								      defines, in human readable terms, the semantics of each

								      assertion made within a <a href="#PICS">PICS</a> label. In

								      the future we would like to move toward a state in which any

								      metadata or eventually any form of machine readable data

								      carries a reference to the specification of the semantics of

								      all the assertions made within it.

								    </p>

								    <p>

								      For example, suppose that when a link is defined between two

								      documents, the relationship which is being asserted is

								      defined in a such way that it can be looked up on the World

								      Wide Web (i.e. using some form of URI), and someone or some

								      program, which has not come across that relationship before

								      can follow the link and extend its understanding or

								      functionality to take advantage of this new form of

								      assertion.

								    </p>

								    <p>

								      In the case of PICS, one can dynamically pick up a human

								      readable definition of what that assertion really means. In

								      PICS (and in theory in SGML using DTDs), one can also pick up

								      a machine readable definition of what form that assertion can

								      take, what syntax, what types of parameters it can take. This

								      allows a human interface to a new PICS scheme to built on the

								      fly. To go one step further, one could, given a suitable

								      logic or knowledge representation language, pick up a machine

								      readable definition of the semantics of that assertion in

								      terms of other relationships.

								    </p>

								    <p>

								      The advantages of such self describing information is that it

								      allows development of new applications and new functionality

								      independently by many groups across the web. Without

								      self-describing information, development must wait for large

								      companies or standards committees to meet and agree on the

								      commonly agreed semantics.

								    </p>

								    <p>

								      Of course a pragmatic way of extending software to handle new

								      forms of information is to dynamically download the code to

								      support a software object which can handle such data for one.

								      Whereas this is a powerful technique, and one which will be

								      used increasingly, it is not sufficient. It is not sufficient

								      because one has to trust the implementation of the object,

								      and the state.

								    </p>

								    <h4>

								      Goal

								    </h4>

								    <table border="1" cellpadding="2">

								      <tbody>

								        <tr>

								          <td>

								            As much as possible of the syntax and semantics should

								            be able to be acquired by reference from a metadata

								            document.

								          </td>

								        </tr>

								      </tbody>

								    </table>

								    <h3>

								      Building Applications using Link Relationships

								    </h3>

								    <p>

								      It turns out that a very large number of applications both

								      built on top of the web and also built within the

								      infrastructure of the Web can largely be built by defining

								      new relationship types. Examples of these are the document

								      versioning problem which can be largely solved by defining

								      link values relating documents to previous and future

								      versions and to lists of versions; intellectual property

								      rights, distribution terms, and other labeling which can be

								      solved by making a link from one document to the document

								      containing the metadata.

								    </p>

								    <hr />

								    <h3>

								      Summary so far

								    </h3>

								    <ol>

								      <li>Metadata is data

								      </li>

								      <li>Metadata may refer to any resource which has a URI

								      </li>

								      <li>Metadata may be stored in any resource no matter to which

								      resource it refers

								      </li>

								      <li>Metadata can be regarded as a set of assertions, each

								      assertion being about a resource &nbsp;(A &nbsp;<i>u1</i>

								      &nbsp;...).

								      </li>

								      <li>Assertions which state a named relationship between two

								      resources are known links &nbsp;(A <i>u1 u2</i>)

								      </li>

								      <li>Assertion types (including link relationships) should be

								      first class objects in the sense that they should be able to

								      be defined in addressable resources and referred to by the

								      address of that resource &nbsp;A in { u }

								      </li>

								      <li>The development of new assertion types and link

								      relationships should be done in a consistent manner so that

								      these sort of assertions can be treated generically by people

								      and by software.

								      </li>

								    </ol>

								    <hr />

								    <p>

								      <i>Rough from here on down</i>

								    </p>

								    <h3 id="Label">

								      Label syntax: Assertions about a common subject

								    </h3>

								    <p>

								      When labeling information, it is often useful to make a lot

								      of statements about the same object. It is also useful to be

								      able to make the same set of &nbsp;statements about a set of

								      resources. For example, the assertions

								    </p>

								    <pre>

								(A1 u1  a b ... )

								 (A2 u1  c d )

								 (A2 u1  a f g h )


								</pre>

								    <p>

								      might be written

								    </p>

								    <pre>

								(for u1

								         (A1 a b ... )

								         (A2 c d )

								         (A3 a f g h )

								 )


								</pre>

								    <p>

								      Therefore in the syntax of an actual assertion the subject is

								      implicit. This is just the case with RFC822 headers which

								      implicitly refer to the following body, and with HTML "HEAD"

								      element contents which implicitly refer to the containing

								      document. &nbsp;(Though notice there is a fundamental

								      difference, discussed <a href=

								      "w:/DesignIssues/temp.html#mesages">below</a>, between a

								      general label and a message header because the message header

								      is definitive.)

								    </p>

								    <p>

								      So it is wise to recognise the label as case which it is wise

								      to specifically optimize in the syntax. <em>[In RDF this

								      indeed the case, that the subject is established as a

								      context, and then many properties are given within that

								      context. -2000/9]</em>

								    </p>

								    <p>

								      Assertions, when the subject is implicit, are known as

								      attribute-value pairs as discussed above. Let's use the term

								      "label" for a set of assertions with the subject extracted.

								      &nbsp;Like the label on a jam jar, it contains information

								      but there must be something else (in this case if its

								      placement on the jar) which tells you to what it applies.

								      &nbsp;(The PICS label in fact contained other information

								      too, including the subject and meta-meta-data about the

								      authorship of the label.)

								    </p>

								    <p>

								      Local definition:

								    </p>

								    <table border="1" cellpadding="2">

								      <tbody>

								        <tr>

								          <td>

								            A label is a set of assertions with a common implicit

								            subject. &nbsp;In this architecture it is a set of

								            attribute-value pairs

								          </td>

								        </tr>

								      </tbody>

								    </table>

								    <p>

								      <i>(There is a convention that you can write "Jam" on a jam

								      jar label. &nbsp;You don't write "Jam jar" or "Jam Jar

								      label". &nbsp; Even though I once saw a label on a cardboard

								      box with the words "Equipment shipping box label" on it!)</i>

								    </p>

								    <h3>

								      Authorship of Metadata

								    </h3>

								    <p>

								      It follows from the fact that metadata is data that here can

								      be metadata about it. &nbsp;Some of this metadata becomes

								      crucial when we consider a trust model. &nbsp;The logic we

								      need includes the author of metadata

								    </p>

								    <p>

								      p1: (A u1 . . .)

								    </p>

								    <p>

								      where p1 is ,in a system with low trust, the author as

								      stated, but in a cryptographically secure system is a

								      principle represented by a key.

								    </p>

								    <p>

								      On the web, the granularity of information is the resource.

								      Authorship and access control genrally use this granularity.

								      Therefore, typically, the trust one places in an assertion is

								      function the document which asserted it, and the metadata

								      about that document. However, when information is then

								      combined from many resources, one needs a language which

								      allows the source of the original to be recorded. Like

								      blockquote in HTML, this separates the data itself from the

								      resource, so the resource does assert the data directly but

								      asserts that it was asserted.

								    </p>

								    <h2>

								      Analysing labels

								    </h2>

								    <p>

								      See <a href="Labels.html">Analysing PICS labels as generic

								      Metadata</a>

								    </p>

								    <p>

								      where we look at PICS labels and try to sift out the actual

								      semantics of them. This is a thought experiment generating

								      requiremnts. The conclusions are that information such as

								      authorship and date information in fact form a tree of

								      assertions about assertions, and it is important to be clear

								      about the structure of that tree. The notion of a message is

								      brought up there too, but not followed up as it is not

								      germaine to the discussion at this point.

								    </p>

								    <h2>

								      Algebraic Manipulations

								    </h2>

								    <p>

								      If you can make assumptions about the properties of labels

								      then you can manipulate them, possibly without knowing

								      everything about their meaning. &nbsp;Properties such as

								      commutativity, transitivity and associativity would be very

								      useful to have easily available: perhaps in the syntax, or

								      failing that in the schema.

								    </p>

								    <p>

								      [See <a href="Semantic.html">Semantic Web roadmap</a> for

								      higher levels of logic]

								    </p>

								    <p>

								      For example, given a label saying a pair of jeans has a 32

								      inch waist and a price of $28, I can deduce a label which

								      just has the price of $28. &nbsp;But given a label which says

								      that the punishment for the crime is a 2 month in jail and a

								      fine of $3000, &nbsp;I can't deduce one that says that that

								      the punishment &nbsp;is 2 months in jail.

								    </p>

								    <p>

								      A typical use of metadata will be to provide a statement

								      along with its proof to be verified by another party.

								      &nbsp;Being able to process these things efficiently and with

								      limited knowledge will be crucial.

								    </p>

								    <p>

								      The most practical way to do this is to create a basic

								      commonvocabulary for the logical functions. Sometimes known

								      as the "RDF upper layers", these are mentioned in the

								      <a href="Semantic.html">note on the Semantic Web.</a>

								    </p>

								    <h4>

								      Ordered/Unordered

								    </h4>

								    <p>

								      The <a href="#independent">axiom of independence of

								      assertions</a> above gives us that in any set of assertions,

								      as assertions are independently true, specific assertions may

								      be removed or reordered, leaving the document just as valid

								      (though possibly less informative).

								    </p>

								    <p>

								      Examples of unordered things currently are: RFC822 message

								      header lines, SGML attributes. Examples of ordered things

								      are: HTTP header lines and SGML elements.

								    </p>

								    <p>

								      Do we need a form in which we can make an assertion which has

								      many parameters which are in fact not mutable in any way?

								    </p>

								    <h2>

								      Summary of Requirements

								    </h2>

								    <p>

								      There are ways of representing &nbsp;the above things:

								      &nbsp;messages, labels, specifying labels, and statements and

								      distinguish between them.

								    </p>

								    <p>

								      As much as possible of the syntax and semantics should be

								      able to be acquired by reference from a metadata document.

								    </p>

								    <p>

								      It must be possible to mix multiple vocabularies within the

								      same scope.

								    </p>

								    <p>

								      The syntax and structure should be such that as many

								      manipulations as possible can be done without having to know

								      the semantics of the vocabulary in use.

								    </p>

								    <p>

								      A common voabulary for basic logic and knowledge

								      representation functionality will be required.

								    </p>

								    <hr />

								    <h2>

								      References

								    </h2>

								    <p>

								      <a name="PICS" id="PICS">PICS</a> - The PICS project was a

								      project to define standards for interchange of endorsement

								      information, aimed at the content filterting problem. See the

								      PICS home page.

								    </p>

								    <hr />

								    <address>

								      Tim BL, &nbsp;January 1997

								      <p>

								        Last edit $Date: 2009/08/27 21:38:08 $

								      </p>

								    </address>

								  </body>

								</html>