You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
2628 lines
116 KiB
2628 lines
116 KiB
<?xml version="1.0" encoding="iso-8859-1"?>
|
|
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
|
|
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
|
|
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
|
|
<head>
|
|
<title>Common HTTP Implementation Problems</title>
|
|
<link rel="stylesheet" type="text/css" href="style.css" />
|
|
<link rel="stylesheet" type="text/css" href="http://www.w3.org/StyleSheets/TR/W3C-NOTE" />
|
|
<meta name="Keywords" content="QA, quality assurance, conformance,
|
|
validity, server, http, uri, cache, content-negotiation" />
|
|
<meta name="Description" content="Better understanding server-side
|
|
Web technologies to avoid misusing them" />
|
|
<link rel="contents" href="#contents" />
|
|
<link rel="copyright" href="http://www.w3.org/Consortium/Legal/" />
|
|
<link rel="schema.DC" href="http://purl.org/dc" />
|
|
<meta name="DC.Subject" lang="en" content="server,http, uri, cache, content-negotiation" />
|
|
<meta name="DC.Title" lang="en" content="Common HTTP Implementation Problems" />
|
|
<meta name="DC.Description.Abstract" lang="en" content="Better understanding
|
|
server-side Web technologies to avoid misusing them" />
|
|
<meta name="DC.Date.Created" content="2002-08-22" />
|
|
<meta name="DC.Language" scheme="RFC1766" content ="en" />
|
|
<meta name="DC.Creator" content="Olivier Thereaux" />
|
|
<meta name="DC.Publisher" content="W3C - World Wide Web Consortium - http://www.w3.org" />
|
|
<meta name="DC.Rights" content="http://www.w3.org/Consortium/Legal/" />
|
|
</head>
|
|
<body>
|
|
|
|
<div class="head">
|
|
<p>
|
|
<a href="http://www.w3.org/">
|
|
<img
|
|
src="http://www.w3.org/Icons/w3c_home" alt="W3C" height="48"
|
|
width="72" /></a></p>
|
|
|
|
<h1>Common <acronym title="Hypertext Transfer Protocol">HTTP</acronym> Implementation Problems</h1>
|
|
<h2>W3C Note 28 January 2003</h2>
|
|
<dl>
|
|
<dt>This version:</dt>
|
|
<dd><a href="http://www.w3.org/TR/2003/NOTE-chips-20030128/">http://www.w3.org/TR/2003/NOTE-chips-20030128/</a></dd>
|
|
<dt>Latest version:</dt>
|
|
<dd><a href="http://www.w3.org/TR/chips">http://www.w3.org/TR/chips</a></dd>
|
|
<dt>Previous version:</dt>
|
|
<dd>n/a</dd>
|
|
<dt>Translations of this document:</dt>
|
|
<dd><a href="http://www.w3.org/QA/translations#chips">
|
|
http://www.w3.org/QA/translations#chips</a></dd>
|
|
<dt>Techniques for this document:</dt>
|
|
<dd><a href="http://www.w3.org/QA/2002/12/chips-techniques">
|
|
http://www.w3.org/QA/2002/12/chips-techniques</a></dd>
|
|
<dt>Editor:</dt>
|
|
<dd><a href="http://www.w3.org/People/olivier/">Olivier Théreaux</a>, W3C</dd>
|
|
<dt>Authors and contributors:</dt>
|
|
<dd>See <a href="#acknowledgments">Acknowledgments</a>.</dd>
|
|
</dl>
|
|
|
|
<p class="copyright"><a href="http://www.w3.org/Consortium/Legal/ipr-notice#Copyright"> Copyright</a> © 2003 <a href="http://www.w3.org/"><acronym title="World Wide Web Consortium">W3C</acronym></a><sup>®</sup> (<a href="http://www.lcs.mit.edu/"><acronym title="Massachusetts Institute of Technology">MIT</acronym></a>, <a href="http://www.ercim.org/"><acronym title="European Research Consortium for Informatics and Mathematics">ERCIM</acronym></a>, <a href="http://www.keio.ac.jp/">Keio</a>), All Rights Reserved. W3C <a href="http://www.w3.org/Consortium/Legal/ipr-notice#Legal_Disclaimer">liability</a>, <a href="http://www.w3.org/Consortium/Legal/ipr-notice#W3C_Trademarks">trademark</a>, <a href="http://www.w3.org/Consortium/Legal/copyright-documents">document use</a> and <a href="http://www.w3.org/Consortium/Legal/copyright-software">software licensing</a> rules apply.</p>
|
|
|
|
</div>
|
|
|
|
<hr />
|
|
<h2><a name="abstract" id="abstract">Abstract</a></h2>
|
|
<p>This document is a set of good practices to improve implementations
|
|
of <acronym title="the Hypertext Transfer Protocol">HTTP</acronym> and related standards
|
|
as well as their use. It explains a few basic concepts, points out common mistakes
|
|
and misbehaviors, and suggests "best practices".</p>
|
|
|
|
<p>This document does <strong>not</strong> incriminate any specific product.
|
|
W3C does not track bugs or errors in implementations.
|
|
That information is generally tracked by the vendors themselves,
|
|
or third parties.</p>
|
|
|
|
<h2><a name="sotd" id="sotd">Status of this document</a></h2>
|
|
<h3><a name="sotd-pub-status" id="sotd-pub-status">Publication status</a></h3>
|
|
|
|
<p>This document is the first public version of a Note,
|
|
published on January 28th, 2003, and made available for
|
|
discussion only by the editor and authors as part of their work as
|
|
W3C Team participants in the <a href="http://www.w3.org/QA/">Quality Assurance</a>
|
|
<a href="http://www.w3.org/QA/Activity">Activity</a>.
|
|
Publication of this Note by W3C does not imply endorsement by W3C,
|
|
including the W3C Team and Membership.
|
|
</p>
|
|
|
|
<p>This document may be updated, replaced, or obsoleted by other documents
|
|
at any time.</p>
|
|
|
|
<h3><a name="sotd-comments" id="sotd-comments">Comments</a></h3>
|
|
<p> No formal commitment is made by W3C to invest additional
|
|
resources in topics addressed by this Note. However, comments are welcome
|
|
and the W3C <a href="http://www.w3.org/QA/">Quality Assurance</a> Team
|
|
may publish an amended version should the amount and quality
|
|
of the received comments prove it worthwhile or necessary.</p>
|
|
|
|
<p>Please send comments to the
|
|
<a href="http://lists.w3.org/Archives/Public/www-qa/">publicly archived</a>
|
|
mailing-list of the
|
|
<a href="http://www.w3.org/QA/IG/">Quality Assurance Interest Group</a>:
|
|
<a href="mailto:www-qa@w3.org">www-qa@w3.org</a>.
|
|
</p>
|
|
|
|
<p>A list of <a href="http://www.w3.org/QA/2002/12/chips-errata">acknowledged
|
|
errors and proposed corrections</a> can be found at
|
|
http://www.w3.org/QA/2002/12/chips-errata.</p>
|
|
|
|
<h3><a name="sotd-translat" id="sotd-translat">Translation</a></h3>
|
|
<p>Translation of this document is welcome. However, before
|
|
starting a translation of this document, please be sure to read the
|
|
<a href="http://www.w3.org/Consortium/Legal/IPR-FAQ.html#translate">
|
|
information on translations</a>, in our
|
|
<a href="http://www.w3.org/Consortium/Legal/IPR-FAQ.html">
|
|
Copyright <abbr title="Frequently Asked Questions">FAQ</abbr></a>,
|
|
and check the <a href="http://www.w3.org/QA/translations#chips">
|
|
list of existing translations</a> of this document (available at
|
|
http://www.w3.org/QA/translations#chips).
|
|
.</p>
|
|
|
|
<h3><a name="sotd-othertr" id="sotd-othertr">Other W3C Technical Reports and publications</a></h3>
|
|
<p>A list of current <a href="http://www.w3.org/TR/">W3C technical
|
|
reports and publications</a>, including Working Drafts and Notes,
|
|
can be found at http://www.w3.org/TR/.</p>
|
|
|
|
|
|
|
|
|
|
<hr />
|
|
|
|
<h2><a name="contents" id="contents">Table of Contents</a></h2>
|
|
<ul class="toc">
|
|
<li><a href="#intro">Introduction</a>
|
|
<ul>
|
|
<li><a href="#scope">Scope of this document</a>
|
|
<ul>
|
|
<li><a href="#organization">Organization of this document : Guidelines, checkpoints</a></li>
|
|
<li><a href="#cp-target">Targets associated with the checkpoints</a></li>
|
|
<li><a href="#glcpex">An example of guideline and checkpoint</a></li>
|
|
</ul>
|
|
</li>
|
|
<li><a href="#conformance">Conformance to this document</a></li>
|
|
<li><a href="#techniques">Techniques related to this note</a></li>
|
|
</ul>
|
|
</li>
|
|
<li><a href="#uri">1. Understanding <acronym title="Uniform Resource Identifier">URI</acronym>s</a>
|
|
<ul>
|
|
<li><a href="#gl1">Guideline 1: Choose <acronym title="Uniform Resource Identifier">URI</acronym>s wisely</a></li>
|
|
<li><a href="#gl2">Guideline 2: Allow <acronym title="Uniform Resource Identifier">URI</acronym> management</a></li>
|
|
<li><a href="#gl3">Guideline 3: Use independent <acronym title="Uniform Resource Identifier">URI</acronym>s</a></li>
|
|
<li><a href="#gl4">Guideline 4: Use standard redirects
|
|
for content that changes</a></li>
|
|
<li><a href="#gl5">Guideline 5: Provide indexing agents
|
|
with useful information</a></li>
|
|
<li><a href="#gl6">Guideline 6: Provide appropriate
|
|
caching information</a></li>
|
|
</ul>
|
|
</li>
|
|
<li><a href="#content">2. Serving content appropriately</a>
|
|
<ul>
|
|
<li><a href="#gl7">Guideline 7: Server-driven content
|
|
negotiation</a></li>
|
|
<li><a href="#gl8">Guideline 8: Provide useful metadata
|
|
in addition to content negotiation</a></li>
|
|
<li><a href="#gl9">Guideline 9: Provide default and
|
|
fall-back solutions</a></li>
|
|
<li><a href="#gl10">Guideline 10: Serve resources with correct
|
|
content-type and character encoding information</a></li>
|
|
<li><a href="#gl11">Guideline 11: Use flexible technology instead of
|
|
client sniffing/blocking</a></li>
|
|
<li><a href="#gl12">Guideline 12: Enrich and Enhance</a></li>
|
|
</ul>
|
|
</li>
|
|
<li><a href="#checklists">Tabular checklist of guidelines and checkpoints</a></li>
|
|
<li><a href="#acknowledgments">Acknowledgments</a></li>
|
|
<li><a href="#references">References</a></li>
|
|
</ul>
|
|
|
|
|
|
<hr />
|
|
|
|
<h2><a name="intro" id="intro">Introduction</a></h2>
|
|
<p><acronym title="the Hypertext Transfer Protocol">HTTP</acronym>
|
|
and <acronym title="Uniform Resource Identifier">URI</acronym>s
|
|
are the basis of the World Wide Web, yet they are often
|
|
misunderstood, and their implementations and uses are sometimes
|
|
incomplete or incorrect.</p>
|
|
<p>This document tries to improve this situation by providing
|
|
a set of good practices to improve implementations
|
|
of <acronym title="the Hypertext Transfer Protocol">HTTP</acronym> and related standards
|
|
(Web servers, server-side Web engines), as well as their use.</p>
|
|
|
|
<p>This document only deals with the server-side aspect of
|
|
<acronym title="the Hypertext Transfer Protocol">HTTP</acronym>,
|
|
people looking for <acronym title="the Hypertext Transfer Protocol">HTTP</acronym>
|
|
implementation problems in Web user agents should have a look at the
|
|
user-agent counterpart of this document :
|
|
"<a href="/TR/cuap.html">Common User-Agent Problems</a>" [<a href="#ref-CUAP">CUAP</a>].</p>
|
|
|
|
<h3><a name="scope" id="scope">Scope of this document</a></h3>
|
|
<p id="scope-targets">This document is a set of known problems and/or good practices
|
|
for <acronym title="the Hypertext Transfer Protocol">HTTP</acronym>
|
|
implementations and their use, aimed at:</p>
|
|
<ul>
|
|
<li>developers implementing Web servers or proxies,</li>
|
|
<li>developers implementing server-side scripting languages and engines,
|
|
web content management or generation systems, etc.
|
|
(referred to, across this document, as "Server-side engine developers"),</li>
|
|
<li>and webmasters, Web site managers (referred to,
|
|
across this document, as "Content Managers").</li>
|
|
</ul>
|
|
|
|
<p>Unless specifically mentioned, what is referred throughout this document as
|
|
"<acronym title="Hypertext Transfer Protocol">HTTP</acronym>" is RFC2616,
|
|
<abbr title="also known as">a.k.a.</abbr>
|
|
<acronym title="Hypertext Transfer Protocol">HTTP</acronym>/1.1
|
|
[<a href="#ref-RFC2616">RFC2616</a>].</p>
|
|
|
|
<h4 id="organization">Organization of this document : Guidelines, checkpoints</h4>
|
|
|
|
<p>This document's organization is inspired from
|
|
<acronym title="the Web Acessibility Initiative">WAI</acronym> guidelines, especially
|
|
<acronym title="User Agent Accessibility Guidelines"><a href="http://www.w3.org/TR/UAAG10/">UAAG</a>
|
|
</acronym>.</p>
|
|
|
|
|
|
<p>This document is divided into 12 guidelines and associated checkpoints.
|
|
Each guideline is a general good practice, whereas the associated checkpoints
|
|
are practical applications of the guideline. Checkpoints are themselves divided in
|
|
one or more provision(s). </p>
|
|
|
|
<p>A guideline can, and will in most cases, have several associated checkpoints.</p>
|
|
|
|
<h4 id="cp-target">Targets associated with the checkpoints</h4>
|
|
<p>Checkpoints and their provisions are tagged according to their <a href="#scope-targets">primary target</a>.</p>
|
|
<ul>
|
|
<li id="target-expl-si">Checkpoints targeted at server (Web servers or proxies)
|
|
implementors are tagged as <span class="cp-target">SI</span>,</li>
|
|
<li id="target-expl-ss">Checkpoints targeted at server-side engine
|
|
(server-side scripting languages and engines,
|
|
web content management or generation systems, etc.)
|
|
developers are tagged as <span class="cp-target">SS</span>,</li>
|
|
<li id="target-expl-cm">Checkpoints targeted at content managers
|
|
(webmasters, Web site managers)
|
|
are tagged as <span class="cp-target">CM</span>.</li>
|
|
</ul>
|
|
<p>If a checkpoint is applicable to several or all of these targets,
|
|
it will have several tags.The target of a checkpoint is the sum of
|
|
the target of its provisions.</p>
|
|
|
|
<h4 id="glcpex">An example of guideline and checkpoint</h4>
|
|
<p>Here is an example of a guideline, with an associated checkpoint. Note the
|
|
way they are presented, the multiple tags for the multiple targets of the checkpoint, etc.:</p>
|
|
<div id="example-of-gl-and-cp" style="border: 1px solid red; padding : 1em">
|
|
<h3 class="gl">Guideline 0 (example): <a name="gl-example" id="gl-example">Show, don't tell</a></h3>
|
|
<div class="checkpoint" id="cp0.1" title="Example">
|
|
<div class="cp-head">
|
|
<span class="cp-number">0.1: </span>
|
|
<span class="cp-title">Example</span>
|
|
<span class="cp-target"> <a href="#target-expl-ss"><abbr title="Server-side Engines">SS</abbr></a> </span>
|
|
<span class="cp-target"> <a href="#target-expl-cm"><abbr title="Content Manager">CM</abbr></a> </span>
|
|
</div>
|
|
|
|
<p class="cp">An example can be worth thousands of explanations.</p>
|
|
<ol>
|
|
<li class="cp-prov"><p><span class="cp-title">sample provision for this checkpoint</span>
|
|
<span class="cp-target"> <a href="#target-expl-cm"><abbr title="Content Manager">CM</abbr></a> </span>
|
|
</p>
|
|
|
|
<p class="cp">Here is a sample checkpoint text, within a sample guideline,
|
|
with the actual markup used for guidelines and checkpoints.</p>
|
|
</li>
|
|
<li class="cp-prov"><p><span class="cp-title">another sample provision for this checkpoint</span>
|
|
<span class="cp-target"> <a href="#target-expl-ss"><abbr title="Server-side Engines">SS</abbr></a> </span>
|
|
<span class="cp-target"> <a href="#target-expl-cm"><abbr title="Content Manager">CM</abbr></a> </span>
|
|
</p>
|
|
<p class="cp">In our example, the checkpoint has two provisions.</p>
|
|
<p class="cp, example"><span class="example-good">Example</span>:<br />
|
|
Checkpoints may include example, too.</p>
|
|
</li>
|
|
</ol>
|
|
</div>
|
|
</div>
|
|
<h3><a name="conformance" id="conformance">Conformance to this document</a></h3>
|
|
<p><strong>This document is informative</strong>.</p>
|
|
|
|
<p>This document has no conformance <em>per se</em>, but since it is about
|
|
implementation and use of normative specifications (such as
|
|
<acronym title="the Hypertext Transfer Protocol">HTTP</acronym>/1.1),
|
|
one should consider following this set of guidelines
|
|
as a good step toward conformance to these specifications.</p>
|
|
|
|
<p>When possible, normative references will be mentioned for each checkpoint.</p>
|
|
|
|
<p>This document uses RFC 2119 [<a href="#ref-RFC2119">RFC2119</a>] keywords
|
|
(capitalized MUST, MAY, SHOULD etc.) when referring to behaviors clearly defined
|
|
by a normative specification. When not capitalized, these words should be interpreted
|
|
as regular language and not as RFC2119 keywords.</p>
|
|
|
|
|
|
|
|
<h3><a name="techniques" id="techniques">Techniques related to this note</a></h3>
|
|
|
|
<p>As specified in the <a href="#abstract">abstract</a>, <cite>This
|
|
document does not incriminate any specific product. W3C does not
|
|
generally track bugs or errors in implementations</cite>. However,
|
|
we welcome implementors and advanced users of such technologies to
|
|
contribute to this document by providing techniques related to this
|
|
note's applicable guidelines and checkpoints for a specific
|
|
implementation.</p>
|
|
|
|
<p>Contributions are welcome in the
|
|
<a href="http://lists.w3.org/Archives/Public/www-qa/"><strong>publicly archived</strong></a>
|
|
mailing-list of the <a href="http://www.w3.org/QA/IG/">Quality Assurance Interest Group</a>:
|
|
<a href="mailto:@w3.org">www-qa@w3.org</a>.
|
|
The <a href="http://lists.w3.org/Archives/Public/www-qa/">public archives
|
|
for this list</a> acts as a repository of contributions.
|
|
A <a href="http://www.w3.org/QA/2002/12/chips-techniques">list of acknowledged
|
|
contributions</a> is available at http://www.w3.org/QA/2002/12/chips-techniques.</p>
|
|
|
|
<hr />
|
|
|
|
<h2>1.<a name="uri" id="uri">Understanding URIs</a></h2>
|
|
|
|
<p>
|
|
We shall start by explaining in details <acronym
|
|
title="Uniform Resource Identifier">URI</acronym>s, and their underlying concepts.
|
|
</p>
|
|
|
|
<p>
|
|
</p>
|
|
|
|
<p><acronym title="Uniform Resource Identifier">URI</acronym>s are defined in:</p>
|
|
<ul>
|
|
<li><a href="http://www.ietf.org/rfc/rfc1630.txt">RFC1630 : "Universal Resource
|
|
Identifiers in WWW"</a> [<a href="#ref-RFC1630">RFC1630</a>],
|
|
and </li>
|
|
<li><a href="http://www.ietf.org/rfc/rfc2396.txt">RFC2396 :"Uniform Resource
|
|
Identifiers (URI): Generic Syntax"</a> [<a href="#ref-RFC2396">RFC2396</a>].
|
|
</li>
|
|
</ul>
|
|
|
|
<p>A common mistake, responsible for many <acronym title="the Hypertext Transfer Protocol">
|
|
HTTP</acronym> implementations problems, is to think this is equivalent
|
|
to a filename within a computer system.
|
|
This is wrong. URIs have, conceptually, nothing to do with a file system.
|
|
One should remember that at all times when dealing with the World Wide Web.
|
|
</p>
|
|
|
|
<p>To understand properly what a <acronym title="Uniform Resource Identifier">URI
|
|
</acronym> is, one has to think of the
|
|
World Wide Web as a giant warehouse with an enormous amount
|
|
of merchandise stored in boxes.</p>
|
|
<p>In this warehouse, a <acronym title="Uniform Resource Identifier">URI</acronym>
|
|
is not "row 12, 42nd box". A <acronym title="Uniform Resource Identifier">URI</acronym>
|
|
is not "that big black box over there", nor the content of the box. The URI is, exactly
|
|
"The toothbrush can be found at row 12, 42nd box".
|
|
</p>
|
|
|
|
<p>A <acronym title="Uniform Resource Identifier">URI</acronym> is,
|
|
actually, a <em>reference to a resource, with fixed and independent semantics</em>.
|
|
An interpretation of this definition is that the
|
|
<acronym title="Uniform Resource Identifier">URI</acronym> is some sort of
|
|
serial number for one of the many merchandises in the warehouse.
|
|
"Fixed semantics" means that we know that in a box referenced by this serial number,
|
|
there will be a specific product (we'll use a toothbrush for our metaphor).
|
|
<strong>Always</strong>. We know neither the color nor
|
|
the shape of the toothbrush, but we are certain that whenever and however we
|
|
<em>dereference</em> the <acronym title="Uniform Resource Identifier">URI</acronym>
|
|
(which means, whatever way one (whoever) chooses to learn which box is referenced
|
|
by the <acronym title="Uniform Resource Identifier">URI</acronym>
|
|
, the resource will <strong>always</strong> be a toothbrush.</p>
|
|
|
|
<p>Note that the <acronym title="Uniform Resource Identifier">URI</acronym>
|
|
is not <strong>exactly</strong> a serial number, since a serial number
|
|
does not have any specific semantic, and it can be a reference to
|
|
multiple instances.</p>
|
|
|
|
<p>Also, if you upgrade from toothbrush to a newer version of the toothbrush ("toothbrush v2"),
|
|
the serial number may change. However, its definition "our toothbrush" will not change.
|
|
One may thus think of the <acronym title="Uniform Resource Identifier">URI</acronym>
|
|
as being the identification of a specific semantic, and the
|
|
<acronym title="the Hypertext Transfer Protocol">HTTP</acronym> <a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.19"><code>ETag</code></a> ([<a href="#ref-RFC2616">RFC2616</a>] section 14.19) being the real serial number.
|
|
</p>
|
|
<p>
|
|
The <acronym title="Uniform Resource Identifier">URI</acronym>
|
|
http://www.example.com/products/toothbrush is then a fixed reference to a
|
|
specific semantic, rather than being a serial number.
|
|
</p>
|
|
|
|
<p>
|
|
Note also that the <acronym title="the Hypertext Transfer Protocol">HTTP</acronym>
|
|
Etag can be shared by identical resources that
|
|
have different <acronym title="Uniform Resource Identifier">URI</acronym>s.
|
|
For example, if <code>http://mirror1.example.org/foo</code>
|
|
and <code>http://mirror2.example.org/foo</code> share the
|
|
same ETags, you can then deduct that those are equivalent resources.</p>
|
|
|
|
|
|
|
|
<div>The warehouse metaphor pointed out three major points about
|
|
<acronym title="Uniform Resource Identifier">URI</acronym>s:
|
|
|
|
<ol>
|
|
<li>a <acronym title="Uniform Resource Identifier">URI</acronym>
|
|
is a reference to a resource</li>
|
|
<li>The reference has fixed semantics</li>
|
|
<li>The reference has independent semantics</li>
|
|
</ol>
|
|
</div>
|
|
|
|
<p>The fixed semantics of a <acronym title="Uniform Resource Identifier">URI</acronym>
|
|
is one of the most important, yet often overlooked, concepts about
|
|
<acronym title="Uniform Resource Identifier">URI</acronym>s.
|
|
</p>
|
|
<p>
|
|
<a href="http://www.w3.org/People/Berners-Lee/">Tim Berners-Lee</a>, creator
|
|
of the World Wide Web, has written in 1998 an article named
|
|
"<a href="http://www.w3.org/Provider/Style/URI.html">Cool
|
|
<acronym title="Uniform Resource Identifier">URI</acronym>s don't change</a>"
|
|
[<a href="#ref-COOLURIs">COOLURIs</a>]
|
|
stressing out this point and explaining how to use
|
|
<acronym title="Uniform Resource Identifier">URI</acronym>s properly.
|
|
</p>
|
|
|
|
<p>Thanks to our warehouse metaphor, it is obvious that
|
|
<acronym title="Uniform Resource Identifier">URI</acronym>s should not change:
|
|
people looking for a resource will have a lot of trouble finding it if
|
|
the actual references for the resource changes, hence making the original
|
|
reference pointing to... nothing. </p>
|
|
|
|
<p>This is all the more important on the Web (more than in our warehouse example)
|
|
because the Web is built upon hyperlinks, which themselves use
|
|
<acronym title="Uniform Resource Identifier">URI</acronym>s.
|
|
When <acronym title="Uniform Resource Identifier">URI</acronym>s are broken,
|
|
following hyperlinks ( or "bookmarks",
|
|
which are a form of hyperlinks ) does not lead to the expected resource.
|
|
In other words, from a server point of view, this means that the resource
|
|
would miss some traffic... Traffic being the final aim of any content provider
|
|
(as selling toothbrushes is the final goal of the warehouse owner),
|
|
behaviors resulting in a loss of traffic should therefore be avoided. </p>
|
|
|
|
<p>As Tim Berners-Lee points out, <q cite="http://www.w3.org/Provider/Style/URI.html">
|
|
When you change a <acronym title="Uniform Resource Identifier">URI</acronym> on your server,
|
|
you can never completely tell who will have links to the old
|
|
<acronym title="Uniform Resource Identifier">URI</acronym>.
|
|
They might have made links from regular Web pages. They might have bookmarked your page.
|
|
They might have scrawled the <acronym title="Uniform Resource Identifier">URI</acronym>
|
|
in the margin of a letter to a friend.</q>.
|
|
In other words, as Jacob Nielsen's writes, <q cite="http://www.useit.com/alertbox/990321.html">
|
|
Persistent URLs Attract Links, Link-rot equals lost business</q>.</p>
|
|
|
|
<p>We have seen why one should avoid breaking <acronym title="Uniform Resource Identifier">
|
|
URI</acronym>s. The following guidelines focus on techniques and strategies to avoid breaking
|
|
<acronym title="Uniform Resource Identifier">URI</acronym>s, or to fix them.</p>
|
|
|
|
<h3 class="gl">Guideline 1: <a name="gl1" id="gl1">Choose
|
|
<acronym title="Uniform Resource Identifier">URI</acronym>s wisely</a> </h3>
|
|
|
|
<p>This section summarizes, paraphrases and extends the section called
|
|
"So what should I do? Designing <acronym title="Uniform Resource Identifier">URI</acronym>s" in
|
|
<a href="http://www.w3.org/Provider/Style/URI.html">Cool <acronym>URI</acronym>s don't change</a>
|
|
[<a href="#ref-COOLURIs">COOLURIs</a>].
|
|
</p>
|
|
|
|
<ul>
|
|
<li>Do not put too much meaning in a <acronym title="Uniform Resource Identifier">URI</acronym>.
|
|
as Berners-Lee writes,
|
|
<q cite="http://www.w3.org/Provider/Style/URI.html">Designing mostly
|
|
means leaving information out</q>.
|
|
If you put too much meaning, too much semantics in your
|
|
<acronym title="Uniform Resource Identifier">URI</acronym>,
|
|
chances are your resource will evolve outside of the semantic frame,
|
|
resulting in an unnecessary division of the resource or change of
|
|
<acronym title="Uniform Resource Identifier">URI</acronym>.</li>
|
|
<li>Use simple <acronym title="Uniform Resource Identifier">URI</acronym>s,
|
|
easy to type, write down, spell, or at least easy to cut and paste.
|
|
They are likely to be easy to be remember if you follow this rule.
|
|
</li>
|
|
</ul>
|
|
|
|
|
|
<div class="checkpoint" id="cp1.1" title="Use short URIs">
|
|
<div class="cp-head">
|
|
<span class="cp-number">1.1: </span>
|
|
<span class="cp-title">Short <acronym title="Uniform Resource Identifier">URI</acronym>s </span>
|
|
<span class="cp-target"><a href="#target-expl-ss"><abbr title="Server-side Engines">SS</abbr></a></span>
|
|
<span class="cp-target"> <a href="#target-expl-cm"><abbr title="Content Manager">CM</abbr></a> </span>
|
|
</div>
|
|
<ol>
|
|
<li class="cp-prov"><p><span class="cp-title">Use short URIs as much as possible</span>
|
|
<span class="cp-target"><a href="#target-expl-ss"><abbr title="Server-side Engines">SS</abbr></a></span>
|
|
<span class="cp-target"> <a href="#target-expl-cm"><abbr title="Content Manager">CM</abbr></a> </span>
|
|
</p>
|
|
<p class="cp">In order to make <acronym title="Uniform Resource Identifier">URI</acronym>s
|
|
easy to type, write down, spell, or remember, they should be short enough.</p>
|
|
|
|
<p class="cp">This checkpoint is not easy to quantify.
|
|
However, we can take into account the fact that e-mail will be used to send
|
|
<acronym title="Uniform Resource Identifier">URI</acronym>s, and e-mail clients
|
|
(sender or receiver) are supposed to wrap at 70-80 characters :
|
|
even though they are not supposed to wrap long
|
|
<acronym title="Uniform Resource Identifier">URI</acronym>s, some do.
|
|
As a result 80 characters is a reasonable total length for
|
|
<acronym title="Uniform Resource Identifier">URI</acronym>s
|
|
(including <acronym title="Uniform Resource Identifier">URI</acronym> scheme,
|
|
e.g "http://", and host name).</p>
|
|
|
|
<p class="cp">Please note, however, that this length limit is by no mean a technical limitation,
|
|
but rather, a practical goal to pursue.</p>
|
|
</li>
|
|
</ol>
|
|
</div>
|
|
|
|
<div class="checkpoint" id="cp1.2">
|
|
<div class="cp-head">
|
|
<span class="cp-number">1.2: </span>
|
|
<span class="cp-title"><acronym title="Uniform Resource Identifier">URI</acronym> case policy</span>
|
|
<span class="cp-target"><a href="#target-expl-ss"><abbr title="Server-side Engines">SS</abbr></a></span>
|
|
<span class="cp-target"> <a href="#target-expl-cm"><abbr title="Content Manager">CM</abbr></a> </span>
|
|
</div>
|
|
<ol>
|
|
<li class="cp-prov"><p><span class="cp-title">Choose a case policy </span>
|
|
<span class="cp-target"> <a href="#target-expl-cm"><abbr title="Content Manager">CM</abbr></a> </span>
|
|
</p>
|
|
|
|
<p class="cp">URIs are partly case sensitive which means that, for example
|
|
<code>http://www.example.com/foo</code> and
|
|
<code>http://www.example.com/FOO </code> are different
|
|
<acronym title="Uniform Resource Identifier">URI</acronym>s and may refer to different resources.</p>
|
|
|
|
<p class="cp">Again, in order for the <acronym title="Uniform Resource Identifier">URI</acronym>s
|
|
to be easy to spell and remember, their case should not only be good (see following provisions
|
|
of this checkpoint) but also consistent.
|
|
It is thus recommended to choose a case policy, and enforce its use.
|
|
</p>
|
|
</li>
|
|
<li class="cp-prov"> <span class="cp-title"> Avoid <acronym title="Uniform Resource Identifier">URI</acronym>s
|
|
in Mixed case</span>
|
|
<span class="cp-target"> <a href="#target-expl-ss"><abbr title="Server-side Engines">SS</abbr></a> </span>
|
|
<span class="cp-target"> <a href="#target-expl-cm"><abbr title="Content Manager">CM</abbr></a> </span>
|
|
<p class="cp">A case policy should be chosen,
|
|
and enforced. All policies are, however, not equally preferable.
|
|
Mixed-case <acronym title="Uniform Resource Identifier">URI</acronym>s
|
|
should be avoided.</p>
|
|
|
|
<p class="cp, example">
|
|
<span class="example-bad">Example of a URI following a mixed-case policy</span>:<br />
|
|
<code> http://example.com/QAfOo/baRRoX</code></p>
|
|
|
|
</li>
|
|
<li class="cp-prov"><p><span class="cp-title">As a case policy choose either "all lowercase" or
|
|
"first letter uppercase".</span>
|
|
<span class="cp-target"> <a href="#target-expl-ss"><abbr title="Server-side Engines">SS</abbr></a> </span>
|
|
<span class="cp-target"> <a href="#target-expl-cm"><abbr title="Content Manager">CM</abbr></a> </span>
|
|
</p>
|
|
|
|
<p class="cp">
|
|
We suggest that either "all lower-case" or "first-letter uppercase" policy be chosen.
|
|
Among these two, "all lower-case" may be prefered for its simplicity.</p>
|
|
|
|
<p class="cp, example"><span class="example-good">Example, "all lower-case"</span>:<br />
|
|
<code>http://www.example.com/foo/bar-bar</code></p>
|
|
<p class="cp, example"><span class="example-good">Example, "first-letter uppercase"</span>:<br />
|
|
<code>http://www.example.com/Foo/Bar-bar</code></p>
|
|
</li>
|
|
</ol>
|
|
|
|
</div>
|
|
|
|
|
|
|
|
<h3 class="gl">Guideline 2:
|
|
<a name="gl2" id="gl2">Allow <acronym title="Uniform Resource Identifier">URI</acronym>
|
|
management</a>
|
|
</h3>
|
|
|
|
<p>As we said in the beginning of this chapter,
|
|
a <acronym title="Uniform Resource Identifier">URI</acronym> is not a filename, and you do not need
|
|
to tie your <acronym title="Uniform Resource Identifier">URI</acronym> structure to the file system
|
|
on the Web server. However, chances are the resources served by a Web server will be available on a
|
|
specific file system, and thus there should be flexible ways to map one onto the other.</p>
|
|
|
|
<div class="checkpoint" id="cp2.1">
|
|
<div class="cp-head">
|
|
<span class="cp-number">2.1:</span>
|
|
<span class="cp-title"><acronym title="Uniform Resource Identifier">URI</acronym> mapping</span>
|
|
<span class="cp-target"> <a href="#target-expl-si"><abbr title="Web Server Implementor">SI</abbr></a> </span>
|
|
</div>
|
|
<ol>
|
|
<li class="cp-prov">
|
|
<p><span class="cp-title">Provide mechanisms for File System to
|
|
<acronym title="Uniform Resource Identifier">URI</acronym> mapping</span>
|
|
<span class="cp-target"> <a href="#target-expl-si"><abbr title="Web Server Implementor">SI</abbr></a> </span>
|
|
</p>
|
|
<p class="cp">Content managers should be able to re-organize
|
|
the file system without modifying the <acronym title="Uniform Resource Identifier">URI</acronym>
|
|
structure. Servers should therefore allow the content manager to map the
|
|
documents to <acronym title="Uniform Resource Identifier">URI</acronym>s.</p>
|
|
|
|
<div class="cp, example"><span class="example-good">Examples:</span>:<br />
|
|
Here are a few technologies that may be used for this purpose:
|
|
<ul>
|
|
<li>Aliases</li>
|
|
<li>Symbolic links</li>
|
|
<li>Table or database of mappings</li>
|
|
<li>etc.</li>
|
|
</ul>
|
|
</div>
|
|
</li>
|
|
</ol>
|
|
</div>
|
|
|
|
<div class="checkpoint" id="cp2.2">
|
|
<div class="cp-head">
|
|
<span class="cp-number">2.2: </span>
|
|
<span class="cp-title">Standard redirects</span>
|
|
<span class="cp-target"> <a href="#target-expl-si"><abbr title="Web Server Implementor">SI</abbr></a> </span>
|
|
<span class="cp-target"> <a href="#target-expl-cm"><abbr title="Content Manager">CM</abbr></a> </span>
|
|
</div>
|
|
<ol>
|
|
<li class="cp-prov">
|
|
<p><span class="cp-title">Allow the use of standard redirects</span>
|
|
<span class="cp-target"> <a href="#target-expl-si"><abbr title="Web Server Implementor">SI</abbr></a> </span>
|
|
</p>
|
|
<p class="cp">
|
|
Content manager should be able to change easily the configuration of the
|
|
server to use the various <acronym title="the Hypertext Transfer Protocol">HTTP</acronym>/1.1
|
|
redirection schemes (section 10.3 of the
|
|
<acronym title="the Hypertext Transfer Protocol">HTTP</acronym>/1.1
|
|
specification [<a href="#ref-RFC2616">RFC2616</a>]) :
|
|
</p>
|
|
|
|
<ul>
|
|
<li><a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.3.2">
|
|
301 Moved Permanently</a> ([<a href="#ref-RFC2616">RFC2616</a>] section 10.3.2)</li>
|
|
<li><a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.3.3">
|
|
302 Found</a> (undefined redirect scheme, [<a href="#ref-RFC2616">RFC2616</a>] Section 10.3.3)</li>
|
|
<li><a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.3.8">
|
|
307 Temporary Redirect</a> ([<a href="#ref-RFC2616">RFC2616</a>] Section 10.3.8)</li>
|
|
</ul>
|
|
<p class="cp">The content manager should be allowed to use these,
|
|
either by modifying directly the server configuration
|
|
or by another indirect way of doing it (local configuration
|
|
modification file, creation of local "redirect"resources, etc.)
|
|
</p>
|
|
|
|
<p class="cp">Note that even though the current practice is to use the
|
|
<a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.3.3">302 Found</a>
|
|
status code for temporary redirects, it is best kept for "undefined" redirects, and the
|
|
<a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.3.8">307 Temporary
|
|
Redirect</a> status code should be preferred for this purpose.</p>
|
|
</li>
|
|
<li class="cp-prov">
|
|
<p>
|
|
<span class="cp-title"> When you change <acronym title="Uniform Resource Identifier">URI</acronym>s,
|
|
use standard redirects</span>
|
|
<span class="cp-target"> <a href="#target-expl-cm"><abbr title="Content Manager">CM</abbr></a> </span>
|
|
</p>
|
|
<p class="cp">If for any reason a content manager change the
|
|
<acronym title="Uniform Resource Identifier">URI</acronym> referencing to a given resource,
|
|
standard redirects, as defined above, should be used to avoid link-rot.</p>
|
|
|
|
<p class="cp">
|
|
Usually, the <a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.3.2">
|
|
HTTP 301 Moved Permanently</a> status code
|
|
([<a href="#ref-RFC2616">RFC2616</a>], section 10.3.2) will be used for this purpose.
|
|
</p>
|
|
</li>
|
|
</ol>
|
|
</div>
|
|
|
|
<h3 class="gl">Guideline 3:
|
|
<a name="gl3" id="gl3">
|
|
Use independent <acronym title="Uniform Resource Identifier">URI</acronym>s
|
|
</a>
|
|
</h3>
|
|
|
|
<p><acronym title="Uniform Resource Identifier">URI</acronym>s
|
|
should be both stable and independent. By independent we mean that
|
|
a <acronym title="Uniform Resource Identifier">URI</acronym>
|
|
should always reference the same resource, regardless of the context
|
|
(time, location, user, user-agent, etc.)</p>
|
|
|
|
|
|
<div class="checkpoint" id="cp3.1">
|
|
<div class="cp-head">
|
|
<span class="cp-number">3.1:</span>
|
|
<span class="cp-title">Technology-independent
|
|
<acronym title="Uniform Resource Identifier">URI</acronym>s</span>
|
|
<span class="cp-target"> <a href="#target-expl-ss"><abbr title="Server-side Engines">SS</abbr></a> </span>
|
|
<span class="cp-target"> <a href="#target-expl-cm"><abbr title="Content Manager">CM</abbr></a> </span>
|
|
</div>
|
|
|
|
<ol>
|
|
<li class="cp-prov">
|
|
<p><span class="cp-title">Serve dynamic content with technology-independent
|
|
<acronym title="Uniform Resource Identifier">URI</acronym>s</span>
|
|
<span class="cp-target"> <a href="#target-expl-ss"><abbr title="Server-side Engines">SS</abbr></a> </span>
|
|
<span class="cp-target"> <a href="#target-expl-cm"><abbr title="Content Manager">CM</abbr></a> </span>
|
|
</p>
|
|
<p class="cp">A <acronym title="Uniform Resource Identifier">URI</acronym>
|
|
should not show the underlying technology (server-side content
|
|
generation engine, script written in such or such language) used to serve
|
|
the resource. </p>
|
|
|
|
<p class="cp">Using <acronym title="Uniform Resource Identifier">URI</acronym>s
|
|
showing the specific underlying technology means one is
|
|
dependent on the technology used, which means that the technology
|
|
cannot be changed without either breaking
|
|
<acronym title="Uniform Resource Identifier">URI</acronym>s
|
|
or going through the hassle of "fixing" them (see Checkpoint
|
|
<a href="#cp2.2">2.2: Standard redirects</a>).
|
|
</p>
|
|
|
|
<p class="cp">Using a scripting language to create dynamic content does
|
|
not mean your <acronym title="Uniform Resource Identifier">URI</acronym>
|
|
should end with the same extension as the script's filename.</p>
|
|
|
|
<p class="cp"> Advertizing one's development environment to the world also
|
|
imply security issues. One's site may have been crawled and be a
|
|
known target for a specific architecture once a security flaw is
|
|
discovered on that architecture. Obscurity is, of course, no
|
|
replacement for security, but a good design keeps threats away.
|
|
Read the Web Security FAQ [WSFAQ] for more on web server-side
|
|
security.</p>
|
|
|
|
<p class="cp">For these reasons, technology-specific extensions should
|
|
be hidden, using content-negotiation (see
|
|
<a href="#gl7">Guideline 7: Server-driven content
|
|
negotiation</a>.), proxying or <acronym title="Uniform Resource Identifier">URI</acronym>
|
|
mapping technologies.</p>
|
|
</li>
|
|
<li class="cp-prov"><p> <span class="cp-title">Serve static content without file extension</span>
|
|
<span class="cp-target"> <a href="#target-expl-cm"><abbr title="Content Manager">CM</abbr></a> </span>
|
|
|
|
</p>
|
|
|
|
<p class="cp">The reason why one should serve static content without file extension
|
|
is similar to the reason stated above :
|
|
the content manager may, at some point, want to change the document format used
|
|
to serve a resource, yet the resource would remain "equivalent". For example,
|
|
switching from an image file format to an equivalent format, or switching from
|
|
plain text to HTML...</p>
|
|
|
|
<p class="cp">File extensions should therefore be hidden for static content,
|
|
using content-negotiation (see <a href="#gl7">Guideline 7:
|
|
Server-driven content negotiation</a>.), proxying or <acronym title="Uniform Resource Identifier">URI
|
|
</acronym> mapping technologies.</p>
|
|
|
|
</li></ol>
|
|
</div>
|
|
|
|
|
|
<div class="checkpoint" id="cp3.2">
|
|
<div class="cp-head">
|
|
<span class="cp-number">3.2:</span>
|
|
<span class="cp-title">Identification and Session mechanisms</span>
|
|
<span class="cp-target"> <a href="#target-expl-ss"><abbr title="Server-side Engines">SS</abbr></a> </span>
|
|
<span class="cp-target"> <a href="#target-expl-cm"><abbr title="Content Manager">CM</abbr></a> </span>
|
|
</div>
|
|
|
|
<p class="cp"><acronym title="the Hypertext Transfer Protocol">HTTP</acronym>/1.1
|
|
provides a number of mechanisms for identification, authentication
|
|
and session management. Using these mechanisms instead of user-based or
|
|
session-based <acronym title="Uniform Resource Identifier">URI</acronym>s
|
|
guarantees than the <acronym title="Uniform Resource Identifier">URI</acronym>s
|
|
used to serve resources are <strong>truly</strong> universal
|
|
(allowing, for example, people to share, send, or copy them).</p>
|
|
|
|
<ol>
|
|
<li class="cp-prov">
|
|
<p><span class="cp-title">Use standard identification instead of per-user
|
|
<acronym title="Uniform Resource Identifier">URI</acronym>s</span>
|
|
<span class="cp-target"><a href="#target-expl-ss"><abbr title="Server-side Engines">
|
|
SS</abbr></a></span>
|
|
<span class="cp-target"><a href="#target-expl-cm"><abbr title="Content Manager">
|
|
CM</abbr></a></span>
|
|
</p>
|
|
|
|
|
|
<p class="cp">For the reasons stated above, standard identification mechanisms should
|
|
be prefered over user-dependent <acronym title="Uniform Resource Identifier">
|
|
URI</acronym>s. </p>
|
|
|
|
<p class="cp">Standard identification mechanisms for the World Wide Web are described in
|
|
<a href="http://www.ietf.org/rfc/rfc2617.txt">RFC 2617 :
|
|
"HTTP Authentication: Basic and Digest Access Authentication"</a>
|
|
[<a href="#ref-RFC2617">RFC2617</a>]. </p>
|
|
|
|
</li>
|
|
<li class="cp-prov">
|
|
<p><span class="cp-title">Use standard session mechanisms instead of session-based
|
|
<acronym title="Uniform Resource Identifier">
|
|
URI</acronym>s.</span>
|
|
<span class="cp-target"><a href="#target-expl-ss"><abbr title="Server-side Engines">
|
|
SS</abbr></a></span>
|
|
<span class="cp-target"><a href="#target-expl-cm"><abbr title="Content Manager">
|
|
CM</abbr></a></span>
|
|
</p>
|
|
|
|
<p class="cp">For the reasons stated above, standard session mechanisms should
|
|
be prefered over session-dependent <acronym title="Uniform Resource Identifier">
|
|
URI</acronym>s. </p>
|
|
|
|
<p class="cp">The latter may only be used in very specific cases, when standard
|
|
mechanisms do not provide the desired features.
|
|
</p>
|
|
|
|
<p class="cp, example">
|
|
<span class="example-good">Example of an acceptable practice</span>:<br />
|
|
A <acronym title="Uniform Resource Identifier">URI</acronym>
|
|
may have some modifiers, like "<code>?</code>" used to pass arguments for cgi,
|
|
or "<code>;</code>" to pass other kind of arguments or context information.
|
|
Used for information tracking, this is a proper use of session information
|
|
in <acronym title="Uniform Resource Identifier">URI</acronym>s.
|
|
</p>
|
|
|
|
<p class="cp, example">
|
|
<span class="example-bad">Example of a bad practice</span>:<br />
|
|
Bob tries to visit <code>http://www.example.com/resource</code>,
|
|
but since it's a rainy Monday morning, he gets redirected to
|
|
<code>http://www.example.com/rainymondaymorning/resource</code>.
|
|
The day after, when Bob tries to access the resource he had bookmarked earlier,
|
|
the server answers that Bob has made a bad request, and serves
|
|
<code>http://www.example.com/error/thisisnotmondayanymore</code>. Had the server served
|
|
back <code>http://www.example.com/resource</code> because the Monday session had expired,
|
|
it would have been, if not acceptable, at least harmless.
|
|
</p>
|
|
<p class="cp">Standard session mechanisms include
|
|
<a href="http://www.ietf.org/rfc/rfc2109.txt">RFC 2109 :
|
|
"HTTP State Management Mechanism"</a> [<a href="#ref-RFC2109">RFC2109</a>],
|
|
also known as "cookies".</p>
|
|
</li>
|
|
</ol>
|
|
|
|
</div>
|
|
|
|
|
|
<h3 class="gl">Guideline 4:
|
|
<a name="gl4" id="gl4">"Cool URIs don't change", but cool content does</a>
|
|
</h3>
|
|
|
|
<p>One misconception about
|
|
<q cite="http://www.w3.org/Provider/Style/URI.html">Cool
|
|
URIs don't change</q> is that it advocates "frozen" documents,
|
|
which content cannot change because that would "break things".</p>
|
|
|
|
<p>This, again, comes from a misunderstanding of the concept of
|
|
<acronym title="Uniform Resource Identifier">URI</acronym>s.
|
|
If we come back to our warehouse metaphor, used in the beginning of
|
|
this document, things get clearer: we know that the
|
|
<acronym title="Uniform Resource Identifier">URI</acronym> is a fixed
|
|
reference to a resource (a "toothbrush" in our example), and we know that
|
|
the reference should not change, however it does not mean that the resource
|
|
itself should not change... On the contrary, the World Wide Web has been
|
|
designed with evolution in mind, and if the resource is modified over time,
|
|
this has nothing to do with the fact that
|
|
<q cite="http://www.w3.org/Provider/Style/URI.html">Cool
|
|
<acronym title="Uniform Resource Identifier">URI</acronym>s don't change</q>.
|
|
</p>
|
|
|
|
<div class="checkpoint" id="cp4.1">
|
|
<div class="cp-head">
|
|
<span class="cp-number">4.1:</span>
|
|
<span class="cp-title">Standard redirects for changing content</span>
|
|
<span class="cp-target"> <a href="#target-expl-ss"><abbr title="Server-side Engines">SS</abbr></a> </span>
|
|
<span class="cp-target"> <a href="#target-expl-cm"><abbr title="Content Manager">CM</abbr></a> </span>
|
|
</div>
|
|
|
|
<ol>
|
|
<li class="cp-prov">
|
|
<p><span class="cp-title">Use standard redirects for changing content</span>
|
|
<span class="cp-target"> <a href="#target-expl-ss"><abbr title="Server-side Engines">SS</abbr></a> </span>
|
|
<span class="cp-target"> <a href="#target-expl-cm"><abbr title="Content Manager">CM</abbr></a> </span>
|
|
</p>
|
|
|
|
<p class="cp">A good example of what is meant here by "changing/moving content"
|
|
would be a daily article on a Web site. People want to be able to
|
|
reference either the "latest daily article", or a specific article.</p>
|
|
|
|
<p class="cp">This is made possible and smooth with the use of two different
|
|
<acronym title="Uniform Resource Identifier">URI</acronym>s (or,
|
|
to be precise, one <acronym title="Uniform Resource Identifier">URI</acronym>
|
|
referencing the "latest" issue, and one URI per article),
|
|
as explained in the following example.</p>
|
|
|
|
<p class="cp, example">Let us consider an imaginary newsletter, issued every day. The
|
|
(latest issue of the) newsletter is available at
|
|
<code>http://www.example.org/newsletter</code> and this is the
|
|
<acronym title="Uniform Resource Identifier">URI</acronym> people use to access
|
|
the newsletter every day.</p>
|
|
|
|
<p class="cp, example">The content manager wants that every newsletter, and not only the latest
|
|
issue, be available on his server, so he archives every issue, and each of them
|
|
is accessible on the Web site at a dated
|
|
<acronym title="Uniform Resource Identifier">URI</acronym>, e.g:
|
|
<code>http://www.example.org/2042/02/12-newsletter</code> for the Feb. 12, 2042 issue.</p>
|
|
|
|
<p class="cp, example">Using a <a href="#cp2.2">standard redirect</a>
|
|
(<a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.3.3">
|
|
HTTP 302 Found</a>, or, even better
|
|
<a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.3.8">
|
|
HTTP 307 Temporary Redirect</a> - [<a href="#ref-RFC2616">RFC2616</a>] section 10.3.3 and 10.3.8),
|
|
the content manager, when publishing the Feb. 12, 2042 issue, redirects
|
|
<code>http://www.example.org/newsletter</code> to the dated
|
|
<code>http://www.example.org/2042/02/12-newsletter</code></p>
|
|
|
|
<p class="cp, example">Readers are, therefore, able to refer to (and access) "the newsletter"
|
|
for the latest issue, or to any specific issue.</p>
|
|
|
|
|
|
<p class="cp">If the server properly sends the <code>Content-Location:</code>
|
|
<acronym title="Hypertext Transfer Protocol">HTTP</acronym>/1.1 Header,
|
|
there is an alternate technique, described in <a href="#cp5.2">
|
|
Checkpoint 5.2: <code>Content-Location</code></a>.
|
|
</p>
|
|
</li>
|
|
</ol>
|
|
</div>
|
|
|
|
<div class="checkpoint" id="cp4.2">
|
|
<div class="cp-head">
|
|
<span class="cp-number">4.2:</span>
|
|
<span class="cp-title"><acronym title="the Hypertext Transfer Protocol">HTTP</acronym> <code>410 Gone</code></span>
|
|
<span class="cp-target"><a href="#target-expl-cm"><abbr title="Content Manager">CM</abbr></a> </span>
|
|
<span class="cp-target"> <a href="#target-expl-si"><abbr title="Web Server Implementor">SI</abbr></a> </span>
|
|
</div>
|
|
|
|
<ol>
|
|
<li class="cp-prov">
|
|
<p><span class="cp-title">When removing a resource, use <code>410 Gone</code></span>
|
|
<span class="cp-target"><a href="#target-expl-cm"><abbr title="Content Manager">CM</abbr></a> </span>
|
|
</p>
|
|
|
|
<p class="cp">
|
|
Most of guidelines 1 to 3 aim at avoiding "link rot",
|
|
documents that have been moved or removed, resulting in a <code>404 Not Found</code>
|
|
status code for agents trying to access a resource once refered to by a
|
|
<acronym title="Uniform Resource Identifier">URI</acronym>.</p>
|
|
|
|
<p class="cp">This does not mean the web does not allow for documents
|
|
to be removed or deprecated. Content managers should avoid, when possible,
|
|
simply removing resources, and should consider instead the correct standard procedure,
|
|
which is to use the <a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.11">
|
|
<code>410 gone</code> status code</a>
|
|
([<a href="#ref-RFC2616">RFC2616</a>] section 10.4.11).</p>
|
|
|
|
<p>Whereas the <code>404 Not found</code> status code only means that the server is
|
|
unable to find the resource, the <code>410 gone</code> status code means that the
|
|
resource is intentionally unavailable. For the sake of semantics and caching (a
|
|
<code>410 gone</code> is cacheable unless indicated otherwise).</p>
|
|
</li>
|
|
|
|
<li class="cp-prov">
|
|
<p><span class="cp-title">Allow the content-manager to use <code>410 Gone</code>
|
|
for removed resources</span>
|
|
<span class="cp-target"> <a href="#target-expl-si"><abbr title="Web Server Implementor">SI</abbr></a> </span>
|
|
</p>
|
|
|
|
<p class="cp">
|
|
Content managers should be allowed to use the
|
|
<a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.11">
|
|
<code>410 gone</code> status code</a>
|
|
([<a href="#ref-RFC2616">RFC2616</a>] section 10.4.11) to remove or deprecate
|
|
resources on a server. There should be an easy way to specify that a resource,
|
|
or an area, has been removed, using the <code>410 gone</code> status code.</p>
|
|
|
|
</li>
|
|
</ol>
|
|
</div>
|
|
|
|
|
|
<h3 class="gl">Guideline 5:
|
|
<a name="gl5" id="gl5">Provide indexing agents with useful information</a>
|
|
</h3>
|
|
|
|
<p>This section deals with providing meaningful and clear information
|
|
to indexing and crawling user-agents (also often referred to as "robots",
|
|
"spiders", "crawlers"). It has a strong influence on the traffic for
|
|
a Web site (both the traffic created by the indexing agents, and the traffic
|
|
attracted by search results) and should be a primary concern for content
|
|
managers.</p>
|
|
|
|
<p>Discussing the use of metadata, and the proper structuring of HTML documents
|
|
in order to help indexing agents in their task is out of scope for this document,
|
|
we will, rather, focus on the inner mechanics of indexing.
|
|
Readers interested in metadata may find interesting bits in these two related
|
|
guidelines: <a href="#gl8">Guideline 8: Provide useful metadata in addition
|
|
to content negotiation</a> and <a href="#gl12">Guideline 12:
|
|
Enrich and enhance</a>.</p>
|
|
|
|
<div class="checkpoint" id="cp5.1">
|
|
<div class="cp-head">
|
|
<span class="cp-number">5.1:</span>
|
|
<span class="cp-title">Indexing policy</span>
|
|
<span class="cp-target"> <a href="#target-expl-cm">
|
|
<abbr title="Content Manager">CM</abbr></a> </span>
|
|
</div>
|
|
<ol>
|
|
<li class="cp-prov"><p><span class="cp-title">Define site-wide indexing policy</span>
|
|
<span class="cp-target"> <a href="#target-expl-cm">
|
|
<abbr title="Content Manager">CM</abbr></a> </span>
|
|
</p>
|
|
|
|
<p class="cp">A site-wide policy specifies what the default behavior of
|
|
indexing or crawling agents should be, and can be refined on a per-document
|
|
basis through local indexing directives. (see below for details)</p>
|
|
|
|
<p class="cp">Content managers should define such a policy for their site.
|
|
The most common way of informing indexing agents of this policy is the
|
|
<a href="http://www.robotstxt.org/wc/exclusion.html#robotstxt">Robots
|
|
Exclusion Protocol</a> [<a href="#ref-ROBOTSPROTO">ROBOTSPROTO</a>],
|
|
but one could use other technologies, such as a metadata database
|
|
giving indexing directives on a document basis.</p>
|
|
</li>
|
|
<li class="cp-prov">
|
|
<p> <span class="cp-title">Define local indexing policy</span>
|
|
<span class="cp-target"> <a href="#target-expl-cm">
|
|
<abbr title="Content Manager">CM</abbr></a> </span>
|
|
</p>
|
|
|
|
<p class="cp">
|
|
The site-wide indexing policy may be completed by a local (per document)
|
|
indexing policy, marked up at the document level.</p>
|
|
|
|
<p class="cp example">For example, HTML [<a href="#ref-HTML401">HTML 4.01</a>]
|
|
defines a specific
|
|
<a href="http://www.w3.org/TR/1999/REC-html401-19991224/struct/global.html#edef-META">META</a>
|
|
element <a href="http://www.w3.org/TR/1999/REC-html401-19991224/appendix/notes.html#h-B.4.1">for
|
|
this purpose</a> ([<a href="#ref-HTML401">HTML 4.01</a>] Section B.4.1).</p>
|
|
</li>
|
|
</ol>
|
|
</div>
|
|
|
|
<div class="checkpoint" id="cp5.2">
|
|
<div class="cp-head">
|
|
<span class="cp-number">5.2:</span>
|
|
<span class="cp-title"> <code>Content-Location</code> </span>
|
|
<span class="cp-target"> <a href="#target-expl-si"><abbr title="Web Server Implementor">SI</abbr></a> </span>
|
|
<span class="cp-target"> <a href="#target-expl-ss"><abbr title="Server-side Engines">SS</abbr></a> </span>
|
|
<span class="cp-target"> <a href="#target-expl-cm">
|
|
<abbr title="Content Manager">CM</abbr></a> </span>
|
|
</div>
|
|
<ol>
|
|
<li class="cp-prov">
|
|
<p><span class="cp-title">Send valid <code>Content-Location:</code> </span>
|
|
<span class="cp-target"> <a href="#target-expl-si"><abbr title="Web Server Implementor">SI</abbr></a> </span>
|
|
<span class="cp-target"> <a href="#target-expl-ss"><abbr title="Server-side Engines">SS</abbr></a> </span>
|
|
</p>
|
|
<p class="cp">The <a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.14">
|
|
<code>Content-Location:</code> <acronym title="the Hypertext Transfer Protocol">
|
|
HTTP</acronym> header</a> [<a href="#ref-RFC2616">RFC2616</a>] section 14.14) is crucial
|
|
for indexing agents as well as user agents, as it gives agents information
|
|
about the actual (current) location of the resource currently served (as opposed
|
|
to the generic location used to access the resource).
|
|
</p>
|
|
|
|
<p class="cp"><code>Content-Location:</code> should not be mistaken for a redirection. While agents
|
|
and caches may assume that a redirected <acronym title="Uniform Resource Identifier">URI</acronym>
|
|
may be used for later requests, they should not assume that a
|
|
<acronym title="Uniform Resource Identifier">URI</acronym> specified by the
|
|
<code>Content-Location:</code> header
|
|
may be used for later requests, if it differs from the requested
|
|
<acronym title="Uniform Resource Identifier">URI</acronym>.
|
|
However, agents may request a <acronym title="Uniform Resource Identifier">URI</acronym>
|
|
once specified as <code>Content-Location:</code> if they specifically intend to request
|
|
this instance of the resource. </p>
|
|
</li>
|
|
|
|
<li class="cp-prov">
|
|
<p>
|
|
<span class="cp-title">Use <code>Content-Location:</code> for changing content</span>
|
|
<span class="cp-target"> <a href="#target-expl-cm">
|
|
<abbr title="Content Manager">CM</abbr></a> </span>
|
|
</p>
|
|
<p class="cp">As seen previously, The
|
|
<a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.14">
|
|
<code>Content-Location:</code> <acronym title="the Hypertext Transfer Protocol">
|
|
HTTP</acronym> header</a> ([<a href="#ref-RFC2616">RFC2616</a>] section 14.14)
|
|
is used to inform user-agents of the actual (current) location of the requested
|
|
resource. This can be used as an alternative to the temporary redirect scheme
|
|
as explained in <a href="#cp4.1">Checkpoint 4.1: Standard redirects for
|
|
changing content</a>.</p>
|
|
|
|
<p class="cp, example"><span class="example-good">Example of a good practice</span>:<br />
|
|
You may remember the example used in <a href="#cp4.1">Checkpoint 4.1:
|
|
Standard redirects for changing content</a>, where the content manager uses
|
|
standards redirect techniques to serve a newsletter with both a "latest" and "dated"
|
|
<acronym title="Uniform Resource Identifier">URI</acronym>.</p>
|
|
|
|
<p class="cp, example">
|
|
One could achieve an almost similar result by using the
|
|
<code>Content-Location:</code> <acronym title="the Hypertext Transfer Protocol">HTTP</acronym>
|
|
header : serving <code>http://www.example.org/newsletter</code> (the "latest"
|
|
<acronym title="Uniform Resource Identifier">URI</acronym>) with a
|
|
<code>Content-Location:</code> of <code>http://www.example.org/2042/02/12-newsletter</code>
|
|
(the "dated" <acronym title="Uniform Resource Identifier">URI</acronym>).</p>
|
|
|
|
<p class="cp, example">
|
|
User-agents, as explained in <a href="http://www.w3.org/TR/cuap">
|
|
Common User Agent Problems</a> [<a href="#ref-CUAP">CUAP</a>]
|
|
may then bookmark the "latest news" <acronym title="Uniform Resource Identifier">
|
|
URI</acronym>, or the <acronym title="Uniform Resource Identifier">URI</acronym>
|
|
of the actual dated content, and may later request the "dated"
|
|
<acronym title="Uniform Resource Identifier">URI</acronym>.
|
|
</p>
|
|
</li>
|
|
<li class="cp-prov">
|
|
<p>
|
|
<span class="cp-title">Allow the content-manager to set the <code>Content-Location:</code> header</span>
|
|
<span class="cp-target"> <a href="#target-expl-si"><abbr title="Web Server Implementor">SI</abbr></a> </span>
|
|
</p>
|
|
<p class="cp">See above for the rationale. The Content-manager should be allowed to set
|
|
the <code>Content-Location:</code> header served for a specific resource at a given time.
|
|
</p>
|
|
</li>
|
|
</ol>
|
|
</div>
|
|
|
|
|
|
<div class="checkpoint" id="cp5.3">
|
|
<div class="cp-head">
|
|
<span class="cp-number">5.3</span>
|
|
<span class="cp-title"><code>Content-Md5</code>
|
|
</span>
|
|
<span class="cp-target"> <a href="#target-expl-si"><abbr title="Web Server Implementor">SI</abbr></a> </span>
|
|
<span class="cp-target"> <a href="#target-expl-ss"><abbr title="Server-side Engines">SS</abbr></a> </span>
|
|
</div>
|
|
|
|
<ol>
|
|
<li class="cp-prov"><p>
|
|
<span class="cp-title">Send <code>Content-Md5</code> for integrity check</span>
|
|
<span class="cp-target"> <a href="#target-expl-si"><abbr title="Web Server Implementor">SI</abbr></a> </span>
|
|
<span class="cp-target"> <a href="#target-expl-ss"><abbr title="Server-side Engines">SS</abbr></a> </span>
|
|
</p>
|
|
|
|
<p class="cp">
|
|
The <a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.15">
|
|
<code>Content-MD5</code> <acronym title="the Hypertext Transfer Protocol">HTTP</acronym>
|
|
header</a> ([<a href="#ref-RFC2616">RFC2616</a>] section 14.15) is used to verify the integrity
|
|
of the transported entity. and may help cache or indexing engines. Even though
|
|
<acronym title="the Hypertext Transfer Protocol">HTTP</acronym> does not make
|
|
it mandatory, it is recommended that servers (or content-generation engines) compute
|
|
and send it.</p>
|
|
|
|
<p class="cp">
|
|
<code>Content-MD5</code> should not be mistaken with <a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.19"><code>ETag</code></a> ([<a href="#ref-RFC2616">RFC2616</a>] section 14.19). The former
|
|
is a check sum of the resource served whereas the latter is a "serial number"
|
|
identifying a specific instance of resource. However he md5 sum of the content
|
|
is supposed to be unique, therefore it may be used as the <code>ETag</code> (but may be too
|
|
resource-consuming for servers that do not cache the metadata). It is, nevertheless,
|
|
better to send both headers.</p>
|
|
|
|
</li>
|
|
</ol>
|
|
</div>
|
|
|
|
|
|
|
|
<h3 class="gl">Guideline 6:
|
|
<a name="gl6" id="gl6">Provide appropriate caching information</a>
|
|
</h3>
|
|
|
|
<p>This guideline relates to the
|
|
<a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html">Caching mechanisms</a>
|
|
defined by the <acronym title="Hypertext Transfer Protocol">HTTP</acronym>/1.1
|
|
specification ([<a href="#ref-RFC2616">RFC2616</a>] section 13).</p>
|
|
|
|
<p>We will try to point out facts often overlooked or misunderstood about
|
|
<acronym title="the Hypertext Transfer Protocol">HTTP</acronym> caching, as well
|
|
as giving advice on how to serve easily cachable content.</p>
|
|
|
|
|
|
<div class="checkpoint" id="cp6.1">
|
|
<div class="cp-head">
|
|
<span class="cp-number">6.1</span>
|
|
<span class="cp-title">Cache-related
|
|
<acronym title="the Hypertext Transfer Protocol">HTTP</acronym> headers</span>
|
|
<span class="cp-target"> <a href="#target-expl-si"><abbr title="Web Server Implementor">SI</abbr></a> </span>
|
|
<span class="cp-target"> <a href="#target-expl-ss"><abbr title="Server-side Engines">SS</abbr></a> </span>
|
|
</div>
|
|
|
|
<ol>
|
|
<li class="cp-prov">
|
|
<p class="cp"><span class="cp-title">Send proper and accurate
|
|
<code>Date</code> header</span>
|
|
<span class="cp-target"> <a href="#target-expl-si"><abbr title="Web Server Implementor">SI</abbr></a> </span>
|
|
</p>
|
|
|
|
<p class="cp">
|
|
<acronym title="the Hypertext Transfer Protocol">HTTP</acronym>/1.1
|
|
servers MUST send a
|
|
<a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.18">
|
|
<code>Date:</code> header</a> ([<a href="#ref-RFC2616">RFC2616</a>] section 14.18).
|
|
It is the base of all caching mechanisms and must be sent both properly and accurately.</p>
|
|
</li>
|
|
<li class="cp-prov">
|
|
<p class="cp"><span class="cp-title">Send <code>Last-Modified</code> whenever possible</span>
|
|
<span class="cp-target"> <a href="#target-expl-si"><abbr title="Web Server Implementor">SI</abbr></a> </span>
|
|
<span class="cp-target"> <a href="#target-expl-ss"><abbr title="Server-side Engines">SS</abbr></a> </span>
|
|
</p>
|
|
<p class="cp">
|
|
<acronym title="the Hypertext Transfer Protocol">HTTP</acronym>/1.1
|
|
([<a href="#ref-RFC2616">RFC2616</a>]) states that
|
|
<q cite="http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.29">servers
|
|
SHOULD send the <a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.29">
|
|
<code>Last-Modified</code> header</a> ([<a href="#ref-RFC2616">RFC2616</a>] section 14.29)
|
|
whenever feasible</q>.
|
|
This header is very important because of its use as a cache validator:
|
|
<q cite="http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html#sec13.1.3">a cache
|
|
entry is considered to be valid if the entity has not been modified since the
|
|
Last-Modified value</q>.
|
|
</p>
|
|
</li>
|
|
<li class="cp-prov">
|
|
<p class="cp"><span class="cp-title">Send <code>Cache-Control</code> directives</span>
|
|
<span class="cp-target"> <a href="#target-expl-si"><abbr title="Web Server Implementor">SI</abbr>
|
|
</a></span>
|
|
</p>
|
|
|
|
<p class="cp"><a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9">
|
|
The <code>Cache-Control</code> header</a> ([<a href="#ref-RFC2616">RFC2616</a>] section 14.9)
|
|
defines the behavior of cache engines with regards to the resource sent.</p>
|
|
|
|
<p class="cp"><code>Cache-Control</code> should be preferred over
|
|
<a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.21">
|
|
<code>Expires:</code></a> ([<a href="#ref-RFC2616">RFC2616</a>] section 14.21)
|
|
because of its richness. Servers may send both, but be aware that agents are
|
|
supposed to ignore <code>Expires:</code> if the max-age directive of
|
|
<code>Cache-Control:</code> is properly sent.
|
|
</p>
|
|
</li>
|
|
</ol>
|
|
|
|
</div>
|
|
|
|
<div class="checkpoint" id="cp6.2">
|
|
<div class="cp-head">
|
|
<span class="cp-number">6.2</span>
|
|
<span class="cp-title">Cache policy</span>
|
|
<span class="cp-target"> <a href="#target-expl-si"><abbr title="Web Server Implementor">SI</abbr></a> </span>
|
|
<span class="cp-target"> <a href="#target-expl-cm"><abbr title="Content Manager">CM</abbr></a> </span>
|
|
</div>
|
|
<ol>
|
|
<li class="cp-prov">
|
|
<p class="cp"><span class="cp-title">Define a cache policy</span>
|
|
<span class="cp-target"><a href="#target-expl-cm"><abbr title="Content Manager">CM</abbr></a> </span>
|
|
</p>
|
|
|
|
<p class="cp">A cache / expiration policy is the rationale behind cache control
|
|
for every resource served by <acronym title="the Hypertext Transfer Protocol">HTTP</acronym>/1.1
|
|
servers..
|
|
Content managers should decide, globally and/or locally, what can or can not be cached,
|
|
how long caches should keep the document before trying to get a new version, etc.
|
|
These decisions may be made depending on the
|
|
frequency at which the documents may be updated.</p>
|
|
</li>
|
|
<li class="cp-prov">
|
|
<p class="cp">
|
|
<span class="cp-title">Allow the Content Manager to set up cache control
|
|
according to a Cache Policy</span>
|
|
<span class="cp-target"> <a href="#target-expl-si"><abbr title="Web Server Implementor">SI</abbr></a> </span>
|
|
</p>
|
|
|
|
<p class="cp">The content manager should be able to set the <code>max-age</code> parameter
|
|
for any resource served according to a cache policy.</p>
|
|
</li>
|
|
</ol>
|
|
|
|
</div>
|
|
|
|
|
|
<div class="checkpoint" id="cp6.3">
|
|
<div class="cp-head">
|
|
<span class="cp-number">6.3:</span>
|
|
<span class="cp-title">Caching generated content</span>
|
|
<span class="cp-target"> <a href="#target-expl-ss"><abbr title="Server-side Engines">SS</abbr></a> </span>
|
|
</div>
|
|
|
|
<ol>
|
|
<li class="cp-prov">
|
|
<p><span class="cp-title">Provide actual caching information for content
|
|
generated dynamically</span>
|
|
<span class="cp-target"> <a href="#target-expl-ss"><abbr title="Server-side Engines">SS</abbr></a> </span>
|
|
</p>
|
|
|
|
<p class="cp">Most dynamic content generation systems act as if the documents
|
|
they generate and serve were "fresh" (<abbr>i.e</abbr> as if the resource was
|
|
last modified at the date it is served), whether the information itself is, or not. </p>
|
|
|
|
<p class="cp">This is a harmful lie for caching engines and
|
|
should be avoided.
|
|
</p>
|
|
|
|
<p class="cp">Regardless of the technology used, it should be possible to
|
|
provide age information by retrieving the actual information
|
|
from whatever source is used to generate the dynamic content:
|
|
file,database, etc.</p>
|
|
</li>
|
|
</ol>
|
|
</div>
|
|
|
|
|
|
<div class="checkpoint" id="cp6.4">
|
|
<div class="cp-head">
|
|
<span class="cp-number">6.4:</span>
|
|
<span class="cp-title"><acronym title="the Hypertext Transfer Protocol">HTTP</acronym> HEAD
|
|
and <acronym title="the Hypertext Transfer Protocol">HTTP</acronym> GET</span>
|
|
<span class="cp-target"> <a href="#target-expl-si"><abbr title="Web Server Implementor">SI</abbr></a> </span>
|
|
</div>
|
|
|
|
<ol>
|
|
<li class="cp-prov">
|
|
<p> <span class="cp-title">Send the same answer to
|
|
<acronym title="the Hypertext Transfer Protocol">HTTP</acronym> HEAD and
|
|
<acronym title="the Hypertext Transfer Protocol">HTTP</acronym> GET requests</span>
|
|
<span class="cp-target"> <a href="#target-expl-si"><abbr title="Web Server Implementor">SI</abbr></a> </span>
|
|
</p>
|
|
|
|
<p class="cp">Servers MUST send back the same information
|
|
(<acronym title="the Hypertext Transfer Protocol">HTTP</acronym> headers)
|
|
when answering a GET and a HEAD request, as required by the
|
|
<a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html#sec9.4">
|
|
<acronym title="the Hypertext Transfer Protocol">HTTP</acronym> specification</a>
|
|
[<a href="#ref-RFC2616">RFC2616</a>] section 9.4.
|
|
This is critical for many mechanisms, including caching.</p>
|
|
</li>
|
|
</ol>
|
|
</div>
|
|
|
|
<h2>2.<a name="content" id="content">Serving content appropriately</a></h2>
|
|
|
|
<h3 class="gl">Guideline 7:
|
|
<a name="gl7" id="gl7">Server-driven content negotiation</a>
|
|
</h3>
|
|
<p>This guideline deals with <a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec12.html#sec12">
|
|
negotiation in
|
|
<acronym title="the Hypertext Transfer Protocol">HTTP</acronym>/1.1</a>
|
|
(as defined in <acronym title="the Hypertext Transfer Protocol">HTTP</acronym>/1.1
|
|
[<a href="#ref-RFC2616">RFC2616</a>] section 12).</p>
|
|
<p>Content negotiation stands for the server-driven negotiation based on
|
|
user agent capabilities and user preferences, including those specified in the
|
|
<a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.1">
|
|
Accept</a> ([<a href="#ref-RFC2616">RFC2616</a>] section 14.1)
|
|
<a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.2">
|
|
Accept-Charset</a> ([<a href="#ref-RFC2616">RFC2616</a>] section 14.2),
|
|
and <a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.4">
|
|
Accept-Language</a> ([<a href="#ref-RFC2616">RFC2616</a>] section 14.4) headers,
|
|
and beyond.</p>
|
|
|
|
|
|
|
|
<div class="checkpoint" id="cp7.1">
|
|
<div class="cp-head">
|
|
<span class="cp-number">7.1:</span>
|
|
<span class="cp-title">Format negotiation</span>
|
|
<span class="cp-target"> <a href="#target-expl-si"><abbr title="Web Server Implementor">SI</abbr></a> </span>
|
|
<span class="cp-target"> <a href="#target-expl-cm"><abbr title="Content Manager">CM</abbr></a> </span>
|
|
</div>
|
|
|
|
<p class="cp">"Format negotiation" here stands for the server-driven negotiation between
|
|
equivalent instances of a resource in different "formats", either media-type (often
|
|
called "content-negotiation" erroneously) or character encoding.</p>
|
|
|
|
<ol>
|
|
|
|
<li class="cp-prov">
|
|
<p>
|
|
<span class="cp-title">Allow the content manager to use and configure content-type negotiation</span>
|
|
<span class="cp-target"> <a href="#target-expl-si"><abbr title="Web Server Implementor">SI</abbr></a> </span>
|
|
</p>
|
|
<p class="cp">Content-managers should be provided with an easy way to specify that
|
|
several documents are different instances of the same resource using various "equivalent"
|
|
media types.</p>
|
|
|
|
<p class="cp">Server should then apply server-driven negotiation algorithms to serve the most
|
|
appropriate variant based at least on the requested
|
|
<a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.1">Accept</a>
|
|
([<a href="#ref-RFC2616">RFC2616</a>] section 14.1) header.
|
|
</p>
|
|
</li>
|
|
|
|
<li class="cp-prov">
|
|
<p>
|
|
<span class="cp-title">Allow the content manager to use and configure
|
|
character encoding negotiation</span>
|
|
<span class="cp-target"> <a href="#target-expl-si"><abbr title="Web Server Implementor">SI</abbr></a> </span>
|
|
</p>
|
|
<p class="cp">Content-managers should be provided with an easy way to specify that
|
|
several documents are different instances of the same resource with different
|
|
character encoding.</p>
|
|
|
|
<p class="cp">Server should then apply server-driven negotiation algorithms to serve the most
|
|
appropriate variant based at least on the requested
|
|
<a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.2">
|
|
Accept-Charset</a> ([<a href="#ref-RFC2616">RFC2616</a>] section 14.2) header.</p>
|
|
|
|
|
|
</li>
|
|
|
|
<li class="cp-prov">
|
|
<p>
|
|
<span class="cp-title">During format negotiation, be cautious with agents accepting anything</span>
|
|
<span class="cp-target"> <a href="#target-expl-si"><abbr title="Web Server Implementor">SI</abbr></a> </span>
|
|
<span class="cp-target"> <a href="#target-expl-cm"><abbr title="Content Manager">CM</abbr></a> </span>
|
|
</p>
|
|
<p class="cp">
|
|
As explained for example in "<a href="http://www.w3.org/TR/cuap#protocols">Common user agent problems</a>" ([<a href="#ref-CUAP">CUAP</a>] section "protocols"), some agents are known to misbehave with regard to format negociation, sending an HTTP header of <code>Accept: */*</code> (thus they are supposed to
|
|
support every and any content type, which they certainly do not).
|
|
</p>
|
|
|
|
<p class="cp">While servers are not required to cope with this problem in user agents, a wise
|
|
practice toward agents sending broken <code>Accept:</code> headers or not expressing
|
|
specific preference on the content type is to send them a version of the resource in a widely
|
|
supported document format.</p>
|
|
|
|
<p class="cp">This can be done at the server level using the quality factors used in the
|
|
<a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec12.html#sec12">negotiation process</a>
|
|
([<a href="#ref-RFC2616">RFC2616</a>] section 12).
|
|
</p>
|
|
|
|
<p class="cp">See also the related guideline : <a href="#gl11">Guideline 11 : Use flexible technology instead of client sniffing/blocking</a>.</p>
|
|
</li>
|
|
|
|
<li class="cp-prov">
|
|
<p>
|
|
<span class="cp-title">Allow the content manager to set the quality factors used during negociation</span>
|
|
<span class="cp-target"> <a href="#target-expl-si"><abbr title="Web Server Implementor">SI</abbr></a> </span>
|
|
</p>
|
|
|
|
<p class="cp">
|
|
Content-managers should be provided with an easy way to specify which version
|
|
(either format or language) of the resource they would rather see served,
|
|
in case the headers sent by the agent do not leave one clear choice.
|
|
</p>
|
|
|
|
<p class="cp">See related checkpoint
|
|
9.1: <a href="#cp9.1">When negotiation fails</a>.</p>
|
|
</li>
|
|
|
|
</ol>
|
|
</div>
|
|
|
|
<div class="checkpoint" id="cp7.2">
|
|
<div class="cp-head">
|
|
<span class="cp-number">7.2:</span>
|
|
<span class="cp-title">Language negotiation</span>
|
|
<span class="cp-target"> <a href="#target-expl-si"><abbr title="Web Server Implementor">SI</abbr></a> </span>
|
|
</div>
|
|
|
|
<ol>
|
|
<li class="cp-prov">
|
|
<p>
|
|
<span class="cp-title">Allow the content manager to use and configure language negotiation</span>
|
|
<span class="cp-target"> <a href="#target-expl-si"><abbr title="Web Server Implementor">SI</abbr></a> </span>
|
|
</p>
|
|
<p class="cp">Content-managers should be provided with an easy way to specify that
|
|
several documents are different instances of the same resource translated in different
|
|
languages.</p>
|
|
|
|
<p class="cp">Server should then apply server-driven negotiation algorithms to serve the most
|
|
appropriate variant based at least on the requested
|
|
<a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.4">
|
|
Accept-Language</a> ([<a href="#ref-RFC2616">RFC2616</a>] section 14.4) header.
|
|
</p>
|
|
|
|
</li>
|
|
|
|
<li class="cp-prov">
|
|
<p>
|
|
<span class="cp-title">Allow the content manager to set the quality factors used during negociation</span>
|
|
<span class="cp-target"> <a href="#target-expl-si"><abbr title="Web Server Implementor">SI</abbr></a> </span>
|
|
</p>
|
|
|
|
<p class="cp">
|
|
Content-managers should be provided with an easy way to specify which version
|
|
(either format or language) of the resource they would rather see served,
|
|
in case the headers sent by the agent do not leave one clear choice.
|
|
</p>
|
|
|
|
<p class="cp">See related checkpoint
|
|
9.1: <a href="#cp9.1">When negotiation fails</a>.</p>
|
|
</li>
|
|
|
|
|
|
<li class="cp-prov">
|
|
<p> <span class="cp-title">Use the <code>Content-Language:</code>
|
|
<acronym title="the Hypertext Transfer Protocol">HTTP</acronym> header</span>
|
|
<span class="cp-target"> <a href="#target-expl-si"><abbr title="Web Server Implementor">SI</abbr></a> </span>
|
|
</p>
|
|
|
|
<p class="cp">If the resource is served using language-negotiation (actually, even if it is not),
|
|
servers MAY send a <code>Content-Language:</code>
|
|
<acronym title="the Hypertext Transfer Protocol">HTTP</acronym> header specifying the
|
|
language of the instance of the resource served. This is an interesting
|
|
information that agents may use to evaluate the result of server-driven negotiation, exactly
|
|
as they would with the <code>Content-Type</code> header in the case of format negotiation.
|
|
</p>
|
|
|
|
<p class="cp example"><span class="example-good">Example of HTTP/1.1 transaction using
|
|
<code>Content-Language:</code></span></p>
|
|
<pre class="cp example">
|
|
GET /foo/resource HTTP/1.1
|
|
Host: www.example.org
|
|
Accept-Language: fr, en-gb;q=0.8, de;q=0.1
|
|
|
|
HTTP/1.1 200 OK
|
|
[...]
|
|
Content-Location: http://www.example.org/foo/resource.html.fr
|
|
Content-Language: fr
|
|
[...]
|
|
</pre>
|
|
</li>
|
|
|
|
</ol>
|
|
|
|
</div>
|
|
|
|
<p>If server-driven negotiation fails, servers should either proceed to agent-driven negotiation
|
|
or try fall-back solutions, as explained in <a href="#gl9">Guideline 9 : Provide default and fall-back solutions</a>.</p>
|
|
|
|
<h3 class="gl">Guideline 8:
|
|
<a name="gl8" id="gl8">Provide useful metadata in addition to content negotiation</a>
|
|
</h3>
|
|
|
|
<p>Server-driven negotiation is used to serve the best content available,
|
|
based on the accept headers received.
|
|
This mechanism does not, however, specify variants beyond the generic <code>Vary:</code>
|
|
<acronym title="the Hypertext Transfer Protocol">HTTP</acronym> header. </p>
|
|
|
|
<p>This guideline gives hits at going a little further for the sake of ease of
|
|
navigation through, and indexing of, multiple HTML documents (variants or collection).
|
|
</p>
|
|
|
|
<div class="checkpoint" id="cp8.1">
|
|
<div class="cp-head">
|
|
<span class="cp-number">8.1:</span>
|
|
<span class="cp-title">Variants of (X)HTML documents</span>
|
|
<span class="cp-target"> <a href="#target-expl-ss"><abbr title="Server-side Engines">SS</abbr></a> </span>
|
|
<span class="cp-target"> <a href="#target-expl-cm"><abbr title="Content Manager">CM</abbr></a> </span>
|
|
|
|
</div>
|
|
<ol>
|
|
<li class="cp-prov">
|
|
<p> <span class="cp-title">Specify variants of HTML documents</span>
|
|
<span class="cp-target"> <a href="#target-expl-ss"><abbr title="Server-side Engines">SS</abbr></a> </span>
|
|
<span class="cp-target"> <a href="#target-expl-cm"><abbr title="Content Manager">CM</abbr></a> </span>
|
|
|
|
</p>
|
|
|
|
|
|
<p class="cp">The <acronym title="Hypertext Markup Language">HTML</acronym>
|
|
specification [<a href="#ref-HTML">HTML 4.01</a>], provides
|
|
<a href="http://www.w3.org/TR/REC-html40/appendix/notes.html#h-B.4">mechanisms
|
|
to specify (language) variants for a given document</a>
|
|
([<a href="#ref-HTML">HTML 4.01</a>] appendix B.4) using the
|
|
<a href="http://www.w3.org/TR/REC-html40/struct/links.html#h-12.3"><code>link</code>
|
|
element</a> ([<a href="#ref-HTML">HTML 4.01</a>] section 12.3).
|
|
</p>
|
|
|
|
<p class="cp">When used with the <code>alternate</code> type, the <code>link</code>
|
|
element can specify variants of a given resource, either language variants
|
|
(translations) with the <code>lang</code> attribute or media variants with the
|
|
<code>media</code> attribute.</p>
|
|
|
|
|
|
<p class="cp, example"><span class="example-good">Example of
|
|
HTML markup for language variants</span>:</p>
|
|
<pre class="cp, example">
|
|
<LINK rel="alternate"
|
|
type="text/html"
|
|
href="mydoc-fr.html" hreflang="fr"
|
|
lang="fr" title="La vie souterraine">
|
|
<LINK rel="alternate"
|
|
type="text/html"
|
|
href="mydoc-de.html" hreflang="de"
|
|
lang="de" title="Das Leben im Untergrund">
|
|
</pre>
|
|
</li>
|
|
<li class="cp-prov">
|
|
<p><span class="cp-title">Specify variants of <acronym title="eXtensible Hypertext Markup Language">XHTML</acronym> documents</span>
|
|
|
|
<span class="cp-target"> <a href="#target-expl-ss"><abbr title="Server-side Engines">SS</abbr></a> </span>
|
|
<span class="cp-target"> <a href="#target-expl-cm"><abbr title="Content Manager">CM</abbr></a> </span>
|
|
</p>
|
|
|
|
<p class="cp">
|
|
Note that this technique is is also applicable for
|
|
<acronym title="eXtensible Hypertext Markup Language">XHTML</acronym> documents.</p>
|
|
|
|
<p class="cp example"><span class="example-good">Example of
|
|
XHTML 1.0 markup for language variants</span> (same as above,
|
|
but with lower-case, closed elements...):</p>
|
|
<pre class="cp example">
|
|
<link rel="alternate"
|
|
type="text/html"
|
|
href="mydoc-fr.html" hreflang="fr"
|
|
lang="fr" title="La vie souterraine" />
|
|
<link rel="alternate"
|
|
type="text/html"
|
|
href="mydoc-de.html" hreflang="de"
|
|
lang="de" title="Das Leben im Untergrund" />
|
|
</pre>
|
|
</li>
|
|
</ol>
|
|
</div>
|
|
|
|
<div class="checkpoint" id="cp8.2">
|
|
<div class="cp-head">
|
|
<span class="cp-number">8.2:</span>
|
|
<span class="cp-title">Navigation among (X)HTML documents</span>
|
|
<span class="cp-target"> <a href="#target-expl-cm"><abbr title="Content Manager">CM</abbr></a> </span>
|
|
</div>
|
|
|
|
<ol>
|
|
<li class="cp-prov">
|
|
<p>
|
|
<span class="cp-title">Facilitate navigation among collections of HTML documents</span>
|
|
|
|
<span class="cp-target"> <a href="#target-expl-cm"><abbr title="Content Manager">CM</abbr></a> </span>
|
|
</p>
|
|
<p class="cp">Again, using the
|
|
<a href="http://www.w3.org/TR/REC-html40/struct/links.html#h-12.3">
|
|
<code>link</code></a> element ([<a href="#ref-HTML">HTML 4.01</a>] section 12.3)
|
|
one can specify relations betweens documents in a collection.</p>
|
|
|
|
<p class="cp">The link types which can be used for this purpose,
|
|
as described in the
|
|
<a href="http://www.w3.org/TR/REC-html40/types.html#type-links">
|
|
Data types section of the HTML 4.01 specification</a>
|
|
[<a href="#ref-HTML">HTML 4.01</a>]
|
|
are:</p>
|
|
<ul class="cp">
|
|
<li>Start</li>
|
|
<li>Next</li>
|
|
<li>Prev</li>
|
|
<li>Contents</li>
|
|
<li>Index</li>
|
|
<li>etc.</li>
|
|
</ul>
|
|
|
|
<p class="cp example"><span class="example-good">examples of use</span>:</p>
|
|
<ul class="cp example">
|
|
<li>in a photo gallery (using <code>Next</code>, <code>Prev</code>,
|
|
<code>Index</code>, etc.)</li>
|
|
<li>for a periodical newsletter (using <code>Next</code>, <code>Prev</code>,
|
|
<code>Copyright</code>, etc)</li>
|
|
<li>in a compound document (using <code>Contents</code>, <code>Chapter</code>,
|
|
<code>Section</code>, <code>Subsection</code>, <code>Appendix</code>,
|
|
<code>Glossary</code>, etc.)</li>
|
|
</ul>
|
|
|
|
</li>
|
|
</ol>
|
|
</div>
|
|
|
|
<h3 class="gl">Guideline 9:
|
|
<a name="gl9" id="gl9">Provide default and fall-back solutions</a>
|
|
</h3>
|
|
|
|
<p><acronym title="the hypertext Trasfer Protocol">HTTP</acronym>
|
|
[<a href="#ref-RFC2616">RFC2616</a>] is about serving content in the most
|
|
appropriate way, and, as we have seen in previous guidelines (<a href="#gl7">
|
|
Guideline 7 : Server-driven content negotiation</a> and
|
|
<a href="#gl8">Guideline 8: Provide useful metadata in addition to content
|
|
negotiation</a>), server-driven negotiation may be used to serve
|
|
the best available content. It may happen that these mechanisms fail, and in this
|
|
case, <acronym title="the hypertext Trasfer Protocol">HTTP</acronym> implementations
|
|
should try, when possible, to give the requested content to the client. This may be
|
|
achieved through default and fall-back mechanisms.</p>
|
|
|
|
<div class="checkpoint" id="cp9.1">
|
|
<div class="cp-head">
|
|
<span class="cp-number">9.1:</span>
|
|
<span class="cp-title">When negotiation fails </span>
|
|
<span class="cp-target"> <a href="#target-expl-si"><abbr title="Web Server Implementor">SI</abbr></a> </span>
|
|
<span class="cp-target"> <a href="#target-expl-ss"><abbr title="Server-side Engines">SS</abbr></a> </span>
|
|
</div>
|
|
|
|
<ol>
|
|
<li class="cp-prov">
|
|
<p><span class="cp-title">provide multiple or default choice(s) when content/language
|
|
negotiation fails to give only one result</span>
|
|
<span class="cp-target"> <a href="#target-expl-si"><abbr title="Web Server Implementor">SI</abbr></a> </span>
|
|
<span class="cp-target"> <a href="#target-expl-ss"><abbr title="Server-side Engines">SS</abbr></a> </span>
|
|
</p>
|
|
|
|
<p class="cp">Using the verbiage of the <acronym title="Hypertext Trasfer Protocol">HTTP</acronym>
|
|
specification, this checkpoint can be paraphrased into "use agent-driven negotiation when the server is
|
|
unable to provide a varying response using server-driven negotiation".</p>
|
|
|
|
<p class="cp">Section 12 of <acronym title="the hypertext Trasfer Protocol">HTTP</acronym>
|
|
[<a href="#ref-RFC2616">RFC2616</a>] provides mechanisms to leave the final decision to the user-agent
|
|
(or its user) for cases when the content or language negotiation does not come up with a unique result
|
|
but with multiple ones.</p>
|
|
|
|
<p class="cp">In such a case, a server can use the
|
|
<a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.3.1">300 (Multiple Choices)</a>
|
|
status code, or be configured to send, by default, one of the resources among the possible choices.</p>
|
|
|
|
</li>
|
|
<li class="cp-prov">
|
|
<p>
|
|
<span class="cp-title">provide default or fall-back choice(s) when
|
|
content/language negotiation fails</span>
|
|
<span class="cp-target"> <a href="#target-expl-si"><abbr title="Web Server Implementor">SI</abbr></a> </span>
|
|
<span class="cp-target"> <a href="#target-expl-ss"><abbr title="Server-side Engines">SS</abbr></a> </span>
|
|
</p>
|
|
|
|
<p class="cp">Section 12 of <acronym title="the hypertext Trasfer Protocol">HTTP</acronym>/1.1
|
|
[<a href="#ref-RFC2616">RFC2616</a>] suggests the use of the
|
|
<a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.7">406 (Not Acceptable)</a>
|
|
status code when content or language negotiation fails to find any appropriate negotiated resource.</p>
|
|
|
|
<p class="cp">
|
|
However the <acronym title="Hypertext Transfer Protocol">HTTP</acronym>/1.1 specification
|
|
[<a href="#ref-RFC2616">RFC2616</a>] also states that
|
|
<q>the server should make the best efforts to give the requested content to the client</q>.
|
|
</p>
|
|
|
|
<p class="cp">
|
|
One possible interpretation of this is that the server may provide fall-back choice(s):
|
|
the message body for "HTTP 406 not acceptable" can give a list of available resources
|
|
and let the user choose, or the server can be configured to serve, arbitrarily, a
|
|
specific variant of the resource in case negotiation fails.
|
|
</p>
|
|
|
|
<p class="cp">
|
|
Note that this is perfectly acceptable with regards to
|
|
<a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.7">
|
|
Section 10.4.7 of <acronym title="the Hypertext Transfer Protocol">HTTP</acronym>/1.1</a>
|
|
[<a href="#ref-RFC2616">RFC2616</a>]:
|
|
<q cite="http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.7">
|
|
HTTP/1.1 servers are allowed to return responses which are
|
|
not acceptable according to the accept headers sent in the
|
|
request. In some cases, this may even be preferable to sending a
|
|
406 response. User agents are encouraged to inspect the headers of
|
|
an incoming response to determine if it is acceptable.
|
|
</q></p>
|
|
</li>
|
|
|
|
<li class="cp-prov">
|
|
<p>
|
|
<span class="cp-title">allow the content manager to set up a fall-back
|
|
behavior content/language for cases when negotiation fails</span>
|
|
<span class="cp-target"> <a href="#target-expl-si"><abbr title="Web Server Implementor">SI</abbr></a> </span>
|
|
<span class="cp-target"> <a href="#target-expl-ss"><abbr title="Server-side Engines">SS</abbr></a> </span>
|
|
</p>
|
|
|
|
<p class="cp">This is the practical implementation of the provision above.
|
|
The server should allow the content manager to decide whether, in case negotiation
|
|
fails, the server should:
|
|
</p>
|
|
<div class="cp">
|
|
<ul>
|
|
<li> send a 406 (Not Acceptable) status code with a list of available choices,</li>
|
|
<li>or if it should arbitrarily serve a variant of the resource.
|
|
(The content manager should, of course, be allowed to choose which variant would be chosen,
|
|
or how it should be chosen.).</li>
|
|
</ul>
|
|
</div>
|
|
|
|
<p class="cp, example">
|
|
<span class="example-good">Example</span>:<br />
|
|
Through the <a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.4">
|
|
Accept-Language</a> headers, a client specifies that it prefers Japanese or English
|
|
versions of the resource, whereas the content is only available in French and Spanish.
|
|
The content manager may be allowed to choose that the French version will be served
|
|
as a default version, or let the server send a 406 status code, giving the user-agent
|
|
a choice between the French and Spanish versions.</p>
|
|
|
|
</li>
|
|
</ol>
|
|
|
|
</div>
|
|
|
|
|
|
<div class="checkpoint" id="cp9.2">
|
|
<div class="cp-head">
|
|
<span class="cp-number">9.2:</span>
|
|
<span class="cp-title"><acronym title="Hypertext Transfer Protocol">HTTP</acronym> error messages body</span>
|
|
<span class="cp-target"> <a href="#target-expl-si"><abbr title="Web Server Implementor">SI</abbr></a> </span>
|
|
<span class="cp-target"> <a href="#target-expl-ss"><abbr title="Server-side Engines">SS</abbr></a> </span>
|
|
</div>
|
|
|
|
<p class="cp">As a general rule, the content manager should be allowed to change and customize
|
|
the body of <acronym title="Hypertext Transfer Protocol">HTTP</acronym> error messages.</p>
|
|
</div>
|
|
|
|
|
|
<h3 class="gl" id="gl10">Guideline 10: Serve resources with correct content-type and character
|
|
encoding information</h3>
|
|
<div class="checkpoint" id="cp10.1">
|
|
<div class="cp-head">
|
|
<span class="cp-number">10.1:</span>
|
|
<span class="cp-title"><code>Content-type</code></span>
|
|
<span class="cp-target"> <a href="#target-expl-si"><abbr title="Web Server Implementor">SI</abbr></a> </span>
|
|
<span class="cp-target"> <a href="#target-expl-ss"><abbr title="Server-side Engines">SS</abbr></a> </span>
|
|
<span class="cp-target"> <a href="#target-expl-cm"><abbr title="Content Manager">CM</abbr></a> </span>
|
|
</div>
|
|
|
|
<ol>
|
|
<li class="cp-prov">
|
|
<p><span class="cp-title">Send proper <code>Content-type</code>
|
|
<acronym title="the Hypertext Transfer Protocol">HTTP</acronym> header</span>
|
|
<span class="cp-target"> <a href="#target-expl-si"><abbr title="Web Server Implementor">SI</abbr></a> </span>
|
|
<span class="cp-target"> <a href="#target-expl-ss"><abbr title="Server-side Engines">SS</abbr></a> </span>
|
|
<span class="cp-target"> <a href="#target-expl-cm"><abbr title="Content Manager">CM</abbr></a> </span>
|
|
</p>
|
|
|
|
<p class="cp">
|
|
Resources should be served with a proper
|
|
<a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.17"><code>Content-type</code>
|
|
Header</a> ([<a href="#ref-RFC2616">RFC2616</a>] section 14.17). Documents not served
|
|
with a proper media type may not be interpreted correctly by user agents.</p>
|
|
|
|
<p class="cp example">
|
|
<span class="example-bad">Example of a wrong practice</span>:<br />
|
|
CSS style sheets are sometimes served as plain text
|
|
(<code>text/plain</code> media type), causing the user-agents to ignore
|
|
the style sheet and rendering the document in an unexpected manner.</p>
|
|
<p class="cp example">
|
|
<span class="example-good">Example of a proper practice</span>:<br />
|
|
CSS style sheets should be served with the <code>text/css</code> media type.</p>
|
|
|
|
|
|
</li>
|
|
|
|
<li class="cp-prov">
|
|
<p><span class="cp-title">allow the content manager to override content-type settings</span>
|
|
<span class="cp-target"> <a href="#target-expl-si"><abbr title="Web Server Implementor">SI</abbr></a> </span>
|
|
</p>
|
|
|
|
<p class="cp">In addition to proper default mapping of media types to file extension,
|
|
since there is no obligation to use "well-known" file extensions in
|
|
<acronym title="Uniform Resource Identifier">URI</acronym>s, servers should allow the content
|
|
manager to set the appropriate media type sent in the <code>Content-type</code> header for
|
|
resources without such file extension, and to override the default setting at will.
|
|
</p>
|
|
</li>
|
|
</ol></div>
|
|
|
|
<div class="checkpoint" id="cp10.2">
|
|
<div class="cp-head">
|
|
<span class="cp-number">10.2:</span>
|
|
<span class="cp-title">Character Encoding</span>
|
|
<span class="cp-target"> <a href="#target-expl-si"><abbr title="Web Server Implementor">SI</abbr></a> </span>
|
|
<span class="cp-target"> <a href="#target-expl-ss"><abbr title="Server-side Engines">SS</abbr></a> </span>
|
|
<span class="cp-target"> <a href="#target-expl-cm"><abbr title="Content Manager">CM</abbr></a> </span>
|
|
</div>
|
|
<ol>
|
|
<li class="cp-prov">
|
|
<p><span class="cp-title">Send proper character encoding information</span>
|
|
<span class="cp-target"> <a href="#target-expl-si"><abbr title="Web Server Implementor">SI</abbr></a> </span>
|
|
<span class="cp-target"> <a href="#target-expl-ss"><abbr title="Server-side Engines">SS</abbr></a> </span>
|
|
<span class="cp-target"> <a href="#target-expl-cm"><abbr title="Content Manager">CM</abbr></a> </span>
|
|
</p>
|
|
<p class="cp">For some document types, the media type sent by the
|
|
<a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.17"><code>Content-type</code>
|
|
Header</a> ([<a href="#ref-RFC2616">RFC2616</a>] section 14.17) may be sent with some information
|
|
about the character encoding of the document. In some cases, this is mandatory (see the provision below
|
|
for HTML and XHTML).</p>
|
|
</li>
|
|
|
|
<li class="cp-prov">
|
|
<p><span class="cp-title">Send proper character encoding information for XHTML documents</span>
|
|
<span class="cp-target"> <a href="#target-expl-si"><abbr title="Web Server Implementor">SI</abbr></a> </span>
|
|
<span class="cp-target"> <a href="#target-expl-ss"><abbr title="Server-side Engines">SS</abbr></a> </span>
|
|
<span class="cp-target"> <a href="#target-expl-cm"><abbr title="Content Manager">CM</abbr></a> </span>
|
|
</p>
|
|
<p class="cp">The
|
|
<a href="http://www.w3.org/TR/1999/REC-html401-19991224/charset.html#h-5.2.2">HTML 4.01 Recommendation</a>
|
|
([<a href="#ref-HTML401">HTML 4.01</a>] section 5.2.2) states that
|
|
the server should provide this information (the character encoding of the HTML document served), e.g:
|
|
</p>
|
|
|
|
<p class="cp example">Content-Type: text/html; charset=EUC-JP</p>
|
|
|
|
<p class="cp">Conforming user agents MUST observe the following priorities
|
|
when determining an HTML document's character encoding (from highest priority to lowest):</p>
|
|
<ol class="cp">
|
|
<li> An HTTP "charset" parameter in a "Content-Type" field</li>
|
|
<li>A META declaration with "http-equiv" set to "Content-Type" and a value set for "charset"</li>
|
|
<li>The charset attribute set on an element that designates an external resource.</li>
|
|
</ol>
|
|
|
|
<p class="cp">Note that The HTTP/1.1 protocol ([<a href="#ref-RFC2616">RFC2616</a>], section 3.7.1)
|
|
mentions ISO-8859-1 as a default character encoding when the "charset" parameter
|
|
is absent from the "Content-Type" header field, but it is now not recommended to follow this
|
|
practice.
|
|
</p>
|
|
|
|
<p>The recommended practice is that the character encoding be <strong>both</strong>
|
|
specified be specified in the META declaration, <strong>and</strong>
|
|
the "Content-Type" header field.</p>
|
|
|
|
<p class="cp example">
|
|
<span class="example-good">Example of an HTML 4.01 document written in French with
|
|
a UTF-8 encoding</span>:</p>
|
|
|
|
<pre class="cp example">
|
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
|
|
"http://www.w3.org/TR/html4/strict.dtd">
|
|
<html lang="fr">
|
|
|
|
<head>
|
|
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
|
|
<title>Exemple de document HTML 4.01</title>
|
|
</head>
|
|
|
|
<body>
|
|
<h1>Portrait Intérieur</h1>
|
|
<h2>Rainer-Maria Rilke</h2>
|
|
|
|
<p>Ce ne sont pas des souvenirs<br>
|
|
qui, en moi, t'entretiennent ;<br>
|
|
tu n'es pas non plus mienne<br>
|
|
par la force d'un beau désir.</p>
|
|
</body>
|
|
</html>
|
|
</pre>
|
|
|
|
</li>
|
|
<li class="cp-prov">
|
|
|
|
<p><span class="cp-title">Send proper character encoding information
|
|
for XHTML 1.0 documents</span>
|
|
<span class="cp-target"> <a href="#target-expl-si"><abbr title="Web Server Implementor">SI</abbr></a> </span>
|
|
<span class="cp-target"> <a href="#target-expl-ss"><abbr title="Server-side Engines">SS</abbr></a> </span>
|
|
<span class="cp-target"> <a href="#target-expl-cm"><abbr title="Content Manager">CM</abbr></a> </span>
|
|
</p>
|
|
<p class="cp">The case of XHTML document is similar to the case of HTML,
|
|
except that, since XHTML is also XML,XHTML document can provide the
|
|
character encoding via the XML declaration. (but if the XHTML document
|
|
uses one of the default encodings - UTF-8 or UTF-16 - no declaration is needed).</p>
|
|
|
|
<p class="cp">The recommended practice for XHTML documents is to properly specify the
|
|
character encoding in both the XML declaration
|
|
and the the "Content-Type" header field.</p>
|
|
|
|
<p class="cp example">
|
|
<span class="example-good">Example of an XHTML 1.0 document written in French
|
|
with an ISO-8859-1 encoding</span>:</p>
|
|
|
|
<pre class="cp example">
|
|
<?xml version="1.0" encoding="ISO-8859-1"?>
|
|
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
|
|
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
|
|
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="fr" lang="fr">
|
|
|
|
<head>
|
|
<title>Exemple de document XHTML 1.0</title>
|
|
</head>
|
|
|
|
<body>
|
|
<h1>Portrait Intérieur</h1>
|
|
<h2>Rainer-Maria Rilke</h2>
|
|
<p>Ce ne sont pas des souvenirs<br />
|
|
qui, en moi, t'entretiennent ;<br />
|
|
tu n'es pas non plus mienne<br />
|
|
par la force d'un beau désir.</p>
|
|
</body>
|
|
</html>
|
|
</pre>
|
|
|
|
</li>
|
|
|
|
<li class="cp-prov">
|
|
<p> <span class="cp-title">Allow the content manager to override
|
|
character encoding settings</span>
|
|
<span class="cp-target"> <a href="#target-expl-si"><abbr title="Web Server Implementor">SI</abbr></a> </span>
|
|
<span class="cp-target"> <a href="#target-expl-ss"><abbr title="Server-side Engines">SS</abbr></a> </span>
|
|
</p>
|
|
<p class="cp">
|
|
The content manager should be allowed to set the character encoding information.
|
|
</p>
|
|
<p class="cp">If the server implementor does not want the content manager,
|
|
or if the content manager does not want the users to change the charset information sent
|
|
by the HTTP server, then the server should send none, and the character encoding may be specified
|
|
at the document level.
|
|
</p>
|
|
</li>
|
|
</ol>
|
|
</div>
|
|
<h3 class="gl">Guideline 11:
|
|
<a name="gl11" id="gl11">Use flexible technology instead of
|
|
client sniffing/blocking</a></h3>
|
|
<div class="checkpoint" id="cp11.1">
|
|
<div class="cp-head">
|
|
<span class="cp-number">11.1:</span>
|
|
<span class="cp-title">Avoid agent sniffing</span>
|
|
<span class="cp-target"> <a href="#target-expl-ss"><abbr title="Server-side Engines">SS</abbr></a> </span>
|
|
<span class="cp-target"> <a href="#target-expl-cm"><abbr title="Content Manager">CM</abbr></a> </span>
|
|
</div>
|
|
<ol>
|
|
<li class="cp-prov">
|
|
<p><span class="cp-title">Use content-negotiated resources instead of Agent sniffing</span>
|
|
<span class="cp-target"> <a href="#target-expl-ss"><abbr title="Server-side Engines">SS</abbr></a> </span>
|
|
<span class="cp-target"> <a href="#target-expl-cm"><abbr title="Content Manager">CM</abbr></a> </span>
|
|
</p>
|
|
<p class="cp">Server-driven negotiation, based on the agent's capabilities (given though the
|
|
<a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.1"><code>Accept:</code>
|
|
header</a> -[<a href="#ref-RFC2616">RFC2616</a> section 14.1 -) is a very efficient way of providing
|
|
agents with content they can display or process, without doubt on their capabilities. It is
|
|
also a cost-efficient technique, as the negotiation is handled by the server based on what
|
|
agents declare they can handle, whereas agent sniffing implies knowledge of (potentially all)
|
|
agents and their capabilities in order to serve (only) content the agents can handle.</p>
|
|
|
|
<p class="cp">Providing (with negotiation) equivalent versions of a resource in
|
|
flexible technologies should therefore be preferred to agent-sniffing.</p>
|
|
|
|
</li>
|
|
<li class="cp-prov">
|
|
<p><span class="cp-title">Use flexible document technologies instead of Agent sniffing</span>
|
|
<span class="cp-target"> <a href="#target-expl-ss"><abbr title="Server-side Engines">SS</abbr></a> </span>
|
|
<span class="cp-target"> <a href="#target-expl-cm"><abbr title="Content Manager">CM</abbr></a> </span>
|
|
</p>
|
|
<p class="cp">
|
|
Content manager often think they have to serve different content depending on the agent,
|
|
either by generating different content on the fly using server-side technologies, filtering,
|
|
negotiating, or <a href="#cp11.2">Blocking</a>.</p>
|
|
|
|
<p class="cp"> However well done (negotiating being the most appropriate way),
|
|
this practice is very seldom suitable to any possible agent, and implies a lot of extra work.</p>
|
|
|
|
<p class="cp">Content-managers should therefore consider the use of standard (i.e widely
|
|
implemented), flexible (scalable, multi-platform, device independent, etc.) document
|
|
technologies whenever possible, either as a primary choice, or, at least, as a negotiated
|
|
alternative.</p>
|
|
|
|
<p class="cp example"><span class="example-good">Example of an acceptable practice</span>:<br />
|
|
The content manager decides to serve a text resource using proprietary, not widely implemented
|
|
technology, but adds a negotiated plain-text alternative for agents which can not handle
|
|
the proprietary document format.</p>
|
|
</li>
|
|
</ol>
|
|
|
|
</div>
|
|
|
|
|
|
<div class="checkpoint" id="cp11.2">
|
|
<div class="cp-head">
|
|
<span class="cp-number">11.2:</span>
|
|
<span class="cp-title">Avoid agent blocking</span>
|
|
<span class="cp-target"> <a href="#target-expl-ss"><abbr title="Server-side Engines">SS</abbr></a> </span>
|
|
<span class="cp-target"> <a href="#target-expl-cm"><abbr title="Content Manager">CM</abbr></a> </span>
|
|
</div>
|
|
<ol>
|
|
<li class="cp-prov">
|
|
<p><span class="cp-title">Avoid agent blocking</span>
|
|
<span class="cp-target"> <a href="#target-expl-ss"><abbr title="Server-side Engines">SS</abbr></a> </span>
|
|
<span class="cp-target"> <a href="#target-expl-cm"><abbr title="Content Manager">CM</abbr></a> </span>
|
|
</p>
|
|
|
|
<p class="cp">
|
|
Even though some agents may be badly broken, refusing to serve content to users of such an agent
|
|
means lost business (traffic), and <a href="#cp11.1">flexible technologies</a>, which ensure that
|
|
the content may be handled by any agent, should be preferred to this practice.</p>
|
|
<p class="cp">Even worse is to choose which agents are "suitable" and block all the other agents.
|
|
This is a very bad move, at least because:</p>
|
|
<ul class="cp">
|
|
<li>Some of the agents one may block are actually idexing agents for search engines,
|
|
and may bring back traffic</li>
|
|
<li>Agents are rapidly evolving, and while a specific version of a specific agent may
|
|
appear better at some point in time, there is no reason to believe another version of
|
|
another agent may not be more appropriate later, hence making the blocking rules obsolete</li>
|
|
<li>Blocking agents means refusing to serve, and ultimately means lost business (traffic).</li>
|
|
</ul>
|
|
|
|
<p class="cp">Agent blocking should therefore be avoided as much as possible, and instead
|
|
flexible negotiation and document technologies, as described in <a href="#cp11.1">Checkpoint 11.1</a>,
|
|
should be used.</p>
|
|
|
|
</li>
|
|
</ol>
|
|
|
|
</div>
|
|
|
|
<h3 class="gl">
|
|
<a name="gl12" id="gl12">Guideline 12: Enrich and enhance</a></h3>
|
|
<p>The previous guidelines showed good practices for the implementation and use of Web server
|
|
technologies. We will close this document by adding a few leads to practices which, even though
|
|
they are not crucial, may be followed to enrich or enhance <acronym title="Hypertext Transfer Protocol">HTTP</acronym> services...</p>
|
|
|
|
<div class="checkpoint" id="cp12.1">
|
|
<div class="cp-head">
|
|
<span class="cp-number">12.1:</span>
|
|
<span class="cp-title">Transfer encoding</span>
|
|
<span class="cp-target"> <a href="#target-expl-si"><abbr title="Web Server Implementor">SI</abbr></a> </span>
|
|
</div>
|
|
<ol>
|
|
<li class="cp-prov">
|
|
<span class="cp-title">Use transfer encoding for bandwidth-constrained devices</span>
|
|
<span class="cp-target"> <a href="#target-expl-si"><abbr title="Web Server Implementor">SI</abbr></a> </span>
|
|
|
|
<p class="cp">Serving content to bandwitdh-constrained devices (this includes
|
|
among many others, mobile devices), can be improved via on the fly connection, using the
|
|
<a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.41">Transfer-Encoding
|
|
<acronym title="Hypertext Transfer Protocol">HTTP</acronym> header</a>
|
|
([<a href="#ref-RFC2616">RFC2616</a>] section 14.41).
|
|
</p>
|
|
</li>
|
|
</ol>
|
|
</div>
|
|
|
|
<div class="checkpoint" id="cp12.2">
|
|
<div class="cp-head">
|
|
<span class="cp-number">12.2:</span>
|
|
<span class="cp-title">From (meta)data to server information</span>
|
|
<span class="cp-target"> <a href="#target-expl-si"><abbr title="Web Server Implementor">SI</abbr></a> </span>
|
|
<span class="cp-target"> <a href="#target-expl-ss"><abbr title="Server-side Engines">SS</abbr></a> </span>
|
|
</div>
|
|
|
|
<p class="cp">This checkpoint is on the verge of the server side, and is added here
|
|
as a proof of concept that the content itself can be used to enhance configuration held by
|
|
and information sent by the <acronym title="Hypertext Transfer Protocol">HTTP</acronym>
|
|
server.
|
|
</p>
|
|
|
|
<ol>
|
|
<li class="cp-prov">
|
|
<p><span class="cp-title">Convert (meta)data into
|
|
<acronym title="Hypertext Transfer Protocol">HTTP</acronym> information</span>
|
|
<span class="cp-target"> <a href="#target-expl-si"><abbr title="Web Server Implementor">SI</abbr></a> </span>
|
|
<span class="cp-target"> <a href="#target-expl-ss"><abbr title="Server-side Engines">SS</abbr></a> </span>
|
|
</p>
|
|
|
|
|
|
<p class="cp">Information in or about a resource (data or metadata) may be used by a web server,
|
|
either as a way to adapt its configuration, as extra information that can be sent in the
|
|
<acronym title="Hypertext Transfer Protocol">HTTP</acronym> headers (standard, or custom),
|
|
or as an alternate machine-readable (metadata) version of the resouce.</p>
|
|
|
|
<p class="cp example"><span class="example-good">A few examples</span>:</p>
|
|
<ul class="cp example">
|
|
<li>Extracting
|
|
<ul>
|
|
<li> meta information (<abbr title="example given:">e.g.</abbr> language, author,
|
|
the Dublin Core set of information) from HTML documents</li>
|
|
<li>the content type from the HTML meta tag</li>
|
|
<li>metadata embedded in images</li>
|
|
</ul>
|
|
</li>
|
|
<li>and
|
|
<ul><li>Using it to serve the resource (<abbr title="example given:">e.g.</abbr>
|
|
the language or the content type can be sent in the standard HTTP headers
|
|
<a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.17"><code>Content-Type</code></a>
|
|
and <a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.12"><code>Content-Language</code></a>
|
|
([<a href="#ref-RFC2616">RFC2616</a>] sections 14.17 and 14.12)</li>
|
|
<li>Using it to build a metadata database used by the server (<abbr title="example given:">e.g.</abbr> for the indexing policy, for negotiation)</li>
|
|
<li>Generating, on the fly, an alternate machine-readable (metadata) version of the resouce
|
|
(<abbr title="example given:">e.g.</abbr> in <acronym title="Resource Description Format">RDF</acronym>)</li>
|
|
</ul>
|
|
</li>
|
|
</ul>
|
|
</li>
|
|
</ol>
|
|
|
|
|
|
<p class="cp">See also the related <a href="#gl8">
|
|
Guideline 8: Provide useful metadata in addition to content negotiation</a>.</p>
|
|
</div>
|
|
|
|
|
|
<hr />
|
|
<h2><a name="checklists" id="checklists">Tabular checklist of guidelines and checkpoints</a></h2>
|
|
|
|
<p>You may use this table as a quick and convenient tool to assess your
|
|
progress in following the guidelines given in this document.</p>
|
|
|
|
|
|
<table frame="box" rules="all" border="3" class="checklist">
|
|
<tr>
|
|
<th>Number</th>
|
|
<th>Title</th>
|
|
<th><a href="#cp-target">target</a></th>
|
|
<th>yes</th>
|
|
<th>no</th>
|
|
<th>N/A</th>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td colspan="6" class="gl">
|
|
Guideline 1: <a href="#gl1">Choose <acronym title="Uniform Resource Identifier">URI</acronym>s wisely</a>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td><span class="cp-number"><a href="#cp1.1">1.1</a></span></td>
|
|
<td><span class="cp-title">Short <acronym title="Uniform Resource Identifier">URI</acronym>s </span></td>
|
|
<td><span class="cp-target"><a href="#target-expl-ss"><abbr title="Server-side Engines">SS</abbr></a></span>
|
|
|
|
<span class="cp-target"> <a href="#target-expl-cm"><abbr title="Content Manager">CM</abbr></a> </span></td>
|
|
<td></td><td></td><td></td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td><span class="cp-number"><a href="#cp1.2">1.2</a></span></td>
|
|
<td><span class="cp-title"><acronym title="Uniform Resource Identifier">URI</acronym> case policy</span></td>
|
|
<td><span class="cp-target"><a href="#target-expl-ss"><abbr title="Server-side Engines">SS</abbr></a></span>
|
|
<span class="cp-target"> <a href="#target-expl-cm"><abbr title="Content Manager">CM</abbr></a> </span>
|
|
</td>
|
|
<td></td><td></td><td></td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td colspan="6" class="gl">
|
|
Guideline 2: <a href="#gl2">Allow <acronym title="Uniform Resource Identifier">URI</acronym> management</a>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td><span class="cp-number"><a href="#cp2.1">2.1</a></span></td>
|
|
<td><span class="cp-title"><acronym title="Uniform Resource Identifier">URI</acronym> mapping</span></td>
|
|
<td><span class="cp-target"> <a href="#target-expl-si"><abbr title="Web Server Implementor">SI</abbr></a> </span></td>
|
|
<td></td><td></td><td></td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td><span class="cp-number"><a href="#cp2.2">2.2</a></span></td>
|
|
<td><span class="cp-title">Standard redirects</span></td>
|
|
<td><span class="cp-target"> <a href="#target-expl-si"><abbr title="Web Server Implementor">SI</abbr></a> </span>
|
|
|
|
<span class="cp-target"> <a href="#target-expl-cm"><abbr title="Content Manager">CM</abbr></a> </span></td>
|
|
<td></td><td></td><td></td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td colspan="6" class="gl">
|
|
Guideline 3: <a href="#gl3">Use independent <acronym title="Uniform Resource Identifier">URI</acronym>s</a></td>
|
|
</tr>
|
|
|
|
|
|
<tr>
|
|
<td><span class="cp-number"><a href="#cp3.1">3.1</a></span> </td>
|
|
<td><span class="cp-title">Technology-independent
|
|
<acronym title="Uniform Resource Identifier">URI</acronym>s</span></td>
|
|
<td> <span class="cp-target"> <a href="#target-expl-ss"><abbr title="Server-side Engines">SS</abbr></a> </span>
|
|
<span class="cp-target"> <a href="#target-expl-cm"><abbr title="Content Manager">CM</abbr></a> </span>
|
|
</td>
|
|
|
|
<td></td><td></td><td></td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td><span class="cp-number"><a href="#cp3.2">3.2</a></span> </td>
|
|
<td><span class="cp-title">Identification and Session mechanisms</span></td>
|
|
<td>
|
|
<span class="cp-target"> <a href="#target-expl-ss"><abbr title="Server-side Engines">SS</abbr></a> </span>
|
|
|
|
<span class="cp-target"> <a href="#target-expl-cm"><abbr title="Content Manager">CM</abbr></a> </span>
|
|
</td>
|
|
<td></td><td></td><td></td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td colspan="6" class="gl">
|
|
Guideline 4: <a href="#gl4">Use standard redirects
|
|
for content that changes</a>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td> <span class="cp-number"><a href="#cp4.1">4.1</a></span></td>
|
|
<td><span class="cp-title">Standard redirects for changing content</span></td>
|
|
<td>
|
|
<span class="cp-target"> <a href="#target-expl-ss"><abbr title="Server-side Engines">SS</abbr></a> </span>
|
|
<span class="cp-target"> <a href="#target-expl-cm"><abbr title="Content Manager">CM</abbr></a> </span></td>
|
|
<td></td><td></td><td></td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td><span class="cp-number"><a href="#cp4.2">4.2</a></span></td>
|
|
<td> <span class="cp-title"><acronym title="the Hypertext Transfer Protocol">HTTP</acronym> <code>410 Gone</code></span></td>
|
|
<td> <span class="cp-target"><a href="#target-expl-cm"><abbr title="Content Manager">CM</abbr></a> </span>
|
|
<span class="cp-target"> <a href="#target-expl-si"><abbr title="Web Server Implementor">SI</abbr></a> </span></td>
|
|
<td></td><td></td><td></td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td colspan="6" class="gl">
|
|
Guideline 5: <a href="#gl5">Provide indexing agents
|
|
with useful information</a></td>
|
|
</tr>
|
|
|
|
|
|
<tr>
|
|
<td><span class="cp-number"><a href="#cp5.1">5.1</a></span></td>
|
|
<td><span class="cp-title">Indexing policy</span></td>
|
|
<td><span class="cp-target"> <a href="#target-expl-cm">
|
|
<abbr title="Content Manager">CM</abbr></a> </span>
|
|
</td>
|
|
<td></td><td></td><td></td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td><span class="cp-number"><a href="#cp5.2">5.2</a></span></td>
|
|
<td><span class="cp-title"> <code>Content-Location</code> </span></td>
|
|
<td> <span class="cp-target"> <a href="#target-expl-si"><abbr title="Web Server Implementor">SI</abbr></a> </span>
|
|
<span class="cp-target"> <a href="#target-expl-ss"><abbr title="Server-side Engines">SS</abbr></a> </span>
|
|
<span class="cp-target"> <a href="#target-expl-cm">
|
|
<abbr title="Content Manager">CM</abbr></a> </span>
|
|
</td>
|
|
<td></td><td></td><td></td>
|
|
</tr>
|
|
|
|
|
|
<tr>
|
|
<td><span class="cp-number"><a href="#cp5.3">5.3</a></span>
|
|
</td>
|
|
<td> <span class="cp-title"><code>Content-Md5</code>
|
|
</span></td>
|
|
<td><span class="cp-target"> <a href="#target-expl-si"><abbr title="Web Server Implementor">SI</abbr></a> </span>
|
|
<span class="cp-target"> <a href="#target-expl-ss"><abbr title="Server-side Engines">SS</abbr></a> </span></td>
|
|
<td></td><td></td><td></td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td colspan="6" class="gl">
|
|
Guideline 6: <a href="#gl6">Provide appropriate
|
|
caching information</a>
|
|
</td>
|
|
</tr>
|
|
|
|
|
|
<tr>
|
|
<td><span class="cp-number"><a href="#cp6.1">6.1</a></span></td>
|
|
<td><span class="cp-title">Cache-related
|
|
<acronym title="the Hypertext Transfer Protocol">HTTP</acronym> headers</span></td>
|
|
<td>
|
|
<span class="cp-target"> <a href="#target-expl-si"><abbr title="Web Server Implementor">SI</abbr></a> </span>
|
|
|
|
<span class="cp-target"> <a href="#target-expl-ss"><abbr title="Server-side Engines">SS</abbr></a> </span></td>
|
|
<td></td><td></td><td></td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td><span class="cp-number"><a href="#cp6.2">6.2</a></span></td>
|
|
<td><span class="cp-title">Cache policy</span>
|
|
</td>
|
|
<td>
|
|
<span class="cp-target"> <a href="#target-expl-si"><abbr title="Web Server Implementor">SI</abbr></a> </span>
|
|
<span class="cp-target"> <a href="#target-expl-cm"><abbr title="Content Manager">CM</abbr></a> </span>
|
|
</td>
|
|
|
|
<td></td><td></td><td></td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td><span class="cp-number"><a href="#cp6.3">6.3</a></span></td>
|
|
<td> <span class="cp-title">Caching generated content</span></td>
|
|
<td> <span class="cp-target"> <a href="#target-expl-ss"><abbr title="Server-side Engines">SS</abbr></a> </span></td>
|
|
<td></td><td></td><td></td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td><span class="cp-number"><a href="#cp6.4">6.4</a></span></td>
|
|
<td><span class="cp-title"><acronym title="the Hypertext Transfer Protocol">HTTP</acronym> HEAD
|
|
and <acronym title="the Hypertext Transfer Protocol">HTTP</acronym> GET</span></td>
|
|
<td><span class="cp-target"> <a href="#target-expl-si"><abbr title="Web Server Implementor">SI</abbr></a> </span>
|
|
</td>
|
|
<td></td><td></td><td></td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td colspan="6" class="gl">
|
|
Guideline 7: <a href="#gl7">Server-driven content
|
|
negotiation</a>
|
|
</td>
|
|
</tr>
|
|
|
|
|
|
<tr>
|
|
<td><span class="cp-number"><a href="#cp7.1">7.1</a></span></td>
|
|
<td> <span class="cp-title">Format negotiation</span></td>
|
|
<td>
|
|
<span class="cp-target"> <a href="#target-expl-si"><abbr title="Web Server Implementor">SI</abbr></a> </span>
|
|
<span class="cp-target"> <a href="#target-expl-cm"><abbr title="Content Manager">CM</abbr></a> </span>
|
|
</td>
|
|
<td></td><td></td><td></td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td><span class="cp-number"><a href="#cp7.2">7.2</a></span></td>
|
|
<td> <span class="cp-title">Language negotiation</span></td>
|
|
<td> <span class="cp-target"> <a href="#target-expl-si"><abbr title="Web Server Implementor">SI</abbr></a> </span>
|
|
</td>
|
|
<td></td><td></td><td></td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td colspan="6" class="gl">
|
|
Guideline 8: <a href="#gl8">Provide useful metadata
|
|
in addition to content negotiation</a>
|
|
</td>
|
|
</tr>
|
|
|
|
|
|
<tr>
|
|
<td><span class="cp-number"><a href="#cp8.1">8.1</a></span></td>
|
|
<td><span class="cp-title">Variants of (X)HTML documents</span></td>
|
|
<td>
|
|
<span class="cp-target"> <a href="#target-expl-ss"><abbr title="Server-side Engines">SS</abbr></a> </span>
|
|
<span class="cp-target"> <a href="#target-expl-cm"><abbr title="Content Manager">CM</abbr></a> </span>
|
|
</td>
|
|
|
|
<td></td><td></td><td></td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td><span class="cp-number"><a href="#cp8.2">8.2</a></span></td>
|
|
<td><span class="cp-title">Navigation among (X)HTML documents</span></td>
|
|
<td><span class="cp-target"> <a href="#target-expl-cm"><abbr title="Content Manager">CM</abbr></a> </span>
|
|
</td>
|
|
<td></td><td></td><td></td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td colspan="6" class="gl">
|
|
Guideline 9: <a href="#gl9">Provide default and
|
|
fall-back solutions</a>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td><span class="cp-number"><a href="#cp9.1">9.1</a></span></td>
|
|
<td> <span class="cp-title">When negotiation fails </span></td>
|
|
<td> <span class="cp-target"> <a href="#target-expl-si"><abbr title="Web Server Implementor">SI</abbr></a> </span>
|
|
|
|
<span class="cp-target"> <a href="#target-expl-ss"><abbr title="Server-side Engines">SS</abbr></a> </span></td>
|
|
<td></td><td></td><td></td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td><span class="cp-number"><a href="#cp9.2">9.2</a></span></td>
|
|
<td><span class="cp-title"><acronym title="Hypertext Transfer Protocol">HTTP</acronym> error messages body</span>
|
|
</td>
|
|
<td> <span class="cp-target"> <a href="#target-expl-si"><abbr title="Web Server Implementor">SI</abbr></a> </span>
|
|
<span class="cp-target"> <a href="#target-expl-ss"><abbr title="Server-side Engines">SS</abbr></a> </span></td>
|
|
<td></td><td></td><td></td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td colspan="6" class="gl">
|
|
Guideline 10: <a href="#gl10">Serve resources with correct
|
|
content-type and character encoding information</a>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td><span class="cp-number"><a href="#cp10.1">10.1</a></span></td>
|
|
<td><span class="cp-title"><code>Content-type</code></span></td>
|
|
<td>
|
|
<span class="cp-target"> <a href="#target-expl-si"><abbr title="Web Server Implementor">SI</abbr></a> </span>
|
|
|
|
<span class="cp-target"> <a href="#target-expl-ss"><abbr title="Server-side Engines">SS</abbr></a> </span>
|
|
<span class="cp-target"> <a href="#target-expl-cm"><abbr title="Content Manager">CM</abbr></a> </span>
|
|
</td>
|
|
<td></td><td></td><td></td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td><span class="cp-number"><a href="#cp10.2">10.2</a></span></td>
|
|
<td> <span class="cp-title">Character Encoding</span></td>
|
|
<td>
|
|
<span class="cp-target"> <a href="#target-expl-si"><abbr title="Web Server Implementor">SI</abbr></a> </span>
|
|
|
|
<span class="cp-target"> <a href="#target-expl-ss"><abbr title="Server-side Engines">SS</abbr></a> </span>
|
|
<span class="cp-target"> <a href="#target-expl-cm"><abbr title="Content Manager">CM</abbr></a> </span>
|
|
</td>
|
|
|
|
<td></td><td></td><td></td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td colspan="6" class="gl">
|
|
Guideline 11: <a href="#gl11">Use flexible technology instead of
|
|
client sniffing/blocking</a>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td><span class="cp-number"><a href="#cp11.1">11.1</a></span></td>
|
|
<td><span class="cp-title">Avoid agent sniffing</span></td>
|
|
<td><span class="cp-target"> <a href="#target-expl-ss"><abbr title="Server-side Engines">SS</abbr></a> </span>
|
|
|
|
<span class="cp-target"> <a href="#target-expl-cm"><abbr title="Content Manager">CM</abbr></a> </span></td>
|
|
|
|
<td></td><td></td><td></td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td><span class="cp-number"><a href="#cp11.2">11.2</a></span></td>
|
|
<td><span class="cp-title">Avoid agent blocking</span></td>
|
|
<td><span class="cp-target"> <a href="#target-expl-ss"><abbr title="Server-side Engines">SS</abbr></a> </span>
|
|
<span class="cp-target"> <a href="#target-expl-cm"><abbr title="Content Manager">CM</abbr></a> </span>
|
|
</td>
|
|
<td></td><td></td><td></td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td colspan="6" class="gl">
|
|
Guideline 12: <a href="#gl12">Enrich and Enhance</a>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td><span class="cp-number"><a href="#cp12.1">12.1</a></span></td>
|
|
<td><span class="cp-title">Transfer encoding</span></td>
|
|
<td><span class="cp-target"> <a href="#target-expl-si"><abbr title="Web Server Implementor">SI</abbr></a> </span></td>
|
|
<td></td><td></td><td></td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td><span class="cp-number">i<a href="#cp12.2">12.2</a></span></td>
|
|
<td><span class="cp-title">From (meta)data to Server information </span></td>
|
|
<td> <span class="cp-target"> <a href="#target-expl-si"><abbr title="Web Server Implementor">SI</abbr></a> </span>
|
|
|
|
<span class="cp-target"> <a href="#target-expl-ss"><abbr title="Server-side Engines">SS</abbr></a> </span></td>
|
|
<td></td><td></td><td></td>
|
|
</tr>
|
|
|
|
|
|
</table>
|
|
|
|
|
|
|
|
<hr />
|
|
|
|
<h2><a name="acknowledgments" id="acknowledgments">Acknowledgments</a></h2>
|
|
|
|
<p>The editor would like to thank the following W3C Team members for the initial input
|
|
and their collaboration in writing this document.</p>
|
|
<ul>
|
|
<li><a href="http://www.w3.org/People/carine/">Carine Bournez</a>, W3C</li>
|
|
<li><a href="http://www.w3.org/People/karl/">Karl Dubost</a>, W3C</li>
|
|
<li>Ted Guild, W3C</li>
|
|
<li><a href="http://www.w3.org/People/Lafon/">Yves Lafon</a>, W3C</li>
|
|
</ul>
|
|
|
|
<p>The editor would also like to thank the following people for their early review of
|
|
the document:</p>
|
|
<ul>
|
|
<li>Henri Fallon, W3C</li>
|
|
<li><a href="http://www.w3.org/People/Dom/">Dominique Hazael-Massieux</a>, W3C</li>
|
|
<li><a href="http://www.w3.org/People/Jacobs/">Ian Jacobs</a>, W3C</li>
|
|
</ul>
|
|
<h2><a name="references" id="references">References</a></h2>
|
|
|
|
<dl>
|
|
<dt id="ref-RFC1630">RFC1630</dt>
|
|
<dd><cite><a href="http://www.ietf.org/rfc/rfc1630.txt">"Universal
|
|
Resource Identifiers in WWW"</a></cite>, T. Berners-Lee, June 1994.
|
|
Available at http://www.ietf.org/rfc/rfc1630.txt.</dd>
|
|
|
|
<dt id="ref-RFC2396">RFC2396</dt>
|
|
|
|
<dd><cite><a href="http://www.ietf.org/rfc/rfc2396.txt">"Uniform
|
|
Resource Identifiers (URI): Generic Syntax"</a></cite>,
|
|
T. Berners-Lee et al., August 1998. Available at
|
|
http://www.ietf.org/rfc/rfc2396.txt.</dd>
|
|
|
|
<dt id="ref-RFC2616">RFC2616</dt>
|
|
|
|
<dd><cite><a href="http://www.ietf.org/rfc/rfc2616.txt">"Hypertext
|
|
Transfer Protocol -- HTTP/1.1"</a></cite>, R. Fielding et al., June 1999.
|
|
Available at http://www.ietf.org/rfc/rfc2616.txt.</dd>
|
|
|
|
<dt id="ref-RFC2617">RFC2617</dt>
|
|
|
|
<dd><cite><a href="http://www.ietf.org/rfc/rfc2617.txt">"HTTP
|
|
Authentication: Basic and Digest Access Authentication"</a></cite>,
|
|
J. Franks et al., June 1999. Available at
|
|
http://www.ietf.org/rfc/rfc2617.txt.</dd>
|
|
|
|
<dt id="ref-RFC2119">RFC2119</dt>
|
|
<dd><cite><a href="http://www.ietf.org/rfc/rfc2119.txt">
|
|
"Key words for use in RFCs to Indicate Requirement Levels"</a>,
|
|
S. Bradner, March 1997. Available at
|
|
http://www.ietf.org/rfc/rfc2119.txt</cite>
|
|
</dd>
|
|
|
|
<dt id="ref-RFC2109">RFC2109</dt>
|
|
<dd><cite><a href="http://www.ietf.org/rfc/rfc2109.txt">
|
|
"HTTP State Management Mechanism"</a></cite>,
|
|
D. Kristol, L. Montulli, February 1997. Available at
|
|
http://www.ietf.org/rfc/rfc2109.txt.</dd>
|
|
|
|
<dt id="ref-HTML401"><a id="ref-HTML">HTML 4.01</a></dt>
|
|
|
|
<dd><cite><a href="http://www.w3.org/TR/1999/REC-html401-19991224/">"HTML
|
|
4.01 Specification"</a></cite>, Dave Raggett, Arnaud Le Hors, Ian Jacobs,
|
|
24 December 1999. Available at
|
|
http://www.w3.org/TR/1999/REC-html401-19991224/.</dd>
|
|
|
|
<dt id="ref-COOLURIs">COOLURIs</dt>
|
|
<dd><cite><a href="http://www.w3.org/Provider/Style/URI.html">"Cool URIs don't change"</a></cite>,
|
|
Tim Berners-Lee, 1998. Available at http://www.w3.org/Provider/Style/URI.html.</dd>
|
|
|
|
<dt id="ref-CUAP">CUAP</dt>
|
|
<dd><cite><a href="http://www.w3.org/TR/cuap">"Common User Agent Problems"</a></cite>,
|
|
Karl Dubost, 28 January 2003. Available at
|
|
http://www.w3.org/TR/2003/NOTE-cuap-20030128. Latest version at
|
|
http://www.w3.org/TR/cuap.</dd>
|
|
|
|
<dt id="ref-WSFAQ">WSFAQ</dt>
|
|
|
|
<dd><cite><a href="http://www.w3.org/Security/Faq/www-security-faq">"The
|
|
World Wide Web Security <abbr title="Frequently
|
|
Asked Questions">FAQ</abbr>"</a></cite>, Lincoln
|
|
D. Stein & John N. Stewart. Available at
|
|
http://www.w3.org/Security/Faq/www-security-faq.</dd>
|
|
|
|
<dt id="ref-ROBOTSPROTO">ROBOTSPROTO</dt>
|
|
<dd><cite><a href="http://www.robotstxt.org/wc/norobots.html">"
|
|
A Standard for Robot Exclusion"</a></cite>, Martijn Koster et. al.,
|
|
30 June 1994. Available at http://www.robotstxt.org/wc/norobots.html.</dd>
|
|
|
|
</dl>
|
|
|
|
|
|
<hr />
|
|
|
|
<div id="disclaimer">
|
|
<address class="author">
|
|
Created by <a href="http://www.w3.org/People/olivier/">Olivier Thereaux</a>,
|
|
<a href="mailto:ot@w3.org"><ot@w3.org></a>.
|
|
</address>
|
|
</div>
|
|
</body>
|
|
</html>
|