You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
384 lines
18 KiB
384 lines
18 KiB
<?xml version="1.0" encoding="iso-8859-1"?>
|
|
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
|
|
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
|
|
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
|
|
<head>
|
|
<meta http-equiv="content-type" content="text/html; charset=iso-8859-1" />
|
|
<title>Making your website valid: a step-by-step guide - W3C QA</title>
|
|
<meta name="Keywords" content="qa, quality assurance, conformance, validity, test suite, validator, html, css" />
|
|
<meta name="Description" content="W3C QA - An article to help web authors and webmasters improve the quality of an existing website" />
|
|
|
|
<link rel="schema.DC" href="http://purl.org/dc" />
|
|
<meta name="DC.Subject" lang="en" content="validator, html, css" />
|
|
|
|
<meta name="DC.Title" lang="en" content="Making your Web site valid: a step by step guide" />
|
|
<meta name="DC.Description.Abstract" lang="en" content="An article to help web authors and webmasters improve the quality of an existing website" />
|
|
<meta name="DC.Date.Created" content="2002-06-24" />
|
|
<meta name="DC.Language" scheme="RFC1766" content ="en" />
|
|
<meta name="DC.Creator" content="Olivier Thereaux" />
|
|
<meta name="DC.Publisher" content="W3C - World Wide Web Consortium - http://www.w3.org" />
|
|
<meta name="DC.Rights" content="http://www.w3.org/Consortium/Legal/copyright-documents-19990405" />
|
|
|
|
<link rel="Stylesheet" href="/QA/2002/12/qa4.css" />
|
|
</head>
|
|
<body>
|
|
|
|
<!-- Header -->
|
|
<div id="Logo">
|
|
<a href="http://www.w3.org/"><img alt="W3C" src="/Icons/WWW/w3c_home" /></a>
|
|
<a href="http://www.w3.org/QA/"><img alt="QA" src="/QA/images/qa" width="161" height="48" /></a>
|
|
|
|
<!-- <div id="Header">Be strict to be cool</div> -->
|
|
<div><map name="introLinks" id="introLinks" title="Introductory Links">
|
|
<div class="banner"> <a
|
|
class="bannerLink" title="W3C Activities" accesskey="A"
|
|
href="/Consortium/Activities">Activities</a> | <a class="bannerLink"
|
|
title="Technical Reports and Recommendations" accesskey="T"
|
|
href="/TR/">Technical Reports</a> | <a class="bannerLink"
|
|
title="Alphabetical Site Index" accesskey="S"
|
|
href="/Help/siteindex">Site Index</a> | <a class="bannerLink"
|
|
title="Help for new visitors" accesskey="N"
|
|
href="/2002/03/new-to-w3c">New Visitors</a> | <a
|
|
class="bannerLink" title="About W3C" accesskey="B"
|
|
href="/Consortium/">About W3C</a> | <a class="bannerLink"
|
|
title="Join W3C" accesskey="J"
|
|
href="/Consortium/Prospectus/Joining">Join W3C</a></div>
|
|
</map></div>
|
|
</div>
|
|
|
|
|
|
<!-- menuRight -->
|
|
<div id="Menu">
|
|
<p><a href="#abstract">Abstract</a><span class="dot">·</span>
|
|
<a href="#problem">Difficult Decision</a><span class="dot">·</span>
|
|
<a href="#wrongway">Bad Approach</a><span class="dot">·</span>
|
|
<a href="#hardway">Hard Approach</a><span class="dot">·</span>
|
|
<a href="#suggestedway">Suggested Approach</a><span class="dot">·</span>
|
|
<a href="#logvaltut">Practical Case</a><span class="dot">·</span>
|
|
</p>
|
|
<hr />
|
|
<p class="navhead">Nearby:</p>
|
|
<p><a href="/QA/Tools/Logvalidator">LogValidator Home</a><span class="dot">·</span>
|
|
<a href="/QA/"><abbr title="Quality Assurance">QA</abbr> Homepage</a><span class="dot">·</span>
|
|
<a href="/QA/#resources">QA Resources</a><span class="dot">·</span>
|
|
<a href="/QA/IG/">QA <abbr title="Interest Group">IG</abbr></a><span class="dot">·</span>
|
|
</p></div>
|
|
|
|
<!-- content -->
|
|
<div id="Content">
|
|
<h1>Making your website valid: a step by step guide.</h1>
|
|
|
|
<h2 id="abstract">Abstract</h2>
|
|
|
|
<p>In this article we will imagine a situation when a webmaster wishes to make a
|
|
whole website compliant with regards to web standards (valid (X)HTML, valid CSS, etc.).
|
|
This article describes the usual ways to approach this problem, as well as
|
|
suggesting a painless approach using a new tool developed by W3C's
|
|
QA activity.</p>
|
|
|
|
<h2 id="status">Status</h2>
|
|
|
|
<p>This article has been produced as part of the <acronym
|
|
title="World Wide Web Consortium">W3C</acronym> <a href="../../IG/">Quality
|
|
Assurance Interest Group</a> work. Please send any public feedback on it to
|
|
the <a href="http://lists.w3.org/Archives/Public/public-evangelist/"><strong>publicly
|
|
archived</strong></a> mailing list <a
|
|
href="mailto:public-evangelist@w3.org">public-evangelist@w3.org</a> or for private feedback to <a
|
|
href="mailto:ot@w3.org">ot@w3.org</a>.</p>
|
|
|
|
<p>This document has been <a href="http://www.w3.org/2003/03/Translations/byTechnology?technology=Step-by-step">translated in other languages</a>.</p>
|
|
|
|
|
|
<h2 id="problem">Improving an existing site: a difficult decision</h2>
|
|
<p>
|
|
Creating a Web site --one that complies with standards such as HTML,
|
|
CSS, or the Web Accessibility Guidelines --, is the right thing to
|
|
do, and is also a profitable choice.
|
|
</p>
|
|
|
|
<p>
|
|
Guidelines and tools are readily available to help you create a
|
|
Web site that conforms to Web standards, ensuring a broad audience,
|
|
cost-effective development, and easier maintenance.</p>
|
|
|
|
<p>But deciding how to convert an existing site to a standards-compliant format
|
|
is a difficult decision. Your site may have legacy, unmaintained documents
|
|
in multiple formats or may serve a large amount of documents, making it difficult
|
|
to update. Your site may be backed by good design and flexible technologies,
|
|
which will simplify the task, yet in any case updating the site will
|
|
require a resource commitment.</p>
|
|
|
|
<p>However, the method you choose to update determines how many resources
|
|
you'll need to dedicate, and the way you will dedicate them.</p>
|
|
|
|
<p>There are two typical ways to make an existing Web site standards
|
|
compliant: start completely over (the wrong way), or manually validate
|
|
each page (the hard way). For IT managers, neither is very attractive,
|
|
hence making the decision to switch to a standards compliant site
|
|
difficult: it simply does not seem worthy given the amount of work needed.</p>
|
|
|
|
<p>After looking in details at these two approaches (analyzing why they are wrong),
|
|
we will see a third, better one: systematically update one section at a time.</p>
|
|
|
|
<h2 id="wrongway">The wrong way: Re-starting from scratch</h2>
|
|
|
|
<p>The wrong way to improve the quality of an already existing
|
|
site is to delete everything existing, and restart the site
|
|
from scratch.</p>
|
|
|
|
|
|
|
|
<p> This approach may be tempting for the freedom it allows and the opportunity
|
|
to use a clean framework for the beginning. However in addition to the cost
|
|
of a full redesign, rewrite and debug of the site,
|
|
trying to fix things by beginning over may create more problems,
|
|
starting with <a href="http://www.w3.org/Provider/Style/URI.html">broken links</a>.
|
|
</p>
|
|
|
|
|
|
<h2 id="hardway">The Hard Way: The whole works</h2>
|
|
|
|
<p>The usual way is also the hard way : the site administrator lists
|
|
all resources available (provided the technologies used make this feasible),
|
|
and runs those, either one by one, or in batch, through "validating" technologies,
|
|
like HTML validation, CSS validation, spell checking, or through corrective
|
|
filters (such as HTML Tidy).</p>
|
|
|
|
<p>This approach has a lot of advantages, and does not include any specific risk
|
|
as did the previous method. However, especially for sites with thousands of documents,
|
|
it requires an incredible amount of work and can't be achieved, if at all, without
|
|
an excellent organization. Just figuring out "where to start?" is, itself,
|
|
a tricky question when it comes to checking a full site.</p>
|
|
|
|
|
|
<h2 id="suggestedway">A suggested alternative</h2>
|
|
<p>There might be no perfect way to fix a whole site, but some are better,
|
|
or easier, than others.
|
|
Using tools introduced below, we will explain a relatively easy method
|
|
that we believe is good.
|
|
This method has, unfortunately, its limits: it is best used
|
|
with static content, or dynamic/generated content if you have control
|
|
over the templates. <br />
|
|
If you do not have control over those and they produce invalid markup,
|
|
then we encourage you to send a bug report to the software vendors,
|
|
or to the service provider managing your content.</p>
|
|
|
|
<h3>Step by Step approach</h3>
|
|
|
|
<p>"The Hard Way", would certainly be the best method to
|
|
fix an existing site for someone with unlimited resources dedicated to this task.
|
|
In the "real world", unless the site is very small, this approach is not realistic,
|
|
except if you make the process gradual, and ordered.
|
|
</p>
|
|
|
|
<p>
|
|
With careful planning and an extended time-line, you can eventually clean up the site.
|
|
However, this process requires careful management, so that a given number of files
|
|
are cleaned up at regular intervals and all new resources are valid.</p>
|
|
|
|
<h4>Do the math</h4>
|
|
<p>The number of resources you will clean up during each period depends upon
|
|
the volume of content (and the ratio of invalid documents).
|
|
Ask yourself the following questions when allocating resources:</p>
|
|
|
|
<ul>
|
|
<li>How much time can you dedicate to cleaning up invalid content?</li>
|
|
<li>How long does it take for you (or the people assigned for this
|
|
task) to fix one invalid document?</li>
|
|
</ul>
|
|
|
|
<h4>No deadline?</h4>
|
|
<p>
|
|
We have not mentioned any deadlines for this cleaning-up work.
|
|
In most cases you probably have no idea what the initial ratio of invalid
|
|
content you have, and you may even not know how many documents you have.
|
|
Without this information, how can you estimate how long it will take?</p>
|
|
|
|
<p>Of course, like every project this cleaning project needs limits and
|
|
deadlines. One limit you can set before starting the project is :
|
|
"what is the acceptable invalid ratio for my site?"
|
|
If you have a small or moderate-sized site, "zero" may be your answer,
|
|
however we suggest you choose a more modest figure if you have a big site,
|
|
10% for example.</p>
|
|
|
|
<p>Once you have set the limit and the dedicated resources to the cleaning
|
|
project, the first few rounds of the "step by step method" will give you an
|
|
idea of how long it will take to reach the limit. You will then be able to
|
|
reconsider the amount of time dedicated if necessary, or your targeted
|
|
"quality ratio".</p>
|
|
|
|
<h3>Traffic-based approach</h3>
|
|
|
|
<p>Here is a simple example to explain the traffic-based approach.</p>
|
|
|
|
<p>Imagine you have 4 documents on a site
|
|
(we'll call them 1,2,3 and 4), accounting for, respectively 40%, 30%,
|
|
20% and 10% of the traffic for this site.</p>
|
|
|
|
<p>Now imagine that documents 1 and 4 are invalid. that's 50% of the
|
|
documents, and 50% of the traffic, and that's bad. If you have time to
|
|
fix both documents, fine, but if you have time to fix only one?</p>
|
|
|
|
<p>
|
|
The usual approach would be to just fix one so only 25% of the documents are invalid.
|
|
The traffic approach tells you to choose document 1, fix it,
|
|
and go up to 90% of the traffic being valid.</p>
|
|
|
|
<p>This is a cost-efficient approach to the problem :
|
|
given a limited amount of resources, you want to focus on the
|
|
improvements that will have the best results.</p>
|
|
|
|
<h4>Estimating the Quality of a site using the Traffic approach</h4>
|
|
<p>The traffic based approach is also a more accurate tool to estimate the
|
|
quality of a website. As we will see in the following section, given a site
|
|
(with an unknown number of documents served, but known logs for a given period
|
|
of time), the LogValidator sorts the documents served during this time by
|
|
popularity (traffic), then tries to find X invalid documents among the most
|
|
popular ones.</p>
|
|
|
|
<p>Now, let's imagine a case where 100 documents have been served. The tool
|
|
needs to go through 20 documents to find 2 (we are setting X=2 for the example)
|
|
that were invalid HTML documents. These 20 documents account for 45% of the
|
|
traffic. The table below give the estimations of the quality of the site (with
|
|
regards to HTML validity), with a "file approach" and with a "traffic approach".
|
|
</p>
|
|
|
|
<table style="border-style:solid;border-color:black;border-width:1px">
|
|
<tr>
|
|
<td></td><th colspan="2">Using the file approach</th> <th colspan="2">Using the traffic approach</th>
|
|
</tr>
|
|
<tr>
|
|
<td></td><th>Lower estimate</th><th>Upper estimate</th><th>Lower estimate</th><th>Upper estimate</th>
|
|
</tr>
|
|
<tr>
|
|
<th>Before validating the 2 documents</th>
|
|
<td>18%</td><td>98%</td>
|
|
<td>40.5%<br />(45*18/20)%</td><td>95.5%<br />((45*18/20)% +55%)</td>
|
|
</tr>
|
|
<tr>
|
|
<th>After validating the 2 document</th>
|
|
<td>20%</td><td>100%</td>
|
|
<td>45%</td><td>100%</td>
|
|
</tr>
|
|
</table>
|
|
|
|
<p>The "file based" estimations are loose and inaccurate, whereas the traffic-based estimations
|
|
are more accurate. Once you have fixed the 2 documents, and re-start this process, the traffic
|
|
based estimates get more accurate (and higher, since more and more of the traffic is valid!).
|
|
</p>
|
|
|
|
|
|
|
|
<h2 id="logvaltut">Practical Case: Using the LogValidator and other tools
|
|
to cleanup your site's markup</h2>
|
|
<p>Here we will describe a practical example of this "cleanup strategy"
|
|
using a limited set of (free) tools to validate a Web site's HTML.
|
|
As stated before, HTML is just an example, you can use
|
|
the techniques described (and some of the tools) for many other cases.</p>
|
|
|
|
<h3>Get the tools</h3>
|
|
<p>The <a href="http://www.w3.org/QA/Tools/LogValidator/">LogValidator</a>
|
|
will be the primary (if not the only) tool you will need. You can
|
|
<a href="http://www.w3.org/QA/Tools/LogValidator/#download">download it</a>
|
|
freely, and install it on any system running Perl (your Web server
|
|
certainly does). </p>
|
|
|
|
<p>You will also need a few <a href="http://www.w3.org/QA/Tools/LogValidator/Manual#id01">
|
|
other components</a> that the LogValidator depends on to run smoothly. They
|
|
can all be downloaded and installed free of charge.</p>
|
|
|
|
<p>If you are not an HTML expert, and cleaning up code is not your hobby, you
|
|
can use <a href="http://tidy.sourceforge.net/">tidy</a> to do it for you. It's
|
|
a (semi-)automatic markup cleanup tool, and is available for many platforms.
|
|
</p>
|
|
|
|
<p>The LogValidator will check your documents through the online
|
|
<a href="http://validator.w3.org">Markup validator</a> at W3C. If you have a big site,
|
|
or want to save bandwidth, you can install it locally, too.</p>
|
|
|
|
<h3>Running the LogValidator</h3>
|
|
<p>We will assume you have installed at least the LogValidator, and at least read the
|
|
<a href="http://www.w3.org/QA/Tools/LogValidator/Manual">Manual</a> carefully.</p>
|
|
|
|
<p>You first need to set up a configuration file to match your server configuration.
|
|
To do so you (mainly) need access to a log file for your Web server (this will be used
|
|
to compute traffic statistics). You can easily create the configuration file by copying the
|
|
sample configuration file distributed with the tool, and edit it as explained in the
|
|
<a href="http://www.w3.org/QA/Tools/LogValidator/Manual">Manual</a>.</p>
|
|
|
|
<p>Once this is done, you can run the LogValidator. Don't set the number of results
|
|
too high, 10 should be enough to begin with.</p>
|
|
|
|
<p>You should get back a list of your 10 "most popular" invalid documents. Take some
|
|
time to analyze them. You can run them through the
|
|
<a href="http://validator.w3.org">Markup Validator</a> to check where the bad HTML is.
|
|
If you are using templates, does it seem like there is something wrong with them?
|
|
Can you check the template with the validator?</p>
|
|
|
|
<p>Next, fix the first documents on the list.Remember, those are the
|
|
most popular documents on your site that are not valid, so it is an important step!
|
|
This first step may be difficult, especially if "big" documents are in the list.
|
|
<a href="http://tidy.sourceforge.net/">tidy</a> can help cleanup your code.
|
|
You can also search the Web for guidelines for fixing Web pages and find people to assist you.</p>
|
|
|
|
<p>For example, if you don't understand the output of the validator,
|
|
check out its <a href="http://validator.w3.org/docs/">documentation</a>
|
|
or contact the public list
|
|
<a href="http://lists.w3.org/Archives/Public/www-validator/">www-validator@w3.org</a>.
|
|
</p>
|
|
|
|
<p>Done? Congratulations! You can now set up the LogValidator to run every week,
|
|
day, or month (see the <a href="http://www.w3.org/QA/Tools/LogValidator/Manual#tips">tip
|
|
</a> to do this), and start again with other documents...</p>
|
|
|
|
<p>Keep up the good work. If you have a really big site made of static documents,
|
|
chances are you won't reach 100% of valid pages, but that's OK. After some
|
|
time, the invalid pages that are left will account for a tiny portion of your site.</p>
|
|
|
|
<h2>Credits</h2>
|
|
<p>Thanks a lot to Kim Nylander for a thorough review of this document
|
|
and many invaluable suggestions.<br />
|
|
Thanks to Karl Dubost and Dominique Hazael-Massieux, W3C, for their
|
|
comments and suggestions.</p>
|
|
|
|
<h2>Contact</h2>
|
|
<address>
|
|
Olivier Thereaux, <a href="http://www.w3.org">W3C</a> :
|
|
<<a href="http://www.w3.org/People/olivier/">ot@w3.org</a>>
|
|
</address>
|
|
|
|
</div>
|
|
<!-- Footer -->
|
|
|
|
<hr />
|
|
|
|
<div class="disclaimer">
|
|
<a href="http://validator.w3.org/check/referer"><img
|
|
src="http://validator.w3.org/images/vxhtml10" alt="Valid XHTML 1.0!"
|
|
height="31" width="88" /></a>
|
|
|
|
<p class="author">
|
|
Created Date: 2002-06-24 by <a href="mailto:ot@w3.org">Olivier Thereaux</a><br />
|
|
Last modified $Date: 2011/12/16 02:57:04 $ by $Author: gerald $</p>
|
|
|
|
<p class="policyfooter"><a rel="Copyright"
|
|
href="/Consortium/Legal/ipr-notice#Copyright">Copyright</a> © 2000-2003
|
|
<a href="/"><acronym
|
|
title="World Wide Web Consortium">W3C</acronym></a><sup>®</sup> (<a
|
|
href="http://www.lcs.mit.edu/"><acronym
|
|
title="Massachusetts Institute of Technology">MIT</acronym></a>, <a
|
|
href="http://www.ercim.org/"><acronym
|
|
title="European Research Consortium for Informatics and Mathematics">ERCIM</acronym></a>, <a
|
|
href="http://www.keio.ac.jp/">Keio</a>), All Rights Reserved. W3C <a
|
|
href="/Consortium/Legal/ipr-notice#Legal_Disclaimer">liability</a>, <a
|
|
href="/Consortium/Legal/ipr-notice#W3C_Trademarks">trademark</a>, <a
|
|
rel="Copyright" href="/Consortium/Legal/copyright-documents">document use</a>
|
|
and <a rel="Copyright" href="/Consortium/Legal/copyright-software">software
|
|
licensing</a> rules apply. Your interactions with this site are in accordance
|
|
with our <a href="/Consortium/Legal/privacy-statement#Public">public</a> and
|
|
<a href="/Consortium/Legal/privacy-statement#Members">Member</a> privacy
|
|
statements.</p>
|
|
|
|
</div>
|
|
</body>
|
|
</html>
|