You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
321 lines
13 KiB
321 lines
13 KiB
<?xml version="1.0" encoding="utf-8"?>
|
|
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
|
|
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
|
|
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
|
|
<head>
|
|
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
|
|
<title>Ampersands, PHP Sessions and Valid HTML - QA @ W3C</title>
|
|
<meta name="Keywords" content="qa, quality assurance, conformance, validity, test suite, @@Meta_Keywords@@" />
|
|
<meta name="Description" content="W3C QA - Why using PHP sessions causes invalid HTML and XHTML to be generated, and how to fix it." />
|
|
|
|
<link rel="schema.DC" href="http://purl.org/dc" />
|
|
<meta name="DC.Subject" lang="en" content="@@Meta_Keywords@@" />
|
|
<meta name="DC.Title" lang="en" content="Ampersands, PHP Sessions and Valid HTML" />
|
|
<meta name="DC.Description.Abstract" lang="en" content="Why using PHP sessions causes invalid HTML and XHTML to be generated, and how to fix it." />
|
|
<meta name="DC.Date.Created" content="2005-04-15" />
|
|
<meta name="DC.Language" scheme="RFC1766" content ="en" />
|
|
<meta name="DC.Creator" content="David Dorward" />
|
|
<meta name="DC.Publisher" content="W3C - World Wide Web Consortium - http://www.w3.org" />
|
|
<meta name="DC.Rights" content="http://www.w3.org/Consortium/Legal/copyright-documents-19990405" />
|
|
|
|
<link rel="Stylesheet" href="/QA/2002/12/qa4.css" /></head>
|
|
<body>
|
|
|
|
<!-- Header -->
|
|
<div id="Logo">
|
|
<a href="http://www.w3.org/"><img alt="W3C" src="/Icons/WWW/w3c_home" /></a>
|
|
<a href="http://www.w3.org/QA/"><img alt="QA" src="/QA/images/qa" width="161" height="48" /></a>
|
|
|
|
<!-- <div id="Header">Be strict to be cool</div> -->
|
|
<div><map name="introLinks" id="introLinks" title="Introductory Links">
|
|
<div class="banner"> <a
|
|
class="bannerLink" title="W3C Activities" accesskey="A"
|
|
href="/Consortium/Activities">Activities</a> | <a class="bannerLink"
|
|
title="Technical Reports and Recommendations" accesskey="T"
|
|
href="/TR/">Technical Reports</a> | <a class="bannerLink"
|
|
title="Alphabetical Site Index" accesskey="S"
|
|
href="/Help/siteindex">Site Index</a> | <a class="bannerLink"
|
|
title="Help for new visitors" accesskey="N"
|
|
href="/2002/03/new-to-w3c">New Visitors</a> | <a
|
|
class="bannerLink" title="About W3C" accesskey="B"
|
|
href="/Consortium/">About W3C</a> | <a class="bannerLink"
|
|
title="Join W3C" accesskey="J"
|
|
href="/Consortium/Prospectus/Joining">Join W3C</a></div>
|
|
</map></div>
|
|
</div>
|
|
|
|
|
|
<!-- menuRight -->
|
|
<div id="Menu">
|
|
<p><a href="#status">Status</a><span class="dot">·</span>
|
|
<a href="#background">Background</a><span class="dot">·</span>
|
|
<a href="#problem">Problem</a><span class="dot">·</span>
|
|
<a href="#solutions">Solutions</a><span class="dot">·</span>
|
|
</p>
|
|
<hr />
|
|
<p class="navhead">Nearby:</p>
|
|
<p><a href="/QA/"><abbr title="Quality Assurance">QA</abbr> Homepage</a><span class="dot">·</span>
|
|
<a href="/QA/#latest">Latest News</a><span class="dot">·</span>
|
|
<a href="/QA/#resources">QA Resources</a><span class="dot">·</span>
|
|
<a href="/QA/IG/">QA <abbr title="Interest Group">IG</abbr></a><span class="dot">·</span>
|
|
<a href="/QA/WG/">QA <abbr title="Working Group">WG</abbr></a><span class="dot">·</span>
|
|
<a href="/QA/Agenda/">QA Calendar</a><span class="dot">·</span>
|
|
</p></div>
|
|
|
|
<!-- content -->
|
|
<div id="Content">
|
|
|
|
<!-- Your content is starting after this -->
|
|
|
|
<h1>Ampersands, PHP Sessions and Valid HTML</h1>
|
|
|
|
<p>Why using PHP sessions causes invalid HTML and XHTML to be generated, and how to fix it.</p>
|
|
|
|
<h2 id="status">Status of this document</h2>
|
|
<p>This document is an article contributed to the <a href="/QA/IG/">QA
|
|
Interest Group</a>. Feedback, suggestions and corrections are welcome,
|
|
and should be sent to the publicly archived mailing-list
|
|
<a href="http://lists.w3.org/Archives/Public/www-qa/">www-qa</a>.</p>
|
|
|
|
<h3>Credits</h3>
|
|
<dl>
|
|
<dt>Author(s)</dt>
|
|
<dd><a href="http://dorward.me.uk/">David Dorward</a></dd>
|
|
</dl>
|
|
|
|
<h2 id="toc">Table of Contents</h2>
|
|
|
|
<ul>
|
|
<li><a href="#background">Background</a></li>
|
|
<li><a href="#problem">Problem</a></li>
|
|
<li><a href="#solutions">Solutions</a>
|
|
<ul>
|
|
<li><a href="#reference">Outputting a character reference</a></li>
|
|
<li><a href="#separator">Using a different argument separator</a></li>
|
|
<li><a href="#disable">Disable sessions for non-cookie users</a></li>
|
|
</ul></li>
|
|
</ul>
|
|
|
|
<h2 id="background">Background</h2>
|
|
|
|
<p>In HTML (and XHTML, along with other SGML and XML applications)
|
|
certain characters have special meaning, a prime example being <,
|
|
which indicates the beginning of a tag. Such characters cannot be
|
|
simply typed into a document if you wish them to display - otherwise
|
|
how could the user agent tell the difference between
|
|
<code>b<a</code> (meaning <em>b is less than a</em>) and
|
|
<code>b<a</code> (meaning <em>b followed by the start of an
|
|
anchor</em>)?</p>
|
|
|
|
<p>In order to display reserved characters HTML and XHTML provide a
|
|
mechanism called <a
|
|
href="http://www.w3.org/TR/html4/charset.html#h-5.3">character
|
|
references</a>. The syntax of these is:</p>
|
|
|
|
<ol>
|
|
<li>an ampersand</li>
|
|
<li>a "code" for the referenced character</li>
|
|
<li>a semicolon</li>
|
|
</ol>
|
|
|
|
<p>For example, the "less than" character is represented as
|
|
<code>&lt;</code>.</p>
|
|
|
|
<p>Giving the ampersand special meaning makes it, like <, a
|
|
reserved character, so it also needs to be represented by an entity
|
|
for it to be used in a document - <code>&amp;</code></p>
|
|
|
|
<p>Now for a small confession - there are exceptions to these rules,
|
|
although they are not relevant when dealing with the issues caused by
|
|
PHP sessions.</p>
|
|
|
|
<p>HTML and XHTML include blocks of what is called CDATA, where HTML
|
|
special characters no longer have special meaning. Inside such blocks
|
|
character references are no longer processed, so an ampersand must be
|
|
typed as an ampersand, and not as its character reference. In HTML,
|
|
the content of <code><script></code> and
|
|
<code><style></code> elements is CDATA, while in XHTML <a
|
|
href="http://www.w3.org/TR/2002/REC-xhtml1-20020801/#h-4.8">they are
|
|
marked explicitly</a>. You can avoid the problem by placing scripts
|
|
and style sheets in separate files and using <code><link></code>
|
|
and <code><script src="…"></code>.</p>
|
|
|
|
<p>The other exceptions are that sometimes the semi-colon is optional,
|
|
and sometimes ampersands can be represented without being encoded as
|
|
entities. In these situations it is never wrong to represent the
|
|
character as a character reference terminated by a semicolon, so I
|
|
won't go into more detail.</p>
|
|
|
|
<h2 id="problem">Problem</h2>
|
|
|
|
<p>PHP has session handling code built in, this enables data to be
|
|
stored on the server but be associated with a specific user (for,
|
|
roughly, a single visit to the site).</p>
|
|
|
|
<p> To link the data with a user, the website has to hand the user
|
|
agent a token which identifies it. This token is stored in a <a
|
|
href="http://www.cookiecentral.com/faq/#1.1">cookie</a>, but not all
|
|
user agents support cookies, and most of those which do allow them to
|
|
be turned off.</p>
|
|
|
|
<p>PHP provides a fallback mechanism. If it discovers that cookies are
|
|
not accepted by the client, it rewrites every link on the page to
|
|
include that token in a query string. I believe this used to be
|
|
enabled by default, but testing shows that, at least for the Fedora
|
|
package of PHP 4.3.11 (Fedora release 2.4 of that package), it
|
|
isn't. It can be turned by on by setting the <a
|
|
href="http://www.php.net/manual/en/ref.session.php#ini.session.use-trans-sid"><em>session.trans_sid</em></a>
|
|
directive.</p>
|
|
|
|
<p>This is, in theory, a pretty elegant solution to the problem
|
|
(discounting the issues of the token hanging around for third parties
|
|
to hover off public computers, bookmarking, link sharing, etc, etc),
|
|
but the implementation is flawed.</p>
|
|
|
|
<p>For links with no query string, there isn't a problem. PHP appends
|
|
<code>?PHPSESSID=</code> followed by a random hexadecimal number. For
|
|
links that do have a query string PHP appends
|
|
<code>&PHPSESSID=</code>.</p>
|
|
|
|
<p>Ampersand characters used as argument separators pose no problem in
|
|
plain old URLs, however in URLs encoded in HTML they still mean
|
|
<em>start of character reference</em> (subject to the aforementioned
|
|
exceptions, which the above example does not qualify for).</p>
|
|
|
|
<p>Most users won't notice a problem, the majority of user agents are
|
|
rather good at working around mistakes by authors. However, that
|
|
does not mean authors should ignore the problem.</p>
|
|
|
|
<ul>
|
|
|
|
<li>You cannot know that every user agent to visit the page will be
|
|
able to cope with the error</li>
|
|
|
|
<li>If a <a href="http://validator.w3.org/">markup validator</a> flags
|
|
an error on every link it is going to be rather more difficult to find
|
|
errors that could cause you serious problems</li>
|
|
|
|
<li>If you ever plan on writing XHTML and <a
|
|
href="http://www.w3.org/TR/xhtml-media-types/" title="XHTML media
|
|
types">serving your markup as such</a> then rogue ampersands will
|
|
cause the XML parser to give up attempting to handle the code (this is
|
|
a requirement of the XML specification).</li>
|
|
|
|
</ul>
|
|
|
|
<h2 id="solutions">Solutions</h2>
|
|
|
|
<h3 id="reference">Outputting a character reference</h3>
|
|
|
|
<p>The character that PHP uses to separate arguments is configurable
|
|
with the <em>arg_separator.output</em> directive. This can be set in a
|
|
number of ways and is the solution suggested in the PHP manual.</p>
|
|
|
|
<h4>Editing php.ini</h4>
|
|
|
|
<p>The php.ini file contains the central configuration data for an
|
|
install of PHP on a computer. You can specify a character reference to
|
|
use there.</p>
|
|
|
|
<pre class="code"><code>arg_separator.output = "&amp;"</code></pre>
|
|
|
|
<h4>Apache directives</h4>
|
|
|
|
<p>The <a href="http://httpd.apache.org/">Apache</a> web server can
|
|
set PHP scripts in all the usual places. This allows different
|
|
directives to be set on a per site or per directory basis (in, for
|
|
example, a <location> block or .htaccess file).</p>
|
|
|
|
<pre class="code"><code>php_value arg_separator.output &amp;</code></pre>
|
|
|
|
<h4>Per script basis</h4>
|
|
|
|
<p>PHP configuration directives can be set on a per script basis with
|
|
<a href="http://php.net/ini_set">the ini_set function</a>. Put the
|
|
code to set the directives at the top of your script.</p>
|
|
|
|
<pre class="code"><code><?php ini_set('arg_separator.output','&amp;'); ?></code></pre>
|
|
|
|
<h3 id="separator">Using a different argument separator</h3>
|
|
|
|
<p>Since the ampersand character has special meaning in HTML, the
|
|
specification suggests that query string parsers allow the <a
|
|
href="http://www.w3.org/TR/html4/appendix/notes.html#h-B.2.2">use of a
|
|
semicolon as an argument separator</a>. PHP comes preconfigured to
|
|
accept this, so you can alter the output code to use a semicolon
|
|
instead of an ampersand using the same techniques.</p>
|
|
|
|
<h4>Editing php.ini</h4>
|
|
|
|
<pre class="code"><code>arg_separator.output = ";"</code></pre>
|
|
|
|
<h4>Apache directives</h4>
|
|
|
|
<pre class="code"><code>php_value arg_separator.output ;</code></pre>
|
|
|
|
<h4>Per script basis</h4>
|
|
|
|
<pre class="code"><code><?php ini_set('arg_separator.output',';'); ?></code></pre>
|
|
|
|
<h3 id="disable">Disable sessions for non-cookie users</h3>
|
|
|
|
<p>This option has a number of advantages from a security point of
|
|
view as it reduces the chance of the session token leaking to third
|
|
parties. As a side effect it will render your session code useless for
|
|
visitors who disable, block or otherwise do not support cookies (this
|
|
has accessibility implications).</p>
|
|
|
|
<h4>Editing php.ini</h4>
|
|
|
|
<pre class="code"><code>session.use_trans_sid = 0</code></pre>
|
|
|
|
<h4>Apache directives</h4>
|
|
|
|
<pre class="code"><code>php_value session.use_trans_sid 0</code></pre>
|
|
|
|
<h4>Per script basis</h4>
|
|
|
|
<p>This directive may or may not be able to be set on a per script
|
|
basis depending on which version of PHP you are using. If it is
|
|
possible to set it then the syntax is as follows:</p>
|
|
|
|
<pre class="code"><code><?php ini_set('session.use_trans_sid','0'); ?></code></pre>
|
|
|
|
|
|
|
|
|
|
<!-- Your content is finishing before this -->
|
|
</div>
|
|
<!-- Footer -->
|
|
|
|
<hr />
|
|
|
|
<div class="disclaimer">
|
|
<a href="http://validator.w3.org/check/referer"><img
|
|
src="http://validator.w3.org/images/vxhtml10" alt="Valid XHTML 1.0!"
|
|
height="31" width="88" /></a>
|
|
|
|
<address class="author">
|
|
Created Date: 2005-04-15 <br />
|
|
Last modified $Date: 2011/12/16 02:59:19 $ by $Author: gerald $</address>
|
|
<p class="policyfooter"><a rel="Copyright"
|
|
href="/Consortium/Legal/ipr-notice#Copyright">Copyright</a> © 2000-2003
|
|
<a href="/"><acronym
|
|
title="World Wide Web Consortium">W3C</acronym></a><sup>®</sup> (<a
|
|
href="http://www.csail.mit.edu/"><acronym
|
|
title="Massachusetts Institute of Technology">MIT</acronym></a>, <a
|
|
href="http://www.ercim.org/"><acronym
|
|
title="European Research Consortium for Informatics and Mathematics">ERCIM</acronym></a>, <a
|
|
href="http://www.keio.ac.jp/">Keio</a>), All Rights Reserved. W3C <a
|
|
href="/Consortium/Legal/ipr-notice#Legal_Disclaimer">liability</a>, <a
|
|
href="/Consortium/Legal/ipr-notice#W3C_Trademarks">trademark</a>, <a
|
|
rel="Copyright" href="/Consortium/Legal/copyright-documents">document use</a>
|
|
and <a rel="Copyright" href="/Consortium/Legal/copyright-software">software
|
|
licensing</a> rules apply. Your interactions with this site are in accordance
|
|
with our <a href="/Consortium/Legal/privacy-statement#Public">public</a> and
|
|
<a href="/Consortium/Legal/privacy-statement#Members">Member</a> privacy
|
|
statements.</p>
|
|
|
|
</div>
|
|
</body>
|
|
</html>
|