You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
350 lines
11 KiB
350 lines
11 KiB
<html xmlns="http://www.w3.org/1999/xhtml"> <!--*- nxml -*-->
|
|
<head>
|
|
<title>Transforming XHTML to LaTeX and BibTeX</title>
|
|
<link rel="stylesheet" href="article.css"/>
|
|
|
|
<link rel="documentclass" title="llncs"/><!-- href? where does that come from? -->
|
|
<link rel="bibliographystyle" title="splncs" /> <!-- href? -->
|
|
<link rel="usepackage" title="graphicx" /><!-- href? -->
|
|
<link rel="usepackage" title="url" href="ftp://cam.ctan.org/tex-archive/macros/latex/contrib/misc/url.sty" />
|
|
</head>
|
|
<body>
|
|
<div class="online"><a href="/">W3C</a></div>
|
|
|
|
<div class="maketitle">
|
|
<h1>Transforming XHTML to LaTeX and BibTeX</h1>
|
|
|
|
|
|
<address><a rel="author" href="http://www.w3.org/People/Connolly/">Dan Connolly</a><br />
|
|
<small class="online">$Revision: 1.23 $ of $Date: 2008/04/24 21:28:36 $</small>
|
|
</address>
|
|
|
|
</div>
|
|
<div class="abstract"><h4>Abstract</h4>
|
|
|
|
<p>We transform XHTML to LaTeX and BibTeX to allow technical articles
|
|
to be developed using familiar XHTML authoring tools and
|
|
techniques.</p>
|
|
</div>
|
|
|
|
|
|
<div>
|
|
<h2>Introduction</h2>
|
|
|
|
<p>Occasionally a web page turns the corner from a casually drafted
|
|
idea to an article worthy of publication. Computer science conferences
|
|
often require submissions using specific LaTeX styles; for example,
|
|
the <a
|
|
href="http://iswc2004.semanticweb.org/submission/authors_instruction.php">ISCW2004
|
|
submission instructions</a> require that submitted papers be formatted
|
|
in the style of the Springer publications format for <a
|
|
href="http://www.springeronline.com/sgw/cda/frontpage/0,10735,5-164-2-72376-0,00.html">Lecture
|
|
Notes in Computer Science (LNCS)</a>.
|
|
<a href="http://www.w3.org/Style/XSL/">XSLT</a> is
|
|
a convenient notation to express a transformation from
|
|
XHTML to LaTeX.</p>
|
|
|
|
<p>Tools to transform from LaTeX to HTML are commonplace, but there
|
|
are far fewer to go the other way. A little bit of searching yielded
|
|
some work<a href="#Gur00">[Gur00]</a> that was designed to undo a
|
|
transformation to XHTML. It used an odd XHTML namespace and exhibited
|
|
various other quirks specific to reversing that transformation, but it
|
|
provided quite a boost up the LaTeX learning curve<a
|
|
href="#Mann94">[Mann94]</a>.</p>
|
|
|
|
<p>That code did not integrate with the BibTeX. In order to take
|
|
advantage of automatic bibliography formatting traditionally provided
|
|
by LaTeX styles, after studying the <a
|
|
href="http://www.cc.gatech.edu/classes/RWL/Projects/citation/Docs/UserManuals/Reference_Pages/bibtex_doc.html">BibTeX
|
|
format</a><a href="#Spen98">[Spen98]</a> for a bit, <tt><a
|
|
href="xh2bib.xsl">xh2bibl.xsl</a></tt> was born.</p>
|
|
|
|
<p>Together with tradtional <tt>pdflatex</tt> and <tt>bibtex</tt>
|
|
tools<a href="#tetex">[tetex]</a> and and XSLT processor such as
|
|
xsltproc<a href="#XSLTPROC">[XSLTPROC]</a>, this transformation can
|
|
turn ordinary web pages with just a bit of special markup into
|
|
camera-ready PDF in specialized LaTeX styles.</p>
|
|
</div>
|
|
|
|
<div><h3>A Quick Example</h3>
|
|
|
|
<p>This article demonstrates the basic features. See:</p>
|
|
|
|
<ul>
|
|
<li><tt><a href="Overview.pdf">Overview.pdf</a></tt></li>
|
|
<li><tt><a href="Overview.tex">Overview.tex</a></tt></li>
|
|
<li><tt><a href="Overview.tex">Overview.bib</a></tt></li>
|
|
</ul>
|
|
|
|
<p>They are produced ala:</p>
|
|
|
|
<pre>
|
|
$ make Overview.pdf
|
|
xsltproc --novalid --stringparam DocClass llncs \
|
|
--stringparam Bib Overview --stringparam BibStyle splncs \
|
|
--stringparam Status prepub \
|
|
-o Overview.tex xh2latex.xsl Overview.html
|
|
TEXINPUTS=.:../../../2004/LLCS: pdflatex Overview.tex
|
|
This is pdfTeX, Version 3.14159-1.10b (Web2C 7.4.5)
|
|
<em>...</em>
|
|
Output written on Overview.pdf (3 pages, 62474 bytes).
|
|
Transcript written on Overview.log.
|
|
xsltproc --novalid -o Overview.bib xh2bib.xsl Overview.html
|
|
BSTINPUTS=.:../../../2004/LLCS: bibtex Overview
|
|
This is BibTeX, Version 0.99c (Web2C 7.4.5)
|
|
The top-level auxiliary file: Overview.aux
|
|
The style file: splncs.bst
|
|
Database file #1: Overview.bib
|
|
TEXINPUTS=.:../../../2004/LLCS: pdflatex Overview
|
|
This is pdfTeX, Version 3.14159-1.10b (Web2C 7.4.5)
|
|
<em>...</em>
|
|
Output written on Overview.pdf (3 pages, 67583 bytes).
|
|
Transcript written on Overview.log.
|
|
TEXINPUTS=.:../../../2004/LLCS: pdflatex Overview
|
|
This is pdfTeX, Version 3.14159-1.10b (Web2C 7.4.5)
|
|
<em>...</em>
|
|
Output written on Overview.pdf (3 pages, 67167 bytes).
|
|
Transcript written on Overview.log.
|
|
</pre>
|
|
|
|
</div>
|
|
|
|
|
|
<div>
|
|
<h2>Features</h2>
|
|
|
|
<p>The transformation <tt><a href="xh2latex.xsl">xh2latex.xsl</a></tt>
|
|
works in the obvious way for many idioms:</p>
|
|
|
|
<ul>
|
|
<li>sections headings: <tt>h2</tt>, <tt>h3</tt>, <tt>h4</tt></li>
|
|
<li>paragraphs: <tt>p</tt></li>
|
|
<li>itemized lists: <tt>ul</tt>, <tt>dl</tt></li>
|
|
<li>enumerated (numbered) lists: <tt>ol</tt></li>
|
|
<li>tables: <tt>table border="1"</tt>, <tt>tr</tt>, <tt>td</tt></li>
|
|
<li>verbatim: <tt>pre</tt></li>
|
|
<li>phrase markup: <tt>em</tt>, <tt>code</tt>, <tt>tt</tt>,
|
|
<tt>i</tt>, <tt>b</tt></li>
|
|
</ul>
|
|
|
|
<p>Table support is limited to tables with <tt>border="1"</tt>
|
|
and where all rows have the same number of cells. For example:</p>
|
|
<table border="1">
|
|
<tr><th>Name</th><th>Address</th><th>Phone</th></tr>
|
|
<tr><td>John Doe</td><td>123 High St.</td><td>555-1212</td></tr>
|
|
<tr><td>Jane Smith</td><td>456 Low St.</td><td>555-1234</td></tr>
|
|
</table>
|
|
|
|
<p>Specialized markup is required for other idioms. An <a
|
|
href="article.css">article.css</a> stylesheet provides
|
|
visual feedback for this special markup.</p>
|
|
|
|
<p>To use a latex package, add a link to the head of your document a la:</p>
|
|
<pre>
|
|
<link rel="usepackage" title="url"
|
|
href="ftp://cam.ctan.org/tex-archive/macros/latex/contrib/misc/url.sty" />
|
|
</pre>
|
|
|
|
<p>The package name is taken from the title attrbute. The href attribute is not used in the LaTeX conversion.</p>
|
|
|
|
<p>We recommend the <a
|
|
href="ftp://cam.ctan.org/tex-archive/macros/latex/contrib/misc/url.sty">url.sty</a>
|
|
package, per <a
|
|
href="http://www.tex.ac.uk/cgi-bin/texfaq2html?label=setURL">a TeX
|
|
FAQ</a>. For example: <tt
|
|
class="url">http://www.w3.org/People/Connolly/</tt>.</p>
|
|
|
|
<div><h3>Front Matter</h3>
|
|
|
|
<p>The following patterns are used to extract the
|
|
title page material:</p>
|
|
|
|
<ul>
|
|
<li><tt>div/@class="maketitle"</tt>
|
|
<ul>
|
|
<li>title: <tt>h1</tt></li>
|
|
<li>abstract: <tt>div/@class="abstract"</tt></li>
|
|
<li>author: <tt>address/a[@rel="author"]</tt></li>
|
|
</ul>
|
|
</li>
|
|
<li>keywords: <tt>div[@class="keywords"]</tt></li>
|
|
<li>terms: <tt>div[@class="terms"]</tt></li>
|
|
</ul>
|
|
|
|
<p><em>support for WWW2006 style authors, following
|
|
<a href="http://www.acm.org/sigs/pubs/proceed/sigfaq.htm">ACM style</a>,
|
|
is in progress.</em></p>
|
|
|
|
</div>
|
|
|
|
<div><h3>Cross references and footnotes</h3>
|
|
|
|
<p>The <tt>a[@rel="ref"]</tt> pattern is transformed to the LaTeX
|
|
<tt>\ref{<var>label</var>}</tt> idiom, assuming the reference takes
|
|
the form <tt>href="#<var>label</var>"</tt>. <em>@@needs testing</em></p>
|
|
|
|
<p>The footnote pattern is <tt>*[@class="footnote"]</tt>.</p>
|
|
</div>
|
|
|
|
<div><h3>Figures</h3>
|
|
|
|
<p>The <tt>div[@class="figure"]</tt> pattern is transformed to a
|
|
figure environment; any <tt>div/@id</tt> is used as a figure
|
|
label. The file pattern is <tt>object/@data</tt>. <em>Figures are
|
|
currently assumed to be PDF; the <tt>object/@height</tt> attribute is
|
|
copied over.</em> The caption pattern is <tt>p[@class="caption"]</tt>.
|
|
<em>@@need to test this.</em>
|
|
Be sure to include the <tt>epsfig</tt> package a la:
|
|
</p>
|
|
<pre>
|
|
<link rel="usepackage" title="epsfig" />
|
|
</pre>
|
|
</div>
|
|
|
|
<div><h3>Citations and Bibliography</h3>
|
|
|
|
<p>An <tt>a</tt> element starting with an open square bracket
|
|
<tt>[</tt> is interpreted as a citation reference. The <tt>href</tt>
|
|
is assumed to be a local link ala <tt>#<var>tag</var></tt>.</p>
|
|
|
|
<p>The pattern <tt>dl/@class="bib"</tt> is used to find the
|
|
bibliography.
|
|
Each item marked up ala...</p>
|
|
<pre>
|
|
<dt class="misc">[<a name="tetex">tetex</a>]</dt>
|
|
<dd>
|
|
<span class="author">Thomas Esser</span>
|
|
<cite><a
|
|
href="http://www.tug.org/tex-archive/help/Catalogue/entries/tetex.html"
|
|
>The TeX distribution for Unix/Linux</a></cite>
|
|
February <span class="year">2003</span>
|
|
</dd>
|
|
</pre>
|
|
|
|
<p>or</p>
|
|
|
|
<pre>
|
|
<dt class="misc" id="tetex">[tetex]</dt>
|
|
...
|
|
</pre>
|
|
|
|
<p>Note the placement of the bibtex item type <tt>misc</tt> and the
|
|
tag <tt>tetex</tt> and keep in mind that <tt>bibtex</tt> ignores
|
|
works in the bibliography that are not cited from the body.</p>
|
|
|
|
<p>The <tt><a href="xh2bib.xsl">xh2bibl.xsl</a></tt> transformation
|
|
turns this markup into BibTeX format. <tt>xh2latex.xsl</tt> transforms
|
|
the entire bibliography <tt>dl</tt> to a <tt>\bibliography{...}</tt>
|
|
reference.</p>
|
|
|
|
<p><em>capitalization of titles seems to get mangled. I'm not sure if
|
|
that's a feature of certain bibliography styles or what.</em></p>
|
|
|
|
</div>
|
|
|
|
<div><h3>Bugs/Caveats/Misfeatures</h3>
|
|
|
|
<ul>
|
|
<li>Composed characters and such in the bibliography are
|
|
handled with a sort of kludge, e.g.
|
|
<tt>K<span title='\"o'>ö</span>bler</tt>
|
|
</li>
|
|
<li>The <tt>samp</tt> element is used to pass LaTeX
|
|
math markup thru, e.g.
|
|
<tt><samp>\Delta</samp></tt>
|
|
</li>
|
|
</ul>
|
|
</div>
|
|
|
|
</div>
|
|
|
|
<div><h2>Makefile support</h2>
|
|
|
|
<p>Formatting a LaTeX document is done in several passes. One <a
|
|
href=
|
|
"http://amath.colorado.edu/documentation/LaTeX/basics/steps/help_latex.html"
|
|
>typical manual</a> shows:</p>
|
|
|
|
<pre>
|
|
ucsub> latex MyDoc.tex
|
|
ucsub> bibtex MyDoc
|
|
ucsub> latex MyDoc.tex
|
|
ucsub> latex MyDoc.tex
|
|
</pre>
|
|
|
|
<p>The follwing excerpt from <tt><a
|
|
href="html2latex.mak">html2latex.mak</a></tt> shows
|
|
some rules to accomplish this using make:</p>
|
|
|
|
<pre>
|
|
.html.tex:
|
|
$(XSLTPROC) --novalid $(HLPARAMS) \
|
|
-o $@ xh2latex.xsl $<
|
|
|
|
.html.bib:
|
|
$(XSLTPROC) --novalid -o $@ xh2bib.xsl $<
|
|
|
|
.tex.aux:
|
|
TEXINPUTS=$(TEXINPUTS) $(PDFLATEX) $<
|
|
|
|
.tex.bbl:
|
|
BSTINPUTS=$(BSTINPUTS) $(BIBTEX) $*
|
|
|
|
|
|
.aux.pdf:
|
|
TEXINPUTS=$(TEXINPUTS) $(PDFLATEX) $*
|
|
TEXINPUTS=$(TEXINPUTS) $(PDFLATEX) $*
|
|
</pre>
|
|
|
|
<p>Sources:</p>
|
|
<ul>
|
|
<li><tt><a href="xh2latex.xsl">xh2latex.xsl</a></tt></li>
|
|
<li><tt><a href="xh2bib.xsl">xh2bib.xsl</a></tt></li>
|
|
<li><tt><a href="article.css">article.css</a></tt></li>
|
|
</ul>
|
|
|
|
</div>
|
|
|
|
<div>
|
|
<h2>References</h2>
|
|
<dl class="bib">
|
|
|
|
<dt class="misc">[<a name="tetex">tetex</a>]</dt>
|
|
<dd>
|
|
<span class="author">Thomas Esser</span>
|
|
<cite><a href="http://www.tug.org/tex-archive/help/Catalogue/entries/tetex.html">The TeX distribution for Unix/Linux</a></cite>
|
|
February <span class="year">2003</span>
|
|
</dd>
|
|
|
|
<dt class="misc">[<a name="Mann94">Mann94</a>]</dt>
|
|
<dd><span class="author">Shannon Mann</span>
|
|
<cite><a href="http://www.csclub.uwaterloo.ca/u/sjbmann/tutorial.html">Beginner's LaTeX Tutorial</a></cite>
|
|
<span class="year">1994</span>-06-16T15:32:27
|
|
</dd>
|
|
|
|
<dt class="misc">[<a name="Spen98">Spen98</a>]</dt>
|
|
|
|
<dd><span class="author">Spencer Rugaber</span>
|
|
<cite>
|
|
<a href="http://www.cc.gatech.edu/classes/RWL/Projects/citation/">The Citation project</a>
|
|
</cite>
|
|
Summer <span class="year">1998</span>.
|
|
</dd>
|
|
|
|
<dt class="misc">[<a name="Gur00" id="Gur00">Gur00</a>]</dt>
|
|
<dd><span class="author">Eitan M. Gurari</span>
|
|
<cite><a href="http://www.cse.ohio-state.edu/~gurari/docs/mml-00/xhm2latex.html">XSLT from XHTML+MathML to LATEX</a></cite>
|
|
<span class="month">July</span> 19, <span class="year">2000</span>
|
|
</dd>
|
|
|
|
<dt class="misc">[<a name="XSLTPROC" id="XSLTPROC">XSLTPROC</a>]</dt>
|
|
<dd><span class="author">Daniel Veillard</span>
|
|
<cite><a href="http://xmlsoft.org/XSLT/xsltproc2.html">The xsltproc tool</a></cite>
|
|
in <a href="http://xmlsoft.org/XSLT/">libxslt: The XSLT C library for Gnome</a>
|
|
1.1.2 <span class="month">Dec</span> 24 <span class="year">2003</span>
|
|
</dd>
|
|
</dl>
|
|
</div>
|
|
|
|
</body>
|
|
</html>
|