Another abandoned server code base... this is kind of an ancestor of taskrambler.
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 
 
 

350 lines
11 KiB

<html xmlns="http://www.w3.org/1999/xhtml"> <!--*- nxml -*-->
<head>
<title>Transforming XHTML to LaTeX and BibTeX</title>
<link rel="stylesheet" href="article.css"/>
<link rel="documentclass" title="llncs"/><!-- href? where does that come from? -->
<link rel="bibliographystyle" title="splncs" /> <!-- href? -->
<link rel="usepackage" title="graphicx" /><!-- href? -->
<link rel="usepackage" title="url" href="ftp://cam.ctan.org/tex-archive/macros/latex/contrib/misc/url.sty" />
</head>
<body>
<div class="online"><a href="/">W3C</a></div>
<div class="maketitle">
<h1>Transforming XHTML to LaTeX and BibTeX</h1>
<address><a rel="author" href="http://www.w3.org/People/Connolly/">Dan Connolly</a><br />
<small class="online">$Revision: 1.23 $ of $Date: 2008/04/24 21:28:36 $</small>
</address>
</div>
<div class="abstract"><h4>Abstract</h4>
<p>We transform XHTML to LaTeX and BibTeX to allow technical articles
to be developed using familiar XHTML authoring tools and
techniques.</p>
</div>
<div>
<h2>Introduction</h2>
<p>Occasionally a web page turns the corner from a casually drafted
idea to an article worthy of publication. Computer science conferences
often require submissions using specific LaTeX styles; for example,
the <a
href="http://iswc2004.semanticweb.org/submission/authors_instruction.php">ISCW2004
submission instructions</a> require that submitted papers be formatted
in the style of the Springer publications format for <a
href="http://www.springeronline.com/sgw/cda/frontpage/0,10735,5-164-2-72376-0,00.html">Lecture
Notes in Computer Science (LNCS)</a>.
<a href="http://www.w3.org/Style/XSL/">XSLT</a> is
a convenient notation to express a transformation from
XHTML to LaTeX.</p>
<p>Tools to transform from LaTeX to HTML are commonplace, but there
are far fewer to go the other way. A little bit of searching yielded
some work<a href="#Gur00">[Gur00]</a> that was designed to undo a
transformation to XHTML. It used an odd XHTML namespace and exhibited
various other quirks specific to reversing that transformation, but it
provided quite a boost up the LaTeX learning curve<a
href="#Mann94">[Mann94]</a>.</p>
<p>That code did not integrate with the BibTeX. In order to take
advantage of automatic bibliography formatting traditionally provided
by LaTeX styles, after studying the <a
href="http://www.cc.gatech.edu/classes/RWL/Projects/citation/Docs/UserManuals/Reference_Pages/bibtex_doc.html">BibTeX
format</a><a href="#Spen98">[Spen98]</a> for a bit, <tt><a
href="xh2bib.xsl">xh2bibl.xsl</a></tt> was born.</p>
<p>Together with tradtional <tt>pdflatex</tt> and <tt>bibtex</tt>
tools<a href="#tetex">[tetex]</a> and and XSLT processor such as
xsltproc<a href="#XSLTPROC">[XSLTPROC]</a>, this transformation can
turn ordinary web pages with just a bit of special markup into
camera-ready PDF in specialized LaTeX styles.</p>
</div>
<div><h3>A Quick Example</h3>
<p>This article demonstrates the basic features. See:</p>
<ul>
<li><tt><a href="Overview.pdf">Overview.pdf</a></tt></li>
<li><tt><a href="Overview.tex">Overview.tex</a></tt></li>
<li><tt><a href="Overview.tex">Overview.bib</a></tt></li>
</ul>
<p>They are produced ala:</p>
<pre>
$ make Overview.pdf
xsltproc --novalid --stringparam DocClass llncs \
--stringparam Bib Overview --stringparam BibStyle splncs \
--stringparam Status prepub \
-o Overview.tex xh2latex.xsl Overview.html
TEXINPUTS=.:../../../2004/LLCS: pdflatex Overview.tex
This is pdfTeX, Version 3.14159-1.10b (Web2C 7.4.5)
<em>...</em>
Output written on Overview.pdf (3 pages, 62474 bytes).
Transcript written on Overview.log.
xsltproc --novalid -o Overview.bib xh2bib.xsl Overview.html
BSTINPUTS=.:../../../2004/LLCS: bibtex Overview
This is BibTeX, Version 0.99c (Web2C 7.4.5)
The top-level auxiliary file: Overview.aux
The style file: splncs.bst
Database file #1: Overview.bib
TEXINPUTS=.:../../../2004/LLCS: pdflatex Overview
This is pdfTeX, Version 3.14159-1.10b (Web2C 7.4.5)
<em>...</em>
Output written on Overview.pdf (3 pages, 67583 bytes).
Transcript written on Overview.log.
TEXINPUTS=.:../../../2004/LLCS: pdflatex Overview
This is pdfTeX, Version 3.14159-1.10b (Web2C 7.4.5)
<em>...</em>
Output written on Overview.pdf (3 pages, 67167 bytes).
Transcript written on Overview.log.
</pre>
</div>
<div>
<h2>Features</h2>
<p>The transformation <tt><a href="xh2latex.xsl">xh2latex.xsl</a></tt>
works in the obvious way for many idioms:</p>
<ul>
<li>sections headings: <tt>h2</tt>, <tt>h3</tt>, <tt>h4</tt></li>
<li>paragraphs: <tt>p</tt></li>
<li>itemized lists: <tt>ul</tt>, <tt>dl</tt></li>
<li>enumerated (numbered) lists: <tt>ol</tt></li>
<li>tables: <tt>table border="1"</tt>, <tt>tr</tt>, <tt>td</tt></li>
<li>verbatim: <tt>pre</tt></li>
<li>phrase markup: <tt>em</tt>, <tt>code</tt>, <tt>tt</tt>,
<tt>i</tt>, <tt>b</tt></li>
</ul>
<p>Table support is limited to tables with <tt>border="1"</tt>
and where all rows have the same number of cells. For example:</p>
<table border="1">
<tr><th>Name</th><th>Address</th><th>Phone</th></tr>
<tr><td>John Doe</td><td>123 High St.</td><td>555-1212</td></tr>
<tr><td>Jane Smith</td><td>456 Low St.</td><td>555-1234</td></tr>
</table>
<p>Specialized markup is required for other idioms. An <a
href="article.css">article.css</a> stylesheet provides
visual feedback for this special markup.</p>
<p>To use a latex package, add a link to the head of your document a la:</p>
<pre>
&lt;link rel="usepackage" title="url"
href="ftp://cam.ctan.org/tex-archive/macros/latex/contrib/misc/url.sty" />
</pre>
<p>The package name is taken from the title attrbute. The href attribute is not used in the LaTeX conversion.</p>
<p>We recommend the <a
href="ftp://cam.ctan.org/tex-archive/macros/latex/contrib/misc/url.sty">url.sty</a>
package, per <a
href="http://www.tex.ac.uk/cgi-bin/texfaq2html?label=setURL">a TeX
FAQ</a>. For example: <tt
class="url">http://www.w3.org/People/Connolly/</tt>.</p>
<div><h3>Front Matter</h3>
<p>The following patterns are used to extract the
title page material:</p>
<ul>
<li><tt>div/@class="maketitle"</tt>
<ul>
<li>title: <tt>h1</tt></li>
<li>abstract: <tt>div/@class="abstract"</tt></li>
<li>author: <tt>address/a[@rel="author"]</tt></li>
</ul>
</li>
<li>keywords: <tt>div[@class="keywords"]</tt></li>
<li>terms: <tt>div[@class="terms"]</tt></li>
</ul>
<p><em>support for WWW2006 style authors, following
<a href="http://www.acm.org/sigs/pubs/proceed/sigfaq.htm">ACM style</a>,
is in progress.</em></p>
</div>
<div><h3>Cross references and footnotes</h3>
<p>The <tt>a[@rel="ref"]</tt> pattern is transformed to the LaTeX
<tt>\ref{<var>label</var>}</tt> idiom, assuming the reference takes
the form <tt>href="#<var>label</var>"</tt>. <em>@@needs testing</em></p>
<p>The footnote pattern is <tt>*[@class="footnote"]</tt>.</p>
</div>
<div><h3>Figures</h3>
<p>The <tt>div[@class="figure"]</tt> pattern is transformed to a
figure environment; any <tt>div/@id</tt> is used as a figure
label. The file pattern is <tt>object/@data</tt>. <em>Figures are
currently assumed to be PDF; the <tt>object/@height</tt> attribute is
copied over.</em> The caption pattern is <tt>p[@class="caption"]</tt>.
<em>@@need to test this.</em>
Be sure to include the <tt>epsfig</tt> package a la:
</p>
<pre>
&lt;link rel="usepackage" title="epsfig" />
</pre>
</div>
<div><h3>Citations and Bibliography</h3>
<p>An <tt>a</tt> element starting with an open square bracket
<tt>[</tt> is interpreted as a citation reference. The <tt>href</tt>
is assumed to be a local link ala <tt>#<var>tag</var></tt>.</p>
<p>The pattern <tt>dl/@class="bib"</tt> is used to find the
bibliography.
Each item marked up ala...</p>
<pre>
&lt;dt class="misc">[&lt;a name="tetex">tetex&lt;/a>]&lt;/dt>
&lt;dd>
&lt;span class="author">Thomas Esser&lt;/span>
&lt;cite>&lt;a
href="http://www.tug.org/tex-archive/help/Catalogue/entries/tetex.html"
>The TeX distribution for Unix/Linux&lt;/a>&lt;/cite>
February &lt;span class="year">2003&lt;/span>
&lt;/dd>
</pre>
<p>or</p>
<pre>
&lt;dt class="misc" id="tetex">[tetex]&lt;/dt>
...
</pre>
<p>Note the placement of the bibtex item type <tt>misc</tt> and the
tag <tt>tetex</tt> and keep in mind that <tt>bibtex</tt> ignores
works in the bibliography that are not cited from the body.</p>
<p>The <tt><a href="xh2bib.xsl">xh2bibl.xsl</a></tt> transformation
turns this markup into BibTeX format. <tt>xh2latex.xsl</tt> transforms
the entire bibliography <tt>dl</tt> to a <tt>\bibliography{...}</tt>
reference.</p>
<p><em>capitalization of titles seems to get mangled. I'm not sure if
that's a feature of certain bibliography styles or what.</em></p>
</div>
<div><h3>Bugs/Caveats/Misfeatures</h3>
<ul>
<li>Composed characters and such in the bibliography are
handled with a sort of kludge, e.g.
<tt>K&lt;span title='\"o'>&#246;&lt;/span>bler</tt>
</li>
<li>The <tt>samp</tt> element is used to pass LaTeX
math markup thru, e.g.
<tt>&lt;samp>\Delta&lt;/samp></tt>
</li>
</ul>
</div>
</div>
<div><h2>Makefile support</h2>
<p>Formatting a LaTeX document is done in several passes. One <a
href=
"http://amath.colorado.edu/documentation/LaTeX/basics/steps/help_latex.html"
>typical manual</a> shows:</p>
<pre>
ucsub> latex MyDoc.tex
ucsub> bibtex MyDoc
ucsub> latex MyDoc.tex
ucsub> latex MyDoc.tex
</pre>
<p>The follwing excerpt from <tt><a
href="html2latex.mak">html2latex.mak</a></tt> shows
some rules to accomplish this using make:</p>
<pre>
.html.tex:
$(XSLTPROC) --novalid $(HLPARAMS) \
-o $@ xh2latex.xsl $&lt;
.html.bib:
$(XSLTPROC) --novalid -o $@ xh2bib.xsl $&lt;
.tex.aux:
TEXINPUTS=$(TEXINPUTS) $(PDFLATEX) $&lt;
.tex.bbl:
BSTINPUTS=$(BSTINPUTS) $(BIBTEX) $*
.aux.pdf:
TEXINPUTS=$(TEXINPUTS) $(PDFLATEX) $*
TEXINPUTS=$(TEXINPUTS) $(PDFLATEX) $*
</pre>
<p>Sources:</p>
<ul>
<li><tt><a href="xh2latex.xsl">xh2latex.xsl</a></tt></li>
<li><tt><a href="xh2bib.xsl">xh2bib.xsl</a></tt></li>
<li><tt><a href="article.css">article.css</a></tt></li>
</ul>
</div>
<div>
<h2>References</h2>
<dl class="bib">
<dt class="misc">[<a name="tetex">tetex</a>]</dt>
<dd>
<span class="author">Thomas Esser</span>
<cite><a href="http://www.tug.org/tex-archive/help/Catalogue/entries/tetex.html">The TeX distribution for Unix/Linux</a></cite>
February <span class="year">2003</span>
</dd>
<dt class="misc">[<a name="Mann94">Mann94</a>]</dt>
<dd><span class="author">Shannon Mann</span>
<cite><a href="http://www.csclub.uwaterloo.ca/u/sjbmann/tutorial.html">Beginner's LaTeX Tutorial</a></cite>
<span class="year">1994</span>-06-16T15:32:27
</dd>
<dt class="misc">[<a name="Spen98">Spen98</a>]</dt>
<dd><span class="author">Spencer Rugaber</span>
<cite>
<a href="http://www.cc.gatech.edu/classes/RWL/Projects/citation/">The Citation project</a>
</cite>
Summer <span class="year">1998</span>.
</dd>
<dt class="misc">[<a name="Gur00" id="Gur00">Gur00</a>]</dt>
<dd><span class="author">Eitan M. Gurari</span>
<cite><a href="http://www.cse.ohio-state.edu/~gurari/docs/mml-00/xhm2latex.html">XSLT from XHTML+MathML to LATEX</a></cite>
<span class="month">July</span> 19, <span class="year">2000</span>
</dd>
<dt class="misc">[<a name="XSLTPROC" id="XSLTPROC">XSLTPROC</a>]</dt>
<dd><span class="author">Daniel Veillard</span>
<cite><a href="http://xmlsoft.org/XSLT/xsltproc2.html">The xsltproc tool</a></cite>
in <a href="http://xmlsoft.org/XSLT/">libxslt: The XSLT C library for Gnome</a>
1.1.2 <span class="month">Dec</span> 24 <span class="year">2003</span>
</dd>
</dl>
</div>
</body>
</html>