You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
294 lines
8.4 KiB
294 lines
8.4 KiB
<!doctype html public "-//W3C//DTD HTML 1997-05-18//EN"
|
|
"html.dtd">
|
|
<HTML>
|
|
<HEAD>
|
|
<TITLE>XML Hacking is Fun!</TITLE>
|
|
</HEAD>
|
|
<BODY>
|
|
<P>
|
|
<A href="../../"><IMG src="../../Icons/WWW/w3c_home" ALT="W3C"></A> |
|
|
<A HREF="../../Architecture/">Architecture</A> |
|
|
<A HREF="../../MarkUp/SGML/">XML</A>
|
|
<H1>
|
|
XML Hacking is Fun!
|
|
</H1>
|
|
<ADDRESS>
|
|
<A HREF="../../People/Connolly/">Dan Connolly</A><BR>
|
|
Created: Mon May 12 16:06:27 CDT 1997<BR>
|
|
$Id: hacking.html,v 1.5 1998/04/29 03:20:20 connolly Exp $
|
|
</ADDRESS>
|
|
<P>
|
|
For me, XML puts the fun back into web hacking. I wrote three XML parsers
|
|
last weekend. Great stress relief!
|
|
<P>
|
|
See also: <A href="../notes.html">some more notes on XML implementation
|
|
experience</A>, mostly by Bert Bos.
|
|
<HR>
|
|
<DL>
|
|
<DT>
|
|
<A href="xml.py">xml.py</A>
|
|
<DD>
|
|
<A href="http://www.python.org">python</A> module for XML.</></>
|
|
<DT>
|
|
<A href="xml-check.pl">xml-check.pl</A>
|
|
<DD>
|
|
quick and dirty XML well-formedness checker in perl. Got bored with this
|
|
and moved on to python after a bit.</></>
|
|
</DL>
|
|
<H2>
|
|
Converting XML to Lout
|
|
</H2>
|
|
<DL>
|
|
<DT>
|
|
<A href="loutwr.py">loutwr</A>
|
|
<DD>
|
|
lexical details of writing lout format
|
|
<DT>
|
|
<A href="xml2lout.py">xml2lout</A>
|
|
<DD>
|
|
rules/stack-based conversion to lout
|
|
<DT>
|
|
<A href="html2lout.py">html2lout</A>
|
|
<DD>
|
|
add some rules for HTML
|
|
<DT>
|
|
<A href="report2lout.py">report2lout</A>
|
|
<DD>
|
|
add some rules for a latex/lout-like
|
|
<A href="../../MarkUp/9705/report.dtd">report DTD</A> on top of html
|
|
</DL>
|
|
<H2>
|
|
XML Typing notes
|
|
</H2>
|
|
<P>
|
|
XML document types should evolve gracefully. Technically, format negotiation
|
|
is a solution to deployment of revised data formats, but it did not meet
|
|
the market constraints (i.e. it wasn't cost-effective for the involved parties)
|
|
in the case of HTML forms, tables and foriegn payload (scripts and stylesheets).
|
|
<P>
|
|
I'm investigating ways to express the MIME multipart alternative concept
|
|
at the element level in XML. This allows new features in XML documents to
|
|
be deployed like color over the b/w TV signal. It allows the new and the
|
|
old semantics to be expressed in the same file, which cuts down the cost
|
|
of managing the data (copy, rename, verify, datestamp, inodes, ...) and caching
|
|
it.
|
|
<P>
|
|
My intuition says that we can borrow the inheritance and subtyping ideas
|
|
from <A href="../../OOP/">OOP</A> to model a form of type negotiation for
|
|
XML.
|
|
<P>
|
|
<DL>
|
|
<DT>
|
|
Akpotsui, Extase K. A; Quint, Vincent; Roisin, Cécile.
|
|
<A HREF="ftp://ftp.inrialpes.fr/pub/opera/publications/MCM97.ps.gz"><CITE>Type
|
|
Modelling for Document Transformation in Structured Editing
|
|
Systems</CITE></A>. Mathematical and Computer Modelling 25/4 (February 1997)
|
|
1-19 (with 26 references). Authors' affiliation: INRIA/Project Opéra.
|
|
<DD>
|
|
Abstract:
|
|
<BLOCKQUOTE>
|
|
This paper addresses the problem of type transformation in structured editing
|
|
systems and proposes a type description model convenient for type comparison
|
|
and document conversation. Two kinds of transformations are considered: dynamic
|
|
transformations allow a structured editor to change the structure of a part
|
|
of a document when the part is copied of moved, and static transformations
|
|
allow specific tools to restructure documents when their generic structure
|
|
is modified. We present in this paper the current state of our research on
|
|
formal analysis for these transformations.
|
|
</BLOCKQUOTE>
|
|
</DL>
|
|
<P>
|
|
Cut/paste issues. Shows that DTD's are not just regexps: & ? are novel.
|
|
<P>
|
|
Also shows that separating element names from element types is essential
|
|
for some kinds of modelling. I suspect DTD's should be extended to allow
|
|
this (well... replaces with something that expresses this.) For example,
|
|
allow XPTR style selectors rather than just namegroups in element declarations:
|
|
<PRE>
|
|
<!element (parent1 child) ANY>
|
|
<!element (parent2 child) (x|y|z)>
|
|
</PRE>
|
|
<P>
|
|
@@don't use class, just make up new elements and use containment!
|
|
<H2>
|
|
XML Modules
|
|
</H2>
|
|
<P>
|
|
About namespaces in DTDs... how about:
|
|
<PRE>
|
|
<![ module-name [
|
|
<!entity module-name "IGNORE">
|
|
... module contents ...
|
|
]]>
|
|
</PRE>
|
|
<P>
|
|
which is just like:
|
|
<PRE>
|
|
#ifdef _module_h
|
|
#define _module_h
|
|
... module contents ...
|
|
#endif /* _module_h */
|
|
</PRE>
|
|
<P>
|
|
I made a <A href="fix-sgml.el">patch to psgml mode</A> to allow me to use
|
|
this syntax.
|
|
<P>
|
|
You still have to have a partial order on your modules. And it's still just
|
|
one big namespace. So it's just like C -- which is good enough for lots of
|
|
things, but not for truly independent development.
|
|
<H2>
|
|
Marked Sections, and Here Documents, and Archives
|
|
</H2>
|
|
<P>
|
|
Is an unescaped > allowed in XML content? (9711 spec says yes.)
|
|
<P>
|
|
HTML 2.0 spec discouraged it in order to avoid ]]> showing up in documents,
|
|
which is an error in SGML'86.
|
|
<P>
|
|
XML of 9711 has the same misfeature, but it's marked "for compatibility".
|
|
<P>
|
|
Marked sections can't contain ]]>
|
|
<P>
|
|
What's the purpose of a marked section, anyway? If it's just to be able to
|
|
put XML inside XML without lots of tedious escaping, then the above limitation
|
|
isn't a showstopper.
|
|
<P>
|
|
But it seems to me that the purpose is to be able to include foriegn data
|
|
like SCRIPT and STYLE, in which case this limitation is really painful.
|
|
<P>
|
|
Based on shell/perl HERE documents and MIME multipart syntax, I suggest the
|
|
following:
|
|
<PRE>
|
|
<![myStringHere[ ... ]myStringHere]>
|
|
</PRE>
|
|
<P>
|
|
which allows ... to contain ANY sequence of characters. Any sequence of bytes,
|
|
actually! This solves the script/style problem, plus gives XML the potential
|
|
to replace tar, zip, etc. in the same way that HERE documents facilitate
|
|
shar archives. (But Just Say No to turning-complete archive formats.)
|
|
<H2>
|
|
Empty end Tags
|
|
</H2>
|
|
<P>
|
|
I'm implemented support for:
|
|
<PRE>
|
|
<foo> ... </>
|
|
</PRE>
|
|
<P>
|
|
The implementation cost is trivial. The deployment cost is the risk that
|
|
folks will expect legacy HTML elements to work this way:
|
|
<PRE>
|
|
<blockquote> ... </>
|
|
</PRE>
|
|
<H2>
|
|
attribute value syntax
|
|
</H2>
|
|
<P>
|
|
???
|
|
<H2>
|
|
Character Entities
|
|
</H2>
|
|
<P>
|
|
Bad idea. general entites are very powerful, and all we need is a way to
|
|
escape three characters (maybe two).
|
|
<P>
|
|
Other characters should be done with "replaced elements" with fallback inside,
|
|
e.g.:
|
|
<PRE>
|
|
<emdash>---</>
|
|
</PRE>
|
|
<P>
|
|
Going to Unicode is probably cost-effective in the long term, but the documents
|
|
don't degrade gracefully.
|
|
<H2>
|
|
Convenience Entities: macros and includes
|
|
</H2>
|
|
<P>
|
|
These are obviated by linking. The idiom:
|
|
<PRE>
|
|
<!doctype html public "-//IETF//DTD HTML//EN" [
|
|
<!entity product-name "Gee Whiz&tm;">
|
|
<!entity legal system "legal.html">
|
|
]>
|
|
... &product-name;
|
|
...
|
|
&legal.html;
|
|
</PRE>
|
|
<P>
|
|
can be done ala:
|
|
<PRE>
|
|
<!doctype html system "http://www.w3.org/9705/html.dtd">
|
|
<div style="display: none">
|
|
<span id=product-name>Gee Whiz&tm;</span>
|
|
</div>
|
|
|
|
... <a href="#product-name" xml-link=replace>Gee Whiz&tm;</>
|
|
<a href="legal.html" xml-link=replace>Copyright (c) 1997 by US</a>
|
|
</PRE>
|
|
<P>
|
|
The a's could be left empty. But for the benefit of downlevel clients, you
|
|
can (by machine) propagate the destination of the link (or a part of it)
|
|
to the souce. clients,
|
|
<H2>
|
|
Parameter Entities
|
|
</H2>
|
|
<DL>
|
|
<DT>
|
|
.cm
|
|
<DD>
|
|
content model. Fully parenthesized. Can be used anywhere a gi can be used.
|
|
<DT>
|
|
.orList
|
|
<DD>
|
|
union expression. orLists can be concatendated. @#hmmm.. namegroup?
|
|
<DT>
|
|
.valType
|
|
<DD>
|
|
attribute value type, e.g. CDATA with overloaded semantics
|
|
<DT>
|
|
.tagType
|
|
<DD>
|
|
list of attribute declarations, ala a list of methods, i.e. an object type
|
|
<DT>
|
|
.dtd
|
|
<DD>
|
|
link to another entity in DTD syntax
|
|
</DL>
|
|
<H2>
|
|
DT and DD
|
|
</H2>
|
|
<P>
|
|
I want DT/DD to be able to format ala:
|
|
<PRE>
|
|
term definition
|
|
definition def d
|
|
efiintion
|
|
</PRE>
|
|
<P>
|
|
so I changed the content models of dt and dd so that dd is contained within
|
|
dt.
|
|
<H2>
|
|
Testing Notes
|
|
</H2>
|
|
<P>
|
|
@@link to MIX.
|
|
<PRE>
|
|
ok3: uses internal declaration subset. Boo.
|
|
note that this is a perfect example of how
|
|
entities are redundant with respect to linking
|
|
|
|
ok3a: @@ WF client should check for data outside root element
|
|
|
|
torture:
|
|
whacked internal declaration out
|
|
removed references to other entities
|
|
|
|
#@@ is an unescaped > allowed in xml? what about ]]>?
|
|
is ]]> a reportable error? well-formedness error? validity error?
|
|
|
|
This doesn't match:
|
|
<p>PI with markup: <?Myparser &lt;p> or <p> --
|
|
|
|
which?></p>
|
|
</PRE>
|
|
</BODY></HTML>
|