You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
139 lines
3.8 KiB
139 lines
3.8 KiB
<HTML>
|
|
<HEAD>
|
|
<TITLE>Some commenst on SGML syntax</TITLE>
|
|
<NEXTID N="z1">
|
|
</HEAD>
|
|
<BODY>
|
|
<H1>Reform of SGML</H1>It is true that SGML was designed
|
|
from the standpoint of markup, that
|
|
is, annotations on text as to how
|
|
it should be formatted, rather than
|
|
as a language. Here is my $.02 worth
|
|
.. I don't for a moment imagine that
|
|
ISO would really clean it up at this
|
|
stage. (March 93). We consider an
|
|
incremental cleaning up of justa
|
|
few points of SGML syntax.
|
|
<H2>Clean up those brackets</H2>The problems of interpretation of
|
|
the space betwen two tags would be
|
|
removed if one had a delimiter (say
|
|
a semicolon) which meant "end of
|
|
tag, begin new tag". For some reason
|
|
an empty piece of text is used for
|
|
this in SGML! This is like using
|
|
a null string, or often a newline
|
|
string, as a statement separator.<P>
|
|
Suppose one could then write
|
|
<PRE> <TAG1 ATTR ATTR2 ;
|
|
TAG2 ATTRSF SDF SDF>
|
|
|
|
</PRE>instead of
|
|
<PRE> <TAG1 ATTR ATTR2>
|
|
<TAG2 ATTRSF SDF SDF>
|
|
|
|
</PRE>Try this with your average DTD and
|
|
see how clean it looks! The result
|
|
looks like (what it should be) a
|
|
computing language with text as parameters.
|
|
<H2>Free format</H2>Suppose white space be allowed between
|
|
the < and the first character. This
|
|
is unthinkable to the markup-minded
|
|
person who wants a < by itself to
|
|
be an error but it looks SO much
|
|
nicer to a language-minded person:
|
|
<PRE> <SECTION LEVEL=2>
|
|
<STITLE ID=ABC>What Next?
|
|
</STITLE>
|
|
<IDX>
|
|
<FIG X=7 y=67 CAP="The solution">
|
|
Hello
|
|
</FRE>
|
|
</IDX>
|
|
|
|
</PRE>would come out like
|
|
<PRE> SECTION LEVEL=2;
|
|
STITLE ID=ABC >What Next?<
|
|
/STITLE;
|
|
IDX;
|
|
FIG X=7 y=67 CAP="The solution"
|
|
>Hello<
|
|
/FIG;
|
|
/IDX;
|
|
|
|
</PRE>It makes so much more sense to quote
|
|
the text instead of the markup when
|
|
there is much more markup than text.
|
|
This way it can look like language
|
|
with embedded text or text with embedded
|
|
markup depending on which is predominant.
|
|
<H2>Unifying the quoting</H2>Now, the astute would realize that
|
|
the double quotes in the attribute
|
|
value CAP="The solution" are playing
|
|
basically the same role as the angle
|
|
brackets which are left around text,
|
|
and would suggest that they are made
|
|
equivalent.
|
|
<PRE> SECTION LEVEL=2;
|
|
STITLE ID=ABC >What Next?<
|
|
/STITLE;
|
|
IDX;
|
|
FIG X=7 y=67 CAP=>The solution<
|
|
>Hello<
|
|
/FIG;
|
|
/IDX;
|
|
|
|
</PRE>Now we have only one form of quoting
|
|
and we can easily distinguish between
|
|
markup and text because one is inside
|
|
and the other outside the quotes.
|
|
<H2>Mark the structure</H2>An independent point is a fundamental
|
|
bug in the language design which
|
|
makes it impossible to tell which
|
|
elements are empty without the DTD.
|
|
In other words, the structure is
|
|
not apparent from the syntax. For
|
|
a "structured markup language", that's
|
|
pretty bad.<P>
|
|
If you run Dynatext, for example,
|
|
all you have to do is tell it which
|
|
elements are empty and it can do
|
|
a good job without any DTD. It should
|
|
really be possible to see the structure
|
|
at a low level. So I would suggest
|
|
some kind of opening symbol which
|
|
was mandatory on all element opening
|
|
tags. maybe a trailing / for symetry
|
|
with the leading / of the closing
|
|
tag. For example:
|
|
<PRE> SECTION/ LEVEL=2;
|
|
STITLE/ ID=ABC >What Next?<
|
|
/STITLE;
|
|
IDX/;
|
|
FIG/ X=7 y=67 CAP=>The solution<
|
|
>Hello<
|
|
/FIG;
|
|
/ IDX;
|
|
|
|
</PRE>Now I can parse that and see that
|
|
I am missing a section end.<P>
|
|
Of course real language people might
|
|
want use a different concrete syntax:
|
|
<PRE> { section(level=2)
|
|
{ stitle(id=abc)
|
|
"What's Next?"
|
|
} stitle
|
|
{ idx
|
|
{ fig (x=7, y=67, cap="The solution")
|
|
"Hello"
|
|
} fig
|
|
} idx
|
|
|
|
</PRE>but we wouldn't like SGML not to
|
|
look like SGML, would we? :-)
|
|
<PRE>
|
|
|
|
</PRE>
|
|
<ADDRESS><A
|
|
NAME="0" HREF="http://www.w3.org./hypertext/TBL_Disclaimer.html">Tim BL</A></A>
|
|
</ADDRESS></BODY>
|
|
</HTML>
|