You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
1716 lines
61 KiB
1716 lines
61 KiB
From: jfg@dxcern.cern.ch (Jean Francois Groff)
|
|
To: www@nxoc01.cern.ch, wei@xcf.berkeley.edu, connolly@pixel.convex.com
|
|
Subject: HTML is not HTML (Was: Update Queries)
|
|
Date: Fri, 5 Jun 92 04:16:54 +0200
|
|
|
|
Tim,
|
|
|
|
Here are my latest thoughts on the SGML/HTML/HTML2 issue. I thought
|
|
Dan and Pei might want to comment as well...
|
|
|
|
-- begin quoted message from timbl -----------------------------------
|
|
|
|
> It's important that this time we make the SGML proper SGML.
|
|
|
|
Absolutely.
|
|
|
|
> The only way to include other formats is to use a NOTATION= attribute.
|
|
|
|
Precisely. And following a discussion with the local expert at CERN,
|
|
it appears that this embedding mechanism is not powerful enough for
|
|
our purposes. Our ponderings on how to make the SGML parser ignore the
|
|
arbitrary junk between <BODY> and </BODY> are pointless: this content
|
|
has to abide by the rules set forth in the SGML declaration, notably
|
|
be composed of SGML characters. As Dan said in his latest message,
|
|
|
|
Dan> An SGML document consists of 3 parts: the declaration, the
|
|
Dan> prologue, and the instance. The declaration lays the groundwork
|
|
Dan> -- defines the encoding and interpretation of the character
|
|
Dan> set(s), sets processing limits and bounds, and other lexical
|
|
Dan> stuff. Applications generally use the default SGML declaration
|
|
Dan> given in the standard. Each SGML parser has a declaration that
|
|
Dan> declares its feature list and limits. If HTML cannot be described
|
|
Dan> with the default SGML declaration, this will severely limit the
|
|
Dan> usable parsers.
|
|
|
|
I definitely believe we should stick to the default SGML declaration.
|
|
We don't want to reinvent a lexical level. Besides, it would probably
|
|
be impossible to do so in full genericity, since we don't even know in
|
|
advance which weird formats we'll want to handle in the future.
|
|
|
|
There's a kludge to embed arbitrary things in SGML, using NDATA, but
|
|
this always involves referring to an external file, so it's not real
|
|
encapsulation as we would need (Dan, if you have a counter-example,
|
|
tell us about it !) So we can abandon the idea of encapsulating the
|
|
returned data format in an <HTDOC>. More on this later.
|
|
|
|
Back to Tim's message:
|
|
> Also, an SGML document must be one element. You can't start in the
|
|
> middle like we do. Our parsers assume <HTDOC> but a generic SGML
|
|
> parser won't and will assume that the <TITLE> for example isthe whole
|
|
> document.
|
|
|
|
Agreed, except for the semantics of <HTDOC> ; see below.
|
|
|
|
[ Discussion of TOEOF and byte count ideas deleted ]
|
|
|
|
> You say you think HTDOC, HTERR, HTFWD should not be part of the HTML
|
|
> but be a separate language. What language? Another arbitrary one?
|
|
> Something binary? Why not use SGML for that too? (Would you prefer
|
|
> ASN/1 representation?)
|
|
|
|
We must be careful not to mix levels here. We want to use SGML
|
|
markup at two levels: describing hypertext, and describing possible
|
|
replies from an HTTP server. These are currently mixed, for historical
|
|
implementation reasons, into what we call HTML. IMnsHO, the term HTML
|
|
should be reserved to describe its expansion: HyperText Markup
|
|
Language, considered as a data format. Therefore, I'm in favor of
|
|
clearly separating the `protocol' part. Remember that we can retrieve
|
|
HTML data from other sources than HTTP servers, and conversely a .html
|
|
file containing <HTERR> would be nonsense...
|
|
|
|
Moreover, this fits our basic designs better ; remember the Browser
|
|
Architecture graph ? The format manager was intended to be a separate
|
|
object taking data from the various supported transfer protocols
|
|
(HTTP, File, News, ...), and passing it to the appropriate parser,
|
|
which can be HTML, LateX, ASCII, PostScript, WordPerfect, whatever...
|
|
Either it's a directly supported format and the appropriate parser
|
|
builds an HText for it through the HText object interface, or we fork
|
|
(or message) another application to deal with the data (like piloting
|
|
synths :-) or viewing PAW graphs :-/ let's not forget that Rene pays!).
|
|
|
|
Of course, both parts must be correct SGML now, i.e. we have two
|
|
document types (hence 2 formal DTDs), which I suggest calling HTML
|
|
and HTTP_REPLY.
|
|
|
|
The HTML DTD essentially comprises the current hypertext markup,
|
|
with all necessary amendments (quoting, minimization, etc.), and the
|
|
instance is surrounded by its document type identifier (syntax: is
|
|
<HTML> ... </HTML> OK ???). Thus we don't "start in the middle".
|
|
|
|
An HTTP_REPLY instance can be one of the suggested <HTTP_DOC>,
|
|
<HTTP_ERR> or <HTTP_FWD>, surrounded by <HTTP_REPLY>...</HTTP_REPLY>.
|
|
In the case of HTTP_DOC, the client should expect to receive the data
|
|
immediately after </HTTP_REPLY>, and pass it along to a parser or an
|
|
external application depending on the format(s) specified by HTTP_DOC
|
|
attributes, and on its local format-to-application mapping tables.
|
|
EOF indicates the end of the data (logical, ain't it?). <HTTP_ERR> can
|
|
have the suggested attributes, and can be followed by some explanatory
|
|
text which will be displayed according to the client's user-interface
|
|
natural style (e.g. an alert panel), and then </HTTP_ERR></HTTP_REPLY>.
|
|
Given <HTTP_FWD>, the client should immediately* fetch the UDI found in
|
|
the attributes. Some explanation can be displayed as well, perhaps
|
|
depending on a user-settable verbosity level.
|
|
|
|
*immediately: I mean immediately after having read the whole HTTP_REPLY
|
|
|
|
For backward compatibility with level-1 servers, level-2 clients
|
|
should treat a heading <PLAINTEXT> as:
|
|
|
|
<HTTP_REPLY>
|
|
<HTTP_DOC NOTATION="PLAINTEXT"> </HTTP_DOC>
|
|
</HTTP_REPLY>
|
|
|
|
(I don't know whether such a substitution would be heretic to the
|
|
Holy SGML Bible -- i.e. can we formalize it ? I wouldn't bother.)
|
|
|
|
And finally, if the received data begins with neither <HTTP_REPLY>
|
|
nor <PLAINTEXT>, then it can only be HTML-1, and we can either use the
|
|
old heretic parser, or send the user to Purgatory... Just prepend:
|
|
|
|
<HTTP_REPLY>
|
|
<HTTP_DOC NOTATION="HTML-1"> </HTTP_DOC>
|
|
</HTTP_REPLY>
|
|
|
|
and let the format manager decide what to do with this HTML-1. Perhaps
|
|
a clever student will write an HTML-1 to HTML-2 on-the-fly converter.
|
|
But as long as there are old servers around, we can leave the current
|
|
HTML-1 parser in the library besides the shiny new HTML-2. I reckon
|
|
that by the time anyone will use an industry-strength SGML engine on
|
|
HTML-2, HTML-1 servers will be extinct.
|
|
|
|
Pop-up note: would it be politically correct to add a VERSION="2.0"
|
|
attribute to the suggested <HTML> and <HTTP_REPLY> tags ? Or should
|
|
that be the job of a further tag ? (I'm inclined towards the first
|
|
solution.)
|
|
|
|
----- </RAMBLINGS> --------------------------------------------------
|
|
|
|
<AUTHOR EMAIL="jfg@info.cern.ch" STATUS="TIRED">
|
|
<A HREF="http://info.cern.ch/hypertext/WWW/People.html#Groff">Jean-Francois</A>
|
|
</AUTHOR>
|
|
|
|
======================================================================
|
|
|
|
From: Dan Connolly <connolly@pixel.convex.com>
|
|
To: jfg@dxcern.cern.ch (Jean Francois Groff)
|
|
Cc: www@nxoc01.cern.ch, wei@xcf.berkeley.edu
|
|
Subject: Re: HTML is not HTML (Was: Update Queries)
|
|
Date: Fri, 05 Jun 92 12:03:54 CDT
|
|
|
|
|
|
About encapsulation mechanisms...
|
|
|
|
>-- begin quoted message from timbl -----------------------------------
|
|
>
|
|
>> It's important that this time we make the SGML proper SGML.
|
|
>
|
|
> Absolutely.
|
|
>
|
|
>> The only way to include other formats is to use a NOTATION= attribute.
|
|
>
|
|
Since it appears to be foolhardy to try to fit everythin _inside_
|
|
SGML, I move that we use a mechanism that was designed to do exactly
|
|
what we're up to: MIME.
|
|
|
|
First of all, it's easier to implement. The RFC for MIME is the
|
|
kind of thing one person can reasonably read, comprehend, and
|
|
implement -- especially given the headstart of available code.
|
|
Not so for the SGML standard.
|
|
|
|
I need to look over my MIME info again to see how to fit this application
|
|
into that architecture, but it seems like a pretty natural match.
|
|
|
|
Let's see... what are the features of HTML.
|
|
1. Describe formatted text. To implement this inside of MIME, we
|
|
simply define a subtype X-HTML (soon to be just HTML) and make sure
|
|
it fits lexically within the MIME text datatype.
|
|
|
|
2. Embedd links to other documents or elements of other documents.
|
|
The current mechanism is the UDI. I suggest that this is not really
|
|
catching on and it has some major limitations. Why not make links
|
|
first class SGML external entities? They would be SYSTEM entities
|
|
and the SGML application, that is the WWW client, would resolve the
|
|
entity by consutlting the corresponding MIME external reference.
|
|
|
|
3. Allow for multimedia. We're running into trouble with the current
|
|
architecture here. But this is a snap with MIME. And again, to
|
|
reference multimedia objects, we just make an SGML external element
|
|
that points to a MIME object.
|
|
|
|
Here's a prototype example (forgive me for not consulting documentation for
|
|
proper syntax):
|
|
|
|
Subject: like the WAIS headline
|
|
Message-ID: <boy I'd sure like to have message ID's for these things.>
|
|
Content-Type: multipart/mixed
|
|
|
|
dummy body explaining this format to non-MIME readers
|
|
|
|
----
|
|
Content-Type: text/X-HTML
|
|
|
|
<HTML>
|
|
<!ENTITY part2 SYSTEM
|
|
-- I wonder if there's a way to implicitly declare these-->
|
|
<!ENTITY part3 SYSTEM>
|
|
<!ENTITY part4 SYSTEM>
|
|
|
|
<TITLE>prototype document</TITLE>
|
|
<H1>Internal links</H1>
|
|
<A IDREF=link1>pointer to XYZ paragraph</A>
|
|
Here's a picture of a monkey: <RASTER MIME=part2>
|
|
<H1>External links</H1>
|
|
See <A MIME=part3>section 3 of [Berners-Lee 92]</A> for more info
|
|
See <A MIME=part4>the comp.text.sgml newsgroup</A> for SGML info.
|
|
<H1><A ID=link1>XYZ</A></H1>
|
|
</HTML>
|
|
|
|
----
|
|
Content-ID: part2
|
|
Content-Type: raster/GIF
|
|
Content-Encoding: 8bit (This is allowed, but I'm not sure how it works)
|
|
(or, we could encode it and use)
|
|
Content-Encoding: MIME's-uuencode-workalike (if need be)
|
|
|
|
@#$@$#@#$ raw GIF data @#%$@#$@
|
|
|
|
----
|
|
Content-ID: part3
|
|
Content-Type: external/HTTP
|
|
HOST=info.cern.ch
|
|
PATH=hypertext/papers/report92
|
|
IDREF=section3
|
|
|
|
----
|
|
Content-ID: part4
|
|
Content-Type: external/NNTP
|
|
GROUP=comp.text.sgml
|
|
|
|
----
|
|
|
|
======================================================================
|
|
|
|
From: Dan Connolly <connolly@pixel.convex.com>
|
|
To: www-interest@nxoc01.cern.ch
|
|
Subject: revised MIME architecture
|
|
Date: Sat, 06 Jun 92 17:31:58 CDT
|
|
|
|
|
|
--cut-here
|
|
|
|
In an earlier message, I proposed we make the W3 project
|
|
interoperate with MIME systems. I made the mistake
|
|
of using existing names for formats and types that
|
|
don't yet exist.
|
|
|
|
I'd like to make a more organized transition to MIME
|
|
interoperability.
|
|
|
|
First, we define some types for existing web servers
|
|
and documents.
|
|
|
|
X-HTTP is an access-type for message/external-body body
|
|
parts to access existing W3 servers.
|
|
Additional parameters include host, port, path, and anchor.
|
|
|
|
X-HTML is a subtype of text for existing W3 documents.
|
|
|
|
So the next part of this message is an HTML document expressed
|
|
as a MIME external-body message.
|
|
|
|
--cut-here
|
|
Content-type: message/external-body;
|
|
access-type=X-HTTP;
|
|
host=info.cern.ch;
|
|
port=2784;
|
|
path=/hypertext/WWW/TheProject.html
|
|
|
|
Content-type: text/X-HTML
|
|
|
|
|
|
--cut-here
|
|
|
|
Then we address limitations in the existing format with two
|
|
new types:
|
|
|
|
In order to encapsulate multimedia objects in web nodes,
|
|
we define X-HYPERTEXT to be a subtype of the multipart body type.
|
|
The first part of a multipart/X-HYPERTEXT is the content of the hypertext.
|
|
The other parts are multimedia attachments and links to other documents.
|
|
|
|
The user agent (WWW client) displays the first part and allows the
|
|
user to choose attachments and/or links. The attachments and links
|
|
will be displayed in addition to or in place of the original content.
|
|
|
|
Then, in order to formalize the structure of hypertext parts,
|
|
we define X-SGML to be a subtype of text. The body of an X-SGML part must
|
|
be a complete SGML document. The user agent (WWW client) will resolve
|
|
external entities (such as the DTD and the mutlimedia attachments).
|
|
|
|
So here's a multimedia web node expressed as MIME body part:
|
|
|
|
--cut-here
|
|
Content-Type: multipart/X-HYPERTEXT;boundary=attachment
|
|
|
|
--attachment
|
|
Content-Type: text/SGML
|
|
|
|
<!DOCTYPE WEB-NODE SYSTEM
|
|
[
|
|
<!ENTITY UDI001 SDATA "HTTP://info.cern.ch/hypertext/WWW/TheProject.html">
|
|
<!ENTITY part3 SDATA "part3">
|
|
]>
|
|
<TITLE>Sample mutlimedia web node</TITLE>
|
|
<SECTION><H1>Old features</H1>
|
|
Here's a link to some info at cern:
|
|
<A HREF=UDI001>cern stuff</A>
|
|
<SECTION><H2>New features</H2>
|
|
Here's a picture: <IMAGE ATTACHMENT=part3>
|
|
|
|
|
|
--attachment
|
|
Content-id: HTTP://info.cern.ch/hypertext/WWW/TheProject.html
|
|
Content-type: message/external-body
|
|
;access-type=X-HTTP
|
|
;host="info.cern.ch"
|
|
;name="/hypertext/WWW/TheProject.html"
|
|
|
|
Content-Type: text/X-HTML
|
|
|
|
--attachment
|
|
Content-id: part3
|
|
Content-type: image/gif
|
|
Content-transfer-encoding: base64
|
|
|
|
R0lGODdhdQAvAIQAAL9/v3+ff39/f/+/f/+ff/9/f3+fv///v39/v//fv/+/v/+fv/9/v7+/
|
|
f7+ff79/f//////f/7/fv7+/v7+fvwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
|
|
ACwAAAAAdQAvAAAF/uARHQkkiuMoQmZUnixUvkfNHjKbivM75riEMIa7xYiQXYvH/CVIMuhz
|
|
5gK6nsuabUYawqamcCQJvOGKQZnz9oues9rge/wOy4asWdQpj6r1EWNVRSVKT3RLYXWEbIot
|
|
jjw6J21jZEdaOycleVCTQYc9PXQ0nSMvXC4qg46QPzA4bWdKNXpvtFpPK4g+d4W9vUIKXHk7
|
|
p0m2KSYJV0JrejmNrJp7dkLNwtbLPKmXLc3DUd8JCuTl2MzZh37ariuHnmVAUCZFpndXRi48
|
|
2MvLKdauikWyBsyauXIUFEwgN65cQyFSfHhyZ+RFHlaPSFCEJgtWM45hapUhEWjIpn4N/oUp
|
|
VEBhAAUCAwgsaKCA5sIJFG4qGLOpkx0Vyqz4PHkCVtBOIm040iNxEDeSmnz463dlwoKFORU4
|
|
WKlw5oKtE6Z4wYelip0iZ6GdFIO27TGeOVyFzNiOjj01Upb0c5gym06rDVbhQRrU0MW5i5Ag
|
|
MiJtLq83hfLeeZTr3x9xKJlhU9il54rPSNLY8JkmIxpCtogo4ZiGhpF5RSMXY4oys8FxFFIB
|
|
HC3nSzXVae1AE8721cUaldrClT0pRw/nk9/ppjrB6gCNASNLYmURdOq0trQfOU4GeVBIqV85
|
|
+SeqC3aDCgjEPzbxVEd7HefA+u72LO/XZ40CHVyGrOFJMTxd/rENVgoM8E5ZjyTiVizHAFiG
|
|
RcGJIdx+QEGiiEVS4SWRP51dRoICCwxwlR9u5GJcLvRU5OGGMrYSIy0WMreDXVTkJ6IK7EBE
|
|
RUMyOcCDdG8hh9YmErIhmlJoIMaYDq/JlVhHWeiD1xc7bpPNSgkJ0weM2mmkDGmJ/FQljegh
|
|
AxlbUgKyhWXNHPlgkAmkqNUC0WmixYXdGKfknyKxiKFbUe6SRkC2UHRGT2Bcho5VFDjQkgPt
|
|
VTFIJLck8ligH+boKYBbvIYlPCGGs9wU+qRCQaUvvYqpfVxwOlx3WWLnnRngNdYTIrKMtFQW
|
|
JgVV7CEKTUCApcwye4U+AGWjV16H/oYWbJSjMZZtRo29WQsWRFVhEjsrESArrLAaKaQJKFpa
|
|
XVilZYtKKRk1VaFo5GXbh3429lBUbMoksJC7lprrwAMUPGBpOuM4gJO7K42DGT1AFdWhLONI
|
|
AA8MHu7nJKOZKOkGxaxilVPBsCr8KsJ9yVRpdczilFN1yQ4k7pH2CPHuTVigGa9cuHbHcRyP
|
|
WuTCw+mmvLLCCDfoNLqWNoA0sy/Di9SJEryrCtIUNEABROAx2jHQzrkYEooviSnwAklbirAD
|
|
cK+csDkGc02wrEizw0N1FFw1E82VKuQ13z2j5uG8cqiyhdksQV3pV0knXCnCABwMt8IDKDv1
|
|
yefGTfXL/sHgxrnMr5KeU+kJ/Qfjf1Iq2U9nOLBkaeSOU90sAJI33XnblRpAQQAUCHDu6TjJ
|
|
7O7LXb/ssAMzvUwOLl5gzJAxKOnE8OxUK9xsyg4AgPADlTsgQPcBPDCAVo6nW3nllTpggO1J
|
|
TzA7uvQTD6uYLqCoQOzk8L3QMhErHur6Foy2+Q5q7Gtf+QLgAAR072DB413cHvC+CiLMgQAA
|
|
nqXeF7m4tc948nscTqRmDpqhDXtVW96r+gY/k1EtAOH7HaxgKIAHIIB93useBR8APqXNTng6
|
|
lJzjfGcp4B1wfkjcnfzmF0JYTcBrJxPg/CQoK/g9MGEG+F7CPNe98TXwAeUD/iMYPRc8ZgmP
|
|
gb9DIdyKWEXgyfBzn6Pf5wY3Oyh2MH27U58G35i0ytkQbgCoIQQF4L1Aeq+MwXsb7oRnQAo4
|
|
EIu4S5oR4WjF4U0RfrWrIhwrlcGEZfCQ32ug+PwYSB0igIfiCyMiEfa7DKrsbZy8ovvWeMZ0
|
|
MYuIllwhCtu3Sypu8FwMrBwG3Za9B76NgoAU4xdPaUOVCW98fsQiGWdJQ08iIJadFGbc0IjG
|
|
9qXxm7ubpi7f98AsUqBy5oTbI2uITMkdLIuBBKP3Tmk5dhLScq7M4vuaCUMdhg+Z6mQjGGG1
|
|
PsmFz3dunJ3ChIcw4T2SlQzEYOX6Wal+ds+Bf6TA/vu8Vz6GRpOH5fsiIeUpgGdCkGl/dFv5
|
|
3nfKLNqQoeVjX0oxSk7fKSyHNJVVJyf4PoqykX3Q/N4+J+c9kK7vAeMjZPlOGc8dEpKdBzvl
|
|
+FoaVXkmbHyT46A8F7rTg17wpuaUXBYP9smLihJ3DwRl+XL3O6YlMosRjar4PGlP8fEwizW0
|
|
5x+dejBBjlQAwEPqU+PJUbSCT56n3KKlxoewiKqskwpzIAQP1lD33bStZbzgR78HPsZWTpB7
|
|
DWQikeq9GkoVqZ1N7D1hyEOGXm6WMFWqDi8HQ3MeM4ZbJSbu5DlUhY21sgAY61zHh9cyltK0
|
|
zcwrakva2uX+0ZVVHaU8/jsaSGYmN4el/aIB/OhMZDbznAEFIjQvV6lhljV86wviTQfpSaYx
|
|
NpFF7SxqmancZlZVnmSdnDq3ODkHCg+90y2lJzXoRQwmDIMMdSA92SdLoJLPrPoEpPscqdLK
|
|
PhW1RWXnXW+KRmO6koKHPKkiiXlK1qZ1h/784hbXy1/wVVim+nzlA8mrxdemNWHr5eFV7dpM
|
|
t57UdkB2p051O1nyNfam2luwVcUYQe2prFI1HKtMNRpZAPzzqmjl6EKv+sca4ni0hy3yiitV
|
|
ADIr9sprbB8pYfjlTt7ze/1cMgTZ7FJWTtbFUrZcA10JZaTqcKzDPCxyUVliPx/WXMtiAHkn
|
|
KchJNbO1sf/87YjLqsXLehmVS0urA0yMSmEyjZQPZOxhF2rovuZWACEAADs=
|
|
|
|
--attachment--
|
|
|
|
--cut-here
|
|
|
|
And here's the DTD for WEB-NODE documents:
|
|
|
|
--cut-here
|
|
|
|
<!-- This DTD was produced by DeveGram on Tue Jun 2 18:58:16 1992 -->
|
|
<!-- and hand-edited by connolly@convex.com -->
|
|
|
|
<!-- Parameter Entities -->
|
|
|
|
<!-- Terminal symbols -->
|
|
|
|
<!ENTITY % words "#PCDATA" >
|
|
|
|
<!-- Non-ELEMENT symbols -->
|
|
|
|
<!ENTITY % inline "%words | A" >
|
|
<!ENTITY % text "%inline | P | IMAGE" >
|
|
<!ENTITY % heading "H1|H2|H3|H4|H5|H6" >
|
|
|
|
<!ENTITY lt "<">
|
|
<!ENTITY gt ">">
|
|
<!ENTITY amp "&">
|
|
|
|
<!ENTITY lt. "<">
|
|
<!ENTITY gt. ">">
|
|
<!ENTITY amp. "&">
|
|
|
|
<!-- Document structure -->
|
|
|
|
<!ELEMENT WEB-NODE O O (TITLE, NEXTID?, ISINDEX?, section+, ADDRESS?)>
|
|
|
|
<!ELEMENT TITLE - - (%inline)+>
|
|
<!ELEMENT ADDRESS - - (%text)+>
|
|
|
|
<!ELEMENT NEXTID - O EMPTY >
|
|
<!ATTLIST NEXTID N NUMBER #IMPLIED>
|
|
|
|
<!ELEMENT ISINDEX - O EMPTY >
|
|
|
|
|
|
<!ELEMENT section O O ((%heading)?,
|
|
(
|
|
%text |
|
|
section |
|
|
MENU |
|
|
UL |
|
|
OL |
|
|
DIR |
|
|
DL)+)>
|
|
|
|
<!ELEMENT (H1|H2|H3|H4|H5|H6) - - (%inline) >
|
|
|
|
<!ELEMENT P - O EMPTY -- paragraph SEPARATOR -->
|
|
|
|
<!ELEMENT IMAGE - O EMPTY>
|
|
<!ATTLIST IMAGE ATTACHMENT ENTITY #REQUIRED>
|
|
|
|
<!ELEMENT A - - (%inline)+>
|
|
<!ATTLIST A
|
|
NAME CDATA #IMPLIED
|
|
HREF ENTITY #IMPLIED
|
|
TYPE CDATA #IMPLIED --@@-- >
|
|
|
|
<!ELEMENT MENU - - (LI+)>
|
|
|
|
<!ELEMENT UL - - (LI+)>
|
|
|
|
<!ELEMENT OL - - (LI+)>
|
|
|
|
<!ELEMENT DIR - - (LI+)>
|
|
|
|
<!ELEMENT LI - O (%text)+>
|
|
|
|
<!ELEMENT DL - - ((DT, DD)+)>
|
|
|
|
<!ELEMENT DT - O (%inline)+>
|
|
|
|
<!ELEMENT DD - O (%text)+>
|
|
|
|
--cut-here
|
|
|
|
And here's a perl script to convert an HTML document
|
|
into a multipart/X-HYPERTEXT MIME body part:
|
|
|
|
--cut-here
|
|
|
|
#!/usr/local/bin/perl
|
|
|
|
$boundary = "attachment";
|
|
print "Content-Type: multipart/X-HYPERTEXT; boundary=$boundary\n\n";
|
|
|
|
print "--$boundary\n";
|
|
print "Content-Type: text/SGML\n\n";
|
|
|
|
print "<!DOCTYPE WEB-NODE SYSTEM \n[\n";
|
|
|
|
@html = <>; # read whole file
|
|
$_ = join('', @html);
|
|
$out = '';
|
|
|
|
sub fix_anchor{
|
|
local($name, $href, $type);
|
|
|
|
# What exactly is the syntax of an SGML attribute value?
|
|
while(s/^(\w+)\s*=\s*((\"[^\"]*\")|([^\s>]+))\s*//){
|
|
local($v) = ($3 || $4);
|
|
local($a) = $1;
|
|
$href = $v if $a =~ /^href$/i;
|
|
$name = $v if $a =~ /^name$/i;
|
|
$type = $v if $a =~ /^type$/i;
|
|
}
|
|
s/[^>]*>//;
|
|
|
|
$out .= "<A";
|
|
$out .= " NAME=\"$name\"" if $name ne '';
|
|
$out .= " TYPE=\"$type\"" if $type ne '';
|
|
if($href ne ''){
|
|
if(!defined($anchor{$href})){
|
|
$anchor{$href} = ++$anchor;
|
|
}
|
|
$out .= " HREF=" . $anchor{$href};
|
|
}
|
|
$out .= ">";
|
|
}
|
|
|
|
$header = 0;
|
|
$anchor = "UDI000";
|
|
while(/</){
|
|
$out .= $`;
|
|
$_ = $';
|
|
if(s/^A\s+//i){
|
|
&fix_anchor;
|
|
}elsif(s/^NEXTID\s+(\d+)\s*>//){
|
|
$out .= "<NEXTID N=$1>";
|
|
}elsif(s/^H(\d)>//){
|
|
local($n) = $1;
|
|
while($n<=$header){ $out .= "</SECTION>"; $header--; }
|
|
while($n>$header){ $out .= "<SECTION>"; $header++; }
|
|
$out .= "<H$n>";
|
|
}else{
|
|
$out .= '<';
|
|
}
|
|
}
|
|
|
|
$out .= $_;
|
|
|
|
foreach(keys %anchor){
|
|
local($ent) = $anchor{$_};
|
|
|
|
print "<!ENTITY $ent SDATA \"$_\">\n";
|
|
}
|
|
|
|
print "]>\n", $out;
|
|
|
|
foreach(keys %anchor){
|
|
local($access_type);
|
|
|
|
print "\n\n--$boundary\n";
|
|
print "Content-id: $_\n";
|
|
print "Content-type: message/external-body\n";
|
|
|
|
$access_type = $1 if s/^(\w+)://;
|
|
|
|
if(s/#([^#]+)$//){
|
|
print "\t;x-anchor=\"$1\"\n";
|
|
}
|
|
|
|
if($access_type =~ /file/i){
|
|
if(&hostport){
|
|
¶m('access-type', "ANON-FTP");
|
|
}else{
|
|
¶m('access-type', 'LOCAL-FILE');
|
|
}
|
|
¶m('name', $_);
|
|
|
|
print "\nContent-Type: application/octet-stream\n\n";
|
|
}elsif($access_type =~ /http/i){
|
|
¶m('access-type', 'X-HTTP');
|
|
&hostport;
|
|
&unescape;
|
|
¶m('name', $_);
|
|
|
|
print "\nContent-Type: text/X-HTML\n\n";
|
|
}elsif($access_type =~ /news/i){
|
|
¶m('access-type', 'X-NEWS');
|
|
&unescape;
|
|
if(/@/){
|
|
¶m('message-id', $_);
|
|
}else{
|
|
¶m('group', $_);
|
|
}
|
|
|
|
print "\nContent-Type: message\n\n";
|
|
|
|
}elsif($access_type =~ /telnet/i){
|
|
¶m('access-type', 'x-telnet');
|
|
&unescape;
|
|
¶m('user', $1) if s/^(.*)@//;
|
|
¶m('port', $1) if s/:(.*)$//;
|
|
¶m('site', $_);
|
|
|
|
print "\nContent-Type: X-TELNET\n\n";
|
|
|
|
}elsif($access_type =~ /gopher/i){
|
|
¶m('access-type', 'x-gopher');
|
|
&hostport;
|
|
¶m('type', $1) if s-^/(\d+)/--;
|
|
&unescape;
|
|
¶m('selector', $_);
|
|
|
|
print "\nContent-Type: @@@@\n\n";
|
|
|
|
}elsif($access_type =~ /wais/i){
|
|
¶m('access-type', 'x-wais');
|
|
&hostport;
|
|
if(m-^/-){
|
|
¶m('type', $1) if s-^/(\w+)--;
|
|
¶m('size', $1) if s-^/(\d+)--;
|
|
&unescape;
|
|
¶m('path', $_);
|
|
}else{
|
|
&unescape;
|
|
¶m('words', $1) if /\?(.*)/;
|
|
}
|
|
|
|
$type = "image/$type" if $type =~ /gif|tiff/i;
|
|
$type = "application/postscript" if $type =~ /PS/i;
|
|
|
|
print "\nContent-Type: $type\n\n";
|
|
|
|
}elsif($access_type eq ""){
|
|
¶m('access-type', 'x-relative');
|
|
&unescape;
|
|
¶m('name', $_);
|
|
|
|
print "\nContent-Type: message\n\n";
|
|
}else{
|
|
warn "unknown access type: $access_type in $_";
|
|
}
|
|
}
|
|
|
|
print "--$boundary--\n";
|
|
|
|
sub unescape{
|
|
s/%(\w\w)/sprintf("%c",hex($1))/ge;
|
|
}
|
|
|
|
sub param{
|
|
local($p, $v) = @_;
|
|
# quote tspecials in parameter values
|
|
$v = '"'.$v.'"' if $v =~ m-[\s()<>@,;:\\\"\/\[\]?\.=]-;
|
|
print "\t;$p=$v\n";
|
|
}
|
|
|
|
sub hostport{
|
|
if(s-//([^:/]+)--){
|
|
¶m('host', $1);
|
|
¶m('port', $1) if s/:(\d+)//;
|
|
1;
|
|
}else{
|
|
0;
|
|
}
|
|
}
|
|
|
|
--cut-here--
|
|
|
|
======================================================================
|
|
|
|
From: Dan Connolly <connolly@pixel.convex.com>
|
|
To: www-talk@nxoc01.cern.ch
|
|
Subject: HTML is not SMGL
|
|
Date: Sun, 07 Jun 92 00:12:55 CDT
|
|
|
|
My grandiose scheme to convert HTML to MIME and SGML
|
|
works fine.
|
|
|
|
Now I'm going back to the idea of writing a DTD for
|
|
the existing HTML format. I can't seem to do it.
|
|
HTML has so little rigid structure that I'm running
|
|
into mixed content problems (I have to allow #PCDATA
|
|
almost anywhere, hence mixed content, which screws
|
|
up everything).
|
|
|
|
How much extant HTML is really out there? And how
|
|
much of it is generated on the fly by gateways
|
|
and servers?
|
|
|
|
This MIME/SGML stuff sure seems like the way to go.
|
|
|
|
Now if I make it possible to create such documents
|
|
with FrameMaker and a perl script, I bet it will
|
|
catch on. I suspect I'll get some resistance against
|
|
abandoning UDI's, but I don't think they work.
|
|
|
|
Dan
|
|
|
|
======================================================================
|
|
|
|
From: jfg@dxcern.cern.ch (Jean Francois Groff)
|
|
To: www-talk@nxoc01.cern.ch
|
|
Subject: Re: HTML is not SMGL
|
|
Date: Mon, 8 Jun 92 01:01:02 +0200
|
|
|
|
Dan asked:
|
|
> How much extant HTML is really out there? And how much of it is
|
|
> generated on the fly by gateways and servers?
|
|
|
|
Our hypertext documentation is certainly the largest quantity of
|
|
HTML you can find in the world. Besides, we know all the people who
|
|
have produced their own, so making the Big Change would be relatively
|
|
simple for them (esp. given your impressive perl script). Gateways can
|
|
be changed easily too. But all the browsers must be updated before,
|
|
and that will take more time !!! (There are thousands of copies
|
|
installed...)
|
|
|
|
> I suspect I'll get some resistance against abandoning UDI's, but I
|
|
> don't think they work.
|
|
|
|
Well, you still use them internally, don't you ? ;^)
|
|
|
|
Jean-Francois
|
|
|
|
======================================================================
|
|
|
|
From: Edward Vielmetti <emv@msen.com>
|
|
To: jfg@dxcern.cern.ch (Jean Francois Groff)
|
|
Cc: www-talk@nxoc01.cern.ch
|
|
Subject: Re: HTML is not SMGL
|
|
Date: Sun, 07 Jun 92 20:26:48 EDT
|
|
|
|
The UDI vs. MIME argument is a non-arguement. MIME is sufficiently
|
|
flexible that if you construct an appropriate Content-type and define
|
|
its semantics appropriately it will accept UDI's and work accordingly.
|
|
"Simple matter of programming" :).
|
|
|
|
Explicit "attribute=value" tags are more flexible than the W3 approach
|
|
to turn the entire document ID into a big long string. I guess it
|
|
depends on whether you believe you are dealing with a big database
|
|
or a big file system. Both approaches have their place. Again as
|
|
a simplified case you have "udi=//host:port/path" as a MIME identifier
|
|
and all is well.
|
|
|
|
I expect that MIME will be available in many e-mail products over the next
|
|
3-5 years. Since the only application that has anywhere near universal
|
|
appeal on the net is e-mail, it strikes me as only appropriate that
|
|
hypertext systems try to get as much leverage from mail as they possibly
|
|
can.
|
|
|
|
--Ed
|
|
|
|
======================================================================
|
|
|
|
From: Dan Connolly <connolly@pixel.convex.com>
|
|
To: Edward Vielmetti <emv@msen.com>
|
|
Cc: jfg@dxcern.cern.ch (Jean Francois Groff), www-talk@nxoc01.cern.ch
|
|
Subject: Re: HTML is not SMGL
|
|
Date: Sun, 07 Jun 92 22:29:44 CDT
|
|
|
|
|
|
>The UDI vs. MIME argument is a non-arguement. MIME is sufficiently
|
|
>flexible that if you construct an appropriate Content-type and define
|
|
>its semantics appropriately it will accept UDI's and work accordingly.
|
|
>"Simple matter of programming" :).
|
|
>
|
|
>Explicit "attribute=value" tags are more flexible than the W3 approach
|
|
>to turn the entire document ID into a big long string. I guess it
|
|
>depends on whether you believe you are dealing with a big database
|
|
>or a big file system. Both approaches have their place. Again as
|
|
>a simplified case you have "udi=//host:port/path" as a MIME identifier
|
|
>and all is well.
|
|
>
|
|
The problems is that the syntax of a UDI doesn't fit into the syntax
|
|
of a MIME parameter (or an SGML attribute value...) because a UDI
|
|
might be arbitrarily long, and it cannot contain any whitespace (so
|
|
it can't be split across lines).
|
|
|
|
So these 200+ character UDI's for WAIS documents can't be
|
|
mailed around safely (even SGML has limits on the length of an
|
|
attribute value).
|
|
|
|
Heck, my WWW client (perhaps it's not the latest version, but still...)
|
|
can't even retrieve wais documents due to these problems.
|
|
|
|
Dan
|
|
|
|
======================================================================
|
|
|
|
From: Dan Connolly <connolly@pixel.convex.com>
|
|
To: www-talk@nxoc01.cern.ch, wais-talk@think.com
|
|
Subject: MIME for global hypertext
|
|
Date: Sun, 07 Jun 92 22:49:51 CDT
|
|
|
|
|
|
[This was posted to several newsgroups, but someone from wais-talk
|
|
suggest I forward it there also.]
|
|
|
|
|
|
The WAIS, gopher, and world-wide-web projects are all client/server
|
|
information retrieval systems. All three deliver plain text information
|
|
quite well, and they each have evolving mechanisms for delivering
|
|
other forms of information.
|
|
|
|
The MIME RFC defines a system for processing multi-part, multimedia
|
|
messages on the internet. I would like to see these systems, along
|
|
with USENET news and internet mail, interoperate with MIME as the substrate.
|
|
|
|
The clients for these systems go something like this:
|
|
0 user invokes client (and chooses a starting point)
|
|
1 client displays user's request
|
|
2 user reads page, chooses a reference to more info
|
|
3 user informs client of choice
|
|
(e.g. "show me item #1," or "search for googoo")
|
|
4 go to step 1
|
|
|
|
These systems often consist of a hierarchy of menus with text files at
|
|
the leaf nodes. The system allows the user to interactively navigate
|
|
the menus and browse leaf nodes. But 1) the format of the menus is
|
|
particular to the system (USENET newsgroups/articles, unix
|
|
directories/files, WAIS source/database/document). And 2) once a user
|
|
is at a leaf node, the system can no longer interactively follow
|
|
references.
|
|
|
|
The novel aspect of hypertext is that the distinction between the
|
|
menu pages and the text pages disappears. In the world-wide-web,
|
|
text documents have machine-readable links inside them, and all
|
|
menus are represented as hypertext documents.
|
|
|
|
The WWW format works well, but it would benefit from use of MIME's
|
|
features.
|
|
|
|
For a common hypertext document format, I propose we define a
|
|
subtype of the MIME multipart message: X-HYPERTEXT. The first
|
|
part of a multipart/X-HYPERTEXT message is the content of
|
|
the document, and the remaining parts are multimedia attachments
|
|
and links to other documents.
|
|
|
|
The content part contains references (by Content-ID) to the
|
|
attachments and links. The client software allows the user
|
|
to interactively choose references to display/follow.
|
|
|
|
The remaining parts may be attached image/audio/video using
|
|
MIME's various types and transfer encodings (text attachments
|
|
would work too) or they may be references to information
|
|
accessible elsewhere using MIME's message/external-body type.
|
|
The parameters to the external-body content-type provide the
|
|
same information as WWW's Universal Document Indentifier.
|
|
(MIME only defines ANON-FTP, FTP, TFTP, LOCAL-FILE and AFS.
|
|
The remaining access-types (WAIS, gopher, etc) would be
|
|
experimental (X-WAIS, X-GOPHER) until standardized.)
|
|
|
|
The emerging standard for structured, platform-independent text
|
|
is SGML. The WWW project defines an SGML document type with
|
|
traditional elements (title, heading, paragraph, list) and
|
|
new hypertext elements (anchor). Soon it will have multimedia
|
|
elements (image, audio).
|
|
|
|
The current design places external document references (to files,
|
|
WWW servers, WAIS documents, gophers, etc.) inside the SGML as
|
|
attributes. There are lexical incompatibilities, and the design
|
|
is under strain. I suggest that we implement references as
|
|
as SGML entities that identify message/external-body parts
|
|
by content-id.
|
|
|
|
Representing document content in SGML allows the same information
|
|
to be accessed using different user interface paradigms (e.g. dumb
|
|
terminals vs. curses style vs. x windows point-and-click).
|
|
|
|
Short of full SGML parsing, we could adopt the MIME text/richtext
|
|
format, with the addition of a <REF ID="xxx">...</REF> tag.
|
|
In fact, any representation that allows the user to interactively indicate
|
|
one of the attached body parts by content-id will do. For example,
|
|
plain text with one-line descriptions would do. The Andrew ez
|
|
data stream would also work, but only Andrew sites could parse it.
|
|
|
|
This brings up the issue of format negociation. No one format is
|
|
optimal for all information. Clients are likely to be able to process
|
|
information in several formats, and servers are likely to be able
|
|
to provide different representations.
|
|
|
|
The various formats can be enclosed in a MIME multipart/alternative
|
|
message. And rather than including the data for all formats in
|
|
the message, the data could be in message/external-body parts. The
|
|
client chooses the type of data it likes and retrieves the corresponding
|
|
external-body. This (modified) example from the MIME rfc may help explain:
|
|
|
|
MIME-Version: 1.0
|
|
Content-Type: multipart/alternative; boundary=42
|
|
|
|
--42
|
|
Content-Type: message/external-body;
|
|
name="BodyFormats.ps";
|
|
site="thumper.bellcore.com";
|
|
access-type=ANON-FTP;
|
|
directory="pub";
|
|
mode="image";
|
|
|
|
Content-type: application/postscript
|
|
|
|
--42
|
|
Content-Type: message/external-body;
|
|
name="/u/nsb/writing/rfcs/RFC-XXXX.ez";
|
|
site="thumper.bellcore.com";
|
|
access-type=AFS;
|
|
|
|
Content-type: application/x-ez
|
|
|
|
--42
|
|
Content-Type: message/external-body;
|
|
name="BodyFormats.txt";
|
|
site="thumper.bellcore.com";
|
|
access-type=ANON-FTP;
|
|
directory="pub";
|
|
|
|
Content-type: text/plain
|
|
|
|
--42--
|
|
|
|
The client can choose between postscript, ez, and plain text, and
|
|
retrieve the corresponding message body.
|
|
|
|
|
|
The question then becomes: how do these systems interoperate?
|
|
By making information available as multipart/X-HYPERTEXT MIME
|
|
messages.
|
|
|
|
The WWW client interfaced to the other systems by defining
|
|
"addressing schemes" and implementing the various protocols
|
|
and translating the data into HTML. Gopher has a similar
|
|
typing scheme -- one character is reserved to indicate
|
|
the access type and the data type. WAIS clients have yet
|
|
another method of resolving types, though they only support
|
|
one protocol. The NewsGrazer application has its own
|
|
encapsulation mechanism. This is becoming a mess.
|
|
|
|
In the short term, global hypertext viewers will have to support
|
|
the access-type and content-type of each system with which it
|
|
interoperates (so we have X-WAIS, X-HTTP, X-GOPHER, X-NNTP, as well as
|
|
X-WAIS-SRC, X-HTML, X-GOPHER-1 thru X-GOPHER-9).
|
|
|
|
Some of the access types will become standard, and some will die out.
|
|
But all the data types should be encapsulated in MIME messages. Any
|
|
data that has machine-readable pointers to other data should be made
|
|
into a multipart/X-HYPERTEXT message. For example, a WAIS question
|
|
should have attachments for each of the result documents (the content
|
|
part can stay application/x-wais-question, or it could be converted to
|
|
a text type, or both), at least in the case where those documents are
|
|
available by some standard access method. [I wrote a perl script that
|
|
will change an HTML document into a MIME message with attachments.]
|
|
|
|
Leaf documents, i.e. documents with no external links, can stay in
|
|
single part types. e.g. Plain text files become MIME messages by simply
|
|
adding a blank line at the beginning (to separate the headers (none)
|
|
from the body).
|
|
|
|
Under this model, a mail message can point to a news article
|
|
which references a WAIS document which contains several drawings
|
|
and pointers to several more available by FTP, and a user could
|
|
just point-and-click between them. The only need for
|
|
protocols like gopher and HTTP is to encapsulate data that's not
|
|
already MIME compliant.
|
|
|
|
This is clearly a pipe dream, but it's the kind of thing we can work
|
|
towards today.
|
|
|
|
Dan
|
|
|
|
|
|
======================================================================
|
|
|
|
From: mitra@pandora.sf.ca.us ()
|
|
To: connolly@pixel.convex.com, www-talk@nxoc01.cern.ch, wais-talk@think.com
|
|
Subject: MIME for global hypertext
|
|
Date: Mon, 8 Jun 92 13:11:15 PDT
|
|
|
|
Dan,
|
|
|
|
Thanks for that proposal. I must admit to not having read the MIME RFC,
|
|
being mostly concerned with text rather than multimedia, so I wasnt
|
|
aware of the hypertext implications of it.
|
|
|
|
My question is on a fairly minor point of your document, you mention that
|
|
a MIME document typically consists of a content and then the pointers,
|
|
with the hypertext links being references to the pointers. In Wais, it
|
|
is quite possible to return part of a document (by byte position), and
|
|
if the pointers are part of the document itself then they may not be
|
|
returned at the time the user chooses to try and follow a link?
|
|
|
|
My concerns are around doing these things for users on low-speed (2400 baud)
|
|
modems. For them, protocols need to be easy to handle at slow speed, and
|
|
need to be meaningfull BEFORE the whole document has been received. As the
|
|
Internet extends out to more and more users beyond the high-speed links
|
|
currently assumed the need for protocol designers to consider those users
|
|
becomes more important.
|
|
|
|
- Mitra
|
|
------------------------------------------------------------------
|
|
Mitra - technical director, Pandora Systems
|
|
mitra@pandora.sf.ca.us
|
|
|
|
|
|
======================================================================
|
|
|
|
From: Dan Connolly <connolly@pixel.convex.com>
|
|
To: mitra@pandora.sf.ca.us ()
|
|
Cc: wais-talk@think.com, www-talk@nxoc01.cern.ch
|
|
Subject: Re: MIME for global hypertext
|
|
Date: Mon, 08 Jun 92 15:50:17 CDT
|
|
|
|
|
|
|
|
>My question is on a fairly minor point of your document, you mention that
|
|
>a MIME document typically consists of a content and then the pointers,
|
|
>with the hypertext links being references to the pointers.
|
|
|
|
Well, this is not typical, but it's the model I'm proposing for
|
|
hypertext. Typically MIME message bodies are either single part
|
|
text/image/audio, or multipart. The standard multipart types
|
|
are mixed, meaning "show these one after the other," parallel,
|
|
meaning "show these at the same time," or alternative, meaning
|
|
"these all represnt the same info. Take your pick."
|
|
|
|
The "content and then list of pointers [or attachments]" model
|
|
is my own proposed format for hypertext.
|
|
|
|
> In Wais, it
|
|
>is quite possible to return part of a document (by byte position), and
|
|
>if the pointers are part of the document itself then they may not be
|
|
>returned at the time the user chooses to try and follow a link?
|
|
>
|
|
I would suggest that the WAIS server interpret the byte positions
|
|
as offsets into the content part of the hypertext. So the structure
|
|
remains in tact. Byte offsets into a MIME multipart message
|
|
don't mean much. Transport systems may mess with the headers and
|
|
trailing whitespace on body lines. Line offsets may be meaningful
|
|
inside text body parts, as long as none of the lines have to be
|
|
split due to line length constraints.
|
|
|
|
Keep in mind that this multipart structure is only necessary for
|
|
hypertext (i.e. contains links) and hypermedia (i.e. contains
|
|
multimedia attachments) documents.
|
|
|
|
Traditional documents can be simple single part bodies. For example,
|
|
A plain text file starting with a new-line will be interpreted as
|
|
a body part with no headers, which defaults to the type
|
|
"text/plain; charset=US-ASCII" ,i.e. plain old text.
|
|
|
|
>My concerns are around doing these things for users on low-speed (2400 baud)
|
|
>modems....
|
|
|
|
======================================================================
|
|
|
|
From: connolly@pixel.convex.com (Dan Connolly)
|
|
To: www-talk@nxoc01.cern.ch
|
|
Cc: enag@ifi.uio.no
|
|
Cc:
|
|
Subject: Re: using NOTATIONs inline
|
|
Date: Mon, 8 Jun 92 00:17:48 -0500
|
|
|
|
In article <23177A@erik.naggum.no> you write:
|
|
>Dan Connolly <connolly@convex.com> writes:
|
|
>|
|
|
>| The WWW group is attempting to define a multimedia interchange
|
|
>| format called HTML. . . .
|
|
>
|
|
>Why not use HyTime?
|
|
>
|
|
Eric:
|
|
Partyly because of ignorance (we've heard of HyTime, but we don't
|
|
know the details). I'd expect a HYTIME engine to be quite a bit
|
|
of work to implement. And partly because, as I understand it, HYTIME
|
|
doesn't go as far as to perscribe a DTD. The WWW project needs
|
|
one particluar language, not a whole architecture.
|
|
|
|
I'd certainly like to know more about HYTIME's techniques for addressing
|
|
documents, esp. elements of documents.
|
|
|
|
Now for the WWW gang:
|
|
>:
|
|
>| That is, is it possible to put an arbitrary 8 bit binary stream
|
|
>| _inside_ an SGML document? My guess is: no. But if we use
|
|
>| CDATA, can we include anything that doesn't contain the closing
|
|
>| tag in full?
|
|
>
|
|
>If you by "the closing tag in full" mean the entire end-tag, complete
|
|
>with etago, generic identifier, and tagc, as in "</image>", this is not
|
|
>the way SGML does it. CDATA and SDATA are terminated by a etago
|
|
>"delimiter-in-context", which is an etago (end-tag open, "</") delimiter
|
|
>followed by a name start character, or a grpo (group open, "(")
|
|
>delimiter if concurrent document types are allowed. In the reference
|
|
>concrete syntax, this means that the regular expression "</[(a-z]"
|
|
>matches the end of CDATA and SDATA elements.
|
|
>
|
|
>You can also use marked sections, with a CDATA status keyword, in which
|
|
>case the CDATA is terminated by the mse delimiter (marked section end,
|
|
>"]]>").
|
|
>
|
|
>:
|
|
>| Someone made the point that an SGML document is only allowed to
|
|
>| include SGML characters as specified by the SGML declaration, and if
|
|
>| we're going to use the default SGML declaration, we have to stick to
|
|
>| the characters blessed by it.
|
|
>
|
|
>Blessed and blessed. The SGML declaration is supposed to reflect the
|
|
>reality of the document, not enforce arbitrary limits on them. So you
|
|
>write an SGML declaration which fits the document.
|
|
>
|
|
>| That's not my understanding. I thought that inside CDATA (or SDATA,
|
|
>| I think) you could put _anything_ but the closing tag in full.
|
|
>
|
|
>As said above, the etago delimiter-in-context terminates the data,
|
|
>regardless of whether it's a legal end-tag in that context.
|
|
>
|
|
>You should be aware that the SGML parser will parse the contents of the
|
|
>"binary" content, and ignore record start, and treat record ends
|
|
>different from other characters. In addition, it's an error for an SGML
|
|
>entity to contain characters with any of the numbers listed in the
|
|
>SHUNCHAR part of the SYNTAX declaration. This is _not_ what you want
|
|
>with binary data.
|
|
>
|
|
>| What's the scoop? Do we have to use external entities for raw data?
|
|
>
|
|
>Yes. An external entity that is not an SGML text entity requires a
|
|
>notation identifier, so you only need to list the entities in the DTD,
|
|
>with notation, and refer to them by name in the document instance.
|
|
>
|
|
>If this is not satisfactory, you should declare the objects to be CDATA,
|
|
>and use a binary to text-only transformation scheme. There are several
|
|
>such schemes. Among them, base64 is the preferred encoding in my view,
|
|
>since it's available as part of the new Multipurpose Internet Mail
|
|
>Extensions (MIME) RFC-to-be. (The latest draft is available for
|
|
>anonymous FTP as ftp.ifi.uio.no:/pub/SGML/MIME.6.ps and MIME.6.txt for
|
|
>two weeks from today. Section 5.2 which concerns the base64 encoding is
|
|
>also available as ftp.ifi.uio.no:/pub/SGML/base64.txt.) Transformation
|
|
>back to the binary form from the text-only form may be done on the fly
|
|
>by the application before sending the data to the notation interpreter.
|
|
>
|
|
My idea is to use MIME encodings, but put these attachments _outside_
|
|
the SGML text, in an attached (or external) body part.
|
|
|
|
>In addition to being much easier to deal with in SGML, this also makes
|
|
>SGML documents containing such content robust with respect to file
|
|
>transfer, etc.
|
|
>
|
|
>Hope this helps,
|
|
></Erik>
|
|
|
|
Thanks. Mostly it confirms my suspicions, but it should also provide
|
|
a somewhat authoritative answer (no references to ISO 8879 here :-)
|
|
to the WWW project.
|
|
|
|
>--
|
|
>Erik Naggum | +47-295-0313 | ISO 8879 SGML | Memento,
|
|
>Naggum Software | "fuzzface" | ISO 10744 HyTime | terrigena.
|
|
>Boks 1570, Vika | <erik@naggum.no> | JTC 1/SC 18/WG 8 | Memento,
|
|
>0118 OSLO, NORWAY | <enag@ifi.uio.no> | SGML UG SIGhyper | vita brevis.
|
|
|
|
|
|
|
|
======================================================================
|
|
|
|
From: davis@willow.tc.cornell.edu (Jim Davis)
|
|
To: www-talk@nxoc01.cern.ch
|
|
Subject: HTML terseness/verbosity
|
|
Date: Mon, 8 Jun 92 09:28:20 EDT
|
|
|
|
Re the recent comments on terseness of UDIs and the
|
|
extra verbosity in Dan Connolly's proposal to
|
|
use Mime for WWW documents:
|
|
|
|
My understanding is that nobody should have to type
|
|
"naked" SGML (or HTML or Mime-language) anyway.
|
|
We should have programs like WYSIWYG editors
|
|
manipulating the markup for us. (Now of course
|
|
at present we do have to type HTML, at least I do
|
|
here, but hopefully this will not persist). If
|
|
that's right, then the more explicit and simple
|
|
the document structure is, the easier to parse
|
|
and manipulate by programs, the better we are.
|
|
|
|
One thing I like about Dan's proposal - it makes
|
|
it possible to collect a hyperdocument into a single
|
|
file (by embedding the docs within one mime file)
|
|
which will make transporting easier
|
|
|
|
======================================================================
|
|
|
|
From: timbl@zippy.lcs.mit.edu (Tim Berners-Lee)
|
|
To: connolly@pixel.convex.com, enag@ifi.uio.no, www-talk@nxoc01.cern.ch
|
|
Cc: timbl@zippy.lcs.mit.edu
|
|
Subject: MIME, SGML, UDIs, HTML and W3
|
|
Date: Thu, 11 Jun 92 12:22:56 -0400
|
|
|
|
I have printed off the recent discussion on the new
|
|
HTTP, HTML and MIMe and UDIs and done what I can
|
|
to disentangle it all in my mind. I will reply
|
|
in one message, becase many of the points are linked.
|
|
I know this should be hypertext, with references but
|
|
(a) I am away from home and (b) we don't yet have a
|
|
universal mail/news archive server running to link to.
|
|
|
|
HTTP and HTML
|
|
|
|
First of all, Jean-Francois <jfg@dxcern.cern.ch>
|
|
points out very properly that the enhaced HTTP
|
|
protocol and the enhanced HTML spec are quite
|
|
separate things, and should be specified separatedly.
|
|
I agree wholeheartdly about all this, and
|
|
I aplogize for muddling the levels up till now.
|
|
|
|
(As a small aside, I would point out that wheras a
|
|
HTERR file is not very useful, a HTFWD file IS.
|
|
It is like a hypertex soft link. But I am happy to
|
|
leave that as a separate type of file. It should
|
|
certainly get a different extension so that it gets a
|
|
different icon)
|
|
|
|
HTTP: SGML vs ASN/1
|
|
|
|
Let's look at the HTTP protocol first. Carl <barker@cernnext.cern.ch>
|
|
is mapping out the requirements for this, and assuming that SGML
|
|
would be a reasonable representation for it in practice.
|
|
And so it is. When the requirements are clear,
|
|
it would certainly be interesting to look at mapping them
|
|
onto a z39.50 - style ASN/1 implementation. This would
|
|
be useful for two reasons. First, the comparison would
|
|
point out to us things in z39.50 which we might not have thought of
|
|
which would b useful for HTTP. Second, the comparison might give
|
|
a nice short or at least well-defined things which the WAIS
|
|
guys might like to take into account in the next version
|
|
of their protocol. (I demod W3 to Brewster who hadn't
|
|
seen it before live, and was very keen that WAIS and W3
|
|
should merge, changing the WAIS protocol if necessary.
|
|
|
|
There is no reason why we shouldn't try both protocols.
|
|
If they map well onto each other, its just a question
|
|
of having two separate prasers at the low level, building
|
|
the same internal structures.
|
|
|
|
When we're talking about an SGML representation,
|
|
and describe a file to come later down the link,
|
|
I don't think we have to use the NOTATION= attribute with a notation
|
|
type, because we won't in fact be talking about
|
|
the notation of an SGML element.
|
|
The format in this case is not something which the SGML
|
|
parse is aware of.
|
|
|
|
I must admit I was disappointed to learn that SGML
|
|
didn't allow for any way of including 8 bit data. Thanks Eric
|
|
<enag@ifi.uio.np> for your explanations.
|
|
|
|
|
|
MIME and SGML
|
|
|
|
Dan <connolly@pixel.convex.com> rightly points out
|
|
the relevance of the coming MIME standards. There
|
|
are several things which we must separate here, though:
|
|
|
|
1. The MIME classification of data formats
|
|
2. The MIME format for multi-part messages
|
|
3. The MIME format for rich text.
|
|
4. The MIME formal for external document addresses (MIME UDIs)
|
|
|
|
1. MIME classification of data formats
|
|
|
|
We must do the same disentangling job which JF did
|
|
on HTML to MIME.
|
|
|
|
First of all, the MIME job of classifying data formats
|
|
is a useful job which is ideally done by just one
|
|
bunch of people. Ther has been some suggestion that
|
|
the MIME classifications are not well enough defined,
|
|
but they seem to be the best effort yet and one can only
|
|
assume they will eveolve in the right direction. So I'd
|
|
back the use of these for W3.
|
|
|
|
|
|
2. The MIME format for multi-part messages
|
|
|
|
This is necessary for sending a multi-part
|
|
document over a mail link. We have to ask ourselves
|
|
whether it is reasonable to use over a binary link.
|
|
Personally, my initial impression is that the MIME
|
|
stuff, using as it does terminators such as
|
|
--xxx-- separated by blank lines, looks more horrible
|
|
to work with in this respect than SGML! Still we have
|
|
the problem of restrictions on the content:
|
|
Must not contain delimiters, limited 7 bit character set,
|
|
line orientation, in fact all the things which email
|
|
carries as a restriction. This is really taking on board
|
|
a legacy of all the mail which has evolved over the years.
|
|
Do we need that for our new ultra-fast hypertext access
|
|
protocol?
|
|
|
|
[Compare the MIME format with the rather cleaner NeXT
|
|
Mail format which is as far as I understand simply
|
|
a uuencoded compressed tar file of all the bits, where
|
|
uuencoding is designed as an optimal way of getting over
|
|
mail transport restrictions, compress does what it says
|
|
and tar is a multipart wrapper designed for that only. Not
|
|
standard outside unix, perhaps, but cleaner in that the
|
|
mail formatting is done at the last minute and doesn't
|
|
affect the other operations]
|
|
|
|
If course, with HTTP2, multipart/alternative shouldn't
|
|
be needed.
|
|
|
|
Multipart for hypetext?
|
|
|
|
Now, Dan not only suggests the use of this for
|
|
multipart messages, but also suggests that a hypetext
|
|
document shoudl necessarily contain many parts,
|
|
one on SGML and one for each link as a MIME external document.
|
|
This means that an SGML hypertext document can never stand
|
|
on its own! An SGML parser will always need to have
|
|
a MIME parser sitting just outside. I don't like
|
|
this: I feel we have to separate these two things.
|
|
|
|
Suppose that an SGML document does want to
|
|
be sent in a MIME message and does want to
|
|
refer to other parts of that MIME message. In that case,
|
|
it seems reasonable to have a format for that.
|
|
However, when an SGML document is seen by itself, and
|
|
refers to a news message for example, then there is
|
|
no resaon for it not to be able to contain a
|
|
complete reference within itself.
|
|
|
|
When SGML documents include other files, then
|
|
the SYSTEM value is typically a file name.
|
|
It is a reeference to something outside. The
|
|
precedent is set that SGML documents are allowed
|
|
to refer to things outside.
|
|
|
|
I think part of you objection, Dan is based on
|
|
a dislike of the UDI syntax -- which I'll come to later.
|
|
|
|
3. The MIME format for rich text.
|
|
|
|
Here, I am not so impressed. Basically, the MIME
|
|
people are at the same level that we were before we started
|
|
this cleanup, that they have SGML-LIKE stuff which isn't SGML.
|
|
As its not difficult to make it SGML, they should do that.
|
|
Comparing MIME's rich text and HTML, I see that
|
|
we lack the characetr formatting attributes BOLD and ITALIC
|
|
but on the other hand I feel that our treatment of
|
|
logical heading levels and other structures is much more powerful
|
|
and has turned out to provide more flexible formatting
|
|
on different platforms than explicit semi-references
|
|
to font sizes. This is born out by all the systems which
|
|
use named styles in preference to explicit formatting,
|
|
LaTeX or other macros instead of TeX, etc etc.
|
|
|
|
So technically, HTML has some things to give MIME's rich
|
|
text. Are the MIME people still open to additions?
|
|
If not, I would suggest we add BOLD and ITALIC (or
|
|
two emphasis styles for characters), and keep HTML
|
|
separete from MIME's rich text, proposing it as a
|
|
MIME text standard.
|
|
(HP0 and HP1 were in the HTML spec but as unimplemented)
|
|
|
|
4. The MIME format for external document addresses (MIME UDIs)
|
|
|
|
As Ed <emv@msen.com> says, this is a bit of a non-issue,
|
|
as MIME addersses and currnet style UDIs map onto
|
|
each other. However, we have to agree on a "concrete
|
|
syntax" (or two... :-) in the end.
|
|
|
|
It's like the difference between an x400 style mail address
|
|
generated from an internet address, and that internet address.
|
|
Which do you prefer
|
|
|
|
timbl@zippy.lcs.mit.edu
|
|
|
|
where the sections of the domain name are defined
|
|
to have no semantics at all, or
|
|
|
|
S=timbl; HO=zippy; OU=lcs; O=MIT; SECTOR=edu
|
|
|
|
(this is not real x400 - don't use it!) or
|
|
|
|
user=timbl
|
|
host=zippy
|
|
group=lcs
|
|
organization=mit
|
|
sector=education
|
|
|
|
You say, Dan, that you "don't think [UDIs] work".
|
|
Do you mean people don't use them in all correspondance?
|
|
Well, what DO they use? They use ange-ftp addresses
|
|
for FTP (like info.cern.ch:/pub/www/doc/*.ps),
|
|
which are even more terse than UDIs! They use news
|
|
message-ids which are UDIs.
|
|
|
|
Let me say that I personally don't much care about the
|
|
arbitrary punctuation. There are a few things, though,
|
|
which are important:
|
|
|
|
- The thing should be printable 7-bit ASCII.
|
|
|
|
Unlike arbitrary document formats,
|
|
UDIs must be sendable in the mail
|
|
|
|
- White space should not be significant. I would
|
|
accept the presence of some arbitrary white space
|
|
as a delimiter, but one cannot distinguish between
|
|
different forms and quantities of white space.
|
|
This is because things get wrapped and unwrapped.
|
|
|
|
Dan, you object to UDIs because they don't
|
|
contain white space. But that is purely so that
|
|
to CAN wrap them onto several lines and still
|
|
recuperate them. You can put white space
|
|
in but it shouldn't mean anything. (This is not possible
|
|
in W3 as is but it is in the UDI document)
|
|
|
|
I don't see why you say they
|
|
can't be put as an SGML attribute. They are just
|
|
text strings. They will be quoted of course
|
|
(Yes, I know the old NeXT browser doesn't quote them)
|
|
Is that not allowed? What are the problem characters?
|
|
If there SGML problem characters in the UDI spec, they
|
|
probably are ruled out of SGML for a reason.
|
|
|
|
(I recently saw in a galley proof of an article in which
|
|
our mail adress had been hypernated! UDIs must be
|
|
squeezable into 2 inch columns.)
|
|
|
|
There is a sematic difference between a tagged
|
|
list and a punctuation-divided set, and that is that
|
|
the former has defined semantics but the latter doesn't and
|
|
can therefore be extended more easily. I suggest that tagging
|
|
could be used for the four bits of an address
|
|
that must be separable by all sides, which are
|
|
limited in number (4). Within those bits, the string should
|
|
be transparent as the protocol does not require
|
|
every party to understand the innards.
|
|
|
|
The bits are
|
|
MIME Used by
|
|
|
|
name space: ACCESS Used by client
|
|
|
|
server details: HOST, PORT used by client, protocol-dependent
|
|
|
|
local doc id: PATH used by server only
|
|
|
|
anchor id: (none) used by presntation application only
|
|
|
|
It seems useful to maintain the ability to work out which
|
|
bits are seen by whom.
|
|
|
|
I only used punctation to separate these parts in the W3 UDI
|
|
because people like internet addresses and mail addresses
|
|
and filenames and telephone numbers and message-ids and
|
|
room numbers and zip codes which don't have tags and
|
|
do make do with punctuation. If the groundswell of
|
|
opionion on this list is that tags are better, then
|
|
let's use tags!
|
|
|
|
Whatever we sue, it should be as quotable in an SGML
|
|
attribute as in a MIME external reference as in a
|
|
scribbled note or a link-pasteboard or whatever.
|
|
(The U is for Universal, NOT Unique!)
|
|
|
|
PHILOSOPHY
|
|
|
|
In the W3 world, the model is of a dynamic world of
|
|
documents which generally have some "home" or
|
|
(or several), which can be found using sufficient
|
|
intelligence and the help of ones friends given the UDI.
|
|
|
|
A mail message has no home, and so in principle the parts
|
|
of it have no home. When a hypertext multipart message
|
|
(really consisting of multiple hypertext documents)
|
|
has links between its parts they refer to each other
|
|
within a completely isolated conetext.
|
|
|
|
There are now two possibilites when the message is in fact
|
|
archived and made readable. One is we say that the parts
|
|
are then addressed as parts ofthe message, wherever it
|
|
may be. The other is to say that the parts of the message
|
|
are very likely things which had some original home.
|
|
In that case, the message is just giving the reciever
|
|
a copy to save him the (perhaps insurmountable) trouble
|
|
of retrieving it. In this case the parts should be
|
|
identified with thier original UDIs so that the
|
|
receiver is not confsed with multiple documents which
|
|
are in fact the same thing.
|
|
|
|
|
|
I think that's all the comments I have on what I've read so far..
|
|
|
|
Tim
|
|
________________________________________________________________
|
|
Tim Berners-Lee
|
|
World-Wide Web initiative
|
|
CERN, 1211 Geneva 23, Switzerland timbl@info.cern.ch
|
|
Visiting MIT: NE43-513, (617)234 6016 timbl@zippy.lcs.mit.edu
|
|
|
|
======================================================================
|
|
|
|
From: Dan Connolly <connolly@pixel.convex.com>
|
|
To: timbl@zippy.lcs.mit.edu (Tim Berners-Lee)
|
|
Cc: enag@ifi.uio.no, www-talk@nxoc01.cern.ch
|
|
Subject: Re: MIME, SGML, UDIs, HTML and W3
|
|
Date: Thu, 11 Jun 92 20:31:08 CDT
|
|
|
|
|
|
Now my comments on your comments:
|
|
|
|
>There is no reason why we shouldn't try both protocols.
|
|
>If they map well onto each other, its just a question
|
|
>of having two separate prasers at the low level, building
|
|
>the same internal structures.
|
|
>
|
|
On the other hand, I'd like to keep a telnet based protocol
|
|
around -- maybe gopher is good enough.
|
|
|
|
>When we're talking about an SGML representation,
|
|
>and describe a file to come later down the link,
|
|
>I don't think we have to use the NOTATION= attribute with a notation
|
|
>type, because we won't in fact be talking about
|
|
>the notation of an SGML element.
|
|
>The format in this case is not something which the SGML
|
|
>parse is aware of.
|
|
>
|
|
I don't believe this is true. From the horse's mount (Erik Naggum, that is):
|
|
----
|
|
| What's the scoop? Do we have to use external entities for raw data?
|
|
|
|
Yes. An external entity that is not an SGML text entity requires a
|
|
notation identifier, so you only need to list the entities in the DTD,
|
|
with notation, and refer to them by name in the document instance.
|
|
|
|
----
|
|
|
|
>1. MIME classification of data formats
|
|
>
|
|
> So I'd
|
|
> back the use of these for W3.
|
|
>
|
|
Yeah!!
|
|
|
|
>
|
|
>2. The MIME format for multi-part messages
|
|
>
|
|
> This is necessary for sending a multi-part
|
|
> document over a mail link. We have to ask ourselves
|
|
> whether it is reasonable to use over a binary link.
|
|
> Personally, my initial impression is that the MIME
|
|
> stuff, using as it does terminators such as
|
|
> --xxx-- separated by blank lines, looks more horrible
|
|
> to work with in this respect than SGML!
|
|
|
|
The algorithm to separate a MIME multipart message into its
|
|
parts is simply: search the data stream for CRLF--boundary--CRLF.
|
|
It can be done by a finite state machine. Even the simplest
|
|
SGML documents require a pushdown automaton to parse.
|
|
|
|
> Still we have
|
|
> the problem of restrictions on the content:
|
|
> Must not contain delimiters, limited 7 bit character set,
|
|
> line orientation, in fact all the things which email
|
|
> carries as a restriction. This is really taking on board
|
|
> a legacy of all the mail which has evolved over the years.
|
|
> Do we need that for our new ultra-fast hypertext access
|
|
> protocol?
|
|
>
|
|
|
|
No, we don't. MIME _allows_ transfer of data over 7 bit ASCII
|
|
channels, but it hardly requres it. The Content-transfer-encoding
|
|
can be:
|
|
7 bit (default): line oriented 7 bit data
|
|
8 bit : line oriented 8 bit data
|
|
binary : raw 8 bit data, no CRLF's required
|
|
base64: uuencode standardized
|
|
quoted-pritable: text with escape sequences
|
|
|
|
The MIME standard explicitly supports expansion to 8 bit transport
|
|
mechanisms.
|
|
|
|
> [Compare the MIME format with the rather cleaner NeXT
|
|
> Mail format which is as far as I understand simply
|
|
> a uuencoded compressed tar file of all the bits, where
|
|
> uuencoding is designed as an optimal way of getting over
|
|
> mail transport restrictions, compress does what it says
|
|
> and tar is a multipart wrapper designed for that only. Not
|
|
> standard outside unix, perhaps, but cleaner in that the
|
|
> mail formatting is done at the last minute and doesn't
|
|
> affect the other operations]
|
|
>
|
|
It was a requirement of MIME that the structure of the document
|
|
be accessible without decoding or uncompressing data, especially
|
|
since MIME messages are recursive and complex messages might
|
|
otherwise go through more than one encoding.
|
|
|
|
Compression was not addressed by the MIME standard, and uuencode
|
|
doesn't make it though some gateways.
|
|
|
|
> If course, with HTTP2, multipart/alternative shouldn't
|
|
> be needed.
|
|
>
|
|
What does HTTP2 define that obviates the multipart/alternative
|
|
type?
|
|
|
|
|
|
> Multipart for hypetext?
|
|
>
|
|
> Now, Dan not only suggests the use of this for
|
|
> multipart messages, but also suggests that a hypetext
|
|
> document shoudl necessarily contain many parts,
|
|
> one on SGML and one for each link as a MIME external document.
|
|
> This means that an SGML hypertext document can never stand
|
|
> on its own!
|
|
|
|
That's exatly the point. Anything besides text should be handled
|
|
as an external entity to be resolved by the parsing system. I just
|
|
suggested that a portable way to resolve SGML external entities
|
|
is to refer to MIME attachments.
|
|
|
|
> An SGML parser will always need to have
|
|
> a MIME parser sitting just outside. I don't like
|
|
> this: I feel we have to separate these two things.
|
|
>
|
|
Well, it has to have something sitting outside. The SGML parsers
|
|
I've seen resolve system entities using the file system. I proposed
|
|
we use a MIME message like a mini file system, with links to
|
|
other file systems.
|
|
|
|
> Suppose that an SGML document does want to
|
|
> be sent in a MIME message and does want to
|
|
> refer to other parts of that MIME message. In that case,
|
|
> it seems reasonable to have a format for that.
|
|
> However, when an SGML document is seen by itself, and
|
|
> refers to a news message for example, then there is
|
|
> no resaon for it not to be able to contain a
|
|
> complete reference within itself.
|
|
>
|
|
OK, I can see that we should be able to resolve the lexical
|
|
issues and put the whole UDI/MIME access specification inside
|
|
the SGML document.
|
|
|
|
But what about multimedia web nodes?
|
|
|
|
SGML describes text and references to other texts just fine.
|
|
But if we want a format that can include more than just
|
|
text, I don't think we should try to fit it _inside_ SGML.
|
|
|
|
I think SGML should be used to convey text and document
|
|
structure. But I still like the idea of wrapping it in
|
|
a MIME message for multimedia interoperability.
|
|
|
|
|
|
>3. The MIME format for rich text.
|
|
>
|
|
> Here, I am not so impressed.
|
|
Nor am I.
|
|
|
|
|
|
>4. The MIME format for external document addresses (MIME UDIs)
|
|
>
|
|
> As Ed <emv@msen.com> says, this is a bit of a non-issue,
|
|
> as MIME addersses and currnet style UDIs map onto
|
|
> each other. However, we have to agree on a "concrete
|
|
> syntax" (or two... :-) in the end.
|
|
>
|
|
Exactly. And why not the MIME concrete syntax?
|
|
|
|
> Let me say that I personally don't much care about the
|
|
> arbitrary punctuation. There are a few things, though,
|
|
> which are important:
|
|
>
|
|
> - The thing should be printable 7-bit ASCII.
|
|
>
|
|
MIME: check.
|
|
|
|
> Unlike arbitrary document formats,
|
|
> UDIs must be sendable in the mail
|
|
>
|
|
MIME: check.
|
|
|
|
> - White space should not be significant. I would
|
|
> accept the presence of some arbitrary white space
|
|
> as a delimiter, but one cannot distinguish between
|
|
> different forms and quantities of white space.
|
|
> This is because things get wrapped and unwrapped.
|
|
>
|
|
MIME: check.
|
|
|
|
> Dan, you object to UDIs because they don't
|
|
> contain white space. But that is purely so that
|
|
> to CAN wrap them onto several lines and still
|
|
> recuperate them. You can put white space
|
|
> in but it shouldn't mean anything. (This is not possible
|
|
> in W3 as is but it is in the UDI document)
|
|
>
|
|
I must not have read the UDI document closely. I certainly
|
|
got the impression that a UDI should look like one word
|
|
when "written on the back of an envelope."
|
|
|
|
> I don't see why you say they
|
|
> can't be put as an SGML attribute. They are just
|
|
> text strings.
|
|
|
|
The WAIS UDIs are huge. An SGML declaration defines a maximum
|
|
for the length of an attribute value. The default value is ...
|
|
oh. ahem. it's 960. I think the MIME 72 character line length
|
|
is a little more restrictive than that :-)
|
|
|
|
> They will be quoted of course
|
|
> (Yes, I know the old NeXT browser doesn't quote them)
|
|
> Is that not allowed? What are the problem characters?
|
|
> If there SGML problem characters in the UDI spec, they
|
|
> probably are ruled out of SGML for a reason.
|
|
>
|
|
Good question. These are the things we should research before
|
|
we go _any_ further implementing this stuff.
|
|
|
|
> Whatever we sue, it should be as quotable in an SGML
|
|
> attribute as in a MIME external reference as in a
|
|
> scribbled note or a link-pasteboard or whatever.
|
|
> (The U is for Universal, NOT Unique!)
|
|
>
|
|
Here's an idea for a quoting strategy for the four parts: Either
|
|
a) it'a a quoted string delimited by "" with \" allowed
|
|
in the middle, or
|
|
b) it's a base-64 representation of an arbitrary
|
|
binary stream.
|
|
Just an idea.
|
|
|
|
I'm late for an appointment. Gotta go.
|
|
|
|
Dan
|
|
|
|
======================================================================
|
|
|