You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
268 lines
12 KiB
268 lines
12 KiB
<HTML>
|
|
<HEAD>
|
|
<META name="RCS-Id" content="$Id: 9810xn.html,v 1.19 1998/09/29 15:30:39 connolly Exp $">
|
|
<TITLE>The XML Revolution (draft for Nature's Web Site)</TITLE>
|
|
</HEAD>
|
|
<BODY>
|
|
<H1>
|
|
The XML Revolution
|
|
</H1>
|
|
<ADDRESS>
|
|
by <A HREF="#Dan">Dan Connolly</A><BR>
|
|
draft of $Date: 1998/09/29 15:30:39 $
|
|
</ADDRESS>
|
|
<P>
|
|
If you have ever peeked with the "view source" option on your Web browser,
|
|
then you're familiar with Hypertext Markup Language
|
|
(<A HREF="../../MarkUp/">HTML</A>).
|
|
<P>
|
|
HTML was an overwhelming success because it fulfilled a dream that word
|
|
processors, despite their myriad features, don't<A HREF="#WWW92">[WWW92]</A>:
|
|
<BLOCKQUOTE>
|
|
Pick up your pen, mouse or favorite pointing device and press it on a reference
|
|
in this document - perhaps to the author's name, or organization, or some
|
|
related work. Suppose you are directly presented with the background material
|
|
- other papers, the author's coordinates, the organization's address and
|
|
its entire telephone directory. Suppose each of these documents has the same
|
|
property of being linked to other original documents all over the world.
|
|
You would have at your fingertips all you need to know about electronic
|
|
publishing, high-energy physics or for that matter Asian culture. If you
|
|
are reading this article on paper, you can only dream, but read on.
|
|
</BLOCKQUOTE>
|
|
<P>
|
|
Now that dream is a reality, and human communication is augmented by the
|
|
Web; that is, as long as the communication consists of a title, headings,
|
|
paragraphs, lists, tables, and forms.
|
|
<P>
|
|
What about all the other communications idioms and document types that we
|
|
routinely use to get our work, business, and play done?
|
|
<UL>
|
|
<LI>
|
|
Restaurant menus
|
|
<LI>
|
|
Theatre programs
|
|
<LI>
|
|
Meeting minutes with agenda items and actions
|
|
<LI>
|
|
Cheques, invoices, and purchase orders
|
|
<LI>
|
|
Calendars and project schedules
|
|
</UL>
|
|
<P>
|
|
Extensible Markup Lanuage (<A HREF="../../XML/">XML</A>) is the evolutionary
|
|
successor to HTML, in "less is more" fashion. If you're thinking that XML
|
|
is all the stuff from HTML plus a few more things, think again. It's the
|
|
same pointy-brackets, tags, and attributes; but when it comes to tag names,
|
|
the slate is wiped clean. XML is like HTML with the training wheels off.
|
|
<P>
|
|
Of course, you can imitate menus, programs and schedules with HTML, or you
|
|
can put pictures or facsimiles of their traditional printed form on the Web.
|
|
That's great because it allows you to share them with people all over the
|
|
planet instantly. But it doesn't invite the computer to help you manage them.
|
|
<P>
|
|
The bane of my existence is doing things that I know the computer could do
|
|
for me.
|
|
<P>
|
|
If the Web page with your personal calendar say's you'll be in New York next
|
|
Thursday, and the page with your workgroup calendar says you'll be in London
|
|
all week, shouldn't the computer be able to warn you about the conflict?
|
|
And shouldn't it go ahead and ask you if it's OK to cancel your flight to
|
|
London and purchase this other ticket to New York?
|
|
<P>
|
|
As a medium for human communication, the Web has reached critical mass, (I
|
|
won't go so far as to say it's mature--there's plenty of work to be done!)
|
|
but as a mechanism to exploit the power of computing in our every-day life,
|
|
the Web is in its infancy. The Web now allows us to communicate our problems
|
|
to one another faster than ever before, but does it really help us solve
|
|
them?
|
|
<P>
|
|
XML is so simple that it just might work: it just might revolutionize the
|
|
ability of people to conduct commerce, express themselves, and generally
|
|
get work done with computers and networks.
|
|
<P>
|
|
Web site designers are doing some amazing things, but they often re-invent
|
|
the wheel for any number of reasons. Order processing systems make a good
|
|
example: some web design shop, say <TT>mall.com</TT>, built one shopping-cart
|
|
system, but <TT>mousetraps.com</TT> can't use it, because
|
|
<UL>
|
|
<LI>
|
|
their infrastructure is Windows NT, and the <TT>mall.com</TT> system is based
|
|
on Unix, or
|
|
<LI>
|
|
perl vs. Java, or perhaps
|
|
<LI>
|
|
the <TT>mousetraps.com</TT> folks were just too busy to discover that
|
|
<TT>mall.com</TT> had solved the problem, or
|
|
<LI>
|
|
the <TT>mall.com</TT> system is aimed at a million transactions per day and
|
|
requires thousands of dollars worth of hardware and software, while the
|
|
<TT>mousetraps.com</TT> folks only expect a few orders a week and can only
|
|
afford a few hundred dollars, or
|
|
<LI>
|
|
<TT>mall.com</TT> doesn't care to share its technology with the community
|
|
either because
|
|
<UL>
|
|
<LI>
|
|
they don't want to lose a competitive advantage or
|
|
<LI>
|
|
because they don't want to take on a support burden.
|
|
</UL>
|
|
</UL>
|
|
<P>
|
|
For all these reasons, it takes longer to develop effective web sites than
|
|
it should, and the community is looking for opportunities to share technologies
|
|
and resources.
|
|
<P>
|
|
At the lowest level, organizations like The World Wide Web Consortium
|
|
(<A HREF="http://www.w3.org/">W3C</A>), The Internet Engineering Task Force
|
|
(<A HREF="http://www.ietf.org/">IETF</A>) and The Object Management Group
|
|
(<A HREF="http://www.omg.org/">OMG</A>) are engaged in updating the transport
|
|
infrastructure, <A HREF="../../Protocols/">HTTP</A>, firstly to address some
|
|
of the design shortcomings that 5 years of experience has exposed, and secondly
|
|
to better integrate with modern software development. At the next level,
|
|
the software development community is pushing the Web down into the
|
|
infrastructure of operating systems and languages like perl, Java, and Microsoft
|
|
Windows. The goal of all this low-level stuff is that it "just works," like
|
|
a lightswitch or a telephone.
|
|
<P>
|
|
But there's a twist: along with shipping your pages around, the computing
|
|
infrastructure should take every opportunity to read, understand, and act
|
|
on them. There's no reason to live with the status
|
|
quo<A HREF="#bosak97">[Bosak97]</A>:
|
|
<BLOCKQUOTE>
|
|
Hospitals have begun to offer the [home health care] agencies a solution
|
|
that goes something like this:
|
|
<OL>
|
|
<LI>
|
|
Log into the hospital's Web site.
|
|
<LI>
|
|
Become an authorized user.
|
|
<LI>
|
|
Access the patient's medical records using a Web browser.
|
|
<LI>
|
|
Print out the records from the browser.
|
|
<LI>
|
|
Manually key in the data from the printouts.
|
|
</OL>
|
|
<P>
|
|
The knowledgeable reader may smile at this "solution," but in fact this is
|
|
not a joke; this is an actual proposal from a large American hospital known
|
|
for its early adoption of advanced medical information systems.
|
|
</BLOCKQUOTE>
|
|
<P>
|
|
<EM>Manually key in the data</EM>? Can't the two systems be made to talk
|
|
to each other? Never mind the multibillion-dollar medical industry; how often
|
|
do you get a computer-generated bill, invoice, or airline ticket, and then
|
|
manually key the information into your computer to manage your schedule or
|
|
finances? Is this the best we can do? Not if the XML revolution succeeds.
|
|
<P>
|
|
Today, several major Web search services build big indexes. These are incredibly
|
|
useful, but they're also limited: they don't know the difference between
|
|
a book <EM>by</EM> Ben Franklin and a book <EM>about</EM> Ben Franklin, let
|
|
alone the difference between an African beetle and a Volkswagon Beetle.
|
|
<P>
|
|
The search services <EM>do</EM> know which part of your page is the title,
|
|
because the <TT><title> </TT>tag in the HTML markup tells them. Why
|
|
not just add <TT><by></TT> and <TT><about></TT> and
|
|
<TT><genus></TT> and such tags to HTML? Because...
|
|
<UL>
|
|
<LI>
|
|
technically, it would produce a mess: HTML is hard enough to process now,
|
|
and if we make it harder, we reduce the chance that new tools will come along
|
|
and make the Web smarter.
|
|
<LI>
|
|
socially, it wouldn't work: the HTML specification is maintained by a small
|
|
group of experts who are trusted to Do The Right Thing on behalf of the
|
|
community; that small group doesn't have expertise in all subjects that may
|
|
be covered by Web pages, and if we added that expertise to the group, it
|
|
would be too large to function. It is much better to give everyone a tool
|
|
that they can easily adapt for their own particular needs.
|
|
</UL>
|
|
<P>
|
|
HTML was a critical first step, but it is, by design, a one-size-fits-all
|
|
solution; it works well when applied to its original domain of simple structured
|
|
documents with links, but doesn't work so well in all the other domains where
|
|
people want the Web to apply.
|
|
<P>
|
|
XML, like the Internet and the Web, is designed to facilitate a marketplace
|
|
of competing companies, innovative individuals, and organizations of all
|
|
sizes in between. <A HREF="http://www.w3.org/">W3C</A> is a consortium of
|
|
270+ member organizations committed to the growth of this marketplace, ensuring
|
|
interoperability and smooth evolution.
|
|
<P>
|
|
This decentralized marketplace is already at work: to automate exchange of
|
|
bills, statements, and payments, the banking and software heavyweights are
|
|
working on Open Financial Exchange
|
|
(<A HREF="http://www.oasis-open.org/cover/gen-apps.html#ofe#xml-ofe">OFX</A>);
|
|
meanwhile, to automate exchange of information about chemicals, their properties,
|
|
uses and suppliers, one researcher in Nottingham, Peter Murray-Rust, rolled
|
|
up his sleeves, and Chemical Markup Language
|
|
(<A HREF="http://www.oasis-open.org/cover/gen-apps.html#cml">CML</A>) was
|
|
born.
|
|
<P>
|
|
XML is intended to span this wide spectrum of application, and it has become
|
|
a strategic technology in W3C, where members are sharing resources to compliment
|
|
HTML with XML-based technologies:
|
|
<UL>
|
|
<LI>
|
|
<A HREF="../../Math/">MathML</A>, for describing mathematics as a basis for
|
|
machine-to-machine communication.
|
|
<LI>
|
|
<A HREF="../../AudioVideo/#SMIL">SMIL</A>, for expressing media synchronization
|
|
<LI>
|
|
<A HREF="../../RDF/">RDF</A>, for resource description, such as library-style
|
|
cataloging
|
|
<LI>
|
|
<A HREF="../../P3P/">P3P</A>, to use XML and RDF so users can be informed,
|
|
in control, and make decisions based on their individual privacy preferences.
|
|
</UL>
|
|
<P>
|
|
XML by itself is just a simple text format; but together with all the ways
|
|
it's being used to share structured information, it's a revolution that promises
|
|
to make the Web a whole lot smarter.
|
|
<P>
|
|
<HR>
|
|
<H2>
|
|
<A name="r234lk">References</A>
|
|
</H2>
|
|
<DL>
|
|
<DT>
|
|
<A NAME="WWW92">[WWW92]</A>
|
|
<DD>
|
|
<A HREF="http://www.w3.org/History/1992/ENRAP/Article_9202.ps"><CITE>World-Wide
|
|
Web: The Information Universe</CITE></A><BR>
|
|
Berners-Lee, T., et al., (1992), Electronic Networking: Research, Applications
|
|
and Policy, Vol 1 No 2, Meckler, Westport CT, Spring 1992
|
|
<DT>
|
|
<A NAME="bosak97">[Bosak97]</A>
|
|
<DD>
|
|
<A HREF="http://sunsite.unc.edu/pub/sun-info/standards/xml/why/xmlapps.htm"><CITE>XML,
|
|
Java, and the future of the Web</CITE></A>
|
|
<DD>
|
|
Jon Bosak, Sun Microsystems
|
|
</DL>
|
|
<P>
|
|
<HR>
|
|
<P>
|
|
<EM><A HREF="./" NAME="Dan">Dan Connolly</A> is the leader of the
|
|
<A HREF="../../Architecture">W3C Architecture Domain</A>. He began contributing
|
|
to the World Wide Web project, and in particular, the HTML specification,
|
|
while developing hypertext production and delivery software in 1992.</EM>
|
|
<P>
|
|
<EM>He presented a draft of <A HREF="../../MarkUp/html-spec">HTML 2.0</A>
|
|
at the <A HREF="http://www.cern.ch/WWW94/">first Web Conference</A> in 1994
|
|
in Geneva, and served as editor until it became a Proposed Standard RFC in
|
|
November 1995.</EM>
|
|
<P>
|
|
<EM>He was the chair of the W3C Working Group that produced HTML 3.2 and
|
|
HTML 4.0, and collaborated with Jon Bosak to form the W3C
|
|
<A HREF="../../XML/">XML</A> Working Group and produce the W3C XML 1.0
|
|
Recommendation.</EM>
|
|
<P>
|
|
<EM>Dan received a B.S. in Computer Science from the
|
|
<A HREF="http://www.utexas.edu/">University of Texas at Austin</A> in 1990.
|
|
His research interest is investigating the value of formal descriptions of
|
|
chaotic systems like the Web, especially in the consensus-building
|
|
process.</EM>
|
|
<P>
|
|
</BODY></HTML>
|