Another abandoned server code base... this is kind of an ancestor of taskrambler.
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 
 
 

1648 lines
42 KiB

<!doctype html system "html.dtd">
<HTML>
<HEAD>
<META name="Author" content="Paul Everitt, Digital Creations">
<TITLE>The ILU Requestor for HTTP servers</TITLE>
</HEAD>
<BODY>
<DIV class=titlepage>
<p><A HREF="http://www.w3.org/"><IMG BORDER="0" align=left
ALT="W3C:" SRC="http://www.w3.org/pub/WWW/Icons/WWW/w3c_home.gif"></A>
WD-ilu-requestor-960307
<H1 class=doctitle align=center>
The ILU Requester:<BR>Object Services in HTTP Servers
</H1>
<H3 align=center>
W3C Informational Draft 07-Mar-96
</H3>
<DL>
<DT>
This version:
<DD>
http://www.w3.org/pub/WWW/TR/WD-ilu-requestor-960307
<DT>
Latest version:
<DD>
http://www.w3.org/pub/WWW/TR/WD-ilu-requestor
<DT>
Authors:
<DD>
Paul Everitt, <A HREF="http://www.digicool.com/">Digital Creations</A>
&lt;paul@digicool.com&gt;
</DL>
<P>
<HR>
<DIV class=status>
<H2>
Status of this document
</H2>
<P>
This document provides information for the W3C members and other
interested
community. This document does not specify a W3C standard of any kind.
<P>
Feedback should be directed to the author.
<P>
A list of current W3C documents can be found at:
<A
href="http://www.w3.org/pub/WWW/TR">http://www.w3.org/pub/WWW/TR</A>
<P>
<HR>
</DIV>
<DIV class=abstract>
</DIV>
<H2>
Abstract
</H2>
<P>
The
<A HREF="http://www.w3.org/pub/WWW/CGI/">Common Gateway Interface</A>
(CGI) is not scaling to meet the requirements
of today's dynamic, interactive webs. For this reason, multiple vendors have
proposed C callable APIs. These APIs allow authors to alleviate the performance
penalty of CGI, and allow tighter integration of add-in modules. Unfortunately,
this comes at the price of complexity and portability.
<P>
This document describes a new model for extending WWW servers. First, HTTP
is captured using an
<A HREF="http://www.w3.org/pub/WWW/Protocols/OORPC/">interface
specification</A>,
which eliminates the ambiguities of interpretating a standards-track document.
This interface is then implemented atop a particular httpd's API. Finally,
all
of this is done using a standard distributed object model called
<A HREF="ftp://ftp.parc.xerox.com/pub/ilu/ilu.html">ILU</A>.
<P>
Digital Creations' work on our
<A HREF="http://www.digicool.com/releases/ilurequester"><EM>ILU
Requester</EM></A> reflects
this design and shows its advantages. This paper describes the ILU Requester.
<H2>
Table of Contents
</H2>
<P>
<OL>
<LI>
<A HREF="#introduction">Introduction</A>
<LI>
<A HREF="#requirements">Requirements</A> for a Requester architecture
<LI>
<A HREF="#description">Detailed Description</A>
<LI>
<A HREF="#status">Current Status</A> of Implementation
<LI>
Examples of <A HREF="#interfaces">Interfaces</A>
<LI>
<A HREF="#performance">Performance Analysis</A>
<LI>
Outstanding <A HREF="#issues">Issues</A>
<LI>
<A HREF="#future">Future Plans</A>
<LI>
<A HREF="#alternatives">Alternatives</A>
<LI>
<A HREF="#references">References</A>
<LI>
<A HREF="http://www.digicool.com/releases/appendices">Appendices</A>
<LI>
<A HREF="#author">Author's Info</A>
</OL>
<H2>
<A NAME="introduction">Introduction</A>
</H2>
<P>
Applications deployed over the World-Wide Web often involve an HTTP
server integrated with a legacy information system, or a custom information
system. The Common Gateway Interface, or CGI, is the most widely deployed
mechanism for integrating HTTP servers with other information systems, but
<A HREF="http://www.netscape.com/comprod/server_central/performance_benchmarks.html">
studies have shown</A> that
its design does not scale to the performance demands of contemporary
applications.
Microsoft states that applications for their API are
<A HREF="http://www.microsoft.com/intdev/server/IIS.HTM">five times faster</A>
than
CGI applications.
<P>
Moreover, CGI applications do not run in the httpd process. In addition
to
the performance penalty, this means that CGI applications cannot modifiy
the
behavior of the httpd's internal operations, such as logging and authorization.
Finally, CGI is viewed as a
<A HREF="http://hoohoo.ncsa.uiuc.edu/cgi/security.html">security issue</A>
by
some server operators, due to its connection to a user-level shell.
<P>
A current solution is to use an httpd with an API, such as
<A HREF="http://www.apache.org/">Apache 1.x</A> or
<A HREF="http://www.netscape.com/comprod/server_central/index.html">Netscape</A>.
By using the API, you have
a performance increase and a load decrease by running your application in
the
httpd process, rather than starting a new process for every request. Also,
the
API exposes some of the httpd's own behavior, allowing you to modify its
operation. In fact, servers like Apache implement large portions of their
functionality, such as ISMAP handling and logging, as
<A HREF="http://www.apache.org/docs/modules.html">modules</A>.
<P>
Unfortunately, the API benefits come at a price. Running a user-written
module
inside the httpd process leads to possible reliability concerns. For instance,
when developing our requesters, early code would regularly lead to core dumps
from
unhandled errors, as well as memory leaks. Also, most current servers use
either
multiple pre-forked subprocesses or separate threads for each new request.
Thus,
applications which change state, such as a simple counter script, have data
concurrency issues that are the burden of the programmer to solve.
<P>
Most importantly, the API route eliminates the casual CGI programmer. In
a <A HREF="http://www.cc.gatech.edu/gvu/user_surveys/survey-10-1995/">recent
survey</A>, Perl beat C 4 to 1 with 46% of the total votes. It appears
that the possibilities for language-choice in a C-based API mechanism are
restrictive.
<P>
Finally, the portability of CGI applications from one httpd implementation
to another, would be lost with an API strategy. Since each API has a different
syntax, authors would be forced to know each API beforehand. Thus, APIs
could
become instruments used by vendors to ensure market retention.
<P>
The elimination of scripting by an API strategy is a serious issue. Web
services are usually built using scripting languages such as Perl, Python,
Tcl, Visual Basic, Rexx, etc. This seems to be the case because web apps
are
frequently:
<OL>
<LI>
quick and dirty
<LI>
complex in their data relationships
<LI>
short-lived
<LI>
written by casual programmers
</OL>
<P>
In essence, the genre of CGI applications are usually complex enough to use
tools
good for rapid prototyping, but which rarely get past the prototype stage
and
into C.
<P>
To address this next generation of server-extending, we developed a mechanism
based on a uniform interface specification for HTTP. This is the
<A HREF="#http.isl"><EM>HTTP.isl</EM></A>. By basing our extension mechanism
on
a distributed object protocol like ILU, we get the performance and features
of
an API strategy (as shown below), with the portability and simplicity of
CGI.
Moreover, it permits the httpd to be extended not only out of its address
space,
but off its machine, and thus into capabilities available only on a remote
node.
This is in the true client-server fashion.
<P>
We call this extension mechanism the <EM>ILU Requester</EM>.
<H4 ALIGN="CENTER">
ILU Requester in a Nutshell
</H4>
<P ALIGN="CENTER">
<OL>
<LI>
<B>Performance of API</B>
<LI>
<B>Features of API</B>
<LI>
<B>Portability of CGI</B>
<LI>
<B>Simplicity of CGI</B>
<LI>
<B>Bridge into distributed objects</B>
</OL>
<H2>
<A NAME="requirements">Requirements for a Requester strategy</A>
</H2>
<P>
We have listed the problems with the current CGI/API situation. And
we have described an ILU Requester architecture. What are its requirements,
and what are some preferred possibilities?
<H3>
Requirements
</H3>
<UL>
<LI>
portable across platforms and vendors
<LI>
based on well-understood industry standards
<LI>
infrastructure uses freely-available, high-quality code base
<LI>
active and sustainable development
<LI>
wide choice of language (i.e. not language-based)
<LI>
significant performance win
<LI>
scaleable to N clients and N servers
<LI>
non-blocking on threaded servers
</UL>
<H3>
Preferences
</H3>
<UL>
<LI>
configurable servers (e.g. adding methods to HTTP, and implementing
them yourself to erase bugs)
<LI>
designed for an eventual absorption into the server's code base as
a common encapsulation
<LI>
designed also for an eventually-encapsulated browser which has an object
runtime available for messaging
<LI>
some standard interfaces, such as a site catalogue, authorizer, logger,
gatherer, broker
</UL>
<H2>
<A NAME="description">Detailed Description</A>
</H2>
<P>
We have implemented the ILU Requester for several platforms, and have extended
development to include other interested parties. First, we will give some
background, and then a description.
<H3>
Background
</H3>
<P>
In December of 1994, we were tasked with developing a complex WWW service.
This service necessitated a dynamic language, and had state. Yet, we were
forced to use CGI. Thus, we made a first implementation using a long-running
process that managed the state using a dynamic language
(<A HREF="http://www.python.org/">Python</A>), and a
small "controller" script that would message it on each hit.
<P>
Over time, we found that we were inventing our own client/server protocol.
For this and other reasons, we started looking at using ILU to manage
interactions
between processes. Thus, the CGI script got a surrogate reference to an
encapsulation of the stateful system.
<P>
Still, we had the performance penalty of CGI. In April of 1995, we wrote
a patch for Apache 0.6.5 that embedded the ILU runtime. With this,
we had access to objects via registered URL constructs. This served several
production systems into the fall. At this point, we started to refer to
this embedded ILU module as the <DFN>requester</DFN>.
<P>
In August, a version of Apache was released that had an API, so we started
reworking the requester to use it. By October we had a related requester
for Netsite working on Unix and partially on NT. In December, based on a
new draft
of the HTTP spec, we consolidated the two feature sets, and wrote an HTTP
ISL that was comprehensive with respect to the new specification. Also,
we started work with ILU 2.0.
<P>
In January of this year, we started standardizing on a Python "framework"
module
for creating our online services. For this, we developed an ISL for installing
object-based Authorizers and Loggers "into" the httpd.
<H3>
Case Study: Broadcast
</H3>
<P>
Also in January, we released our first major product based on this architecture
called <A HREF="http://www.digicool.com/products/broadcast/">Broadcast</A>.
This is
a Web-based chat application that had one primary goal: it should be very
fast
under very highly-loaded conditions. Some design choices were:
<DL>
<DT>
Perl-based CGI
<DD>
The product that ours was replacing started in life as a perl-based chat.
It became
very popular, or at least popular enough that many simultaneous users would
load
the system up too much. This suffered from the startup cost of an interpreter,
the
cost of reading the state in from disk, and from design issues for multiple
processes
changing the state.
<DT>
C-based CGI
<DD>
Same as the above, but moved to C. Still faced with the problems of state
and concurrency.
<DT>
CGI-based requester-daemon service
<DD>
One choice for solving the problem of state would be to have a long-running
server process that managed the state of the chat, and have skinny requesters
that
message the chat server from CGI over a socket. This design solves the problem
of
reincarnating state for each request. Also, it provides a DBMS-like function
for
modifying the state, since everything goes through one process.<BR>
However, there is still the cost of starting up a CGI requester on each hit,
and
the socket create/teardown issue. Also, you have invented a nice little
client-server
system that speaks your protocol, but no other. Plus, this protocol has
to be interpreted
on the wire, using your custom parser. Finally, the chat daemon must be
equipped with concurrency, or else it becomes a bottleneck.
<DT>
RPC service
<DD>
A more elegant version of the chat daemon strategy might be to use RPC to
the
chat server, either from a CGI requester or an API-based requester. This
would replace
your custom protocol, and would allow an API-based requester to keep connections
open.<BR>
On the other hand, you have produced a system that is procedure-oriented,
rather than
<A HREF="http://www-db.stanford.edu/~testbed/ilu/ilu20doc/manual_1.html#SEC4">object-oriented</A>.
</DL>
<P>
We chose to use an ILU Requester that would make generic calls on published
objects
that represented the chat site's components. This allowed us to have very
low latency
(by avoiding startup costs), and expose the OO design of the chat implementation.
<P>
It appears that the design choice was valid. Performance is fantastic, and
load is
low. Also, using the requester strategy, we now have many new possibilites
for
application partitioning. Finally, using our <A HREF="#api_scripting">API
Scripting</A>
infrastructure, we are able to add new features in a very coherent fashion.
<H3>
Description
</H3>
<P>
As alluded to above, the entire system is based around ILU. From this,
we get language-independence, cross-process communication, and
platform-independence.
<P>
The goal is to add an abstract object interface to an httpd in a uniform
way. For this, we wrote an interface for HTTP that encapsulates the behavior
of the HTTP transaction. We then implement this interface in C by mapping
it to the semantics of a particular httpd's API. This implementation is
called the <DFN>requester</DFN>, and gives an httpd a mechanism for passing
certain incoming requests to an ILU published object.
<P>
This architecture mimics the interaction between the browser and the
httpd using the same concepts as HTTP. For instance, the information
contained in the request is mapped into a Request type in the interface
specification. The requested object is a Resource, and the result of
the operation on the Resource is a Response. Both of the Request and
Response are types defined in the interface.
<P>
Fortunately, because of the uniform, abstract ISL, the services you write
do not have to know anything about the semantics of the server or its API.
In fact, it would be possible to skip the httpd altogether, and communicate
directly with the published object. The objects could do this: they can
have multiple representations, and can communicate via HTTP requests, ILU
requests, or some other request structure.
<P>
When writing a service, therefore, all you have to do is publish an
object that is based on the skeleton code generated from the interface.
Pretty standard stuff here. Then, if the published object is listed in
the httpd's configuration file, incoming requests matching a certain URI
form will be sent to the requester, which will make an ILU call to
the published object.
<P>
Also, it is possible to map the requester to remotely-published objects
using ILU's String Binding Handle mechanism. This makes it possible to
bridge the httpd into services available on other platforms. Future ILU
mechanisms will make this process easier.
<H2>
<A NAME="status">Current Status of Implementation</A>
</H2>
<P>
As of this writing, we have solid requesters based on ILU 1.8 and 2.0 for
Netsite (Unix) and Apache. They have been tested by others, reviewed for
optimizations, passed through simple memory leak testers, and documented.
We are making distributions freely available in source, and some in binary
form.
Currently, the requesters are known to work fine on Solaris, Digital Unix,
Linux, AIX, and BSDI. Additionally, we have preliminary support for NT.
Full support is waiting for us to finish up our work on threading with
ILU. Finally, ILU has been reported to work on OS/2, and there is work on
and Apache implementation for that platform.
<P>
The threading issue will become increasingly important as we build more
sophisticated systems, especially when we might want to have a common
ORB. However, for systems such as Netsite and Microsoft's IIS on NT,
as well as Spyglass' server, it is <EM>required</EM> in the requester.
This is because these platforms service each incoming request as a thread,
rather than passing the request to an isolated process. Thread-safeing the
requester is thus becoming a requirement.
<P>
We have just added support for aliasing multiply-published objects
inside the requester. For instance, you could make a request to info@system,
and have "system map" to one of several published objects. This is mainly
for
a performance increase in read-only situations. Note that this may be
subsumed by the ILU work on multicast.
<P>
Another area we are working on is making the object systems easier
to use. We have just added an HTTP header that gets returned, stating the
ILU
version and requester version. We are adding support for a discoverable
interface,
using a standard 'info@root' that is built in to every requester. This object
will
return catalogue information from config file directives, and will attempt
to
contact the ILU servers listed in the config file and get their 'info@root'
information, if implemented.
<H3>
<A NAME="api_scripting">API Scripting</A>
</H3>
<P>
Making an httpd able to call distributed objects is only half of the
system: you must have objects that can be called.
<P>
We have intended for this system to replace CGI as a server-extension
mechanism. To do this, it must be nearly as easy to create services as CGI
is currently. For this, we have been working on an infrastructure for
publishing requester-capable objects called <EM>API Scripting</EM>.
<P>
For creating services, we are focusing on Python, and building up a
toolset to of components. We have made parts of this toolset available,
and have released our demonstration programs and load testing modules.
Based on this Python toolset and the requester, we are fielding high-performance
Internet services for commercial use.
<P>
For instance, here is a very simple script in Python that publishes an
object which echoes the contents of a request:
<HR>
<PRE>
#!/usr/local/bin/python
"""Every good module deserves docstrings.
This is a very simple script that subclasses a Resource, fills in the
blanks, and echoes incoming Requests. It then publishes the object and
goes into a main loop.
"""
import ilu
import wwworb # our toolset
class ILUforDummies(wwworb.Resource):
def GET(self,request,connect):
request = wwworb.Request(request)
response = wwworb.Response(`request`)
return response
POST = GET
# Create an ILU server
ilu.CreateServer('paul.demos')
# Now, create an instance of your class, passing it a parameter
# for the name of the published object.
nitwit = ILUforDummies('dumb')
print nitwit.IluSBH()
ilu.RunMainLoop()
</PRE><P><HR>
<P>Note that there are really only nine necessary lines in the above. This
should put it into the realm of CGI for ease of use.
<P>Our next step is to make highly-concurrent systems available in Python. To do
this, we are working with the ILU team to thread the iluPrmodule. This work
is related to the work on threading the ILU kernel.
<P>For all of these, we have an emerging development group, and an infrastructure
for documentation, tutorials, bug reports, etc.
<H2><A NAME="interfaces">Examples of Interfaces</A>
</H2>
<P>
Currently, we have stabilized our
<A HREF="#http.isl">HTTP interface</A>, and feel that it accurately
represents the interaction between a browser and a client in a way useful
for published objects behind an httpd. Therefore, we are now focusing on
problem-specific interfaces.
<P>
First, we would like to have a discoverable interface for online services.
For instance, one should be able to go to any requester-enabled site, send
a
request to an <TT>info@root</TT> published object, and get an inventory of
that
site. The contents of this inventory might vary, might support a set of minimum
operations, might extend, and might change. All the things that an
<DFN>interface</DFN>
allows you to do over time.
<P>
This discoverable interface is being worked on. There are other interfaces
that have already rolled out.
<H3>
<A NAME="authorizer.isl">The Authorizer ISL</A>
</H3>
<P>
Most of our "API scripting" services involve persistent Python objects
that receive requests from the ILU main loop. Some of these services need
some type of Access Control List (ACL) mechanism on them. However, we
really don't want to interface into some external, httpd-controlled,
single-filesystem-based password file.
<P>
The API-based servers have modules already that allow you store
user authentication information in a SQL table. Yet, we already have
users defined in our object system. Moreover, we might want to have some
instance-based authorization mechanism.
<P>
To extend the ACL-capabilities of the httpd, we wrote an Authorizer
interface, and implemented it into the APIs we support. Thus, accesses to
protected URIs are mapped to an object call, which determines if that
operation is allowed by that identity.
<H3>
<A NAME="logger.isl">The Logger ISL</A>
</H3>
<P>
Another area we wanted to standardize on was an intelligent logging
mechanism. Currently, there is the Common Log File format for writing to
disk. However, we wanted something more structured and more dynamic. Thus,
we wrote an interface for logging, mapped it into the httpd's API functions,
and created an installable logging facility. A smart implementation could
publish an object which is registered for successful or unsuccessful requests
to document or ILU-based requests. If there is an error, you could decide
whether send a page to someone's beeper. For all requests, you can take
the incoming data structure, and write parts of it into a miniSQL table.
<P>
We wanted to extend the httpd's logging facilities in new and interesting
ways. For instance, we wanted to do processing and take special actions if
an error was raised. Also, we wanted to investigate logging from a Unix
httpd into a Windows-based personal DBMS like Microsoft Access.
<P>
To do this, we made an interface for loggging, and implemented it on the
APIs we support. This mapping forces the httpd to run our object call during
logging events. The interface is very simple; it just
sends the Request object to LoggerObject via an asynchronous method. One
could then subtype from there to do more interesting, platform-specific
things.
<H3>
The Stanford Digital Library Common Object Services
</H3>
<P>
The Stanford Digital Library team has produced interfaces and implementations
for CORBA-type
<A HREF="http://diglib.stanford.edu/diglib/pub/software/testbed/cos/">
Common Object Services (COS)</A>. Common Object Services are objects or groups
of
objects that provide the basic requirements which most objects need in order
to
function in a distributed environment. These services are designed to be
generic;
they do not depend on the type of client object or type of data passed. Note:
this
is hard to do in ILU since there is no concept of the Object or Any type.
<H3>
Other Interfaces
</H3>
<P>
There are other good candidates for interfaces. For instance, the Harvest
system has its own protocol for collecting indexing information, and doing
searches. If an interface was written, it could perhaps be moved into this
architecture.
<P>
We have started on some other standard interfaces, such as a Data Access
interface and a an OLE interface (via Python). These, though, are not necessarily
related to the ILU Requester, and are thus outside the scope of this paper.
<H2>
<A NAME="performance">Performance Analysis</A>
</H2>
<P>
It should be apparent that the architecture lends itself to good performance.
However, we felt that some performance numbers were important, so we came
up
with a performance-testing program, and a regimen to exercise it.
<P>
To test, I used a Sparc 5 running Solaris 2.4 with 32 Mb as the testing client,
and an Alpha with 64 Mb running Digital Unix as the testing server. The test
program was written in Python, and used the httplib and thread modules to
make
concurrent requests. The server was running Netsite, using our requester
and ILU 2.0.
We had Netsite configured to use up to 32 processes.
<P>
We then ran through a series or URLs (listed below) in a series of two tests:
a
latency test and a throughput test. The latency test sent a series of requests
on one
thread, to test the response time. The throughput test dispatched the same
number of
requests on several simultaneous threads, to test concurrent use. Thus, the
throughput test attempted to detail the affects of load, concurrency, and
aggregate response time for a batch of requests.
<P>
For the URIs, the index.html test merely retrieved a very short HTML file.
The
others were:
<HR>
<PRE>
simple.sh
---------
#!/bin/sh
echo 'Content-type: text/html\n\n'
echo Hello.
simple.pl
---------
#!/usr/local/bin/perl
print "Content-type: text/html\n\n";
print "hello.";
simple.py
---------
#!/usr/users/paul/cgipython
"""Simple script to echo the dictionary back.
"""
print 'Content-type: text/html\n\n'
print 'Hello.'
simple1.py
----------
#!/usr/users/paul/cgipython
"""\
Simple script to echo the dictionary back.
"""
print 'Content-type: text/html\n\n'
import simple_lib
simple_lib.py
-------------
#!/usr/users/paul/cgipython
"""\
Simple script to echo the dictionary back.
"""
import cgi
f = cgi.SvFormContentDict()
print f.items()
dumb.py
-------
# Note: echo is equivalent to simple.py, and dumb is equivalent to simple1.py
"""\
The simplest, dumbest API script around.
This Python program has one goal: fewest lines for an interactive script.
The script reads the form variables, and sends them back, without very much
formatting.
Note that we have embedded the HTML into the class, which has added some
characters. Normally this class would be even shorter, as we would use the
"pyhtml" external representation. But, that would be smart, and this one is,
well, dumb.
"""
import sys
sys.path.append('wwworb')
sys.path.append('interfaces')
import string, ilu, wwworb
print ilu.Version
# Make a class derived from the Resource class in wwworb. Remember
# that the base class (wwworb.Resource) requires a parameter to be
# passed to its __init__ startup call. This parameter is the name
# of the published object.
class EchoforDummies(wwworb.Resource):
def GET(self,request,connect):
return wwworb.Response('Hello.')
POST = GET
class ILUforDummies(wwworb.Resource):
def GET(self,request,connect):
request = wwworb.Request(request)
response = wwworb.Response(`request`)
return response
POST = GET
# Create an ILU server
ilu.CreateServer('paul.demos')
# Now, create an instance of your class, passing it a parameter
# for the name of the published object.
nitwit = ILUforDummies('dumb')
echo = EchoforDummies('echo')
print nitwit.IluSBH()
print echo.IluSBH()
ilu.RunMainLoop()
</PRE><P><HR>
<P>These scripts were chosen to reflect both the least that could be done with
a CGI script (echo back a string) vs. very little that could be done (parse
the incoming request into data structures, and echo it back). The Bourne
shell script and the Perl script are thrown in as reference points. It is
the comparison of Python scripts that is relevant.
<P>The Python interpreter used for the CGI scripts was very small. I removed
nearly everything from the Modules setup, and did not link with threads (a
source of startup time problems on Digital Unix). I used Python 1.3 for all
of these, and ILU 2.0a3.
<P>Thus, the comparison is between Python CGI and Python "API Scripting".
The two tests are a simple echo of a string, and a slightly-computational
parsing of the incoming information. Obviously, a real-world application,
where files have to read, or marshals loaded, or databases connected-to,
would tilt the scales towards API scripting, since the state is always in
memory.
<P>In the following, <B>HPS</B> refers to hits per second,
<B>SPH</B> refers to seconds per hit, and <B>SD</B> refers
to standard deviation.
<H3>Latency test
</H3>
<P>
This test used 10 runs of 1 thread, 20 requests on the thread:
<TABLE BORDER=1>
<TR>
<TD>
URI
</TD>
<TD>
Min
</TD>
<TD>
Max
</TD>
<TD>
Avg<BR>HPS
</TD>
<TD>
SD<BR>(HPS)
</TD>
<TD>
Avg<BR>SPH
</TD>
<TD>
SD<BR>(SPH)
</TD>
</TR>
<TR>
<TD>
/index.html
</TD>
<TD>
0.142
</TD>
<TD>
0.580
</TD>
<TD>
4.774
</TD>
<TD>
0.1394
</TD>
<TD>
0.2097
</TD>
<TD>
0.0066
</TD>
</TR>
<TR>
<TD>
/cgi-bin/simple.sh
</TD>
<TD>
0.171
</TD>
<TD>
0.221
</TD>
<TD>
4.814
</TD>
<TD>
0.0074
</TD>
<TD>
0.2077
</TD>
<TD>
0.0003
</TD>
</TR>
<TR>
<TD>
/cgi-bin/simple.pl
</TD>
<TD>
0.182
</TD>
<TD>
0.224
</TD>
<TD>
4.821
</TD>
<TD>
0.0349
</TD>
<TD>
0.2074
</TD>
<TD>
0.0015
</TD>
</TR>
<TR>
<TD>
/cgi-bin/simple.py
</TD>
<TD>
0.176
</TD>
<TD>
0.222
</TD>
<TD>
4.825
</TD>
<TD>
0.0133
</TD>
<TD>
0.2073
</TD>
<TD>
0.0006
</TD>
</TR>
<TR>
<TD>
/cgi-bin/simple1.py?x=1&amp;y=2&amp;z=3&amp;z=4&amp;z=5
</TD>
<TD>
0.382
</TD>
<TD>
0.566
</TD>
<TD>
2.315
</TD>
<TD>
0.0436
</TD>
<TD>
0.4321
</TD>
<TD>
0.0083
</TD>
</TR>
<TR>
<TD>
/echo@paul.demos
</TD>
<TD>
0.111
</TD>
<TD>
0.847
</TD>
<TD>
4.687
</TD>
<TD>
0.4048
</TD>
<TD>
0.2152
</TD>
<TD>
0.0235
</TD>
</TR>
<TR>
<TD>
/dumb@paul.demos
</TD>
<TD>
0.182
</TD>
<TD>
0.351
</TD>
<TD>
4.824
</TD>
<TD>
0.0703
</TD>
<TD>
0.2073
</TD>
<TD>
0.0031
</TD>
</TR>
</TABLE>
<H3>
Throughput test
</H3>
<P>
This test used 10 runs of 10 threads, 20 requests apiece. In this
case, the Min and Max refer to the thread completion times:
<TABLE BORDER=1>
<TR>
<TD>
URI
</TD>
<TD>
Min
</TD>
<TD>
Max
</TD>
<TD>
Avg<BR>HPS
</TD>
<TD>
SD<BR>(HPS)
</TD>
<TD>
Avg<BR>SPH
</TD>
<TD>
SD<BR>(SPH)
</TD>
</TR>
<TR>
<TD>
/index.html
</TD>
<TD>
0.060
</TD>
<TD>
1.402
</TD>
<TD>
20.853
</TD>
<TD>
0.4736
</TD>
<TD>
0.0480
</TD>
<TD>
0.0011
</TD>
</TR>
<TR>
<TD>
/cgi-bin/simple.sh
</TD>
<TD>
0.106
</TD>
<TD>
1.279
</TD>
<TD>
15.128
</TD>
<TD>
0.4986
</TD>
<TD>
0.0662
</TD>
<TD>
0.0022
</TD>
</TR>
<TR>
<TD>
/cgi-bin/simple.pl
</TD>
<TD>
0.119
</TD>
<TD>
1.354
</TD>
<TD>
13.626
</TD>
<TD>
0.3116
</TD>
<TD>
0.0734
</TD>
<TD>
0.0017
</TD>
</TR>
<TR>
<TD>
/cgi-bin/simple.py
</TD>
<TD>
0.143
</TD>
<TD>
1.926
</TD>
<TD>
9.155
</TD>
<TD>
0.1296
</TD>
<TD>
0.1093
</TD>
<TD>
0.0015
</TD>
</TR>
<TR>
<TD>
/cgi-bin/simple1.py?x=1&amp;y=2&amp;z=3&amp;z=4&amp;z=5
</TD>
<TD>
0.738
</TD>
<TD>
4.817
</TD>
<TD>
2.597
</TD>
<TD>
0.0092
</TD>
<TD>
0.3850
</TD>
<TD>
0.0014
</TD>
</TR>
<TR>
<TD>
/echo@paul.demos
</TD>
<TD>
0.093
</TD>
<TD>
1.221
</TD>
<TD>
20.362
</TD>
<TD>
0.6996
</TD>
<TD>
0.0492
</TD>
<TD>
0.0017
</TD>
</TR>
<TR>
<TD>
/dumb@paul.demos
</TD>
<TD>
0.109
</TD>
<TD>
1.597
</TD>
<TD>
19.862
</TD>
<TD>
0.6895
</TD>
<TD>
0.0504
</TD>
<TD>
0.0017
</TD>
</TR>
</TABLE>
<P>
Understand that the HPS and SPH numbers on the throughput test
reflect the ability of the server to service multiple requests
simultaneously. Thus, each hit effectively is done faster.
<H3>
Analysis
</H3>
<P>
Looking at the latency tests, you see that HTML and the simple CGI
scripts are about the same HPS. These simple scripts don't parse the
environment, and thus do no calculation. The simple1.py script which
does parse the environment and imports a module suffers a 50% rise
in latency. Yet, the API Scripting apps stay at the same level as
the HTML and simple CGI, even though one is parsing the environment.
<P>
In the throughput test, the reference point -- the index.html file --
shows that a 10-thread request gets just over a five-fold bump in
throughput. Certainly not a ten-fold, but a enough to show that it is
handling simultaneous requests well. However, the CGI scripts start to
show less benefit. Yet, the API Scripting applications stay at the
<P>
<TABLE BORDER=1>
<TR>
<TD>
URI
</TD>
<TD>
Percent of<BR>single-threaded HPS
</TD>
</TR>
<TR>
<TD>
/index.html
</TD>
<TD>
437
</TD>
</TR>
<TR>
<TD>
/simple.sh
</TD>
<TD>
314
</TD>
</TR>
<TR>
<TD>
/simple.pl
</TD>
<TD>
282
</TD>
</TR>
<TR>
<TD>
/simple.py
</TD>
<TD>
190
</TD>
</TR>
<TR>
<TD>
/simple1.py??x=1&amp;y=2&amp;z=3&amp;z=4&amp;z=5
</TD>
<TD>
112
</TD>
</TR>
<TR>
<TD>
/echo@paul
</TD>
<TD>
434
</TD>
</TR>
<TR>
<TD>
/dumb@paul
</TD>
<TD>
411
</TD>
</TR>
</TABLE>
<P>
If we consider getting an HTML file -- both in single-threaded and
ten-threaded batches -- to be a baseline, we see the relation of these
tests. Again, we see that getting an HTML file gets a four-fold bump
from a ten-thread batch. A simple Bash CGI script yields a three-fold
improvement (317 percent) over single-thread HTML batches. A simple
CGI script that parses the environment, run in ten-threaded batches,
achieves <EM>only half</EM> the aggregate throughput of a single-threaded
HTML request. Thus, concurrent CGI is slower than single-request HTML.
Again, the API Scripting applications keep pace with the baseline.
<P>
<TABLE BORDER=1>
<TR>
<TD>
URI
</TD>
<TD>
1-thread % of<BR>1-thread HTML
</TD>
<TD>
10-thread % of<BR>1-thread HTML
</TD>
<TD>
10-thread % of<BR>10-thread HTML
</TD>
</TR>
<TR>
<TD>
/index.html
</TD>
<TD>
100
</TD>
<TD>
437
</TD>
<TD>
100
</TD>
</TR>
<TR>
<TD>
/simple.sh
</TD>
<TD>
101
</TD>
<TD>
317
</TD>
<TD>
73
</TD>
</TR>
<TR>
<TD>
/simple.pl
</TD>
<TD>
101
</TD>
<TD>
285
</TD>
<TD>
65
</TD>
</TR>
<TR>
<TD>
/simple.py
</TD>
<TD>
101
</TD>
<TD>
192
</TD>
<TD>
44
</TD>
</TR>
<TR>
<TD>
/simple1.py??x=1&amp;y=2&amp;z=3&amp;z=4&amp;z=5
</TD>
<TD>
49
</TD>
<TD>
54
</TD>
<TD>
12
</TD>
</TR>
<TR>
<TD>
/echo@paul
</TD>
<TD>
98
</TD>
<TD>
427
</TD>
<TD>
98
</TD>
</TR>
<TR>
<TD>
/dumb@paul
</TD>
<TD>
101
</TD>
<TD>
416
</TD>
<TD>
95
</TD>
</TR>
</TABLE>
<P>
In the rightmost column above, which is a throughput measurement, an API
Scripting application is over <EM>eight times faster</EM> than an equivalent
CGI application.
<P>
A conclusion is that, even for simple state applications of reading in the
form
data, CGI loses to API Scripting in latency, and loses significantly in
concurrent use. It would appear that the performance win would increase even
more
for complex applications, especially those that have to initialize some
state, or make a connection to a SQL database. Getting the state setup for
these is more complicated, and the increase in latency and load mean
pileups for service.
<P>
A caveat in the testing must be noted. A more representative sample of
API scripting vs. HTML would be to use an ILU C program and an API C program.
This would also allow the testing of HTML vs. CGI vs. straight API C apps
vs.
ILU Requester with objects written in C.
<H2>
<A NAME="issues">Outstanding Issues</A>
</H2>
<P>
At this time, movement to ILU 2.0 is the biggest issue. First,
there are some minor bugs with the current prerelease. The real issue is
embracing some new capabilities:
<UL>
<LI>
poor ILU support for bulk data (e.g. RPC limit)
<LI>
missing
<A HREF="http://www.w3.org/pub/WWW/OOP/interfaces/vhll.isl">data types</A>
(soon to be
<A HREF="http://www-diglib.stanford.edu/ilu/ilu-archive/0611.html">alleviated
in Python</A>)
<LI>
investigation of Stanford Digital Library's COS (mentioned above)
<LI>
distributed concurrency and threading
<LI>
performance of surrogate object references
<LI>
true object inside httpd process
</UL>
<P>
For more on this, see
<H2>
<A NAME="future">Future Plans</A>
</H2>
<P>
We have a number of directions we intend to pursue internally, and a
suggested direction for industry adoption.
<H3>
Internal
</H3>
<P>
Some of our plans are:
<UL>
<LI>
discoverable interface for debugging and cataloging
<LI>
better performance numbers
<LI>
better story on concurrency
</UL>
<H3>
Industry
</H3>
<P>
Some requirements:
<UL>
<LI>
Use of ILU
<LI>
Integration of the ILU runtime into their product
<LI>
Support of a basic HTTP ISL
<LI>
Use of standard Resource, Request, and Response mechanism
<LI>
Mapping the HTTP spec's error codes into HTTP exceptions
<LI>
Ensure the safety of concurrent requesters running across threads or
forked daemons
</UL>
<P>
Some optional support:
<UL>
<LI>
Extensions of the base HTTP ISL to expose advanced functionality within
the ILU type system
<LI>
Support for discoverable objects
<LI>
Connections through native ILU protocol
<LI>
Publishing true objects inside the httpd for high-performance apps
<LI>
Agreement on reference implementation suite for compliance testing
and performance testing
</UL>
<H2>
<A NAME="alternatives">Alternatives</A>
</H2>
<P>
Many ideas have floated around. Press releases have discussed, for instance,
embedding Java inside of Web servers as a better fit than APIs. While this
does
get many of the benefits of this architecture, it is language-based, and
thus
does not have language-independent interfaces. Some, though, view this as
a benefit.
<P>
Another option is <A HREF="http://ring.etl.go.jp/openlab/horb/">HORB</A>,
which is a Java-based remote object
operation environment. From the
<A HREF="http://ring.etl.go.jp/openlab/horb/doc/faq.htm">HORB FAQ</A>:
<BLOCKQUOTE>
I wanted to have a good language for parallel and distributed computing.
For those purposes,
however, the classic Java has very poor functionality. I like Java because
it's simple and
easy. But the basic idea of Java is not far from C++. C++ can also make objects,
threads,
and sockets. Java has no direct support for distributed object processing
as C++ does not.
So I decided to make a new framework for parallel and distributed computing.
</BLOCKQUOTE>
<P>
Also in the FAQ, a comparison of HORB to CORBA:
<BLOCKQUOTE>
CORBA and CORBA2 are desinged for Interoperability between different languages
and
different systems. You have to write interface definitions in CORBA IDL language
in
addition to real code. It must be annoying for casual use. CORBA cannot pass
instances.
It limits programming. CORBA ORB tends to huge to comply the CORBA standard.
Since HORB
ORB for clients is only 20KBytes, modem users can wait for dynamic loading.
Current CORBA
systems are very expensive. HORB is free of charge.
</BLOCKQUOTE>
<P>
As stated on a
<A HREF="http://ring.etl.go.jp/openlab/horb/examples/worldClock/WorldClock.htm">demo
page</A>,
HORB aims to "replace CGI or socket programming with smart remote object
operations of HORB".
<P>
For Windows-based platforms, Microsoft's server-extension solution in their
<A HREF="http://www.microsoft.com/INFOSERV/">IIS WWW server</A> is an
<A HREF="http://www.microsoft.com/intdev/iis/iis.htm">SDK</A>. One of the
sample applications for their API is an
<A HREF="http://www.microsoft.com/intdev/inttech/oleisapi.htm">OLE
interface</A>.
<H2>
<A NAME="references">References</A>
</H2>
<UL>
<LI>
CGI spec, ILU, Python
<LI>
Dan's web
<LI>
API thread in www-talk archives
<LI>
Our releases and HTTP.isl
</UL>
<H2>
<A NAME="appendices">Appendices</A>
</H2>
<H3>
<A NAME="http.isl">The HTTP ISL</A>
</H3>
<P>
The interface for HTTP is used to extend the WWW server by mapping the
browser-server interaction to an object request. We used the latest
HTTP specification, as mentioned in the comment.
<PRE>
(* $Id: WD-ilu-requestor-960307.html,v 1.6 1996/12/09 03:45:26 jigsaw Exp $ *)
(*
Proposed HTTP interface
Digital Creations &lt;info@digicool.com&gt;
Reference: http://www.w3.org/pub/WWW/Protocols/HTTP1.0/draft-ietf-http-spec.html
*)
(*
The following is a list of headers guaranteed to be included with
the request, regardless of the requester used. This list is probably
incomplete and will grow as I become more familiar with requesters
other than NetSite:
In "request.headers":
None
In "connection":
"remote-ip" == the IP address of the remote client
"remote-name" == the name of the remote clinet, or the IP
address if the name cannot be determined
*)
INTERFACE http;
TYPE field-name = ilu.CString;
TYPE field-value = ilu.CString;
TYPE optional-field-value = OPTIONAL field-value;
TYPE RequestURI = ilu.CString;
(* Should we handle URI parsing???
TYPE RequestURI = RECORD
scheme : ilu.CString,
net_loc : ilu.CString,
path : ilu.CString,
params : ilu.CString,
query : ilu.CString,
fragment : ilu.CString
END;
*)
TYPE Header = RECORD
name : field-name,
value : optional-field-value
END;
TYPE HTTPHeader = Header;
TYPE HTTPHeaders = SEQUENCE of HTTPHeader;
TYPE EntityBody = SEQUENCE of BYTE;
TYPE OptionalEntityBody = OPTIONAL EntityBody;
TYPE Request = RECORD
URI : RequestURI,
headers : HTTPHeaders,
body : OptionalEntityBody
END;
TYPE StatusCode = ENUMERATION
OK = 200,
Created = 201,
Accepted = 202,
NoContent = 204,
MovedPermanently = 301,
MovedTemporarily = 302,
NotModified = 304,
BadRequest = 400,
Unauthorized = 401,
Forbidden = 403,
NotFound = 404,
InternalError = 500,
NotImplemented = 501,
BadGateway = 502,
ServiceUnavailable = 503
END;
TYPE Response = RECORD
status : StatusCode,
headers : HTTPHeaders,
body : OptionalEntityBody
END;
TYPE ConnectionParameter = Header;
TYPE Connection = SEQUENCE of ConnectionParameter;
TYPE Resource = OBJECT
METHODS
GET (request: Request, connection: Connection) : Response,
HEAD (request: Request, connection: Connection) : Response,
POST (request: Request, connection: Connection) : Response
END;
TYPE OptionalResource = OPTIONAL Resource;
</PRE><H3><A NAME="logger.isl">The Logger ISL</A>
</H3>
<PRE>
(* $Id: WD-ilu-requestor-960307.html,v 1.6 1996/12/09 03:45:26 jigsaw Exp $ *)
(* I've thought about just eliminating this ISL and using HTTP to do
logging, but I'm sticking with this right now to allow logging to be
asynchronous. Comments? *)
(*
The following list is are the name-value pairs that must be contained
in the headers (the separate requesters may include their own unique
headers, and various clients might send different headers which should
be passed along here):
"content-length" == the length in bytes of the returned data
"content-type" == the mime type of the returned data
"method" == the method of the request
"remote-ip" == the IP address of the remote client
"remote-name" == the name of the remote client, or the IP address if the name cannot be determined
"status" == the status code of the response
"uri" == the URI of the request
*)
INTERFACE logger IMPORTS ilu, http END;
TYPE LoggerObject = OBJECT
METHODS
ASYNCHRONOUS LogRequest(params: http.HTTPHeaders)
END;
</PRE><H3><A NAME="authorizer.isl">The Authorizer ISL</A>
</H3>
<PRE>
(* $Id: WD-ilu-requestor-960307.html,v 1.6 1996/12/09 03:45:26 jigsaw Exp $ *)
INTERFACE authorize IMPORTS http END;
TYPE NameType = ilu.CString;
TYPE GroupList = SEQUENCE OF ilu.CString;
EXCEPTION AuthenticationFailed;
EXCEPTION Forbidden;
EXCEPTION AuthorizationRequired: ilu.CString;
TYPE AuthorizationRecord = RECORD
name: ilu.CString,
groups: GroupList
END;
TYPE OptionalAuthorizationRecord = OPTIONAL AuthorizationRecord;
TYPE Authenticator = OBJECT
METHODS
AuthenticateUser(name: NameType, password: ilu.CString): AuthorizationRecord RAISES AuthenticationFailed END
END;
TYPE Authorizer = OBJECT
METHODS
AuthorizeUser(authorization-record: OptionalAuthorizationRecord) RAISES Forbidden, AuthorizationRequired END
END;
</PRE><H2><A NAME="author">Author Info</A>
</H2>
<P>
Paul Everitt is Vice President of
<A HREF="http://www.digicool.com/">Digital Creations</A>. His email address
is <A HREF="mailto:paul@digicool.com">paul@digicool.com</A>.
</DIV>
</BODY></HTML>