You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
458 lines
15 KiB
458 lines
15 KiB
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
|
|
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
|
|
<html xmlns="http://www.w3.org/1999/xhtml">
|
|
<head>
|
|
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
|
|
<title>Semantic Web Application Integration: Travel Tools</title>
|
|
<link rel="stylesheet" href="../doc/style.css" />
|
|
<style type="text/css">
|
|
.mechanics { background-color: #b6b6b6; }
|
|
.figure { text-align: center }
|
|
</style>
|
|
</head>
|
|
|
|
<body>
|
|
<div class="noprint">
|
|
<a href="../../../../">W3C</a> | <a href="../../../01/sw/">SWAD</a> | <a
|
|
href="../">SWAP</a> | <a href="../doc/">Tutorial</a>
|
|
</div>
|
|
|
|
<h1>Semantic Web Application Integration: Travel Tools</h1>
|
|
|
|
<p><em>The bane of my existence is doing things I know the computer
|
|
could do for me.</em> When I got my proposed July 2001 travel
|
|
itinerary in email, I just couldn't bear the thought of manually
|
|
copying and pasting each field from the itinerary into my PDA calendar. I
|
|
started putting the Semantic Web approach to application integration
|
|
to work.</p>
|
|
|
|
<p>The Semantic Web approach to application integration emphasizes
|
|
data about real-world things like people, places, and events over
|
|
document structure. Documents are important real-world things too, of
|
|
course. And Semantic Web data formats benefit from the
|
|
internationalization support in XML and the growing infrastructure of
|
|
tools. But most XML schemas are too constrained, syntactically, and
|
|
not constrained enough, semantically, to accomplish these integration
|
|
tasks:</p>
|
|
|
|
<ul>
|
|
<li><a href="#map-viz">plot an itinerary on a map</a></li>
|
|
<li><a href="#ical-evo">import travel itineraries into my iCalendar-happy desktop PIM</a></li>
|
|
<li><a href="#pda-in">import travel itineraries into my PDA calendar</a></li>
|
|
<li><a href="#plain-text-sum">produce a brief summary of an itinerary for use in plain text email</a></li>
|
|
<li><a href="#ckcn">check proposed work travel itineraries against family constraints</a></li>
|
|
<li><a href="#fw">tell me when my travel schedule brings me unusually near a friend/colleague</a></li>
|
|
<li><a href="#fw">produce animated views of my travel schedule or past trips</a></li>
|
|
<li><a href="#fw">find conflicts between teleconferences and flights</a></li>
|
|
</ul>
|
|
|
|
|
|
<h2 id="grokLeg">Working with legacy data</h2>
|
|
|
|
<p>While more and more of the data in our lives is available
|
|
in the Semantic Web, there will always be a place for mechanisms
|
|
that extract the statements implicit in legacy data.</p>
|
|
|
|
<p>The data comes from the travel agency like this, probably
|
|
dumped from their database system:</p>
|
|
|
|
<pre>07 APR 03 - MONDAY
|
|
AIR AMERICAN AIRLINES FLT:3199 ECONOMY
|
|
OPERATED BY AMERICAN AIRLINES
|
|
LV KANSAS CITY INTL 940A EQP: MD-80
|
|
DEPART: TERMINAL BUILDING B 01HR 36MIN
|
|
AR DALLAS FT WORTH 1116A NON-STOP</pre>
|
|
|
|
<p>I hope that before too long they'll dump it from their
|
|
database directly into RDF
|
|
or perhaps in XML using some travel industry vocabulary, but</p>
|
|
|
|
<ul>
|
|
|
|
<li>before they'll do so, somebody will have to show them why it's
|
|
valuable</li>
|
|
|
|
<li>sometimes, for the short term, reverse-engineering the structure
|
|
of their data is cheaper than getting them to change their
|
|
processes</li>
|
|
|
|
</ul>
|
|
|
|
<p>So I wrote a perl script (<a
|
|
href="grokTravItin.pl">grokTravItin.pl</a>) to extract
|
|
statements from the data:</p>
|
|
|
|
<p><img src="grokLeg.png" alt="itin.txt -> itin.n3/rdf" /></p>
|
|
|
|
<p>The output of the perl script, <tt>itin.nt</tt>,
|
|
is in <a
|
|
href="http://www.w3.org/TR/rdf-testcases/#ntriples">n-triples</a>, a
|
|
line-oriented serialization developed in the RDF Core working group
|
|
for testing parsers. For visual inspection and debugging, we use cwm
|
|
to pretty-print it in N3. The results look like this:</p>
|
|
|
|
<pre>
|
|
:_gflt3199_3 a :_gECONOMY_5;
|
|
k:endingDate :_gdayMONDAY07_2;
|
|
k:fromLocation <http://www.daml.org/cgi-bin/airport?MCI>;
|
|
k:startingDate :_gdayMONDAY07_2;
|
|
k:toLocation <http://www.daml.org/cgi-bin/airport?DFW>;
|
|
t:arrivalTime "11:16";
|
|
t:carrier :_gAMERICANAIRLINES_4;
|
|
t:departureTime "09:40";
|
|
t:flightNumber "3199" .
|
|
|
|
:_gAMERICANAIRLINES_4 a k:AirlineCompany;
|
|
k:nameOfAgent "AMERICAN AIRLINES" .
|
|
|
|
:_gECONOMY_5 r:value "ECONOMY" .
|
|
|
|
:_gdayMONDAY07_2 a k:Monday;
|
|
dt:date "2003-04-07" .
|
|
|
|
|
|
</pre>
|
|
|
|
<h3>Choosing a Vocabulary: Build Or Buy?</h3>
|
|
|
|
|
|
<p>The import script not only bridges the syntactic
|
|
gap between the legacy data and RDF, but it also translates
|
|
the vocabulary of terms used in the data into URI space.
|
|
This raises the classic build-or-buy choice:</p>
|
|
|
|
<dl>
|
|
<dt>use (buy) an existing, general-purpose vocabulary</dt>
|
|
|
|
<dd>If we can accept the risk of putting word's into the source's
|
|
mouth, we can benefit from an economy of scale of shared vocabulary
|
|
such as the <a
|
|
href="http://www.cyc.com/cycdoc/vocab/transportation-vocab.html">cyc
|
|
ontology of common sense transportation terms</a>.</dd>
|
|
|
|
<dt>build a vocabulary just for this purpose</dt>
|
|
<dd>A special-purpose vocabulary isolates the data
|
|
from risks of version skew and such.</dd>
|
|
</dl>
|
|
|
|
<p>Early versions of the import script used a special-purpose
|
|
vocabulary; rules to relate this vocabulary to other
|
|
vocabularies were developed one at a time. But eventually
|
|
a pattern of using the general purpose cyc ontology
|
|
emerged, and the expected benefit of maintaining the
|
|
special-purpose ontology was dominated by the cost.
|
|
More recent versions convert directly to terms
|
|
in shared ontologies, except in the case where
|
|
custom terms were needed:</p>
|
|
|
|
<div class="figure"><img alt="travel terms" src="travelFig.png" />
|
|
|
|
<div class="noprint mechanics" >if your desktop is SVG-happy, <a href="travelFig.svg">travelFig.svg</a> is
|
|
nicer. I drew it with dia; you can play with <a
|
|
href="travelFig.xml">travelFig.xml</a>, the source. I'm working on <a
|
|
href="/2002/03owlt/umlp/dia2owl.xsl">dia2owl.xsl</a>, which converts it to
|
|
RDF/S: <a href="travelFig.rdf">travelFig.rdf</a>.
|
|
|
|
<p>TODO: more stuff in the diagram? socialParticipants, eventOccursAt, etc.</p>
|
|
|
|
</div>
|
|
</div>
|
|
|
|
|
|
<ul>
|
|
<li style="color: red">cyc terms (prefix <tt>k:</tt>) in red</li>
|
|
<li style="color: purple">DAML airport ontology terms in purple</li>
|
|
<li style="color: green">custom travelTerms (prefix <tt>t:</tt>) in green.
|
|
e.g. <code>departureTime</code>, <code>flightNumber</code>, ...; see <a
|
|
href="travelTerms">travelTerms</a>, in <a href="travelTerms.rdf">RDF/xml</a>,
|
|
<a href="travelTerms.rdf">RDF/n3</a>.
|
|
</li>
|
|
<li style="color: blue">RDF standard terms (prefix <tt>r:</tt>)in blue</li>
|
|
<li style="color: orange">XML Schema terms (prefix <tt>dt:</tt>)in orange</li>
|
|
</ul>
|
|
|
|
<p>Note that <strong>mixing vocabularies in RDF is easy</strong>; so
|
|
easy, compared with the general problem of mixing XML namespaces, that
|
|
I hardly notice it at all. Within the basic subject/predicate/object
|
|
abstract syntax, terms can be combined freely. Migrating to more
|
|
specialized or more generalized terms is cheap, using
|
|
<tt>rdfs:subPropertyOf</tt> and the like.</p>
|
|
|
|
<div class="noprint"> <em>(@@ok to just say that
|
|
without demonstrating it?)</em></div>
|
|
|
|
<h2 id="map-viz">Integration with mapping tools</h2>
|
|
|
|
<p>Let's exploit the effort we have put into going beyond formalized
|
|
document structure into formalized data about the real world.
|
|
Folks in the <a href="http://www.daml.org/">DAML project</a> have
|
|
imported airport lat/long data into the semantic web; we
|
|
can use log:semantics to reach out and get it with rules like
|
|
these, excerpted from <a href="airportLookup.n3">airportLookup.n3</a>:
|
|
</p>
|
|
|
|
<pre>
|
|
# well-known airports...
|
|
{ :X a :Y; #@@kludge...
|
|
log:uri [ str:startsWith "http://www.daml.org/cgi-bin/airport?" ] }
|
|
log:implies { :X a :AirportKnownToDAML }.
|
|
|
|
{ :X apt:iataCode :K.
|
|
:Y log:uri [ is str:concatenation of
|
|
("http://www.daml.org/cgi-bin/airport?" :K) ];
|
|
}
|
|
log:implies { :Y a apt:Airport; apt:iataCode :K; = :X }.
|
|
|
|
# we only want to look up certain airports...
|
|
{ [ k:toLocation :X ]. }
|
|
log:implies { :X a :InterestingPlace }.
|
|
{ [ k:fromLocation :X ]. }
|
|
log:implies { :X a :InterestingPlace }.
|
|
|
|
|
|
# believe what daml.org says about airport latitutde/longitudes...
|
|
:AirportProperty is rdf:type of
|
|
apt:latitude,
|
|
apt:name,
|
|
apt:iataCode,
|
|
apt:icaoCode,
|
|
apt:location,
|
|
apt:latitude,
|
|
apt:longitude,
|
|
apt:elevation.
|
|
|
|
{
|
|
:P a :AirportProperty.
|
|
[ a :AirportKnownToDAML, :InterestingPlace;
|
|
log:semantics [
|
|
log:includes {
|
|
:IT :P :V.
|
|
}
|
|
] ].
|
|
} log:implies {
|
|
:IT a apt:Airport; :P :V.
|
|
}.
|
|
</pre>
|
|
|
|
|
|
<p>For the convenience of consumers (including ourselves), we publish
|
|
in RDF/XML the results of reaching out with the rules; i.e. the
|
|
itinerary including the lat/long info. Then we use the (<em>little
|
|
documented</em>) <tt>cwm --strings</tt> output mode to generate two
|
|
files, <tt>itin-arcs</tt> and <tt>itin-markers</tt>, as input to <a
|
|
href="http://xplanet.sourceforge.net/">xplanet</a>:</p>
|
|
|
|
<div class="figure">
|
|
|
|
<img alt="map viz toolchain" src="mapVizFig.png" />
|
|
</div>
|
|
|
|
|
|
<p>The resulting map shows that we have given the
|
|
machine a fairly deep understanding of the itinerary:</p>
|
|
|
|
<div class="figure">
|
|
|
|
<p><img alt="MCI to YMX and back for Extreme 2002"
|
|
src="../../../../2003/04dc-mia/itin-mia.png" /></p>
|
|
|
|
<div class="noprint mechanics">more details are in the <a href="../../../../2003/04dc-mia/Makefile">Apr 2003 trip Makefile</a></div>
|
|
</div>
|
|
|
|
|
|
|
|
<h2 id="ical-evo">Integration with iCalendar Tools</h2>
|
|
|
|
<p>In fact, the published RDF/XML version of the itinerary is joined
|
|
not only with latitude/longitude data, but also timezone data, and
|
|
elaborated via <a href="itin2ical.n3">itin2ical.n3</a> rules into <a href="../../../../2002/12/cal/">an RDF
|
|
representation of the standard iCalendar syntax</a>.</p>
|
|
|
|
<pre>
|
|
|
|
{ :FLT
|
|
k:startingDate [ dt:date :YYMMDD];
|
|
k:endingDate [ dt:date :YYMMDD2];
|
|
t:departureTime :HH_MM;
|
|
k:fromLocation [ :timeZone [ cal:tzid :TZ] ];
|
|
t:arrivalTime :HH_MM2;
|
|
k:toLocation [ :timeZone [ cal:tzid :TZ2] ].
|
|
:DTSTART is str:concatenation of
|
|
(:YYMMDD "T" :HH_MM ":00"). #@@ extra punct in dates
|
|
:DTEND is str:concatenation of
|
|
(:YYMMDD2 "T" :HH_MM2 ":00").
|
|
|
|
( :FLT!log:rawUri "@uri-2-mid.w3.org") str:concatenation :UID. #@@hmm... kludge?
|
|
}
|
|
log:implies {
|
|
:FLT a cal:Vevent;
|
|
cal:uid :UID;
|
|
cal:dtstart [ cal:tzid :TZ; cal:dateTime :DTSTART ];
|
|
cal:dtend [ cal:tzid :TZ2; cal:dateTime :DTEND ].
|
|
}.
|
|
</pre>
|
|
|
|
|
|
<p> The final
|
|
syntactic export is more complex than the markers/arcs case, so we
|
|
wrote a python program, <tt><a
|
|
href="toIcal.py">toIcal.py</a></tt>, using the
|
|
cwm API, to generate iCalendar syntax.</p>
|
|
|
|
<div class="figure">
|
|
<img alt="calendar integration toolchain" src="calIntFig.png" />
|
|
</div>
|
|
|
|
<p>We can import the resulting iCalendar file into an of
|
|
a number of interoperable tools, such as Ximian Evolution:</p>
|
|
|
|
<div class="figure">
|
|
<img alt="evo screenshot" src="calIntShot.png" />
|
|
</div>
|
|
|
|
<div class="noprint mechanics">
|
|
FYI: python/evolution stuff: <a
|
|
href="http://heddley.com/edd/2002/03/05/evocal.py">evocal.py</a> from
|
|
Edd. relies on debian package: evolution-dev. also: <a
|
|
href="http://heddley.com/edd/2002/03/05/python-libversit-0.1.tar.gz">python-libversit</a>
|
|
</div>
|
|
|
|
<h2 id="pda-in">Conversion for PDA import</h2>
|
|
|
|
<p>
|
|
The <a href="itin2ical.n3">itin2ical.n3</a> rules
|
|
were actually developed after my Palm Pilot broke
|
|
and I switched to an iCalendar tool. The original
|
|
rules mapped to
|
|
an <a
|
|
href="../../../08/palm56/datebook">RDF vocabulary for the palmpilot
|
|
datebook</a>, developed for use with <a
|
|
href="http://dev.w3.org/cvsweb/2001/palmagent/">palmagent</a>, an
|
|
HTTP/RDF interface to my PDA:</p>
|
|
|
|
<div class="figure">
|
|
<img src="travelPdaRulesFig.png" alt="pda rules toolchain" />
|
|
|
|
<div class="noprint mechanics">see <a
|
|
href="../../../../2001/07dc-bos/Makefile">Makefile</a> for (some of
|
|
the?) details). The rules, <a
|
|
href="../../../../2001/07dc-bos/itin2datebook.n3">itin2datebook.n3</a>,
|
|
use some outdated vocabulary.
|
|
</div>
|
|
|
|
</div>
|
|
|
|
|
|
<div class="noprint"><em>[@@TODO: image of palm pilot showing flight
|
|
in the datebook]</em></div>
|
|
|
|
<h2 id="plain-text-sum">Plain Text Summaries</h2>
|
|
|
|
<p>All this rich integration is great when the tools are all working
|
|
and you have plenty of bandwidth and all that, but sometimes, plain
|
|
text is necessary and sufficient for the task at hand. For example, if
|
|
I get mail asking when I arrive at the meeting site, mailing back a
|
|
map is probably overkill, and I can't be 100% sure their desktop is
|
|
iCalendar-happy.</p>
|
|
|
|
<p>The <tt>cwm --strings</tt> output mode can
|
|
be really handy in these cases; we can use a few
|
|
<a href="../../../../2002/10dc-uk/itinBrief.n3">itinBrief.n3</a> rules
|
|
ala...</p>
|
|
|
|
<pre>python cwm.py itinBrief.n3 itin.nt --think --strings</pre>
|
|
|
|
<p>to get a summary ala...</p>
|
|
|
|
<pre>
|
|
2003-04-07 09:40 - 11:16 MCI->DFW Monday AMERICAN AIRLINES #3199
|
|
2003-04-07 12:03 - 15:49 DFW->MIA Monday AMERICAN AIRLINES #68
|
|
2003-04-10 19:12 - 21:32 MIA->ORD Thursday AMERICAN AIRLINES #1477
|
|
2003-04-10 22:33 - 23:54 ORD->MCI Thursday AMERICAN AIRLINES #1081
|
|
</pre>
|
|
|
|
|
|
<h2 id="ckcn">Checking Constraints</h2>
|
|
|
|
<p>Now that I have the proposed itinerary formalized, I can
|
|
automatically check it against various constraints before
|
|
I accept it and before I copy it to my PDA and to all the
|
|
other peers that need to know about it.</p>
|
|
|
|
<p>Rules like "itineraries that have me leaving
|
|
before 30 July are no good" are a bit tedious to
|
|
formalize, but my confidence in the results is
|
|
higher than my confidence in eyeballing it:</p>
|
|
|
|
<pre>{
|
|
?D a k:ItineraryDocument; k:containsInformationAbout-Focally ?TRIP.
|
|
?TRIP k:subEvents
|
|
[ k:startingDate [ dt:date ?D1 ];
|
|
k:fromLocation [ apt:iataCode "MCI" ];
|
|
t:departureTime ?T1;
|
|
].
|
|
?D1 str:lessThan "2001-07-30".
|
|
} => {
|
|
?TRIP <#leavesDaysTooSoon> ?D1;
|
|
<#at> ?T1.
|
|
}.</pre>
|
|
|
|
<p>These constraints can be checked with cwm ala:</p>
|
|
|
|
<pre>$ python cwm.py proposed-itinerary.nt --think=constraints.n3</pre>
|
|
|
|
<p>... and look for <tt><#leavesDaysTooSoon></tt> in the output.</p>
|
|
|
|
<div class="noprint mechanics">
|
|
<p>More details: <a href="../../../../2001/08swws67/">SWWS stuff</a>.</p>
|
|
</div>
|
|
|
|
|
|
|
|
<h2 id="fw">Conclusions and Future Work</h2>
|
|
|
|
<p>For the first few integration tasks, it might have been less work
|
|
to just manually copy the data, field by field. But the return on
|
|
investment increases with each trip I take, each system we integrate
|
|
with, and each collaborator who develops an interoperable tool.</p>
|
|
|
|
<p>The DAML project is a source of not just airport lat/long data, but
|
|
also tools such as <a
|
|
href="http://www.daml.org/2001/06/itinerary/">DAML itinerary</a> by
|
|
Mike Dean.</p>
|
|
|
|
<p>Ideas for future work include:</p>
|
|
|
|
<ul>
|
|
|
|
<li>animated views of my travel schedule or <a
|
|
href="../../../../People/Connolly/events/">past trips</a></li>
|
|
|
|
<li>peer-to-peer synchronization, ala "if it's in this schedule
|
|
scraped from an HTML page I maintain, but not in my evolution
|
|
calendar, print it out in .ics format for import".
|
|
|
|
<div class="noprint mechanics">
|
|
prototyped in <a href="/2002/08dc-ymx/Makefile">Aug 2002 trip Makefile</a>
|
|
</div>
|
|
|
|
</li>
|
|
|
|
<li>based on my addressbook and my travel schedule (and the travel
|
|
schedules of folks in my addressbook...) tell me when my travel
|
|
schedule brings me unusually near a friend/colleague</li>
|
|
|
|
<li>find conflicts between teleconferences and flights</li>
|
|
|
|
</ul>
|
|
|
|
<div class="noprint">
|
|
<hr />
|
|
<address>
|
|
<a href="../../../../People/Connolly/">Dan Connolly</a> and friends<br />
|
|
<small>started Jun 2002<br />
|
|
$Revision: 1.32 $ of $Date: 2004/09/21 15:13:59 $ by $Author: connolly $</small>
|
|
</address>
|
|
</div>
|
|
</body>
|
|
</html>
|