You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
1415 lines
50 KiB
1415 lines
50 KiB
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
|
|
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
|
|
<html xmlns="http://www.w3.org/1999/xhtml">
|
|
<head>
|
|
<meta name="generator" content="HTML Tidy, see www.w3.org" />
|
|
<title>Natural Language Semantics Markup Language: W3C Working
|
|
Draft</title>
|
|
<style type="text/css">
|
|
body {
|
|
font-family: sans-serif;
|
|
margin-left: 10%;
|
|
margin-right: 5%;
|
|
font-family: Tahoma, Verdana, "Myriad Web", Syntax, sans-serif;
|
|
}
|
|
h1,h2,h3,h4,h5,h6 {
|
|
color: rgb(0,92,160);
|
|
font-weight: normal;
|
|
margin-left: -4%;
|
|
}
|
|
img {
|
|
border-width: 0;
|
|
color: white;
|
|
}
|
|
h1 { clear: both; margin-top: 2em }
|
|
div.navbar { margin-bottom: 1em }
|
|
div.head { margin-bottom: 1em }
|
|
p.copyright {font-size: 70% }
|
|
span.term { color: rgb(0,0,192); font-style: italic }
|
|
blockquote {margin-left: 4% }
|
|
.toc {
|
|
list-style: none;
|
|
marker-offset: 1em;
|
|
}
|
|
.tocline { list-style: none }
|
|
ul.toc a { text-decoration: none }
|
|
.fig { text-align: center }
|
|
pre {
|
|
background-color: rgb(204,204,255);
|
|
border: none;
|
|
margin-left: 0;
|
|
margin-right: 0;
|
|
font-family: monospace;
|
|
padding: 0.5em;
|
|
white-space: pre;
|
|
width: 100%
|
|
}
|
|
code {
|
|
color: green;
|
|
font-family: monospace;
|
|
font-weight: bold
|
|
}
|
|
code.greenmono {
|
|
color: green;
|
|
font-family: monospace;
|
|
font-weight: bold
|
|
}
|
|
.good {
|
|
border-bottom: green 2px solid;
|
|
border-left: green 2px solid;
|
|
border-right: green 2px solid;
|
|
border-top: green 2px solid;
|
|
color: green;
|
|
font-weight: bold;
|
|
margin: 1em 5% 1em 0px
|
|
}
|
|
.bad {
|
|
border-bottom: red 2px solid;
|
|
border-left: red 2px solid;
|
|
border-right: red 2px solid;
|
|
border-top: red 2px solid;
|
|
color: rgb(192,101,101);
|
|
margin: 1em 5% 1em 0px
|
|
}
|
|
div.navbar { text-align: center }
|
|
div.contents {
|
|
background-color: rgb(204,204,255);
|
|
border-bottom: medium none;
|
|
border-left: medium none;
|
|
border-right: medium none;
|
|
border-top: medium none;
|
|
margin-right: 5%;
|
|
padding: 0.5em;
|
|
}
|
|
.tocline { list-style: none }
|
|
table.exceptions { background-color: rgb(255,255,153) }
|
|
.diff { color: rgb(128,0,0) }
|
|
.issues { color: green; font-style: italic }
|
|
.reqs { color: blue; font-style: italic }
|
|
</style>
|
|
|
|
<link rel="stylesheet" type="text/css"
|
|
href="http://www.w3.org/StyleSheets/TR/W3C-WD" />
|
|
</head>
|
|
<body>
|
|
<div class="head">
|
|
<div class="banner"><a href="http://www.w3.org/"><img
|
|
src="http://www.w3.org/Icons/WWW/w3c_home" alt="W3C"
|
|
border="0" /></a></div>
|
|
|
|
<h1 class="notoc">Natural Language Semantics Markup Language for
|
|
the Speech Interface Framework</h1>
|
|
|
|
<h2 class="notoc">W3C Working Draft <i>20 November 2000</i></h2>
|
|
|
|
<dl>
|
|
<dt>This version</dt>
|
|
|
|
<dd><a
|
|
href="http://www.w3.org/TR/2000/WD-nl-spec-20001120/">http://www.w3.org/TR/2000/WD-nl-spec-20001120</a></dd>
|
|
|
|
<dt>Latest version</dt>
|
|
|
|
<dd><a
|
|
href="http://www.w3.org/TR/nl-spec/">http://www.w3.org/TR/nl-spec</a></dd>
|
|
|
|
<dt><br />
|
|
Previous versions:</dt>
|
|
|
|
<dd><i>None - this is the first public version.</i></dd>
|
|
|
|
<dt><br />
|
|
Editor:</dt>
|
|
|
|
<dd>Deborah A. Dahl, Unisys</dd>
|
|
</dl>
|
|
|
|
<p class="copyright"><a
|
|
href="http://www.w3.org/Consortium/Legal/ipr-notice#Copyright">Copyright</a>
|
|
©2000 <a href="http://www.w3.org/"><abbr
|
|
title="World Wide Web Consortium">W3C</abbr></a><sup>®</sup>
|
|
(<a href="http://www.lcs.mit.edu/"><abbr
|
|
title="Massachusetts Institute of Technology">MIT</abbr></a>, <a
|
|
href="http://www.inria.fr/"><abbr lang="fr"
|
|
title="Institut National de Recherche en Informatique et Automatique">
|
|
INRIA</abbr></a>, <a href="http://www.keio.ac.jp/">Keio</a>), All
|
|
Rights Reserved. W3C <a
|
|
href="http://www.w3.org/Consortium/Legal/ipr-notice#Legal_Disclaimer">
|
|
liability</a>, <a
|
|
href="http://www.w3.org/Consortium/Legal/ipr-notice#W3C_Trademarks">
|
|
trademark</a>, <a
|
|
href="http://www.w3.org/Consortium/Legal/copyright-documents-19990405">
|
|
document use</a> and <a
|
|
href="http://www.w3.org/Consortium/Legal/copyright-software-19980720">
|
|
software licensing</a> rules apply.</p>
|
|
|
|
<hr />
|
|
</div>
|
|
<h2 class="notoc">Abstract</h2>
|
|
|
|
<p>The W3C Voice Browser working group aims to develop
|
|
specifications to enable access to the Web using spoken
|
|
interaction. This document is part of a set of specifications for
|
|
voice browsers, and provides details of an XML markup language
|
|
for describing the meanings of individual natural language
|
|
utterances. It is expected to be automatically generated by
|
|
semantic interpreters for use by components that act on the
|
|
user's utterances, such as dialog managers.</p>
|
|
|
|
<h2>Status of this Document</h2>
|
|
|
|
<p>This document is a W3C Working Draft for review by W3C members
|
|
and other interested parties. It is a draft document and may be
|
|
updated, replaced, or obsoleted by other documents at any time.
|
|
It is inappropriate to use W3C Working Drafts as reference
|
|
material or to cite them as other than "work in progress". A list
|
|
of current public W3C Working Drafts can be found at <a
|
|
href="http://www.w3.org/TR/">http://www.w3.org/TR</a>.</p>
|
|
|
|
<p>This specification describes markup for representing natural
|
|
language semantics, and forms part of the proposals for the W3C
|
|
Speech Interface Framework. This document has been produced as
|
|
part of the <a href="http://www.w3.org/Voice/">W3C Voice Browser
|
|
Activity</a>, following the procedures set out for the <a
|
|
href="http://www.w3.org/Consortium/Process/">W3C Process</a>. The
|
|
authors of this document are members of the <a
|
|
href="http://www.w3.org/Voice/Group/">Voice Browser Working
|
|
Group</a> (W3C Members only). This document is for public review,
|
|
and comments and discussion are welcomed on the public mailing
|
|
list <<a
|
|
href="mailto:www-voice@w3.org">www-voice@w3.org</a>>. To
|
|
subscribe, send an email to <<a
|
|
href="mailto:www-voice-request@w3.org">www-voice-request@w3.org</a>>
|
|
with the word <tt>subscribe</tt> in the subject line (include the
|
|
word unsubscribe if you want to unsubscribe). The <a
|
|
href="http://lists.w3.org/Archives/Public/www-voice/">archive</a>
|
|
for the list is accessible online.</p>
|
|
|
|
<!--
|
|
<h2>Process</h2>
|
|
|
|
<p>The specification development process will consist of the
|
|
following steps:</p>
|
|
|
|
<ol>
|
|
<li>Collect requirements on natural language markup, prioritize
|
|
those requirements and solicit public input.<br />
|
|
[Status: Requirements and priorities completed. Public feedback
|
|
in process.]</li>
|
|
|
|
<li>Analyze existing natural language markup languages against
|
|
requirements and determine the starting point for specification
|
|
development.<br />
|
|
[Status: The committee was unable to discover an existing
|
|
XML-based semantics markup. The XML format described in
|
|
this document was prepared by the Voice Browser Working
|
|
Group.]</li>
|
|
|
|
<li>Develop a specification based on the requirements for
|
|
delivery to the W3C Voice Browser Working Group. Iterate
|
|
specification through review and discussion by the working
|
|
group.<br />
|
|
[Status: initial draft complete.]</li>
|
|
|
|
<li>Agreement by committee on public release draft followed by
|
|
public review.<br />
|
|
[Status: agreed by committee vote.]</li>
|
|
</ol>
|
|
-->
|
|
<h2>General Issues</h2>
|
|
|
|
<p>The NL semantics representation uses the data models of the <a
|
|
href="http://www.w3.org/TR/2000/WD-xforms-datamodel-20000406">W3C
|
|
XForms</a> draft specification to represent application-specific
|
|
semantics. While XForms syntax may change in future revisions of
|
|
the specification, it is not expected to change in ways that
|
|
affect the NL Semantics Markup Language significantly. </p>
|
|
|
|
<h2 class="notoc">Table of Contents</h2>
|
|
|
|
<ul class="toc">
|
|
<li>1. <a href="#intro">Introduction</a>
|
|
|
|
<ul class="tocline">
|
|
<li>1.1 <a href="#uses">Uses</a></li>
|
|
|
|
<li>1.2 <a href="#markup">Markup Functions</a></li>
|
|
|
|
<li>1.3 <a href="#overview">Overview of Elements and
|
|
Relationships</a></li>
|
|
</ul>
|
|
</li>
|
|
|
|
<li>2. <a href="#elements">Elements and Attributes</a>
|
|
|
|
<ul class="tocline">
|
|
<li>2.1 <a href="#result">"result" Root Element</a></li>
|
|
|
|
<li>2.2 <a href="#interpret">"interpretation" Root
|
|
Element</a></li>
|
|
|
|
<li>2.3 <a href="#model">"model" Root Element</a></li>
|
|
|
|
<li>2.4 <a href="#instance">"instance" Root Element</a></li>
|
|
|
|
<li>2.5 <a href="#input">"input" Root Element</a></li>
|
|
|
|
<li>2.6 <a href="#nomatch">"nomatch" Root Element</a></li>
|
|
|
|
<li>2.7 <a href="#noinput">"noinput" Root Element</a></li>
|
|
|
|
<li>2.8 <a href="#meta">Interpreting Meta-Dialog and Meta-Task
|
|
Utterances</a></li>
|
|
|
|
<li>2.9 <a href="#anaphora">Anaphora and Deixis</a></li>
|
|
</ul>
|
|
</li>
|
|
|
|
<li>3. <a href="#ext">Extensibility</a></li>
|
|
|
|
<li>4. <a href="#compliance">Compliance</a></li>
|
|
|
|
<li>5. <a href="#dtd">Document Type Definition</a></li>
|
|
|
|
<li>6. <a href="#examples">Examples</a>
|
|
|
|
<ul class="tocline">
|
|
<li>6.1 <a href="#simple">Simple Ambiguity</a></li>
|
|
|
|
<li>6.2 <a href="#mixed">Mixed Initiative</a></li>
|
|
|
|
<li>6.3 <a href="#dtmf">DTMF</a></li>
|
|
</ul>
|
|
</li>
|
|
|
|
<li>7. <a href="#study">Future Study</a>
|
|
|
|
<ul class="tocline">
|
|
<li>7.1 <a href="#ambig">Representation of Ambiguities</a></li>
|
|
|
|
<li>7.2 <a href="#source">Representation of the Source of an
|
|
Ambiguity</a></li>
|
|
|
|
<li>7.3 <a href="#dialog">Representing Information Collected over
|
|
the Course of a Dialog</a></li>
|
|
|
|
<li>7.4 <a href="#compos">Composition of Multiple Data Models
|
|
within One Utterance</a></li>
|
|
|
|
<li>7.5 <a href="#multi">Representation of Multi-modal
|
|
Input</a></li>
|
|
|
|
<li>7.6 <a href="#xforms">Extensibility of XForms Data
|
|
Models</a></li>
|
|
|
|
<li>7.7 <a href="#recurse">Representation of Recursive
|
|
Structures</a></li>
|
|
|
|
<li>7.8 <a href="#unanalyzed">Representing Unanalyzed
|
|
Information: "unanalyzed" Element</a></li>
|
|
</ul>
|
|
</li>
|
|
|
|
<li>8.0 <a href="#acks">Acknowledgements</a></li>
|
|
</ul>
|
|
|
|
<h2><a id="intro" name="intro">1. Introduction</a></h2>
|
|
|
|
<p>This document presents an XML specification for a Natural
|
|
Language Semantics Markup Language, responding to the
|
|
requirements documented in  <a
|
|
href="http://www.w3.org/TR/voice-nlu-reqs/">W3C Natural Language
|
|
Processing Requirements for Voice Browsers.</a> This markup
|
|
language is intended for use by systems that provide semantic
|
|
interpretations for a variety of inputs, including but not
|
|
necessarily limited to, speech and natural language text input.
|
|
These systems include Voice Browsers, web browsers and accessible
|
|
applications.</p>
|
|
|
|
<p>It is expected that this markup will be used primarily as a
|
|
standard data interchange format between Voice Browser
|
|
components; in particular, it will normally be automatically
|
|
generated by a semantic interpretation component to represent the
|
|
semantics of users' utterances and will not be directly authored
|
|
by developers. </p>
|
|
|
|
<p>The language is focused on representing the semantic
|
|
information of a single utterance, as opposed to (possibly
|
|
identical) information that might have been collected over the
|
|
course of a dialog. See the Future Study section for a detailed
|
|
discussion of returning information from a dialog.</p>
|
|
|
|
<p>The language provides a set of elements that are focused on
|
|
accurately representing the semantics of a natural language
|
|
input. The following are the key design criteria.</p>
|
|
|
|
<ul>
|
|
<li>
|
|
<p><em>Fidelity:</em> The representation should be capable of
|
|
accurately reflecting the user's intended meaning in terms of the
|
|
application's goals. However, it should also provide a semantic
|
|
interpreter with the means to represent vagueness and ambiguity
|
|
when the user's meaning cannot be fully determined with the
|
|
information available to the semantic interpreter.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p><em>Interoperability:</em> The representation should support
|
|
use along with other W3C specifications including (but not
|
|
limited to) the Dialog Markup Language, <a
|
|
href="http://www.w3.org/TR/grammar-spec">Speech Grammar Markup
|
|
Language</a>, <a href="http://www.w3.org/AudioVideo/">SMIL</a>
|
|
and <a
|
|
href="http://www.w3.org/TR/2000/WD-xforms-datamodel-20000406">XForms.</a></p>
|
|
</li>
|
|
|
|
<li>
|
|
<p><em>Implementability:</em> The required elements of the
|
|
specification should be implementable with existing, generally
|
|
available technology.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p><i>Extensibility:</i> The specification should be extensible
|
|
to accommodate emerging and future capabilities of 
|
|
automatic speech recognizers (ASR's), natural language
|
|
interpreters, and voice browsers. For example, it should be
|
|
compatible with statistical ASR's, mixed initiative dialogs and
|
|
multi-modal components.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p><i>Architectural Neutrality:</i> The specification should
|
|
attempt wherever possible to avoid specifications which imply
|
|
commitments to particular Voice Browser architectures, for
|
|
example whether multi-modal integration takes place before or
|
|
after natural language interpretation.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p><i>Portability:</i> The specification should be able to
|
|
support consistent behavior across platforms.<br />
|
|
 </p>
|
|
</li>
|
|
</ul>
|
|
|
|
<p>This specification includes a set of draft  <a
|
|
href="#elements">elements and attributes</a> and includes a <a
|
|
href="#dtd">draft DTD</a>.</p>
|
|
|
|
<h3><a id="uses" name="uses">1.1 Uses</a></h3>
|
|
|
|
<p>The general purpose of the NL Semantics Markup is to represent
|
|
information automatically extracted from a user's utterances by a
|
|
semantic interpretation component, where <i>utterance</i> is to
|
|
be taken in the general sense of a meaningful user input in any
|
|
modality supported by the platform. Referring to the sample Voice
|
|
Browser architecture in <a
|
|
href="http://www.w3.org/Voice/Group/2000/voice-intro-20000911.html">
|
|
Introduction and Overview of the W3C Speech Interface
|
|
Framework</a>, a specific architecture can take advantage of this
|
|
representation by using it to convey content among various system
|
|
components that generate and make use of the markup.</p>
|
|
|
|
<p>Components that generate NL Semantics Markup:</p>
|
|
|
|
<ol>
|
|
<li>ASR</li>
|
|
|
|
<li>Natural language understanding</li>
|
|
|
|
<li>Other input media interpreters (e.g. DTMF, pointing,
|
|
keyboard)</li>
|
|
|
|
<li>Reusable dialog component</li>
|
|
|
|
<li>Multimedia integration component</li>
|
|
</ol>
|
|
|
|
<p>Components that use NL Semantics Markup:</p>
|
|
|
|
<ol>
|
|
<li>Dialog manager</li>
|
|
|
|
<li>Multimedia integration component</li>
|
|
</ol>
|
|
|
|
<p>A platform may also choose to use this general format as the
|
|
basis of a general semantic result that is carried along and
|
|
filled out during each stage of processing. In addition, future
|
|
systems may also potentially make use of this markup to convey
|
|
abstract semantic content to be rendered into natural language by
|
|
a natural language generation component.</p>
|
|
|
|
<h3><a id="markup" name="markup">1.2 Markup Functions</a></h3>
|
|
|
|
<p>A semantic interpretation system that supports the Natural
|
|
Language Semantics Markup Language is responsible for
|
|
interpreting natural language inputs and formatting the
|
|
interpretation as defined in this document. Semantic
|
|
interpretation is typically either included as part of the speech
|
|
recognition process, or involves one or more additional
|
|
components, such as natural language interpretation components
|
|
and dialog interpretation components. See the Voice Browser
|
|
Architecture described in <a
|
|
href="http://www.w3.org/TR/voice-intro/">http://www.w3.org/TR/voice-intro/</a>
|
|
for a sample architecture. </p>
|
|
|
|
<p>The elements of the markup fall into the following general
|
|
functional categories:</p>
|
|
|
|
<p><em>Input formats and ASR information:</em></p>
|
|
|
|
<p>The "<a href="#input">input</a>" element, representing the
|
|
input to the semantic interpreter.</p>
|
|
|
|
<p><i>Interpretation:</i></p>
|
|
|
|
<p>Elements and attributes representing the semantics of the
|
|
user's utterance, including the "<a href="#result">result</a>",
|
|
"<a href="#interpret">interpretation</a>", "<a
|
|
href="#model">model</a>", and "<a href="#instance">instance</a>"
|
|
elements. The "result" element contains the full result of
|
|
processing one utterance. It may contain multiple
|
|
"interpretation" elements if the interpretation of the utterance
|
|
results in multiple alternative meanings due to uncertainty in
|
|
speech recognition or natural language understanding. There are
|
|
at least two reasons for providing multiple interpretations:</p>
|
|
|
|
<ol>
|
|
<li>another component, such as a dialog manager, might have
|
|
additional information, for example, information from a database,
|
|
that would allow it to select a preferred interpretation from
|
|
among the possible interpretations returned from the semantic
|
|
interpreter.</li>
|
|
|
|
<li>a dialog manager that was unable to select between several
|
|
competing interpretations could use this information to go back
|
|
to the user and find out what was intended. For example, <i>Did
|
|
you say "Boston" or "Austin"?</i></li>
|
|
</ol>
|
|
|
|
<p>The "model" is an XForms data model for the semantic
|
|
information being returned in the interpretation. The "model" is
|
|
a structured representation of the interpretation and allows for
|
|
type checking. The "instance" is an instantiation of the data
|
|
model containing the semantic information for a specific
|
|
interpretation of a specific utterance. For example, the
|
|
information in a travel application might include three groups of
|
|
information: flights, car rental and hotels. The flight
|
|
information, in turn, could contain values for "to_city",
|
|
"from_city", "departure_date" and so on, which would be typed as
|
|
strings.</p>
|
|
|
|
<p><i>Side Information:</i></p>
|
|
|
|
<p>Elements and attributes representing additional information
|
|
about the interpretation, over and above the interpretation
|
|
itself. Side information includes</p>
|
|
|
|
<ol>
|
|
<li>
|
|
<p>Whether an interpretation was achieved (the "nomatch" element)
|
|
and the system's confidence in an interpretation (the
|
|
"confidence" attribute of "interpretation").</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Alternative interpretations ("<a
|
|
href="#interpret">interpretation</a>")</p>
|
|
</li>
|
|
</ol>
|
|
|
|
<p><i>Multi-modal integration:</i></p>
|
|
|
|
<p>When more than one modality is available for input, the
|
|
interpretation of the inputs needs to be coordinated. The "mode"
|
|
attribute of "<a href="#input">input</a>" supports this by
|
|
indicating whether the utterance was input by speech, dtmf,
|
|
pointing, etc. The timestamp attributes of "input" also provide
|
|
for temporal coordination by indicating when inputs occurred.</p>
|
|
|
|
<h3><a id="overview" name="overview">1.3 Overview of Elements and
|
|
their Relationships</a></h3>
|
|
|
|
<p>This figure shows a graphical view of the relationships among
|
|
the elements of the Natural Language Semantics markup.</p>
|
|
|
|
<p style="MARGIN-LEFT: -10%"><img alt="" border="0"
|
|
src="nl-spe8.gif" width="537" height="360" /></p>
|
|
|
|
<p>The elements shown in the graphic fall into two
|
|
categories:</p>
|
|
|
|
<ol>
|
|
<li>description of the input to be processed; shown in the left
|
|
box, "incoming data" in blue.</li>
|
|
|
|
<li>description of the meaning which was extracted from the
|
|
input; shown in the right box, "meaning", in yellow.</li>
|
|
</ol>
|
|
|
|
<p>Next to each element in the graphic are its attributes in
|
|
italics. In addition, some elements can contain multiple
|
|
instances of other elements. For example, a "result" can contain
|
|
multiple "interpretations", each of which is taken to be an
|
|
alternative. The element "xf:model" is an XForms data model as
|
|
specified in the <a
|
|
href="http://www.w3.org/TR/xforms-datamodel/">XForms data
|
|
model</a> draft, and therefore is not defined in this
|
|
document.</p>
|
|
|
|
<p>To illustrate the basic usage of these elements, as a simple
|
|
example, consider the utterance <i>ok.</i> (interpreted as "yes")
|
|
The example illustrates how that utterance and its interpretation
|
|
would be represented in the NL Semantics markup.</p>
|
|
|
|
<pre>
|
|
<result x-model="http://theYesNoModel"
|
|
xmlns:xf="http://www.w3.org/2000/xforms"
|
|
grammar="http://theYesNoGrammar>
|
|
<interpretation>
|
|
<xf:instance>
|
|
<myApp:yes_no>
|
|
<response>yes</response>
|
|
</myApp:yes_no>
|
|
</xf:instance>
|
|
<input>ok</input>
|
|
</interpretation>
|
|
</result>
|
|
</pre>
|
|
|
|
<p>This example includes only the minimum required information,
|
|
i.e., it does not include any of the optional information defined
|
|
in this document. There is an overall "result" element which
|
|
includes one interpretation. The data model is defined externally
|
|
by referring to the URI for "theYesNo Model". This external model
|
|
defines a "response" element. The "myApp" namespace refers to the
|
|
application-specific elements that are defined by the XForms data
|
|
model.</p>
|
|
|
|
<h3><a id="elements" name="elements">2. Elements and
|
|
Attributes</a></h3>
|
|
|
|
<h3><a id="result" name="result">2.1 "result" Root
|
|
Element</a></h3>
|
|
|
|
<h3>Attributes: grammar, x-model, xmlns</h3>
|
|
|
|
<p>The root element of the markup is "result". The "result"
|
|
element includes one or more "<a
|
|
href="#interpret">interpretation</a>" elements. Multiple
|
|
interpretations result from ambiguities in the input or in the
|
|
semantic interpretation. If the "grammar", "x-model", and "xmlns"
|
|
attributes don't apply to all of the interpretations in the
|
|
result they can be overridden for individual interpretations at
|
|
the "interpretation" level.</p>
|
|
|
|
<p>Attributes:</p>
|
|
|
|
<ol>
|
|
<li><b>grammar:</b> The grammar or recognition rule matched by
|
|
this result. (The format of the grammar attribute will match the
|
|
rule reference semantics defined in the <a
|
|
href="http://www.w3.org/TR/grammar-spec">grammar
|
|
specification.</a>) The grammar can be overridden by a grammar
|
|
attribute in the "interpretation" element if the input was
|
|
ambiguous as to which grammar it matched.</li>
|
|
|
|
<li><b>x-model:</b> The URI which defines the XForms data model
|
|
used for this result. The data model used by the interpretation
|
|
can either be specified here or by an in-line data model using
|
|
the " <a
|
|
href="http://www.w3.org/Voice/Group/2000/nl-spec-20000809.html#2.4
|
|
">model</a>" element. (optional) The x-model can be overridden by
|
|
an x-model attribute in the "interpretation" element if the input
|
|
was ambiguous as to which x-model it matched.</li>
|
|
|
|
<li><b>xmlns:</b> An XML namespace declaration is required to
|
|
define the namespace used by XForms elements and attributes. The
|
|
DTD defaults the "xmlns" namespace declaration to a standard
|
|
location, since it will rarely change.</li>
|
|
</ol>
|
|
|
|
<pre>
|
|
<result grammar="http://grammar" x-model="http://dataModel"
|
|
xmlns:xf="http://www.w3.org/2000/xforms"
|
|
<interpretation/>
|
|
</result>
|
|
</pre>
|
|
|
|
<h3><a id="interpret" name="interpret">2.2 "interpretation"
|
|
Element</a></h3>
|
|
|
|
<h3>Attributes: confidence, grammar, x-model, xmlns</h3>
|
|
|
|
<p>An "interpretation" element contains a single semantic
|
|
interpretation.</p>
|
|
|
|
<p>Attributes:</p>
|
|
|
|
<ol>
|
|
<li><b>confidence:</b> an integer from 0-100 indicating the
|
|
semantic analyzer's confidence in this interpretation. At this
|
|
point there is no formal, platform-independent, definition of
|
|
confidence. (optional)</li>
|
|
|
|
<li><b>grammar:</b> The grammar or recognition rule matched by
|
|
this interpretation (if needed to override the grammar
|
|
specification at the "interpretation" level.) The dialog markup
|
|
interpreter needs to know the grammar rule that is matched by the
|
|
utterance because multiple rules may be simultaneously active.
|
|
The value that is filled in is the grammar URI used by the dialog
|
|
markup interpreter to specify the grammar. The format of the
|
|
grammar attribute will match the rule reference semantics defined
|
|
in the <a href="http://www.w3.org/TR/grammar-spec">grammar
|
|
specification.</a> Specifically, the rule reference will be in
|
|
the <a href="http://www.w3.org/TR/grammar-spec#S2.2">external XML
|
|
form for grammar rule references.</a> This attribute will only be
|
|
needed under "interpretation" if it is necessary to override a
|
|
grammar that was defined at the "result" level.) (optional)</li>
|
|
|
|
<li><b>x-model:</b> The location of the XForms data model used
|
|
for this interpretation. The XForms data used by the
|
|
interpretation may either be specified here or by an in-line data
|
|
model using the "<a href="#model">model</a>" element. (As in the
|
|
case of "grammar", this attribute only needs to be defined under
|
|
"interpretation" if it is necessary to override the x-model
|
|
specification at the "interpretation" level.) (optional)</li>
|
|
</ol>
|
|
|
|
<p>Interpretations must be sorted best-first by some measure of
|
|
"goodness". The goodness measure is "confidence" if present,
|
|
otherwise, it is some platform-specific indication of
|
|
quality.</p>
|
|
|
|
<p>The x-model and grammar are expected to be specified most
|
|
frequently at the "result" level, because most often one data
|
|
model will be sufficient for the entire result. However, it can
|
|
be overridden at the "interpretation" level because it is
|
|
possible that different interpretations may have different data
|
|
models - perhaps because they match different grammar rules.</p>
|
|
|
|
<p>The "interpretation" element includes an "<a
|
|
href="#input">input</a>" element which contains the input being
|
|
analyzed, optionally a "<a href="#model">model</a>" element
|
|
defining the XForms data model and an "<a
|
|
href="#instance">instance</a>" element containing the
|
|
instantiation of the data model for this utterance. The data
|
|
model would be empty if the interpreter was not able to produce
|
|
any interpretation.</p>
|
|
|
|
<pre>
|
|
<interpretation confidence="75" grammar="http://grammar"
|
|
x-model="http://dataModel"
|
|
xmlns:xf="http://www.w3.org/2000/xforms">
|
|
...
|
|
</interpretation>
|
|
</pre>
|
|
|
|
<h3><a id="model" name="model">2.3 "model" Element</a></h3>
|
|
|
|
<p>The "model" element contains an XForms data model for the data
|
|
and is part of the X-Forms name space. The XForms data model
|
|
provides for a structured data model consisting of groups, which
|
|
may contain other groups or simple types. Simple types can be one
|
|
of: string, boolean, number, monetary values, date, time of day,
|
|
duration, URI, binary. For further information on XForms data
|
|
models see the <a
|
|
href="http://www.w3.org/TR/2000/WD-xforms-datamodel-20000406">X-Forms
|
|
data model specification.</a> Note that XForms fields default to
|
|
optional.</p>
|
|
|
|
<p>If no data model is supplied by either the "model" element or
|
|
the "x-model" attribute then it is assumed that the data model
|
|
will be provided by the dialog (or whatever other process
|
|
receives the NL semantic mark-up).</p>
|
|
|
|
<p>It is an error to specify both an x-model attribute and a
|
|
"model" element.</p>
|
|
|
|
<p>Example: An XForms data model for name and address.</p>
|
|
|
|
<pre>
|
|
<model>
|
|
<xf:group name="nameAddress">
|
|
<string name="name"/>
|
|
<string name="street"/>
|
|
<string name="city"/>
|
|
<string name="state"/>
|
|
<string name="zip">
|
|
<mask>ddddd</mask>
|
|
</string>
|
|
<xf:/group>
|
|
</model>
|
|
</pre>
|
|
|
|
<h3><a id="instance" name="instance">2.4 "instance"
|
|
Element</a></h3>
|
|
|
|
<p>The "instance" element contains an instance of the XForms data
|
|
model for the data and is part of the XForms name space.</p>
|
|
|
|
<p>Attributes:</p>
|
|
|
|
<ol>
|
|
<li><b>confidence:</b> All elements of the data instance may have
|
|
an optional confidence attribute, defined in the NL semantics
|
|
namespace. The confidence attribute contains an integer value in
|
|
the range from 0-100 reflecting the system's confidence in the
|
|
analysis of that slot. The meaning of confidence scores has not
|
|
been defined in a platform-independent way. (optional)</li>
|
|
</ol>
|
|
|
|
<p>The use of a confidence attribute from the NL semantics
|
|
namespace does not appear to present any document validation
|
|
problems. However if future XForms specifications support an
|
|
equivalent attribute then that would be preferable to the current
|
|
proposal.</p>
|
|
|
|
<pre>
|
|
|
|
<xf:instance name="nameAddress">
|
|
<nameAddress>
|
|
<street confidence=75>123 Maple Street</street>
|
|
<city>Mill Valley</city>
|
|
<state>CA</state>
|
|
<zip>90952</zip>
|
|
</nameAddress>
|
|
</xf:instance>
|
|
<input>
|
|
My address is 123 Maple Street,
|
|
Mill Valley, California, 90952
|
|
</input>
|
|
</pre>
|
|
|
|
<h3><a id="input" name="input">2.5 "input" Element</a></h3>
|
|
|
|
<p>The "input" element is the text representation of a user's
|
|
input. It includes an optional "confidence" attribute which
|
|
indicates the recognizer's confidence in the recognition result
|
|
(not the confidence in the interpretation, which is indicated by
|
|
the "confidence" attribute of "interpretation"). Optional
|
|
"timestamp-start" and "timestamp-end" attributes indicate the
|
|
start and end times of a spoken utterance, in ISO 8601 format (<a
|
|
href="http://www.iso.ch/markete/8601.pdf">http://www.iso.ch/markete/8601.pdf</a>
|
|
).</p>
|
|
|
|
<p>Attributes:</p>
|
|
|
|
<ol>
|
|
<li><b>timestamp-start:</b> The time at which the input began.
|
|
(optional)</li>
|
|
|
|
<li><b>timestamp-end:</b> the time at which the input ended.
|
|
(optional)</li>
|
|
|
|
<li><b>mode:</b> The modality of the input, for example, speech,
|
|
dtmf, etc. (optional)</li>
|
|
|
|
<li><b>confidence:</b> the confidence of the recognizer in the
|
|
correctness of the input (optional)</li>
|
|
</ol>
|
|
|
|
<p>Note that it doesn't make sense for temporally overlapping
|
|
inputs to have the same mode; however, this constraint is not
|
|
expected to be enforced by platforms.</p>
|
|
|
|
<p>When there is no time zone designator, ISO 8601 time
|
|
representations default to local time.</p>
|
|
|
|
<p>There are three possible formats for the "input" element.</p>
|
|
|
|
<p>a) The "input" element can contain simple text:</p>
|
|
|
|
<pre>
|
|
<input confidence = "100" mode="speech">onions</input>
|
|
</pre>
|
|
|
|
<p>b) The "input" element can also contain additional "input"
|
|
elements. Having additional input elements allows the
|
|
representation to support future multi-modal inputs as well as
|
|
finer-grained speech information, such as timestamps for
|
|
individual words and word-level confidences.</p>
|
|
|
|
<pre>
|
|
<input>
|
|
<input mode="speech" confidence="50"
|
|
timestamp-start="2000-04-03T0:00:00"
|
|
timestamp-end="2000-04-03T0:00:00.2">fried</input>
|
|
<input mode="speech" confidence="100"
|
|
timestamp-start="2000-04-03T0:00:00.25"
|
|
timestamp-end="2000-04-03T0:00:00.6">onions</input>
|
|
</input>
|
|
</pre>
|
|
|
|
<p>c) Finally, the "input" element can contain "nomatch" and
|
|
"noinput" elements, which describe situations in which the speech
|
|
recognizer (or other media interpreter) received input that it
|
|
was unable to process, or did not receive any input at all,
|
|
respectively.</p>
|
|
|
|
<h3><a id="nomatch" name="nomatch">2.6 "nomatch" Element</a></h3>
|
|
|
|
<p>The "nomatch" element under "input" is used to indicate that
|
|
the natural language interpreter was unable to successfully match
|
|
any input. It can optionally contain the text of the best of the
|
|
(rejected) matches.</p>
|
|
|
|
<pre>
|
|
<interpretation>
|
|
<instance/>
|
|
<input>
|
|
<nomatch/>
|
|
</input>
|
|
</interpretation>
|
|
</pre>
|
|
|
|
<h3><a id="noinput" name="noinput">2.7 "noinput" Element</a></h3>
|
|
|
|
<p>The "noinput" element under "input" is used to indicate that
|
|
there was no input-- a timeout occurred in the speech recognizer
|
|
due to silence.</p>
|
|
|
|
<pre>
|
|
<interpretation>
|
|
<instance/>
|
|
<input>
|
|
<noinput/>
|
|
</input>
|
|
</interpretation>
|
|
</pre>
|
|
|
|
<p>If there are multiple levels of inputs, it appears that the
|
|
most natural place for the "nomatch" and "noinput" elements is
|
|
under the highest level of "input" for "no input", and under the
|
|
appropriate level of "input" for "nomatch". So "noinput" means
|
|
"no input at all" and "nomatch" means "no match in speech
|
|
modality" or "no match in dtmf modality". For example, to
|
|
represent garbled speech combined with dtmf "1 2 3 4", we would
|
|
have the following:</p>
|
|
|
|
<pre>
|
|
<input>
|
|
<input mode="speech"><nomatch/></input>
|
|
<input mode="dtmf">1 2 3 4</input>
|
|
</input>
|
|
</pre>
|
|
|
|
<h3><a id="meta" name="meta">2.8 Interpreting Meta-Dialog and
|
|
Meta-Task Utterances</a></h3>
|
|
|
|
<p>The natural language requirements state that the semantics
|
|
specification must be capable of representing a number of types
|
|
of meta-dialog and meta-task utterances. This specification is
|
|
flexible enough so that meta utterances can be represented on an
|
|
application-specific basis without defining specific formats in
|
|
this specification.</p>
|
|
|
|
<p>Here are two examples of how meta-task and meta-dialog
|
|
utterances might be represented.</p>
|
|
|
|
<blockquote>System: <i>What toppings do you want on your
|
|
pizza?</i><br />
|
|
User: <i>What toppings do you have?</i></blockquote>
|
|
|
|
<pre>
|
|
<interpretation grammar="http://toppings"
|
|
xmlns:xf="http://www.w3.org/2000/xforms">
|
|
<input mode="speech">
|
|
what toppings do you have?
|
|
</input>
|
|
<xf:x-model>
|
|
<xf:group xf:name="question"/>
|
|
<xf:string xf:name="questioned_item"/>
|
|
<xf:string xf:name="questioned_property"/>
|
|
</xf:group>
|
|
</xf:x-model>
|
|
<xf:instance>
|
|
<xf:question>
|
|
<xf:questioned_item>toppings</xf:questioned_item>
|
|
<xf:questioned_property>
|
|
availability
|
|
</xf:questioned_property>
|
|
</xf:question>
|
|
</xf:instance>
|
|
</interpretation>
|
|
</pre>
|
|
|
|
<blockquote>User: <i>slow down.</i></blockquote>
|
|
|
|
<pre>
|
|
<interpretation grammar="http://generalCommandsGrammar"
|
|
xmlns:xf="http://www.w3.org/2000/xforms">
|
|
<xf:model>
|
|
<group name="command"/>
|
|
<string name="action"/>
|
|
<string name="doer"/>
|
|
</group>
|
|
</xf:model>
|
|
<xf:instance>
|
|
<myApp:command>
|
|
<action>reduce speech rate</action>
|
|
<doer>system</doer>
|
|
</myApp:command>
|
|
</xf:instance>
|
|
<input mode="speech">slow down</input>
|
|
</interpretation>
|
|
</pre>
|
|
|
|
<br class="reqs" />
|
|
<br />
|
|
<h3><a id="anaphora" name="anaphora">2.9 Anaphora and
|
|
Deixis</a></h3>
|
|
|
|
<p>This specification can be used on an application-specific
|
|
basis to represent utterances that contain unresolved anaphoric
|
|
and deictic references. Anaphoric references, which include
|
|
pronouns and definite noun phrases that refer to something that
|
|
was mentioned in the preceding linguistic context, and deictic
|
|
references, which refer to something that is present in the
|
|
non-linguistic context, present similar problems in that there
|
|
may not be sufficient unambiguous linguistic context to determine
|
|
what their exact place in the data instance should be. In order
|
|
to represent unresolved anaphora and deixis using this
|
|
specification, the developer must define a more surface-oriented
|
|
representation that leaves the interpretation of the reference
|
|
open. (This assumes that a later component is responsible for
|
|
actually resolving the reference)</p>
|
|
|
|
<p>Example: (ignoring the issue of representing the input from
|
|
the pointing gesture.)</p>
|
|
|
|
<blockquote>System: <i>What do you want to drink?</i><br />
|
|
Use: I <i>want this (clicks on picture of large root
|
|
beer.)</i></blockquote>
|
|
|
|
<pre>
|
|
<result>
|
|
<interpretation>
|
|
<xf:model>
|
|
<group name="genericAction">
|
|
<string name="doer">
|
|
<string name="action">
|
|
<string name="object">
|
|
</group>
|
|
</xf:model>
|
|
<xf:instance>
|
|
<doer>I</doer>
|
|
<action>want</action>
|
|
<object>this</object>
|
|
</xf:instance>
|
|
<input>
|
|
<input mode="speech">I want this</input>
|
|
</input>
|
|
<interpretation>
|
|
</result>
|
|
</pre>
|
|
|
|
<h2><a id="ext" name="ext">3. Extensibility</a></h2>
|
|
|
|
<p>One of the natural language requirements states that the
|
|
specification must be extensible. The specification supports this
|
|
requirement because of its flexibility, as discussed in the
|
|
discussions of meta utterances and anaphora. The markup can
|
|
easily be used in sophisticated systems to convey
|
|
application-specific information that more basic systems would
|
|
not make use of, for example defining speech acts, if this is
|
|
meaningful to the dialog manager. Defining standard
|
|
representations for items such as dates, times, etc. could also
|
|
be done.</p>
|
|
|
|
<h2><a id="compliance" name="compliance">4. Compliance</a></h2>
|
|
|
|
<p>Compliance issues are deferred until a later revision of the
|
|
specification.</p>
|
|
|
|
<h2><a id="dtd" name="dtd">5. Document Type Definition</a></h2>
|
|
|
|
<p>(TBD)</p>
|
|
|
|
<p>Leading and trailing spaces in utterances are not significant.
|
|
This will be defined in the DTD by specifying
|
|
"xml:space=default".</p>
|
|
|
|
<h2><a id="examples" name="examples">6. Examples</a></h2>
|
|
|
|
<h3><a id="simple" name="simple">6.1 Simple Ambiguity:</a></h3>
|
|
|
|
<blockquote>System: <i>To which city will you be
|
|
traveling?</i><br />
|
|
User: <i>I want to go to Pittsburgh.</i></blockquote>
|
|
|
|
<pre>
|
|
<result xmlns:xf="http://www.w3.org/2000/xforms"
|
|
grammar="http://flight">
|
|
<interpretation confidence="60">
|
|
<input mode="speech">
|
|
I want to go to Pittsburgh
|
|
</input>
|
|
<xf:model>
|
|
<group name="airline">
|
|
<string name="to_city"/>
|
|
</group>
|
|
</xf:model>
|
|
<xf:instance>
|
|
<myApp:airline>
|
|
<to_city>Pittsburgh</to_city>
|
|
</myApp:airline>
|
|
</xf:instance>
|
|
</interpretation>
|
|
<interpretation confidence="40"
|
|
<input>I want to go to Stockholm</input>
|
|
<xf:model>
|
|
<group name="airline">
|
|
<string name="to_city"/>
|
|
</group>
|
|
</xf:model>
|
|
<xf:instance>
|
|
<myApp:airline>
|
|
<to_city>Stockholm</to_city>
|
|
</myApp:airline>
|
|
</xf:instance>
|
|
</interpretation>
|
|
</result>
|
|
</pre>
|
|
|
|
<br class="issues" />
|
|
<br />
|
|
<h3><a id="mixed" name="mixed">6.2 Mixed Initiative:</a></h3>
|
|
|
|
<blockquote>System: <i>What would you like?</i><br />
|
|
User: <i>I would like 2 pizzas, one with pepperoni and cheese,
|
|
one with sausage and a bottle of coke, to go.</i></blockquote>
|
|
|
|
<p>This representation includes an order object which in turn
|
|
contains objects named "food_item", "drink_item" and
|
|
"delivery_method". This representation assumes there are no
|
|
ambiguities in the speech or natural language processing. Note
|
|
that this representation also assumes some level of
|
|
intrasentential anaphora resolution, i.e., to resolve the two
|
|
"one's" as "pizza".</p>
|
|
|
|
<pre>
|
|
<result xmlns:xf="http://www.w3.org/2000/xforms"
|
|
grammar="http://foodorder">
|
|
<interpretation confidence="100" >
|
|
<xf:model>
|
|
<group name="order">
|
|
<group name="food_item" maxOccurs="*">
|
|
<group name="pizza" >
|
|
<string name="ingredients" maxOccurs="*"/>
|
|
</group>
|
|
<group name="burger">
|
|
<string name="ingredients" maxOccurs="*/">
|
|
</group>
|
|
</group>
|
|
<group name="drink_item" maxOccurs="*">
|
|
<string name="size">
|
|
<string name="type">
|
|
</group>
|
|
<string name="delivery_method"/>
|
|
</group>
|
|
</xf:model>
|
|
<xf:instance>
|
|
<myApp:order>
|
|
<food_item confidence="100">
|
|
<pizza>
|
|
<xf:ingredients confidence="100">
|
|
pepperoni
|
|
</xf:ingredients>
|
|
<xf:ingredients confidence="100">
|
|
cheese
|
|
</xf:ingredients>
|
|
</pizza>
|
|
<pizza>
|
|
<ingredients>sausage</ingredients>
|
|
</pizza>
|
|
</food_item>
|
|
<drink_item confidence="100">
|
|
<size>2-liter</size>
|
|
</drink_item>
|
|
<delivery_method>to go</delivery_method>
|
|
</myApp:order>
|
|
</xf:instance>
|
|
<input mode="speech">I would like 2 pizzas,
|
|
one with pepperoni and cheese, one with sausage
|
|
and a bottle of coke, to go.
|
|
</input>
|
|
</interpretation>
|
|
</result>
|
|
</pre>
|
|
|
|
<h3><a id="dtmf" name="dtmf">6.3 DTMF:</a></h3>
|
|
|
|
<p>A combination of dtmf input and speech would be represented
|
|
using nested input elements. For example:</p>
|
|
|
|
<blockquote>User: <i>My pin is</i> (dtmf 1 2 3 4)</blockquote>
|
|
|
|
<pre>
|
|
<input>
|
|
<input mode="speech" confidence ="100"
|
|
timestamp-start="2000-04-03T0:00:00"
|
|
timestamp-end="2000-04-03T0:00:01.5">My pin is
|
|
</input>
|
|
<input mode="dtmf" confidence ="100"
|
|
timestamp-start="2000-04-03T0:00:01.5"
|
|
timestamp-end="2000-04-03T0:00:02.0">1 2 3 4
|
|
</input>
|
|
</input>
|
|
</pre>
|
|
|
|
<h2><a id="study" name="study">7. Future Study</a></h2>
|
|
|
|
<h3><a id="ambig" name="ambig">7.1 Representation of
|
|
ambiguities</a></h3>
|
|
|
|
<p>In this mark-up ambiguities are only represented at the
|
|
top-level, using separate interpretation elements. Representation
|
|
of "local" ambiguities, for example, at the level of an ambiguity
|
|
between two ingredients (<i>peppers</i> vs. <i>pepperoni</i>)
|
|
would be useful, but represents validation problems because of
|
|
multiple namespaces unless the XForms specification includes it.
|
|
The more compact representation using local ambiguities has not
|
|
been defined for three reasons:</p>
|
|
|
|
<ol>
|
|
<li>It is not possible to combine ambiguities with the XForms
|
|
notation and retain the ability to validate NL semantics
|
|
documents using XML schema or DTDs.</li>
|
|
|
|
<li>When multiple filler elements are allowed, as for example
|
|
with pizza toppings, representation of ambiguity can become very
|
|
complex and confusing.</li>
|
|
|
|
<li>Although fully spelling out ambiguities at the top level
|
|
results in a more verbose representation, current practical
|
|
systems seldom make use of more than 2 alternative
|
|
interpretations, so the increase in verbosity from spelling out
|
|
redundant information should not be too significant in
|
|
practice.</li>
|
|
</ol>
|
|
|
|
<p>Local ambiguities may be supported in the future if
|
|
representation of ambiguity becomes part of the XForms
|
|
standard.</p>
|
|
|
|
<h3><a id="source" name="source">7.2 Representing the source of
|
|
an ambiguity</a></h3>
|
|
|
|
<p>If there is more than one interpretation, it may be useful to
|
|
add an attribute specifying the source of the ambiguity, for
|
|
example, "natural_language", "speech", "ocr", or "handwriting"
|
|
Speech ambiguities originate in uncertainties about the speech
|
|
recognition result, for example, <i>Austin</i> vs. <i>Boston</i>.
|
|
"handwriting" and "ocr" are analogous to speech. Natural language
|
|
ambiguities result from syntactic, semantic, or pragmatic
|
|
ambiguities in a single recognizer result. For example in <i>I
|
|
want fried onions and peppers,</i> there are two interpretations,
|
|
one in which the peppers are to be fried and one in which they
|
|
are not to be fried. This attribute would not be meaningful if
|
|
there is only one interpretation. This information could be used,
|
|
for example, by a dialog manager to construct a more helpful
|
|
response (e.g. <i>I didn't hear that</i> vs. <i>I didn't
|
|
understand that</i>) or by a scoring algorithm that treats
|
|
different ambiguity sources differently.</p>
|
|
|
|
<h3><a id="dialog" name="dialog">7.3 Representing information
|
|
collected over the course of a dialog</a></h3>
|
|
|
|
<p>In many cases identical information can be conveyed in one
|
|
utterance or over the course of several dialog turns. This
|
|
situation can occur both in the case of a subdialog or in the
|
|
case of a reusable component. For example, if the system's goal
|
|
in the subdialog or the reusable component is to collect travel
|
|
information from a user, the ultimate information is the same
|
|
whether the user says <i>I want to go from Pittsburgh to Seattle
|
|
on January 1, 2001</i>, in a single utterance or whether the same
|
|
information is elicited from the user during several dialog
|
|
turns, as in</p>
|
|
|
|
<blockquote>
|
|
<p>System: <i>Where will you be departing from?</i><br />
|
|
User: <i>Pittsburgh.</i><br />
|
|
System: <i>Where will you be traveling to?</i><br />
|
|
User: <i>Seattle.</i></p>
|
|
</blockquote>
|
|
|
|
<p>etc.</p>
|
|
|
|
<p>It should be possible to use a substantially similar semantic
|
|
representation in both of these situations. The main issue is
|
|
that in the case of information collected over the course of a
|
|
dialog it becomes very difficult to tie that information back to
|
|
the original inputs. Elements such as "input" and attributes such
|
|
as "timestamp-start", "timestamp-end", "grammar", and "mode"
|
|
which relate the semantic interpretation directly to the input
|
|
become less meaningful when the information is collected in a
|
|
dialog. Moreover, they also become less useful to the main dialog
|
|
component, since presumably it's the function of the subdialog or
|
|
reusable component to make use of this low-level information
|
|
internally to guide its own dialog and to shield the main dialog
|
|
from these details. One strategy under consideration is simply to
|
|
omit these aspects of the markup for dialog-based semantic
|
|
information. This issue may also be dealt with in the reusable
|
|
components group, since the issue of return information is key to
|
|
its charter.</p>
|
|
|
|
<h3><a id="compos" name="compos">7.4 Composition of multiple data
|
|
models within one utterance</a></h3>
|
|
|
|
<p>Some utterances could potentially make use of more than one
|
|
data model in their semantic representations. For example it is
|
|
possible in a mixed initiative situation for the user to combine
|
|
multiple functions in one utterance, as in:</p>
|
|
|
|
<blockquote>
|
|
<p>System: <i>I heard you say you want to go to Pittsburgh, is
|
|
that correct?</i></p>
|
|
|
|
<p>User: <i>Yes, and I'll be leaving around 8:00 a.m.</i></p>
|
|
</blockquote>
|
|
|
|
<p>It would be natural for there to be a generic data model for
|
|
the "yes" and also an application-specific model for the flight
|
|
arrangements. One possibility would be for the interpreter to
|
|
create one joint data model on the fly from these models. Or, the
|
|
developer could define one data model that includes both elements
|
|
for "yes_no" and for the application-specific information. If
|
|
there are two data models, and consequently two instances, then
|
|
it is necessary to consider the problem of associating the
|
|
instances with the correct data models.</p>
|
|
|
|
<h3><a id="multi" name="multi">7.5 Representation of Multi-modal
|
|
input</a></h3>
|
|
|
|
<p>This is deferred until the specification for multi-modal
|
|
inputs is better defined, except for dtmf (for dtmf, see the <a
|
|
href="#dtmf">example</a> above)</p>
|
|
|
|
<h3><a id="xforms" name="xforms">7.6 Extensibility of XForms data
|
|
models</a></h3>
|
|
|
|
<p>It would be highly desirable if components in the dialog
|
|
system could extend the data model so that grammars or reusable
|
|
components could return information that is additional to a base
|
|
data model for, say, a time or date component or grammar. With
|
|
the current XForms specification it would be necessary to provide
|
|
a complete new data model in these cases. It is possible that the
|
|
XForms working group may extend the XForms specification to
|
|
include extensibility of the data model.</p>
|
|
|
|
<p>Similarly, the current XForms data model definition does not
|
|
provide for the re-use of complex type definitions, i.e. groups,
|
|
in multiple locations. Thus, to represent travel information
|
|
consisting of both an outbound flight and an inbound flight, it
|
|
is not possible to define a single complex type "flight_details"
|
|
that is used for both outbound and inbound flight information.
|
|
(See the section on "Shared Datatype Libraries" in the <a
|
|
href="http://www.w3.org/TR/2000/WD-xforms-datamodel-20000406/#shared">XForms Data
|
|
Model</a> document for additional discussion.)</p>
|
|
|
|
<h3><a id="recurse" name="recurse">7.7 Representation of
|
|
recursive structures</a></h3>
|
|
|
|
<p>Some systems may find it useful to represent generic syntactic
|
|
parse trees in natural language output. Generic parse trees
|
|
cannot be represented by current XForms data models because they
|
|
do not support any recursion. However, it is not clear how
|
|
frequently this capability would be required.</p>
|
|
|
|
<h3><a id="unanalyzed" name="unanalyzed">7.8 Representing
|
|
unanalyzed information: "unanalyzed" Element</a></h3>
|
|
|
|
<p>An "unanalyzed" element could be used to represent a part of
|
|
the input that was left unanalyzed in the current interpretation.
|
|
This element could be used by a dialog manager to decide if
|
|
enough of the input had been analyzed for the dialog to proceed,
|
|
or if the dialog manager should ask for a clarification from the
|
|
user. The dialog manager could also use the unanalyzed material
|
|
to help it decide which of several alternative interpretations is
|
|
correct. Each "unanalyzed" element would contain "input" elements
|
|
which would contain the portions of the full utterance that was
|
|
unanalyzed.</p>
|
|
|
|
<p>"unanalyzed" has not been included in the current version of
|
|
the spec for several reasons:</p>
|
|
|
|
<ol>
|
|
<li>
|
|
<p>It's not clear that it has a platform-independent
|
|
interpretation.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>It's not clear that current applications would make use of
|
|
it.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Although there is a requirement for representing "unanalyzed",
|
|
this can be accommodated in the current specification if the
|
|
developer incorporates "unanalyzed" into the data model in an
|
|
application-specific manner. In addition, natural language
|
|
interpreters can take unanalyzed information into account
|
|
internally when they are computing confidences, so that this
|
|
information is available indirectly to dialog managers through
|
|
the confidence attributes.</p>
|
|
</li>
|
|
</ol>
|
|
|
|
<p>The most important consideration appears to be whether in fact
|
|
the ability to represent unanalyzed material is of interest to
|
|
current or near future applications.</p>
|
|
|
|
<p>Note that the use of "unanalyzed" would be mainly useful for
|
|
systems with robust natural language interpreters which are
|
|
capable of ignoring portions of the speech recognizer result that
|
|
don't match the natural language grammar. In the case of tightly
|
|
coupled ASR/NL systems which require that all of the input match
|
|
a speech recognizer grammar the notion of "unanalyzed" isn't
|
|
useful, since all of the input is required to be analyzed by the
|
|
nature of the system. Similarly, keyword spotting systems with
|
|
garbage models will not be able to make use of this element
|
|
because the speech recognition process discards any
|
|
unrecognizable speech before the natural language interpretation
|
|
process begins.</p>
|
|
|
|
<p>Example:</p>
|
|
|
|
<blockquote>System: <i>Where do you want to go?</i><br />
|
|
User: <i>I'd like to fly from Boston and then continue on to
|
|
Philadelphia.</i></blockquote>
|
|
|
|
<p>(assuming that <i>"and then continue on"</i> is not included
|
|
in the speech grammar.)</p>
|
|
|
|
<pre>
|
|
<unanalyzed>
|
|
<input>and then continue on</input>
|
|
</unanalyzed>
|
|
</pre>
|
|
|
|
<p>If there is duplicated unanalyzed material, as in <i>Please
|
|
get my email please,</i> every unanalyzed item should be
|
|
represented individually, so <i>please</i> should be duplicated
|
|
if both occurrences are unanalyzed.</p>
|
|
|
|
<h2><a id="acks" name="acks">8. Acknowledgements</a></h2>
|
|
|
|
<p>This document was written with the participation of the
|
|
members of the W3C Voice Browser Working Group <em>(listed in
|
|
alphabetical order)</em>:</p>
|
|
|
|
<blockquote>Daniel Austin, Ask Jeeves, Inc.<br />
|
|
Dan Burnett, Nuance<br />
|
|
Andrew Hunt, SpeechWorks<br />
|
|
Robert Keiller, VoxSurf International<br />
|
|
Andreas Kellner, Philips<br />
|
|
Bruce Lucas, IBM<br />
|
|
Dave Raggett W3C/Phone.com<br />
|
|
</blockquote>
|
|
</body>
|
|
</html>
|
|
|