You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
916 lines
33 KiB
916 lines
33 KiB
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
|
|
<html>
|
|
<head>
|
|
<meta http-equiv="Content-Type" content=
|
|
"text/html; charset=iso-8859-1">
|
|
<title>Dialog Requirements for Voice Markup Languages</title>
|
|
<style type="text/css">
|
|
body {
|
|
font-family: sans-serif;
|
|
margin-left: 10%;
|
|
margin-right: 5%;
|
|
color: black;
|
|
background-color: white;
|
|
background-attachment: fixed;
|
|
background-image: url(http://www.w3.org/StyleSheets/TR/WD.gif);
|
|
background-position: top left;
|
|
background-repeat: no-repeat;
|
|
font-family: Tahoma, Verdana, "Myriad Web", Syntax, sans-serif;
|
|
}
|
|
.unfinished { font-style: normal; background-color: #FFFF33}
|
|
.dtd-code { font-family: monospace;
|
|
background-color: #dfdfdf; white-space: pre;
|
|
border: #000000; border-style: solid;
|
|
border-top-width: 1px; border-right-width: 1px;
|
|
border-bottom-width: 1px; border-left-width: 1px; }
|
|
h2,h3 {margin-top: 1em;}
|
|
p.copyright {font-size: smaller}
|
|
code {
|
|
color: green;
|
|
font-family: monospace;
|
|
font-weight: bold;
|
|
}
|
|
.example {
|
|
border: solid green;
|
|
border-width: 2px;
|
|
color: green;
|
|
font-weight: bold;
|
|
margin-right: 5%;
|
|
margin-left: 0;
|
|
}
|
|
.bad {
|
|
border: solid red;
|
|
border-width: 2px;
|
|
margin-left: 0;
|
|
margin-right: 5%;
|
|
color: rgb(192, 101, 101);
|
|
}
|
|
div.navbar { text-align: center; }
|
|
div.contents {
|
|
background-color: rgb(204,204,255);
|
|
padding: 0.5em;
|
|
border: none;
|
|
margin-right: 5%;
|
|
}
|
|
table {
|
|
margin-left: -4%;
|
|
margin-right: 4%;
|
|
font-family: sans-serif;
|
|
background: white;
|
|
border-width: 2px;
|
|
border-color: white;
|
|
}
|
|
th { font-family: sans-serif; background: rgb(204, 204, 153) }
|
|
td { font-family: sans-serif; background: rgb(255, 255, 153) }
|
|
.tocline { list-style: none; }
|
|
</style>
|
|
<link rel="stylesheet" type="text/css" href=
|
|
"http://www.w3.org/StyleSheets/TR/W3C-WD.css">
|
|
</head>
|
|
<body>
|
|
<div class="head">
|
|
<p><a href="http://www.w3.org/"><img class="head" src=
|
|
"http://www.w3.org/Icons/WWW/w3c_home.gif" alt="W3C"></a></p>
|
|
|
|
<h1 class="notoc">Dialog Requirements<br>
|
|
for Voice Markup Languages</h1>
|
|
|
|
<h3 class="notoc">W3C Working Draft <i>23 December 1999</i></h3>
|
|
|
|
<dl>
|
|
<dt>This version:</dt>
|
|
|
|
<dd><a href=
|
|
"http://www.w3.org/TR/1999/WD-voice-dialog-reqs-19991223">
|
|
http://www.w3.org/TR/1999/WD-voice-dialog-reqs-19991223</a></dd>
|
|
|
|
<dt>Latest version:</dt>
|
|
|
|
<dd><a href=
|
|
"http://www.w3.org/TR/voice-dialog-reqs">
|
|
http://www.w3.org/TR/voice-dialog-reqs</a></dd>
|
|
|
|
<dt>Previous version:</dt>
|
|
|
|
<dd><a href=
|
|
"http://www.w3.org/Voice/Group/1999/dialog-reqs-19991130.html">
|
|
http://www.w3.org/Voice/Group/1999/dialog-reqs-19991130</a></dd>
|
|
|
|
<dt>Editor:</dt>
|
|
|
|
<dd>Scott McGlashan</dd>
|
|
</dl>
|
|
|
|
<p class="copyright"><a href=
|
|
"http://www.w3.org/Consortium/Legal/ipr-notice#Copyright">
|
|
Copyright</a> © 1999 <a href="http://www.w3.org/">
|
|
W3C</a><sup>®</sup> (<a href=
|
|
"http://www.lcs.mit.edu/">MIT</a>, <a href=
|
|
"http://www.inria.fr/">INRIA</a>, <a href=
|
|
"http://www.keio.ac.jp/">Keio</a>), All Rights Reserved. <abbr
|
|
title="World Wide Web Consortium">W3C</abbr> <a href=
|
|
"http://www.w3.org/Consortium/Legal/ipr-notice#Legal_Disclaimer">
|
|
liability</a>, <a href=
|
|
"http://www.w3.org/Consortium/Legal/ipr-notice#W3C_Trademarks">
|
|
trademark</a>, <a href=
|
|
"http://www.w3.org/Consortium/Legal/copyright-documents">document
|
|
use</a> and <a href=
|
|
"http://www.w3.org/Consortium/Legal/copyright-software">software
|
|
licensing</a> rules apply.</p>
|
|
|
|
<hr>
|
|
</div>
|
|
|
|
<h2 class="notoc">Abstract</h2>
|
|
|
|
<p>The W3C Voice Browser working group aims to develop
|
|
specifications to enable access to the Web using spoken
|
|
interaction. This document is part of a set of requirements
|
|
studies for voice browsers, and provides details of the
|
|
requirements for marking up spoken dialogs.</p>
|
|
|
|
<h2>Status of this document</h2>
|
|
|
|
<p>This document describes the requirements for marking up dialogs
|
|
for spoken interaction, as a precursor to starting work on
|
|
specifications. Related requirement drafts are linked from the <a
|
|
href="/TR/1999/WD-voice-intro-19991223">introduction</a>. The
|
|
requirements are being released as working drafts but are not
|
|
intended to become proposed recommendations.</p>
|
|
|
|
<p>This specification is a Working Draft of the Voice Browser working
|
|
group for review by W3C members and other interested parties. This is
|
|
the first public version of this document. It is a draft document and
|
|
may be updated, replaced, or obsoleted by other documents at any
|
|
time. It is inappropriate to use W3C Working Drafts as reference
|
|
material or to cite them as other than "work in progress".</p>
|
|
|
|
<p>Publication as a Working Draft does not imply endorsement by
|
|
the W3C membership, nor of members of the Voice Browser working
|
|
groups. This is still a draft document and may be updated,
|
|
replaced or obsoleted by other documents at any time. It is
|
|
inappropriate to cite W3C Working Drafts as other than "work in
|
|
progress."</p>
|
|
|
|
<p>This document has been produced as part of the <a href=
|
|
"http://www.w3.org/Voice/">W3C Voice Browser Activity</a>,
|
|
following the procedures set out for the <a href=
|
|
"http://www.w3.org/Consortium/Process/">W3C Process</a>. The
|
|
authors of this document are members of the <a href=
|
|
"http://www.w3.org/Voice/Group">Voice Browser Working Group</a>.
|
|
This document is for public review. Comments should be sent to
|
|
the public mailing list <<a href=
|
|
"mailto:www-voice@w3.org">www-voice@w3.org</a>> (<a href=
|
|
"http://www.w3.org/Archives/Public/www-voice/">archive</a>) by
|
|
14th January 2000.</p>
|
|
|
|
<p>A list of current W3C Recommendations and other technical
|
|
documents can be found at <a href="http://www.w3.org/TR">
|
|
http://www.w3.org/TR</a>.</p>
|
|
|
|
<h2> 0. Introduction</h2>
|
|
|
|
<p>The main goal of this subgroup is to established a prioritized
|
|
list of requirements for spoken dialog interaction which any
|
|
proposed markup language (or extension thereof) should
|
|
address.</p>
|
|
|
|
<p>The process will consist of the following steps:</p>
|
|
|
|
<ol>
|
|
<li>Collect requirements on spoken dialog.</li>
|
|
|
|
<li>Prioritize these requirements.</li>
|
|
|
|
<li>Distribute requirements to, and take feedback from, relevant
|
|
groups working on specific markup languages supporting speech
|
|
dialog.</li>
|
|
|
|
<li>Propose further work on how the spoken dialogs can be
|
|
integrated and synchronized with other input/output media to
|
|
provide co-ordinated multi-modal interaction.</li>
|
|
</ol>
|
|
|
|
<h3> 0.1 Scope</h3>
|
|
|
|
<p>The core activity focuses on defining three types of
|
|
requirements on the voice markup language: modality, functional,
|
|
and format. Modality requirements concern the types of modalities
|
|
(media in combination with an input/output mechanism) supported
|
|
by the markup language for user input and system output.
|
|
Functional requirements concern the behaviour (or operational
|
|
semantics) which results from interpreting a voice markup
|
|
language. Format requirements constrain the format (or syntax) of
|
|
the voice markup language itself.</p>
|
|
|
|
<p> The environment and capabilities of the voice browser
|
|
interpreting the markup language will affect these requirements.
|
|
There may be differences in the modality and functional
|
|
requirements for desktop versus telephony-based environments (and
|
|
in the latter case, between fixed, mobile and internet telephony
|
|
environments). The capability of the voice browser device will
|
|
also have an important impact on requirements; for example,
|
|
telephones without graphical displays versus those with graphical
|
|
displays. Requirements affected by the environment or
|
|
capabilities of the voice browser device will be explicitly
|
|
marked as such.</p>
|
|
|
|
<p>The Subgroup will not directly address how these requirements
|
|
are implemented in specific SGML or XML languages. It is agnostic
|
|
between the (X)HTML approach (where, for example, <a href=
|
|
"http://www.conversa.com/web/web.asp">Conversational
|
|
Computing</a>, <a href="http://www.pipebeach.com">PipeBeach</a>,
|
|
<a href="http://www.prodworks.com">Productivity Works</a>, and <a
|
|
href="http://www.speechtml.com/">Vocalis</a> interpret HTML as
|
|
voice markup) and XML languages specifically designed for spoken
|
|
dialog (<a href="http://www.voxml.com/voxml.html">VoxML</a>, <a
|
|
href="http://www.alphaWorks.ibm.com/tech">SpeechML</a>, <a href=
|
|
"http://www.w3.org/Voice/TalkML/">TalkML</a>, <a href=
|
|
"http://www.vxml.org">VoiceXML</a>, etc). However, for
|
|
illustrative purposes, examples and explanations of requirements
|
|
may be given in specific markup languages.</p>
|
|
|
|
<p>This Subgroup does not arbitrate on the extent to which
|
|
dialogs have graphical web browsers as their reference model:
|
|
i.e. the 'dialog' could be provided by any standard spoken dialog
|
|
system, or a (voice) browser with spoken dialog capabilities.
|
|
However, in both cases the voice markup needs to support
|
|
meta-commands (see Section 2.6).</p>
|
|
|
|
<p>Finally, features like call processing and billing are not
|
|
regarded as dialog requirements but as application design issues.
|
|
While they may have an impact on dialog requirements, they are
|
|
not part of the dialog markup or behaviour itself.</p>
|
|
|
|
<h3> 0.2 Interaction with Other Groups</h3>
|
|
|
|
<p>The activities of the Dialog Requirements Subgroup will be
|
|
coordinated with the activities of the Grammar Representation
|
|
Subgroup, the Synthesis Markup Subgroup, the Natural Language
|
|
Subgroup, the Multimodal Interaction Subgroup and the Reusable
|
|
Dialog Components Subgroup.</p>
|
|
|
|
<h3> 0.3 Terminology</h3>
|
|
|
|
<p>Although defining a dialog is highly problematic, some basic
|
|
definition must be provided to establish a common basis of
|
|
understanding and avoid confusion. The following terminology is
|
|
based upon an event-driven model of dialog interaction.<br>
|
|
<br>
|
|
</p>
|
|
|
|
<table border="1" cellpadding="6" width="85%" summary="first
|
|
column gives term, second gives description">
|
|
<tr>
|
|
<th>Voice Markup Language</th>
|
|
<td>a language in which voice dialog behaviour is specified. The
|
|
language may include reference to style and scripting elements
|
|
which can also determine dialog behaviour.</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<th>Voice Browser</th>
|
|
<td>a software device which interprets a voice markup language
|
|
and generates a dialog with voice output and/or input, and
|
|
possibly other modalities.</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<th>Dialog</th>
|
|
<td>a model of interactive behaviour underlying the
|
|
interpretation of the markup language. The model consists of
|
|
states, variables, events, event handlers, inputs and
|
|
outputs.</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<th>State</th>
|
|
<td>the basic interactional unit defined in the markup language;
|
|
for example, an < input > element in HTML. A state can
|
|
specify variables, event handlers, outputs and inputs. A state
|
|
may describe output content to be presented to the user, input
|
|
which the user can enter, event handlers describing, for example,
|
|
which variables to bind and which state to transition to when an
|
|
event occur.</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<th>Events</th>
|
|
<td>generated when a state is executed by the voice browser; for
|
|
example, when outputs or inputs in a state are rendered or
|
|
interpreted. Events are typed and may include information; for
|
|
example, an input event generated when an utterance is recognized
|
|
may include the string recognized, an interpretation, confidence
|
|
score, and so on.</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<th>Event Handlers</th>
|
|
<td>are specified in the voice markup language and describe how
|
|
events generated by the voice browser are to be handled.
|
|
Interpretation of events may bind variables, or map the current
|
|
state into another state (possibly itself).</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<th>Output</th>
|
|
<td>content specified in an element of the markup language for
|
|
presentation to the user. The content is rendered by the voice
|
|
browser; for example, audio files or text rendered by a TTS.
|
|
Output can also contain parameters for the output device; for
|
|
example, volume of audio file playback, language for TTS, etc.
|
|
Events are generated when, for example, the audio file has been
|
|
played.</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<th>Input</th>
|
|
<td>content (and its interpretation) specified in an element of
|
|
the markup language which can be given as input by a user; for
|
|
example, a grammar for DTMF and speech input. Events are
|
|
generated by the voice browser when, for example, the user has
|
|
spoken an utterance and variables may be bound to information
|
|
contained in the event. Input can also specify parameters for the
|
|
input device; for example, timeout parameters, etc.</td>
|
|
</tr>
|
|
</table>
|
|
|
|
<p>The dialog requirements for the voice markup language are
|
|
annotated with the following priorities. If a feature is deferred
|
|
from the initial specification to a future release, consideration
|
|
may be given to leaving open a path for future incorporation of
|
|
the feature.<br>
|
|
<br>
|
|
</p>
|
|
|
|
<table border="1" cellpadding="6" width="85%" summary="first
|
|
column gives priority name, second its description">
|
|
<tr>
|
|
<th>must have</th>
|
|
<td>The first official specification must define the
|
|
feature.</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<th>should have</th>
|
|
<td>The first official specification should define the feature if
|
|
feasible but may defer it until a future release.</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<th>nice to have</th>
|
|
<td>The first official specification may define the feature if
|
|
time permits, however, its priority is low.</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<th>future revision</th>
|
|
<td>It is not intended that the first official specification
|
|
include the feature.</td>
|
|
</tr>
|
|
</table>
|
|
|
|
<h2> 1. Modality Requirements</h2>
|
|
|
|
<p>These requirements will be co-ordinated with the Multimodal
|
|
Interaction Subgroup.</p>
|
|
|
|
<h3> 1.1 Audio Modality Input and Output (must have)</h3>
|
|
|
|
<p>The markup language can specify which spoken user input is
|
|
interpreted by the voice browser, as well as the content rendered
|
|
as spoken output by the voice browser.</p>
|
|
|
|
<h3> 1.2 Sequential multi-modal Input (must have)</h3>
|
|
|
|
<p> The markup language specifies that user input from multiple
|
|
modalities is to be interpreted by the voice browser. There is no
|
|
requirement that the input modalities are simultaneously active.
|
|
For example, a voice browser interpreting the markup language in
|
|
a telephony environment could accept DTMF input in one dialog
|
|
state, and spoken input in another.</p>
|
|
|
|
<h3> 1.3 Unco-ordinated, Simultaneous, Multi-modal Input (should
|
|
have)</h3>
|
|
|
|
<p> The markup language specifies that user input from different
|
|
modalities is to be interpreted at the same time. There is no
|
|
requirement that interpretation of the input modalities are
|
|
co-ordinated. For example, a voice browser in a desktop
|
|
environment could accept keyboard input or spoken input in same
|
|
dialog state.</p>
|
|
|
|
<h3> 1.4 Co-ordinated, Simultaneous Multi-modal Input (nice to
|
|
have)</h3>
|
|
|
|
<p> The markup language specifies that user input from multiple
|
|
modalities is interpreted at the same time and that
|
|
interpretation of the inputs are co-ordinated by the voice
|
|
browser. For example, in a telephony environment, the user can
|
|
type<em>200</em> on the keypad and say <em>transfer to checking
|
|
account</em> and the interpretations are co-ordinated so that
|
|
they are understood as <em> transfer 200 to checking
|
|
account</em>.</p>
|
|
|
|
<h3> 1.5 Sequential multi-modal Output (must have)</h3>
|
|
|
|
<p> The markup language specifies that content is rendered in
|
|
multiple modalities by the voice browser. There is no requirement
|
|
the output modalities are rendered simultaneously. For example, a
|
|
voice browser could output speech in one dialog state, and
|
|
graphics in another.</p>
|
|
|
|
<h3> 1.6 Unco-ordinated, Simultaneous, Multi-modal Output (nice
|
|
to have)</h3>
|
|
|
|
<p> The markup language specifies that content is rendered in
|
|
multiple modalities at the same time. There is no requirement the
|
|
rendering of output modalities are co-ordinated. For example, a
|
|
voice browser in a desktop environment could display graphics and
|
|
provide audio output at the same time.</p>
|
|
|
|
<h3> 1.7 Co-ordinated, Simultaneous Multi-modal Output (nice to
|
|
have)</h3>
|
|
|
|
<p> The markup language specifies that content is to be
|
|
simultaneously rendered in multiple modalities and that output
|
|
rendering is co-ordinated. For example, graphical output on a
|
|
cellular telephone display is co-ordinated with spoken
|
|
output.</p>
|
|
|
|
<h2> 2. Functional Requirements</h2>
|
|
|
|
<p> These requirements are intended to ensure that the markup
|
|
language is capable of specifying co-operative dialog behaviour
|
|
characteristic of state-of-the-art spoken dialog systems. In
|
|
general, the voice browser should compensate for its own
|
|
limitations in knowledge and performance compared with equivalent
|
|
human agents; for example, compensate for limitations in speech
|
|
recognition capability by confirming spoken user input when
|
|
necessary.</p>
|
|
|
|
<h3> 2.1 Mixed Initiative: Form Level (must have)</h3>
|
|
|
|
<p>Mixed initiative refers to dialog where one participant take
|
|
the initiative by, for example, asking a question and expects the
|
|
other participant to respond to this initiative by, for example,
|
|
answering the question. The other participant, however, responds
|
|
instead with an initiative by asking another question. Typically,
|
|
the first participant then reponds to this initiative, before the
|
|
second participant responds to the original initiative. This
|
|
behaviour is illustrated below:<br>
|
|
<br>
|
|
<em>S-A1: When do you want to fly to Paris?<br>
|
|
U-B1: What did you say?<br>
|
|
S-B2: I said when do you want to fly to Paris?<br>
|
|
U-A2: Tuesday.</em></p>
|
|
|
|
<p> where A1 is responded to in A2 after a nested interaction, or
|
|
sub-dialog in B1 and B2. Note that the B2 response itself could
|
|
have been another initiative leading to futher nesting of the
|
|
interaction.</p>
|
|
|
|
<p> The form-level mixed initiative requirement is that the
|
|
markup language can specify to the voice browser that it can take
|
|
the initiative when user expects a response, and also allow the
|
|
user to take the initiative when it expects a response where the
|
|
content of these initiatives is relevant to the task at hand,
|
|
contains navigation instructions or concerns general
|
|
meta-communication issues. This mixed initiative requirement is
|
|
particularly important when processing form input (hence the
|
|
name) and is further elaborated in requirements 2.1.1, 2.1.2,
|
|
2.1.3 and 2.1.4 below.</p>
|
|
|
|
<h4> 2.1.1 Clarification Subdialog (must have)</h4>
|
|
|
|
<p> The markup language can specify that a clarification
|
|
sub-dialog should be performed when the user provides incomplete,
|
|
form-related information. For example, in a flight enquiry
|
|
service, the departure city and date may be required but the user
|
|
does not always provide all the information at once:<br>
|
|
<br>
|
|
<em>S1: How can I help you?<br>
|
|
U1: I want to fly to Paris.<br>
|
|
S2: When?<br>
|
|
U1: Monday</em></p>
|
|
|
|
<p> U1 is incomplete (or 'underinformative') with respect to the
|
|
service (or form) and the system then initiates a sub-dialog in
|
|
S2 to collect the required information. If additional parameters
|
|
are required, further sub-dialogs may be initiated.</p>
|
|
|
|
<h4> 2.1.2 Confirmation Subdialog (must have)</h4>
|
|
|
|
<p> The markup language can specify that a confirmation
|
|
sub-dialog is to be performed when the confidence associated with
|
|
the interpretation of the user input is too low.<br>
|
|
<br>
|
|
<em>U1: I want to fly to Paris.<br>
|
|
S1: Did you say 'I want a fly to Paris'?<br>
|
|
U2: Yes.<br>
|
|
S2: When?<br>
|
|
U3: ...</em></p>
|
|
|
|
<p> Note confirmation sub-dialogs take precedence over
|
|
clarification sub-dialogs.</p>
|
|
|
|
<h4> 2.1.3 Over-informative Input: corrective (must have)</h4>
|
|
|
|
<p> The markup language can specify that unsolicited user input
|
|
in a sub-dialog which corrects earlier input is to be interpreted
|
|
appropriately. For example, in a confirmation sub-dialog users
|
|
may provide corrective information relevant to the form:<br>
|
|
<br>
|
|
<em>S1: Did you say you wanted to travel from Paris?<br>
|
|
U1: No, from Perros.</em> (modification) <em><br>
|
|
U1': Yes, from Paris</em> (repetition)</p>
|
|
|
|
<h4> 2.1.4 Over-informative Input: additional (nice to have)</h4>
|
|
|
|
<p> The markup language can specify that unsolicited user input
|
|
in a sub-dialog which is not corrective but additional, relevant
|
|
information for the current form is to be interpreted
|
|
appropriately. For example, in a confirmation sub-dialog users
|
|
may provide additional information relevant to the form:<br>
|
|
<em>S1: Did you say you wanted to travel from Paris?<br>
|
|
U1: Yes, I want to fly to Paris on Monday around 11.30</em></p>
|
|
|
|
<h3> 2.2 Mixed Initiative: Task Level (must have)</h3>
|
|
|
|
<p>The markup language needs to address mixed initiative in
|
|
dialogs which involve more than one task (or topic). For example,
|
|
a portal service may allow the user to interact with a number of
|
|
specific services such as car hire, hotel reservation, flight
|
|
enquiries, etc, which may be located on the different web sites
|
|
or servers. This requirement is further elaborated in
|
|
requirements 2.2.1, 2.2.2, 2.2.3, 2.2.4 and 2.2.5 below.</p>
|
|
|
|
<h4> 2.2.1 Explicit Task Switching (must have)</h4>
|
|
|
|
<p>The markup language can specify how users can explicitly
|
|
switch from one task to another. For example, by means of a set
|
|
of global commands which are active in all tasks and which take
|
|
the user to a specific task; e.g. <em>Take me to car hire</em>,
|
|
<em>Go to hotel reservations</em>.</p>
|
|
|
|
<h4> 2.2.2 Implicit Task Switching (should have)</h4>
|
|
|
|
<p>The markup language can specify how users can implicitly
|
|
switch from one task to another. For example, by means of simply
|
|
uttering a phrases relevant to another task; <em>I want to
|
|
reserve a McLaren F1 in Monaco next wednesday</em>.</p>
|
|
|
|
<h4> 2.2.3 Manual Return from Task Switch (must have)</h4>
|
|
|
|
<p>The markup language can specify how users can explicitly
|
|
return to a previous task at any time. For example, by means of
|
|
global task navigation commands such as <em>previous
|
|
task</em>.</p>
|
|
|
|
<h4> 2.2.4 Automatic Return from Task Switch (should have)</h4>
|
|
|
|
<p>The markup language can specify that users can automatically
|
|
return to the previous task upon completion or explicit
|
|
cancellation of the current task.</p>
|
|
|
|
<h4> 2.2.5 Suspended Tasks (should have)</h4>
|
|
|
|
<p>The markup langauge can specify that when task switching
|
|
occurs the previous task is suspended rather than cancelled. Thus
|
|
when the user returns to the previous task, the interaction is
|
|
resumed at the point it was suspended.</p>
|
|
|
|
<h3> 2.3 Help Behaviour (should have)</h3>
|
|
|
|
<p>The markup language can specify help information when
|
|
requested by the user. Help information should be available in
|
|
all dialog states.<br>
|
|
<em>S1: How can I help you?<br>
|
|
U1: What can you do?<br>
|
|
S2: I can give you flight information about flights between major
|
|
cities world-wide just like a travel agent. How can I help
|
|
you?<br>
|
|
U1: I want a flight to Paris ...</em><br>
|
|
</p>
|
|
|
|
<p> Help information can be tapered so that it can be elaborated
|
|
upon on subsequent user requests.</p>
|
|
|
|
<h3> 2.4 Error Correction Behaviour (must have)</h3>
|
|
|
|
<p> The markup language can specify how error events generated by
|
|
the voice browser are to be handled. For example, by initiating a
|
|
sub-dialog to describe and correct the error:<br>
|
|
<em>S1: How can I help you?<br>
|
|
U1: <audio but no interpretation><br>
|
|
S2: Sorry, I didn't understand that. Where do you want to travel
|
|
to?<br>
|
|
U2: Paris</em></p>
|
|
|
|
<p> The markup language can specify how specific types of errors
|
|
encountered in spoken dialog, e.g. no audio, too loud/soft, no
|
|
interpretation, no audio, internal error, etc, are to be handled
|
|
as well as providing a general 'catch all' method.</p>
|
|
|
|
<h3> 2.5 Timeout Behaviour (must have)</h3>
|
|
|
|
<p> The markup language can specify what to do when the voice
|
|
browser times out waiting for input; for example, a timeout event
|
|
can be handled by repeating the current dialog state:<br>
|
|
<em>S1: Did you say monday?<br>
|
|
U1: <timeout><br>
|
|
S2: Did you say Monday?</em><br>
|
|
</p>
|
|
|
|
<p> Note that the strategy may be dependent upon the environment;
|
|
in a desktop evironment, repetition for example may be
|
|
irritating.</p>
|
|
|
|
<h3> 2.6 Meta-Commands (should have)</h3>
|
|
|
|
<p> The markup language specifies a set of meta-command functions
|
|
which are available in all dialog states; for example, repeat,
|
|
cancel, quit, operator, etc.</p>
|
|
|
|
<p> The precise set of meta-commands will be co-ordinated with
|
|
the Telephony Speech Standards Committee.</p>
|
|
|
|
<p> The markup language should specify how the scope of
|
|
meta-commands like 'cancel' is resolved.</p>
|
|
|
|
<h3> 2.7 Barge-in Behaviour (should have)</h3>
|
|
|
|
<p> The markup language specifies when the user is able to
|
|
bargein on the system output, and when it is not allowed.</p>
|
|
|
|
<p> Note: The output device may generate timestamped events when
|
|
barge-in occurs (see 3.9).</p>
|
|
|
|
<h3> 2.8 Call Transfer (should have)</h3>
|
|
|
|
<p> The markup language specifies a mechanism to allow transfer
|
|
of the caller to another line in a telephony environment. For
|
|
example, in cases of dialog breakdown, the user can be
|
|
transferred to an operator (cf. 'callto' in HTML). The markup
|
|
language also provides a mechanism to deal with transfer failures
|
|
such as when the called line is busy or engaged.</p>
|
|
|
|
<h3> 2.9 Quit Behaviour (must have)</h3>
|
|
|
|
<p> The markup language provides a mechanism to terminate the
|
|
session (cf. user-terminated sessions via a 'quit' meta-command
|
|
in 2.6).</p>
|
|
|
|
<h3> 2.10 Interaction with External Components (must have)</h3>
|
|
|
|
<p> The markup language must support a generic component
|
|
interface to allow for the use of external components on the
|
|
client and/or server side. The interface provides a mechanism for
|
|
transferring data between the markup language's variables and the
|
|
component. Examples of such data are: configuration parameters
|
|
(such as timeouts), and events for data input and error codes.
|
|
Except for event handling, a call to an external component does
|
|
not directly change the dialog state, i.e. the dialog continues
|
|
in the state from which the external component was called.</p>
|
|
|
|
<p> Examples of external components are pre-built dialog
|
|
components and server scripts. Pre-built dialogs are further
|
|
described in Section 3.3. Server scripts can be used to interact
|
|
with remote services, devices or databases.</p>
|
|
|
|
<h2> 3. Format Requirements</h2>
|
|
|
|
<h3> 3.1 Ease of Use (must have)</h3>
|
|
|
|
<p>The markup language should be easy for designers to understand
|
|
and author without special tools or knowledge of vendor
|
|
technology or protocols (dialog design knowledge is still
|
|
essential).</p>
|
|
|
|
<h3> 3.2 Simplicity and Power (must have)</h3>
|
|
|
|
<p>The markup language allows designers to rapidly develop simple
|
|
dialogs without the need to worry about interactional details but
|
|
also allow designers to take more control over interaction to
|
|
develop complex dialogs.</p>
|
|
|
|
<h3> 3.3 Support for Modularity and Re-use (should have)</h3>
|
|
|
|
<p> The markup language complies with the requirements of the
|
|
Reusable Dialog Components Subgroup.</p>
|
|
|
|
<p> The markup language can specify a number of pre-built dialog
|
|
components. This enables one to build a library of reusable
|
|
'dialogs'. This is useful for handling both application specific
|
|
input types, such as telephone numbers, credit card number, etc
|
|
as well as those that are more generic, such as times, dates,
|
|
numbers, etc.</p>
|
|
|
|
<h3> 3.4 Naming (must have)</h3>
|
|
|
|
<p> Dialogs, states, inputs and outputs can be referenced by a
|
|
URI in the markup language.</p>
|
|
|
|
<h3> 3.5 Variables (must have)</h3>
|
|
|
|
<p> Variables can be defined and assigned values.</p>
|
|
|
|
<p> Variables can be scoped within namespaces: for example,
|
|
state-level, dialog-level, document-level, application-level or
|
|
session-level. The markup language defines the precise scope of
|
|
all variables.</p>
|
|
|
|
<p> The markup language must specify if variables are atomic or
|
|
stuctured.</p>
|
|
|
|
<p> Variables can be assigned default values. Assignment may be
|
|
optional; for example, in a flight reservation form, a 'special
|
|
meal' variable need not be assigned a value by the user.</p>
|
|
|
|
<p> Variables may be referred to in the output content of the
|
|
markup language.</p>
|
|
|
|
<p> The precise requirements on variables may be affected by W3C
|
|
work on modularity and XML schema datatypes.</p>
|
|
|
|
<h3> 3.6 Variable Binding (must have)</h3>
|
|
|
|
<p> User input can bind one or more state variables. A single
|
|
input may bind a single variable or it may bind multiple
|
|
variables in any order; for example, the following utterances
|
|
result in the same variable bindings<br>
|
|
</p>
|
|
|
|
<ul>
|
|
<li>Transfer $200 from savings to checking</li>
|
|
|
|
<li>Transfer $200 to checking from savings</li>
|
|
|
|
<li>Transfer from savings $200 to checking</li>
|
|
</ul>
|
|
|
|
<h3> 3.7 Event Handler (must have)</h3>
|
|
|
|
<p> The markup language provides an explicit event handling
|
|
mechanism for specifying actions to be carried out when events
|
|
are generated in a dialog state.</p>
|
|
|
|
<p> Event handlers can be ordered so that if multiple event
|
|
handlers match the current event, only the handler with the
|
|
highest ranking is executed. By default, event handler ranking is
|
|
based on proximity and specificity: i.e. the handler closest in
|
|
the event hierarchy with the most specific matching
|
|
conditions.</p>
|
|
|
|
<p> Actions can be conditional upon variable assignments, as well
|
|
as the type and content of events (e.g. input events specifying
|
|
media, content, confidence, and so on).</p>
|
|
|
|
<p> Actions include: the binding of variables with information,
|
|
for example, information contained in events; transition to
|
|
another dialog state (including the current state).</p>
|
|
|
|
<h3> 3.8 Builtin Event Handlers (should have)</h3>
|
|
|
|
<p> The markup language can provide implicit event handlers which
|
|
provide default handling of, for example, timeout and error
|
|
events as well as handlers for situations, such as confirmation
|
|
and clarification, where there is a transition to a implicit
|
|
dialog state. For example, there can be a default handler for
|
|
user input events such that if the recognition confidence score
|
|
is below a given threshold, then the input is confirmed in a
|
|
sub-dialog.</p>
|
|
|
|
<p> Properties of implicit event handlers (thresholds, counters,
|
|
locale, etc) can be explicitly customized in the markup
|
|
language.</p>
|
|
|
|
<p> Implicit event handlers are always overridden by explicit
|
|
handlers.</p>
|
|
|
|
<h3> 3.9 Output Content and Events (must have)</h3>
|
|
|
|
<p> The markup language complies with the requirements developed
|
|
by the Speech Synthesis Markup Subgroup for output text content
|
|
and parameter settings for the output device. Requirements on
|
|
multimodal output will be co-ordinated by the Multimodal
|
|
Interaction Subgroup (cf. Section 1).</p>
|
|
|
|
<p> In addition, the markup supports the following output
|
|
features (if not already defined in the Synthesis Markup):</p>
|
|
|
|
<ol>
|
|
<li>Pre-recorded audio file output</li>
|
|
|
|
<li>Streamed audio</li>
|
|
|
|
<li>Playing/synthesizing sounds such as tones and beeps</li>
|
|
|
|
<li>variable level of detail control over structured text</li>
|
|
</ol>
|
|
|
|
<p> The output device generates timestamped events including
|
|
error events and progress events (output started/stopped, current
|
|
position).</p>
|
|
|
|
<h3> 3.10 Richer Output (nice to have)</h3>
|
|
|
|
<p>The markup language allows for richer output than variable
|
|
substitution in the output content. For example, natural language
|
|
generation of output content.</p>
|
|
|
|
<h3> 3.11 Input Content and Events (must have)</h3>
|
|
|
|
<p> The markup language complies with the requirements developed
|
|
by the Grammar Representation Subgroup for the representation of
|
|
speech grammar content. Requirements on multimodal input will be
|
|
co-ordinated by the Multimodal Interaction Subgroup (cf. Section
|
|
1).</p>
|
|
|
|
<p> The markup language can specify the activation and
|
|
deactivation of multiple speech grammars. These can be
|
|
user-defined, or builtin grammars (digits, date, time, money,
|
|
etc).</p>
|
|
|
|
<p> The markup language can specify parameters for speech grammar
|
|
content including timeout parameters --- maximum initial silence,
|
|
maximum utterance duration, maximum within-utterance pause ---
|
|
energy thresholds necessary for bargein, etc.</p>
|
|
|
|
<p> The input device generates timestamped events including input
|
|
timeout and error events, progress events (utterance started,
|
|
interference, etc), and recognition result events (including
|
|
content, interpretation/variable bindings, confidence).</p>
|
|
|
|
<p> In addition to speech grammars, the markup language allows
|
|
input content and events to be specified for DTMF and keyboard
|
|
devices.</p>
|
|
|
|
<h2> 4. Other Requirements</h2>
|
|
|
|
<h3> 4.1 Event Handling (must have)</h3>
|
|
|
|
<p> One key difference between contemporary event models (e.g.
|
|
DOM Level 2, 'try-catch' in object-oriented programming) is
|
|
whether the same event can be handled by more than one event
|
|
handler within the hierarchy. The markup language must motivate
|
|
whether it supports this feature or not.</p>
|
|
|
|
<h3> 4.2 Logging (nice to have)</h3>
|
|
|
|
<p> For development and testing it is important that data and
|
|
events are to be logged by the voice browser. At the most
|
|
detailed level, this will include logging of input and output
|
|
audio data. A mechanism which allows logged data to be retrieved
|
|
from a voice browser, preferably via standard internet protocol
|
|
(http, ftp, etc), is also required.</p>
|
|
|
|
<p> One approach is to require that the markup language can
|
|
control logging via, for example, an optional meta tag. Another
|
|
approach is for logging to be controlled by means other than the
|
|
markup language, such as via proprietary meta tags.</p>
|
|
|
|
<h3> 4.3 Speaker Verification (should have)</h3>
|
|
|
|
<p>The markup language could provide the ability to verify a
|
|
speaker's identity through a dialog containing both acoustic
|
|
verification and knowledge verification. The acoustic
|
|
verification may compare speech samples to an existing model
|
|
(kept in some, possibly external, repository) of that speaker's
|
|
voice. A verification result returns a value indicating whether
|
|
the acoustic and knowledge tests were accepted or rejected.
|
|
Results for verification and results for recognition may be
|
|
returned simultaneously.</p>
|
|
|
|
<h2>5. Acknowledgments</h2>
|
|
|
|
<h3>Subgroup Members</h3>
|
|
|
|
<blockquote>Laurence Ferrieux (France Telecom)<br>
|
|
Linda Dorrian (Productivity Works)<br>
|
|
Andreas Kellner (Philips)<br>
|
|
Kenneth Rehor (Bell Labs)<br>
|
|
David Attwater (BT)<br>
|
|
Danniel Burnett (Nuance)<br>
|
|
Deborah Dahl (Unisys)<br>
|
|
Andrew Hunt (Sun Labs)<br>
|
|
Robert Keiller (Canon)<br>
|
|
James Larson (Intel)<br>
|
|
William Ledingham (SpeechWorks)<br>
|
|
Bruce Lucas (IBM)<br>
|
|
Jen Marschner (Philips)<br>
|
|
Scott McGlashan (PipeBeach)<br>
|
|
Michael Phillips (SpeechWorks)<br>
|
|
Stephen Potter (Entropic)<br>
|
|
David Raggett (W3C/HP)<br>
|
|
Volker Steinbiss (Philips)<br>
|
|
Ramesh Sarukkai (L & H)<br>
|
|
Dwight Smith (Motorola)<br>
|
|
Michael Brown (Bell Labs)<br>
|
|
Marianne Hickey (HP)<br>
|
|
George White (General Magic)</blockquote>
|
|
|
|
|
|
</body>
|
|
</html>
|
|
|