You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
2195 lines
90 KiB
2195 lines
90 KiB
<?xml version="1.0" encoding="UTF-8"?>
|
|
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
|
|
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
|
|
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="EN" lang="EN">
|
|
<head>
|
|
<meta http-equiv="content-type" content="text/html; charset=UTF-8" />
|
|
<title>Voice Extensible Markup Language (VoiceXML) 3.0 Requirements</title>
|
|
<style type="text/css" xml:space="preserve">
|
|
.add { background-color: #FFFF99; }
|
|
.remove { background-color: #FF9999; text-decoration: line-through }
|
|
.issues { font-style: italic; font-weight: bold; color: green }
|
|
|
|
.tocline { list-style: none; }</style>
|
|
<link rel="stylesheet" type="text/css"
|
|
href="http://www.w3.org/StyleSheets/TR/W3C-WD.css" />
|
|
</head>
|
|
|
|
<body>
|
|
|
|
<div class="head">
|
|
<p><a href="http://www.w3.org/"><img alt="W3C"
|
|
src="http://www.w3.org/Icons/w3c_home" height="48" width="72" /></a></p>
|
|
|
|
<h1 class="notoc" id="h1">Voice Extensible Markup Language (VoiceXML) 3.0
|
|
Requirements</h1>
|
|
|
|
<h2 class="notoc" id="date">W3C Working Draft <i>8 August 2008</i></h2>
|
|
<dl>
|
|
<dt>This version:</dt>
|
|
<dd><a
|
|
href="http://www.w3.org/TR/2008/WD-vxml30reqs-20080808/">http://www.w3.org/TR/2008/WD-vxml30reqs-20080808/
|
|
</a></dd>
|
|
<dt>Latest version:</dt>
|
|
<dd><a
|
|
href="http://www.w3.org/TR/vxml30reqs/">http://www.w3.org/TR/vxml30reqs/
|
|
</a></dd>
|
|
<dt>Previous version:</dt>
|
|
<dd>This is the first version. </dd>
|
|
<dt>Editors:</dt>
|
|
<dd>Jeff Hoepfinger, SandCherry</dd>
|
|
<dd>Emily Candell, Comverse</dd>
|
|
<dt>Authors:</dt>
|
|
<dd>Jim Barnett, Aspect</dd>
|
|
<dd>Mike Bodell, Microsoft</dd>
|
|
<dd>Dan Burnett, Voxeo</dd>
|
|
<dd>Jerry Carter, Nuance</dd>
|
|
<dd>Scott McGlashan, HP</dd>
|
|
<dd>Ken Rehor, Cisco</dd>
|
|
</dl>
|
|
|
|
<p class="copyright"><a
|
|
href="http://www.w3.org/Consortium/Legal/ipr-notice#Copyright">Copyright</a>
|
|
© 2008 <a href="http://www.w3.org/"><acronym
|
|
title="World Wide Web Consortium">W3C</acronym></a><sup>®</sup> (<a
|
|
href="http://www.csail.mit.edu/"><acronym
|
|
title="Massachusetts Institute of Technology">MIT</acronym></a>, <a
|
|
href="http://www.ercim.org/"><acronym
|
|
title="European Research Consortium for Informatics and Mathematics">ERCIM</acronym></a>,
|
|
<a href="http://www.keio.ac.jp/">Keio</a>), All Rights Reserved. W3C <a
|
|
href="http://www.w3.org/Consortium/Legal/ipr-notice#Legal_Disclaimer">liability</a>,
|
|
<a
|
|
href="http://www.w3.org/Consortium/Legal/ipr-notice#W3C_Trademarks">trademark</a>
|
|
and <a href="http://www.w3.org/Consortium/Legal/copyright-documents">document
|
|
use</a> rules apply.</p>
|
|
</div>
|
|
<hr />
|
|
|
|
<h2 class="notoc"><a id="abstract" name="abstract">Abstract</a></h2>
|
|
|
|
<p>The W3C Voice Browser working group aims to develop specifications to
|
|
enable access to the Web using spoken interaction. This document is part of a
|
|
set of requirement studies for voice browsers, and provides details of the
|
|
requirements for marking up spoken dialogs.</p>
|
|
|
|
<h2><a id="status" name="status">Status of this document</a></h2>
|
|
|
|
<p><em>This section describes the status of this document at the time of its
|
|
publication. Other documents may supersede this document. A list of current
|
|
W3C publications and the latest revision of this technical report can be
|
|
found in the <a href="http://www.w3.org/TR/">W3C technical reports index</a>
|
|
at http://www.w3.org/TR/.</em></p>
|
|
|
|
<p>This is the 8 August 2008 W3C Working Draft of "Voice Extensible Markup
|
|
Language (VoiceXML) 3.0 Requirements".</p>
|
|
|
|
<p>This document describes the requirements for marking up dialogs for spoken
|
|
interaction required to fulfill the charter given in <a
|
|
href="http://www.w3.org/2006/12/voice-charter.html#scope">the Voice Browser
|
|
Working Group Charter</a>, and indicates how the W3C Voice Browser Working
|
|
Group has satisfied these requirements via the publication of working drafts
|
|
and recommendations. This is a First Public Working Draft. The group does not
|
|
expect this document to become a W3C Recommendation.</p>
|
|
|
|
<p>This document has been produced as part of the <a
|
|
href="http://www.w3.org/Voice/Activity.html" shape="rect">W3C Voice Browser
|
|
Activity</a>, following the procedures set out for the <a
|
|
href="http://www.w3.org/Consortium/Process/" shape="rect">W3C Process</a>.
|
|
The authors of this document are members of the <a
|
|
href="http://www.w3.org/Voice/" shape="rect">Voice Browser Working Group</a>.
|
|
You are encouraged to subscribe to the public discussion list <<a
|
|
href="mailto:www-voice@w3.org" shape="rect">www-voice@w3.org</a>> and to
|
|
mail us your comments. To subscribe, send an email to <<a
|
|
href="mailto:www-voice-request@w3.org"
|
|
shape="rect">www-voice-request@w3.org</a>> with the word
|
|
<em>subscribe</em> in the subject line (include the word <em>unsubscribe</em>
|
|
if you want to unsubscribe). A <a
|
|
href="http://lists.w3.org/Archives/Public/www-voice/" shape="rect">public
|
|
archive</a> is available online.</p>
|
|
|
|
<p>This specification is a Working Draft of the Voice Browser working group
|
|
for review by W3C members and other interested parties. It is a draft
|
|
document and may be updated, replaced, or obsoleted by other documents at any
|
|
time. It is inappropriate to use W3C Working Drafts as reference material or
|
|
to cite them as other than "work in progress".</p>
|
|
|
|
<p> This document was produced by a group operating under the <a href="http://www.w3.org/Consortium/Patent-Policy-20040205/">5 February 2004 W3C Patent Policy</a>. The group does not expect this document to become a W3C Recommendation. W3C maintains a <a rel="disclosure" href="http://www.w3.org/2004/01/pp-impl/34665/status">public list of any patent disclosures</a> made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains <a href="http://www.w3.org/Consortium/Patent-Policy-20040205/#def-essential">Essential Claim(s)</a> must disclose the information in accordance with <a href="http://www.w3.org/Consortium/Patent-Policy-20040205/#sec-Disclosure">section 6 of the W3C Patent Policy</a>. </p>
|
|
|
|
<p>Publication as a Working Draft does not imply endorsement by the W3C
|
|
Membership. This is a draft document and may be updated, replaced or
|
|
obsoleted by other documents at any time. It is inappropriate to cite this
|
|
document as other than work in progress.</p>
|
|
|
|
<h2><a id="toc" name="toc" shape="rect">Table of Contents</a></h2>
|
|
<ul class="toc">
|
|
<li class="tocline">0. <a href="#intro" shape="rect">Introduction</a></li>
|
|
<li class="tocline">1. <a href="#modality-reqs" shape="rect">Modality
|
|
Requirements</a></li>
|
|
<li class="tocline">1.1 <a href="#mod-csmo" shape="rect">Coordinated,
|
|
Simultaneous Multimodal Output</a></li>
|
|
<li class="tocline">1.2 <a href="#mod-usmo" shape="rect">Uncoordinated,
|
|
Simultaneous Multimodal Output</a></li>
|
|
<li class="tocline">2. <a href="#functional-reqs" shape="rect">Functional
|
|
Requirements</a></li>
|
|
<li class="tocline">2.1 <a href="#funct-vcr" shape="rect">VCR
|
|
Controls</a></li>
|
|
<li class="tocline">2.2 <a href="#funct-media" shape="rect">Media
|
|
Control</a></li>
|
|
<li class="tocline">2.3 <a href="#funct-siv" shape="rect">Speaker
|
|
Verification</a></li>
|
|
<li class="tocline">2.4 <a href="#funct-event" shape="rect">External Event
|
|
Handling while a dialog in progress</a></li>
|
|
<li class="tocline">2.5 <a href="#funct-pls" shape="rect">Pronunciation
|
|
Lexicon Specification</a></li>
|
|
<li class="tocline">2.6 <a href="#funct-emma" shape="rect">EMMA</a></li>
|
|
<li class="tocline">2.7 <a href="#funct-upload" shape="rect">Synchronous
|
|
Upload of Recordings</a></li>
|
|
<li class="tocline">2.8 <a href="#funct-speed" shape="rect">Speed
|
|
Control</a></li>
|
|
<li class="tocline">2.9 <a href="#funct-volume" shape="rect">Volume
|
|
Control</a></li>
|
|
<li class="tocline">2.10 <a href="#funct-record" shape="rect">Media
|
|
Recording</a></li>
|
|
<li class="tocline">2.11 <a href="#funct-mediaformat" shape="rect">Media
|
|
Formats</a></li>
|
|
<li class="tocline">2.12 <a href="#funct-datamodel" shape="rect">Data
|
|
Model</a></li>
|
|
<li class="tocline">2.13 <a href="#funct-submitprocessing"
|
|
shape="rect">Submit Processing</a></li>
|
|
<li class="tocline">3. <a href="#format-reqs" shape="rect">Format
|
|
Requirements</a></li>
|
|
<li class="tocline">3.1 <a href="#format-flow" shape="rect">Flow
|
|
Language</a></li>
|
|
<li class="tocline">3.2 <a href="#format-semmod" shape="rect">Semantic
|
|
Model Definition</a></li>
|
|
<li class="tocline">4. <a href="#other-reqs" shape="rect">Other
|
|
Requirements</a></li>
|
|
<li class="tocline">4.1 <a href="#other-vxml" shape="rect">Consistent with
|
|
other Voice Browser Working Group Specs</a></li>
|
|
<li class="tocline">4.2 <a href="#other-other" shape="rect">Consistent with
|
|
other Specs</a></li>
|
|
<li class="tocline">4.3 <a href="#other-simplify" shape="rect">Simplify
|
|
existing VoiceXML Tasks</a></li>
|
|
<li class="tocline">4.4 <a href="#other-maintain" shape="rect">Maintain
|
|
Functionality from Previous VXML Versions</a></li>
|
|
<li class="tocline">4.5 <a href="#other-crs" shape="rect">Address Change
|
|
Requests from Previous VXML Versions</a></li>
|
|
<li class="tocline">5. <a href="#acknowledgments"
|
|
shape="rect">Acknowledgments</a></li>
|
|
<li class="tocline">Appendix A. <a href="#prev-reqs" shape="rect">Previous
|
|
Requirements</a></li>
|
|
</ul>
|
|
|
|
<h2><a id="intro" name="intro">0. Introduction</a></h2>
|
|
|
|
<p>The main goal of this activity is to establish the current status of the
|
|
Voice Browser Working Group Activities relative to the requirements defined
|
|
in <a href="http://www.w3.org/TR/1999/WD-voice-dialog-reqs-19991223">Previous
|
|
Requirements Document</a> and define additional requirements to drive future
|
|
Voice Browser Working Group activities based on Voice Community experience
|
|
with existing standards</p>
|
|
|
|
<p>The process will consist of the following steps:</p>
|
|
<ol>
|
|
<li>Identify how the existing requirements have been satisfied by the
|
|
standards defined by the Voice Browser Working Group, other W3C Working
|
|
Groups or other standards bodies. Note that references to VoiceXML 2.0
|
|
imply that VoiceXML 2.1 also satisfies the requirement.</li>
|
|
<li>Identify the requirements that have not yet been satisfied and
|
|
determine if they are still valid requirements</li>
|
|
<li>Identify new requirements based on input from working group members and
|
|
submission to the W3C Voice Browser Public Mailing List <<a
|
|
href="mailto:www-voice@w3.org">www-voice@w3.org</a>> (<a
|
|
href="http://www.w3.org/Archives/Public/www-voice/">archive</a>)</li>
|
|
<li>Prioritize remaining requirements and identify road map by which the
|
|
Voice Browser Working Group plans to address these items</li>
|
|
</ol>
|
|
|
|
<h3><a id="S0_1" name="S0_1"></a>0.1 Scope</h3>
|
|
|
|
<p>The previous requirements definition activity focused on defining three
|
|
types of requirements on the voice markup language: modality, functional, and
|
|
format.</p>
|
|
<ul>
|
|
<li><b>Modality</b> requirements concern the types of modalities (media in
|
|
combination with an input/output mechanism) supported by the markup
|
|
language for user input and system output. (For the Voice Browser Working
|
|
Group, the modalities supported are speech, video and DTMF. Requirements
|
|
regarding other modalities will be handled by the <a
|
|
href="http://www.w3.org/2002/mmi/">Multimodal Interaction Working
|
|
Group.</a>)</li>
|
|
<li><b>Functional</b> requirements concern the behavior (or operational
|
|
semantics) which results from interpreting a voice markup language.</li>
|
|
<li><b>Format</b> requirements constrain the format (or syntax) of the
|
|
voice markup language itself.</li>
|
|
</ul>
|
|
|
|
<p>The environment and capabilities of the voice browser interpreting the
|
|
markup language affects these requirements. There may be differences in the
|
|
modality and functional requirements for desktop versus telephony-based
|
|
environments (and in the latter case, between fixed, mobile and Internet
|
|
telephony environments). The capabilities of the voice browser device also
|
|
impacts on requirements. Requirements affected by the environment or
|
|
capabilities of the voice browser device will be explicitly marked as
|
|
such.</p>
|
|
|
|
<h3><a id="S0_2" name="S0_2"></a>0.2 Terminology</h3>
|
|
|
|
<p>Although defining a dialog is highly problematic, some basic definitions
|
|
must be provided to establish a common basis of understanding and avoid
|
|
confusion. The following terminology is based upon an event-driven model of
|
|
dialog interaction.<br />
|
|
<br />
|
|
</p>
|
|
|
|
<table summary="first column gives term, second gives description" border="1"
|
|
cellpadding="6" width="85%">
|
|
<tbody>
|
|
<tr>
|
|
<th>Voice Markup Language</th>
|
|
<td>a language in which voice dialog behavior is specified. The
|
|
language may include reference to style and scripting elements which
|
|
can also determine dialog behavior.</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Voice Browser</th>
|
|
<td>a software device which interprets a voice markup language and
|
|
generates a dialog with voice output and/or input, and possibly other
|
|
modalities.</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Dialog</th>
|
|
<td>a model of interactive behavior underlying the interpretation of
|
|
the markup language. The model consists of states, variables, events,
|
|
event handlers, inputs and outputs.</td>
|
|
</tr>
|
|
<tr>
|
|
<th>State</th>
|
|
<td>the basic interactional unit defined in the markup language; for
|
|
example, an < input > element in HTML. A state can specify
|
|
variables, event handlers, outputs and inputs. A state may describe
|
|
output content to be presented to the user, input which the user can
|
|
enter, event handlers describing, for example, which variables to
|
|
bind and which state to transition to when an event occur.</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Events</th>
|
|
<td>generated when a state is executed by the voice browser; for
|
|
example, when outputs or inputs in a state are rendered or
|
|
interpreted. Events are typed and may include information; for
|
|
example, an input event generated when an utterance is recognized may
|
|
include the string recognized, an interpretation, confidence score,
|
|
and so on.</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Event Handlers</th>
|
|
<td>are specified in the voice markup language and describe how events
|
|
generated by the voice browser are to be handled. Interpretation of
|
|
events may bind variables, or map the current state into another
|
|
state (possibly itself).</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Output</th>
|
|
<td>content specified in an element of the markup language for
|
|
presentation to the user. The content is rendered by the voice
|
|
browser; for example, audio files or text rendered by a TTS. Output
|
|
can also contain parameters for the output device; for example,
|
|
volume of audio file playback, language for TTS, etc. Events are
|
|
generated when, for example, the audio file has been played.</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Input</th>
|
|
<td>content (and its interpretation) specified in an element of the
|
|
markup language which can be given as input by a user; for example, a
|
|
grammar for DTMF and speech input. Events are generated by the voice
|
|
browser when, for example, the user has spoken an utterance and
|
|
variables may be bound to information contained in the event. Input
|
|
can also specify parameters for the input device; for example,
|
|
timeout parameters, etc.</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
|
|
<p>The dialog requirements for the voice markup language are annotated with
|
|
the following priorities. If a feature is deferred from the initial
|
|
specification to a future release, consideration may be given to leaving open
|
|
a path for future incorporation of the feature.<br />
|
|
<br />
|
|
</p>
|
|
|
|
<table summary="first column gives priority name, second its description"
|
|
border="1" cellpadding="6" width="85%">
|
|
<tbody>
|
|
<tr>
|
|
<th>must have</th>
|
|
<td>The first official specification must define the feature.</td>
|
|
</tr>
|
|
<tr>
|
|
<th>should have</th>
|
|
<td>The first official specification should define the feature if
|
|
feasible but may defer it until a future release.</td>
|
|
</tr>
|
|
<tr>
|
|
<th>nice to have</th>
|
|
<td>The first official specification may define the feature if time
|
|
permits, however, its priority is low.</td>
|
|
</tr>
|
|
<tr>
|
|
<th>future revision</th>
|
|
<td>It is not intended that the first official specification include
|
|
the feature.</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
|
|
<h2><a id="modality-reqs" name="modality-reqs">1. Modality
|
|
Requirements</a></h2>
|
|
<!-- <p><span class="owner">Owner: Scott McGlashan</span><br /> -->
|
|
<!-- <span class="note">Note: These requirements will be coordinated with the -->
|
|
<!-- Multimodal Interaction Subgroup.</span></p> -->
|
|
|
|
<h3><a id="mod-csmo" name="mod-csmo">1.1 Coordinated, Simultaneous Multimodal
|
|
Output (nice to have)</a></h3>
|
|
|
|
<p>1.1.1 The markup language specifies that content is to be simultaneously
|
|
rendered in multiple modalities (e.g. audio and video) and that output
|
|
rendering is coordinated. For example, graphical output on a cellular
|
|
telephone display is coordinated with spoken output.</p>
|
|
|
|
<h3><a id="mod-usmo" name="mod-usmo">1.2 Uncoordinated, Simultaneous
|
|
Multimodal Output (nice to have)</a></h3>
|
|
|
|
<p>1.2.1 The markup language specifies that content is to be simultaneously
|
|
rendered in multiple modalities (e.g. audio and video) and that output
|
|
rendering is uncoordinated. For example, graphical output on a cellular
|
|
telephone display is uncoordinated with spoken output.</p>
|
|
|
|
<h2><a id="functional-reqs" name="functional-reqs">2. Functional
|
|
Requirements</a></h2>
|
|
|
|
<p>These requirements are intended to ensure that the markup language is
|
|
capable of specifying cooperative dialog behavior characteristic of
|
|
state-of-the-art spoken dialog systems. In general, the voice browser should
|
|
compensate for its own limitations in knowledge and performance compared with
|
|
equivalent human agents; for example, compensate for limitations in speech
|
|
recognition capability by confirming spoken user input when necessary.</p>
|
|
|
|
<h3><a id="funct-vcr" name="funct-vcr">2.1 VCR Controls (must have)</a></h3>
|
|
<!-- <p><span class="owner">Owner: Emily Candell</span><br /> -->
|
|
<!-- <span class="note">Note: Emily reviewed and felt these were -->
|
|
<!-- complete.</span></p> -->
|
|
|
|
<h4><a id="S2_1_1" name="S2_1_1"></a>2.1.1 VoiceXML 3.0 MUST provide a
|
|
mechanism giving an application developer a high-level of control of audio
|
|
and video playback.</h4>
|
|
|
|
<h4><a id="S2_1_1_1" name="S2_1_1_1"></a>2.1.1.1 It MUST be possible to
|
|
invoke media controls by DTMF or speech input (other input mechanisms may be
|
|
supported).</h4>
|
|
|
|
<h4><a id="S2_1_1_2" name="S2_1_1_2"></a>2.1.1.2 Media controls MUST not
|
|
disable normal user input: i.e. input for media control and input for
|
|
application input MUST be possible simultaneously.</h4>
|
|
|
|
<h4><a id="S2_1_1_3" name="S2_1_1_3"></a>2.1.1.3 Input associated with media
|
|
controls MUST be treated in the same way as other inputs. Resolution of best
|
|
match follows standard VoiceXML 2.0 precedence and scoping rules.</h4>
|
|
|
|
<h4><a id="S2_1_1_4" name="S2_1_1_4"></a>2.1.1.4 It MUST be possible for user
|
|
input to be interpreted as seek controls -- fast forward and rewind -- during
|
|
media output playback.</h4>
|
|
|
|
<h4><a id="S2_1_1_5" name="S2_1_1_5"></a>2.1.1.5 The seek control MUST allow
|
|
fast forward and rewind to be specified in time - seconds, milliseconds -
|
|
relative to the current playback position.</h4>
|
|
|
|
<h4><a id="S2_1_1_6" name="S2_1_1_6"></a>2.1.1.6 The seek control MUST allow
|
|
fast forward and rewind to be specified relative to <mark> elements in
|
|
the output.</h4>
|
|
|
|
<h4><a id="S2_1_1_7" name="S2_1_1_7"></a>2.1.1.7 The seek control MUST not
|
|
affect the selection of alternative content: i.e. the same (alternative)
|
|
content MUST be used.</h4>
|
|
|
|
<h4><a id="S2_1_1_8" name="S2_1_1_8"></a>2.1.1.8 It MUST be possible for user
|
|
input to be interpreted as pause/resume during media output playback.</h4>
|
|
|
|
<h4><a id="S2_1_1_9" name="S2_1_1_9"></a>2.1.1.9 It MUST be possible for the
|
|
different inputs to control pause and resume.</h4>
|
|
|
|
<h3><a id="funct-media" name="funct-media">2.2 Media Control (must
|
|
have)</a></h3>
|
|
<!-- <p><span class="owner">Owner: Jeff Hoepfinger</span><br /> -->
|
|
<!-- <span class="note">Note: These requirements were reversed engineered from the -->
|
|
<!-- VoiceXML 3.0 spec editor's draft.</span></p> -->
|
|
|
|
<h4><a id="S2_2_1_" name="S2_2_1_"></a>2.2.1. It MUST be possible to specify
|
|
a media clip begin value, specified in time, as an offset from the start of
|
|
the media clip to begin playback.</h4>
|
|
|
|
<h4><a id="S2_2_2_" name="S2_2_2_"></a>2.2.2. It MUST be possible to specify
|
|
a media clip end value, specified in time, as an offset from the start of the
|
|
media clip to end playback.</h4>
|
|
|
|
<h4><a id="S2_2_3_" name="S2_2_3_"></a>2.2.3. It MUST be possible to specify
|
|
a repeat duration, specified in time, as the amount of time the media file
|
|
will repeat playback.</h4>
|
|
|
|
<h4><a id="S2_2_4_" name="S2_2_4_"></a>2.2.4. It MUST be possible to specify
|
|
a repeat count, specified as a non-negative integer, as the number of times
|
|
the media file will repeat playback.</h4>
|
|
|
|
<h4><a id="S2_2_5_" name="S2_2_5_"></a>2.2.5. It MUST be possible to specify
|
|
a gain , specified as a percentage, as the percent to adjust the amplitude
|
|
playback of the original waveform.</h4>
|
|
|
|
<h4><a id="S2_2_6_" name="S2_2_6_"></a>2.2.6. It MUST be possible to specify
|
|
a speed, specified as a percentage, as the percent to adjust the speed
|
|
playback of the original waveform.</h4>
|
|
|
|
<h3><a id="funct-siv" name="funct-siv">2.3 Speaker Verification (must
|
|
have)</a></h3>
|
|
<!-- <p><span class="owner">Owner: Ken Rehor</span><br /> -->
|
|
<!-- <span class="note">Note: Ken reviewed and thought these were -->
|
|
<!-- complete</span></p> -->
|
|
|
|
<h4><a id="S2_3_1" name="S2_3_1"></a>2.3.1 The markup language MUST provide
|
|
the ability to verify a speaker's identity through a dialog containing both
|
|
acoustic verification and knowledge verification.</h4>
|
|
|
|
<p>The acoustic verification may compare speech samples to an existing model
|
|
(kept in some, possibly external, repository) of that speaker's voice. A
|
|
verification result returns a value indicating whether the acoustic and
|
|
knowledge tests were accepted or rejected. Results for verification and
|
|
results for recognition may be returned simultaneously.</p>
|
|
|
|
<h4><a id="S2_3_1_1" name="S2_3_1_1"></a>2.3.1.1 VoiceXML 3.0 MUST support
|
|
SIV for end-user dialogs</h4>
|
|
|
|
<p>Note: The security administrator's interface is out-of-scope for
|
|
VoiceXML.</p>
|
|
|
|
<h4><a id="S2_3_1_2" name="S2_3_1_2"></a>2.3.1.2 SIV features MUST be
|
|
integrated with VoiceXML 3.0.</h4>
|
|
|
|
<p>SIV features such as enrollment and verification are voice dialogs. SIV
|
|
must be compatible and complementary with other VoiceXML 3.0 dialog
|
|
constructs such as speech recognition.</p>
|
|
|
|
<h4><a id="S2_3_1_3" name="S2_3_1_3"></a>2.3.1.3 VoiceXML 3.0 MUST be able to
|
|
be used without SIV.</h4>
|
|
|
|
<p>SIV features must be part of VoiceXML 3.0 but may not be needed in all
|
|
application scenarios or implementations. Not all voice dialogs need SIV.</p>
|
|
|
|
<h4><a id="S2_3_1_4" name="S2_3_1_4"></a>2.3.1.4 SIV MUST be able to be used
|
|
without other input modalities.</h4>
|
|
|
|
<p>Some SIV processing techniques operate without using any ASR.</p>
|
|
|
|
<h4><a id="S2_3_1_5" name="S2_3_1_5"></a>2.3.1.5 SIV features MUST be able to
|
|
operate in multi-factor environments.</h4>
|
|
|
|
<p>Some applications require the use of SIV along with other means of
|
|
authentication: biometric (e.g. fingerprint, hand, retina, DNA) or
|
|
non-biometric (e.g. caller ID, geolocation, personal knowledge, etc.).</p>
|
|
|
|
<h4><a id="S2_3_1_6" name="S2_3_1_6"></a>2.3.1.6 SIV-specific events MUST be
|
|
defined.</h4>
|
|
|
|
<p>SIV processing engines and network protocols (e.g. MRCP) generate events
|
|
related to their operation and use. These events must be made available in a
|
|
manner consistent with other VoiceXML events. Event naming structure must
|
|
allow for vendor-specific and application-specific events.</p>
|
|
|
|
<h4><a id="S2_3_1_7" name="S2_3_1_7"></a>2.3.1.7 SIV-specific properties MUST
|
|
be defined.</h4>
|
|
|
|
<p>These properties are provided to configure the operation of the SIV
|
|
processing engines (analogous to "Generic Speech Recognition Properties"
|
|
defined in <a href="http://www.w3.org/TR/voicexml20/#dml6.3.2">VoiceXML 2.0
|
|
Section 6.3.2</a>).</p>
|
|
|
|
<h4><a id="S2_3_1_8" name="S2_3_1_8"></a>2.3.1.8 The SIV result MUST be
|
|
available in the result structure used by the host environment (e.g. VoiceXML
|
|
3.0, MMI).</h4>
|
|
|
|
<p>Note that this does not require EMMA in all cases, such as non-VoiceXML
|
|
3.0 environments. This also does not specify the version of EMMA.</p>
|
|
|
|
<h4><a id="S2_3_1_8_1" name="S2_3_1_8_1"></a>2.3.1.8.1 VoiceXML 3.0 SIV
|
|
result MUST be representable in EMMA.</h4>
|
|
|
|
<p>VoiceXML 3.0 must specify the format of the result structure and version
|
|
of EMMA.</p>
|
|
|
|
<h4><a id="S2_3_1_9" name="S2_3_1_9"></a>2.3.1.9 SIV syntax SHOULD adhere to
|
|
the W3C guidelines for security handling.</h4>
|
|
|
|
<p>This includes:</p>
|
|
<ul>
|
|
<li>XML encryption</li>
|
|
<li>XML signature processing,</li>
|
|
<li>possibly TLS or non-XML security, such as the NIST SP 800-63 guideline
|
|
for remote authentication.</li>
|
|
</ul>
|
|
|
|
<p>The following security aspects are out-of-charter for VoiceXML:<br />
|
|
</p>
|
|
<ul>
|
|
<li>The security administrator's interface</li>
|
|
<li>Whether security aspects may be modified by the security
|
|
administrators</li>
|
|
<li>Requirements for securing the SIV data</li>
|
|
</ul>
|
|
|
|
<h4><a id="S2_3_1_11" name="S2_3_1_11"></a>2.3.1.11 SIV features MUST support
|
|
enrollment.</h4>
|
|
|
|
<p>Enrollment is the process of collecting voice samples from a person and
|
|
the subsequent generation and storage of voice reference models associated
|
|
with that person.</p>
|
|
|
|
<h4><a id="S2_3_1_12" name="S2_3_1_12"></a>2.3.1.12 SIV features MUST support
|
|
verification.</h4>
|
|
|
|
<p>Verification is the process of comparing an utterance against a single
|
|
reference model based on a single claimed identity (e.g., user ID, account
|
|
number). A verification result includes both a score and a decision.</p>
|
|
|
|
<h4><a id="S2_3_1_13" name="S2_3_1_13"></a>2.3.1.13 SIV features MUST support
|
|
identification.</h4>
|
|
|
|
<p>Identification is verification with multiple identity claims. An
|
|
identification result includes both the verification results for all of the
|
|
individual identity claims, and the identifier of a single reference model
|
|
that matches the input utterance best.</p>
|
|
|
|
<h4><a id="S2_3_1_14" name="S2_3_1_14"></a>2.3.1.14 SIV features SHOULD
|
|
support supervised adaptation.</h4>
|
|
|
|
<p>The application should have control over whether a voice model is updated
|
|
or modified based on the results of a verification.<br />
|
|
</p>
|
|
|
|
<h4><a id="S2_3_1_15" name="S2_3_1_15"></a>2.3.1.15 SIV features MUST support
|
|
concurrent SIV processing.</h4>
|
|
|
|
<p>An application developer must be able to specify at the individual turn
|
|
level that one or more of the following types of processing need to be
|
|
performed concurrently:</p>
|
|
<ul>
|
|
<li>ASR</li>
|
|
<li>Audio recording</li>
|
|
<li>Buffering (SIV)</li>
|
|
<li>Authentication (SIV)</li>
|
|
<li>Enrollment (SIV)</li>
|
|
<li>Adaptation (SIV)</li>
|
|
</ul>
|
|
Note: "Concurrent" means at the dialog specification level. A platform may
|
|
choose to implement these functions sequentially.
|
|
|
|
<h4><a id="S2_3_1_15_1" name="S2_3_1_15_1"></a>2.3.1.15.1 SIV features SHOULD
|
|
support other concurrent audio processing.</h4>
|
|
|
|
<p>Concurrent processing of other forms of audio processing (e.g., channel
|
|
detection, gender detection) should also be permitted but remain optional.</p>
|
|
|
|
<h4><a id="S2_3_1_16" name="S2_3_1_16"></a>2.3.1.16 SIV features MUST be able
|
|
to accept text from the application for presentation to the user.</h4>
|
|
|
|
<p>Text-prompted SIV applications require prompts to match the expected
|
|
response. The application is responsible for the content of the dialog but
|
|
VoiceXML is responsible for the presentation.</p>
|
|
|
|
<h4><a id="S2_3_1_16_1" name="S2_3_1_16_1"></a>2.3.1.16.1 SIV SHOULD be
|
|
architecturally agnostic</h4>
|
|
|
|
<p>Many different SIV processing technologies exist. The VoiceXML 3.0 SIV
|
|
architecture should avoid dependencies upon specific engine technologies.</p>
|
|
|
|
<h3><a id="funct-event" name="funct-event">2.4 External Event handling while
|
|
a dialog is in progress (must have)</a></h3>
|
|
<!-- <p><span class="owner">Owner: Jim Barnett</span><br /> -->
|
|
<!-- <span class="note">Note: Jim reviewed and felt these were complete</span></p> -->
|
|
|
|
<h4><a id="S2_4_1" name="S2_4_1"></a>2.4.1 It MUST be possible for external
|
|
entities to inject events into running dialogs. The dialog author MUST be
|
|
able to control when such events are processed and what actions are taken
|
|
when they are processed.</h4>
|
|
|
|
<h4><a id="S2_4_2" name="S2_4_2"></a>2.4.2 Among the possible results of
|
|
processing such events MUST be pausing, resuming, and terminating the dialog.
|
|
The VoiceXML 3.0 specification MAY define default handlers for certain such
|
|
external events.</h4>
|
|
|
|
<h4><a id="S2_4_3" name="S2_4_3"></a>2.4.3 It MUST be possible for running
|
|
dialogs to send events into the <a
|
|
href="http://www.w3.org/TR/mmi-arch/">Multimodal Interaction
|
|
Framework.</a></h4>
|
|
|
|
<h3><a id="funct-pls" name="funct-pls"></a>2.5 <a
|
|
href="http://www.w3.org/TR/pronunciation-lexicon/">Pronunciation Lexicon
|
|
Specification (must have)</a></h3>
|
|
<!-- <p><span class="owner">Owner: Jeff Hoepfinger</span><br /> -->
|
|
<!-- <span class="note">Note: There was some discussion in Orlando F2F on being -->
|
|
<!-- able to define lexicons using normal scoping rules, but there was no -->
|
|
<!-- agreement reached</span> </p> -->
|
|
|
|
<h4><a id="S2_5_1" name="S2_5_1"></a>2.5.1 The author MUST be able to define
|
|
lexicons that span an entire VoiceXML application.</h4>
|
|
|
|
<h3><a id="funct-emma" name="funct-emma"></a>2.6 <a
|
|
href="http://www.w3.org/TR/emma/">EMMA Specification (must have)</a></h3>
|
|
|
|
<h4><a id="S2_6_1_" name="S2_6_1_"></a>2.6.1. The application author MUST be
|
|
able to specify the preferred format of the input result within VoiceXML. If
|
|
not specified, the default format is EMMA.</h4>
|
|
|
|
<h4><a id="S2_6_2" name="S2_6_2"></a>2.6.2 All available semantic information
|
|
(ie. content that could have meaning) from the input MUST be accessible to
|
|
the application author. This result MUST be navigable by the application
|
|
author.</h4>
|
|
|
|
<p>The exact form of navigation will depend on the format and decisions
|
|
around the preferred data model made by the working group. If the result is a
|
|
string, string processing functions are expected to be available. If the
|
|
result is an XML document, DOM or E4X-like functions are expected to be
|
|
supported.</p>
|
|
|
|
<h4><a id="S2_6_3_" name="S2_6_3_"></a>2.6.3. VoiceXML 3 (or profiles) MUST
|
|
describe how the default result format is mapped into the application's data
|
|
model.</h4>
|
|
|
|
<p>VoiceXML 3 will declare one or more mandatory result formats.</p>
|
|
|
|
<h4><a id="S2_6_4" name="S2_6_4"></a>2.6.4 The application author SHOULD be
|
|
able to specify specific result content not to be logged.</h4>
|
|
|
|
<p>This will allow the author to prevent logging of confidential or sensitive
|
|
information.</p>
|
|
|
|
<h3><a id="funct-upload" name="funct-upload">2.7 Synchronous Upload of
|
|
Recordings (must have)</a></h3>
|
|
<!-- <p><span class="owner">Owner: Emily Candell</span><br /> -->
|
|
<!-- <span class="note">Note: Emily reviewed and felt these were -->
|
|
<!-- complete</span></p> -->
|
|
|
|
<h4><a id="S2_7_1" name="S2_7_1"></a>2.7.1 VoiceXML 3.0 MUST enable
|
|
synchronous uploads of recordings while the recording is in progress</h4>
|
|
|
|
<h4><a id="S2_7_1_1" name="S2_7_1_1"></a>2.7.1.1 It MUST be possible to
|
|
specify the upload destination of the recording in the <record>
|
|
element</h4>
|
|
|
|
<h4><a id="S2_7_1_2" name="S2_7_1_2"></a>2.7.1.2 The upload destination MUST
|
|
be an HTTP URI</h4>
|
|
|
|
<h4><a id="S2_7_1_3" name="S2_7_1_3"></a>2.7.1.3 The application developer
|
|
MAY specify HTTP PUT or HTTP POST as the recording upload method</h4>
|
|
|
|
<h4><a id="S2_7_1_4" name="S2_7_1_4"></a>2.7.1.4 This feature MUST be
|
|
backward compatible with VoiceXML 2.0/2.1 record functionality</h4>
|
|
|
|
<h3><a id="funct-speed" name="funct-speed">2.8 Speed Control (must
|
|
have)</a></h3>
|
|
<!-- <p><span class="owner">Owner: Emily Candell</span><br /> -->
|
|
<!-- <span class="note">Note: Emily reviewed and felt these were -->
|
|
<!-- complete</span></p> -->
|
|
|
|
<h4><a id="S2_8_1" name="S2_8_1"></a>2.8.1 It MUST be possible for user input
|
|
to change the speed of media output playback.</h4>
|
|
|
|
<h4><a id="S2_8_2" name="S2_8_2"></a>2.8.2 It MUST be possible to map the
|
|
values for speed control to the rate attribute of prosody</h4>
|
|
|
|
<h4><a id="S2_8_3" name="S2_8_3"></a>2.8.3 Values for speed controls MAY be
|
|
specified as properties which follow the standard VoiceXML scoping model.
|
|
Default values are specified at session scope. Values specified on the
|
|
control element take priority over inherited properties.</h4>
|
|
|
|
<h3><a id="funct-volume" name="funct-volume">2.9 Volume Control (must
|
|
have)</a></h3>
|
|
<!-- <p><span class="owner">Owner: Emily Candell</span><br /> -->
|
|
<!-- <span class="note">Note: Emily reviewed and felt these were -->
|
|
<!-- complete</span></p> -->
|
|
|
|
<h4><a id="S2_9_1" name="S2_9_1"></a>2.9.1 It MUST be possible for user input
|
|
to change the volume of media output playback.</h4>
|
|
|
|
<h4><a id="S2_9_1_1" name="S2_9_1_1"></a>2.9.1.1 Values for volume controls
|
|
MAY be specified as properties which follow the standard VoiceXML scoping
|
|
model. Default values are specified at session scope. Values specified on the
|
|
control element take priority over inherited properties.</h4>
|
|
|
|
<h4><a id="S2_9_1_2" name="S2_9_1_2"></a>2.9.1.2 It MUST be possible to map
|
|
the values for volume control to the volume attribute of prosody in SSML.</h4>
|
|
|
|
<h3><a id="funct-record" name="funct-record">2.10 Media Recording (must
|
|
have)</a></h3>
|
|
<!-- <p><span class="owner">Owner: Ken Rehor</span></p> -->
|
|
|
|
<h4><a id="S2_10_1" name="S2_10_1"></a>2.10.1 Recording Modes</h4>
|
|
|
|
<p>Form item recording mode (Requirements section 2.10.1.1 and 2.10.1.2)
|
|
captures media from the caller (only) during the collect phase of a dialog.
|
|
Partial- and Whole-Session recording captures media from the caller, system,
|
|
and/or called party (in the cases of a transferred endpoint) in a
|
|
multichannel or single (mixed) channel recording. Duration of these
|
|
recordings depends on the type.</p>
|
|
|
|
<h4><a id="S2_10_1_1" name="S2_10_1_1"></a>2.10.1.1 Form Item equivalent
|
|
(e.g. VoiceXML 2.0 <record>)</h4>
|
|
<!-- <span class="note">Note: Audio endpointing controls are defined in Section
|
|
2.10.3.</span> -->
|
|
|
|
<h4><a id="S2_10_1_1_1" name="S2_10_1_1_1"></a>2.10.1.1.1 VoiceXML 3.0 MUST
|
|
be able to record input from a user.</h4>
|
|
|
|
<h4><a id="S2_10_1_2" name="S2_10_1_2"></a>2.10.1.2 Utterance Recording</h4>
|
|
<!-- <span class="note">Note: Should this be generalized to handle other media -->
|
|
<!-- like video?<br /> -->
|
|
<!-- Note: Should this be supported in the case of DTMF-only?</span> -->
|
|
|
|
<p>Utterance recording mode is recording that occurs during an ASR or SIV
|
|
form item. The audio may be endpointed, usually by the speech engine.</p>
|
|
|
|
<h4><a id="S2_10_1_2_1" name="S2_10_1_2_1"></a>2.10.1.2.1 VoiceXML 3.0 MUST
|
|
support recording of a user's utterance during an form item
|
|
[recordutterance]</h4>
|
|
|
|
<h4><a id="S2_10_1_2_2" name="S2_10_1_2_2"></a>2.10.1.2.2 VoiceXML 3.0 MUST
|
|
support the control of utterance recording via a <property>.</h4>
|
|
|
|
<h4><a id="S2_10_1_2_3" name="S2_10_1_2_3"></a>2.10.1.2.3 VoiceXML 3.0 MUST
|
|
support the control of utterance recording via an attribute on input
|
|
items.</h4>
|
|
|
|
<h4><a id="S2_10_1_3" name="S2_10_1_3"></a>2.10.1.3 Session Recording</h4>
|
|
|
|
<p>Session recording begins with a start command. It continues until:</p>
|
|
<ul>
|
|
<li>a pause command; a resume command continues recording;</li>
|
|
<li>a stop command;</li>
|
|
<li>the end of the VoiceXML session;</li>
|
|
<li>an error occurs.</li>
|
|
</ul>
|
|
|
|
<p>Recording configuration and parameter requirements are defined in Section
|
|
2.10.2.</p>
|
|
|
|
<h4><a id="S2_10_1_3_1" name="S2_10_1_3_1"></a>2.10.1.3.1 VoiceXML 3.0 MUST
|
|
be able to record part of a VoiceXML session.</h4>
|
|
|
|
<h4><a id="S2_10_1_3_2" name="S2_10_1_3_2"></a>2.10.1.3.2 VoiceXML 3.0 MUST
|
|
be able to record an entire dialog.</h4>
|
|
|
|
<h4><a id="S2_10_1_4" name="S2_10_1_4"></a>2.10.1.4 Restricted Session
|
|
Recording</h4>
|
|
|
|
<p>Restricted session recording begins with a start command and continues
|
|
until:</p>
|
|
<ul>
|
|
<li>the end of the session;</li>
|
|
<li>an error occurs.</li>
|
|
</ul>
|
|
|
|
<p>See Table 1 for applicable controls.</p>
|
|
|
|
<h4><a id="S2_10_1_5" name="S2_10_1_5"></a>2.10.1.5 Multiple instances</h4>
|
|
|
|
<h4><a id="S2_10_1_5_1" name="S2_10_1_5_1"></a>2.10.1.5.1 VoiceXML 3.0 MUST
|
|
be able to support multiple simultaneous recordings of different types during
|
|
a call.</h4>
|
|
|
|
<h4><a id="S2_10_2_" name="S2_10_2_"></a>2.10.2. Recording Configuration and
|
|
Parameters</h4>
|
|
|
|
<p>This matrix specifies which features apply to which recording types.</p>
|
|
|
|
<table style="text-align: left; width: 722px; height: 166px;" border="1"
|
|
cellpadding="1" cellspacing="0">
|
|
<tbody>
|
|
<tr>
|
|
<td>Feature Requirement /<br />
|
|
Recording type</td>
|
|
<td>Dialog</td>
|
|
<td>Utterance</td>
|
|
<td>Session</td>
|
|
<td>Restricted<br />
|
|
Session</td>
|
|
</tr>
|
|
<tr>
|
|
<td>2.10.2.1 Recording starts when caller begins speaking</td>
|
|
<td>Y</td>
|
|
<td>Y</td>
|
|
<td>N</td>
|
|
<td>N</td>
|
|
</tr>
|
|
<tr>
|
|
<td>2.10.2.2 Initial silence interval cancels recording</td>
|
|
<td>Y</td>
|
|
<td>N</td>
|
|
<td>N</td>
|
|
<td>N</td>
|
|
</tr>
|
|
<tr>
|
|
<td>2.10.2.3 Final silence ends recording</td>
|
|
<td>Y</td>
|
|
<td>N</td>
|
|
<td>N</td>
|
|
<td>N</td>
|
|
</tr>
|
|
<tr>
|
|
<td>2.10.2.4 Maximum recording time</td>
|
|
<td>Y</td>
|
|
<td>N</td>
|
|
<td>N</td>
|
|
<td>N</td>
|
|
</tr>
|
|
<tr>
|
|
<td>2.10.2.5 Terminate recording with DTMF input</td>
|
|
<td>Y</td>
|
|
<td>N</td>
|
|
<td>N</td>
|
|
<td>N</td>
|
|
</tr>
|
|
<tr>
|
|
<td>2.10.2.6 Grammar control: modal operation</td>
|
|
<td>Y</td>
|
|
<td>N</td>
|
|
<td>N</td>
|
|
<td>N</td>
|
|
</tr>
|
|
<tr>
|
|
<td>2.10.2.7 Media format</td>
|
|
<td>Y</td>
|
|
<td>Y</td>
|
|
<td>Y</td>
|
|
<td>Y</td>
|
|
</tr>
|
|
<tr>
|
|
<td>2.10.2.8 Recording indicator</td>
|
|
<td>N</td>
|
|
<td>N</td>
|
|
<td>Y</td>
|
|
<td>N</td>
|
|
</tr>
|
|
<tr>
|
|
<td>2.10.2.9 Channel assignment</td>
|
|
<td>N</td>
|
|
<td>N</td>
|
|
<td>Y</td>
|
|
<td>Y</td>
|
|
</tr>
|
|
<tr>
|
|
<td>2.10.2.10 Channel groups</td>
|
|
<td>N</td>
|
|
<td>N</td>
|
|
<td>Y</td>
|
|
<td>Y</td>
|
|
</tr>
|
|
<tr>
|
|
<td>2.10.2.11 Buffer control</td>
|
|
<td>Y</td>
|
|
<td>Y</td>
|
|
<td>N</td>
|
|
<td>N</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
|
|
<p>Table 1: Recording Configuration and Parameter Application</p>
|
|
|
|
<p>(Attributes from VoiceXML 2.0 are indicated in brackets [].)</p>
|
|
|
|
<h4><a id="S2_10_2_1" name="S2_10_2_1"></a>2.10.2.1 Recording starts when
|
|
caller begins speaking</h4>
|
|
|
|
<p>VoiceXML 3.0 must support dynamic start-of-recording based on when a
|
|
caller starts to speak</p>
|
|
|
|
<p>Voice Activity Detection used to determine when to initiate
|
|
recording. This feature can be disabled.</p>
|
|
|
|
<h4><a id="S2_10_2_2" name="S2_10_2_2"></a>2.10.2.2 Initial silence interval
|
|
cancels recording</h4>
|
|
|
|
<p>VoiceXML 3.0 must support specification of an interval of silence at the
|
|
beginning of the recording cycle to terminate recording [timeout].</p>
|
|
|
|
<p>A noinput event will be thrown if no audio is collected.</p>
|
|
|
|
<h4><a id="S2_10_2_3" name="S2_10_2_3"></a>2.10.2.3 Final silence ends
|
|
recording</h4>
|
|
|
|
<p>VoiceXML 3.0 must support specification of an interval of silence that
|
|
indicates end of speech to terminate recording [finalsilence]</p>
|
|
|
|
<p>Voice Activity Detection used to determine when to stop recording. This
|
|
feature can be disabled.</p>
|
|
|
|
<p>Finalsilence interval may be used to specify the amount of silent audio to
|
|
be removed from the recording.</p>
|
|
|
|
<h4><a id="S2_10_2_4" name="S2_10_2_4"></a>2.10.2.4 Maximum recording
|
|
time</h4>
|
|
|
|
<p>VoiceXML 3.0 must support specification of the maximum allowable recording
|
|
time [maxtime].</p>
|
|
|
|
<h4><a id="S2_10_2_5" name="S2_10_2_5"></a>2.10.2.5 Terminate recording via
|
|
DTMF input</h4>
|
|
|
|
<p>VoiceXML 3.0 must provide a mechanism to control DTMF termination of an
|
|
active record [dtmfterm]</p>
|
|
|
|
<h4><a id="S2_10_2_6" name="S2_10_2_6"></a>2.10.2.6 Grammar control: Modal
|
|
operation</h4>
|
|
|
|
<h4><a id="S2_10_2_6_1" name="S2_10_2_6_1"></a>2.10.2.6.1 VoiceXML 3.0 MUST
|
|
provide a mechanism to control whether non-local DTMF grammars are active
|
|
during recording [modal]</h4>
|
|
|
|
<h4><a id="S2_10_2_6_2" name="S2_10_2_6_2"></a>2.10.2.6.2 VoiceXML 3.0 MUST
|
|
provide a mechanism to control whether non-local speech recognition grammars
|
|
are active during recording [modal]</h4>
|
|
|
|
<h4><a id="S2_10_2_7" name="S2_10_2_7"></a>2.10.2.7 Media format</h4>
|
|
|
|
<p>VoiceXML 3.0 must enable specification of the media type of the recording
|
|
[type]</p>
|
|
|
|
<h4><a id="S2_10_2_8" name="S2_10_2_8"></a>2.10.2.8 Recording Indicator</h4>
|
|
|
|
<h4><a id="S2_10_2_8_1" name="S2_10_2_8_1"></a>2.10.2.8.1 VoiceXML 3.0 MUST
|
|
optionally support playing a beep tone to the user before recording begins.
|
|
[beep]</h4>
|
|
|
|
<h4><a id="S2_10_2_8_2" name="S2_10_2_8_2"></a>2.10.2.8.2 VoiceXML 3.0 MUST
|
|
optionally support displaying a visual indication to the user before
|
|
recording begins.</h4>
|
|
|
|
<h4><a id="S2_10_2_8_3" name="S2_10_2_8_3"></a>2.10.2.8.3 VoiceXML 3.0 MUST
|
|
optionally support displaying a visual indication to the user during
|
|
recording.</h4>
|
|
|
|
<p>Use cases:</p>
|
|
<ol>
|
|
<li>Display a countdown timer to indicate when recording will begin (could
|
|
be accomplished by playing a file immediately before the record
|
|
function)</li>
|
|
<li>Display an indicator while recording is active (e.g. full screen,
|
|
partial screen, icon, etc.)</li>
|
|
</ol>
|
|
|
|
<h4><a id="S2_10_2_9" name="S2_10_2_9"></a>2.10.2.9 Channel Assignment</h4>
|
|
|
|
<h4><a id="S2_10_2_9_1" name="S2_10_2_9_1"></a>2.10.2.9.1 VoiceXML 3.0 MUST
|
|
be able to record and store each media path independently.</h4>
|
|
|
|
<h4><a id="S2_10_2_9_2" name="S2_10_2_9_2"></a>2.10.2.9.2 VoiceXML 3.0 MUST
|
|
enable each media path to be recorded in the same multi-channel file.</h4>
|
|
|
|
<h4><a id="S2_10_2_9_3" name="S2_10_2_9_3"></a>2.10.2.9.3 VoiceXML 3.0 MUST
|
|
enable each media path to be recorded into separate files.</h4>
|
|
|
|
<h4><a id="S2_10_2_9_4" name="S2_10_2_9_4"></a>2.10.2.9.4 VoiceXML 3.0 MAY be
|
|
able to mix all voice paths into a single recording channel.</h4>
|
|
|
|
<h4><a id="S2_10_2_10" name="S2_10_2_10"></a>2.10.2.10 Channel Groups</h4>
|
|
|
|
<h4><a id="S2_10_2_10_1" name="S2_10_2_10_1"></a>2.10.2.10.1 One or more
|
|
channels within the same session MUST be controllable as a group.</h4>
|
|
|
|
<p>These groups can be used to simultaneously apply other recording controls
|
|
to more than one media channel (e.g. mute two channels simultaneously).
|
|
Applies whether channels are in same file or in separate files (implies
|
|
concept of group of channels *not* part of the same file).</p>
|
|
|
|
<p>A command to "start recording" must specify the details for that recording
|
|
session:</p>
|
|
<ul>
|
|
<li>media type</li>
|
|
<li>number of channels and channel assignment (e.g. channel x, group y
|
|
represented as a variable of the format x.y)</li>
|
|
<li>channel assignment</li>
|
|
<li>(specific parameters to be determined)</li>
|
|
</ul>
|
|
|
|
<h4><a id="S2_10_2_11" name="S2_10_2_11"></a>2.10.2.11 Buffer Controls</h4>
|
|
|
|
<h4><a id="S2_10_2_11_1" name="S2_10_2_11_1"></a>2.10.2.11.1 VoiceXML 3.0
|
|
MUST provide a mechanism to enable additional recording time before the start
|
|
of speaking ("pre" buffer)</h4>
|
|
|
|
<h4><a id="S2_10_2_11_2" name="S2_10_2_11_2"></a>2.10.2.11.2 VoiceXML 3.0
|
|
MUST provide a mechanism to enable specification of additional recording time
|
|
after the end of speaking ("post" buffer).</h4>
|
|
|
|
<h4><a id="S2_10_2_11_3" name="S2_10_2_11_3"></a>2.10.2.11.3 VoiceXML 3.0 MAY
|
|
provide a mechanism to enable specification of the pre and post recording
|
|
duration.</h4>
|
|
|
|
<p>The duration provided by the platform is up to the amount of audio the
|
|
application requested. If that amount of audio is not available, the platform
|
|
is required to provide the amount of audio that is available.</p>
|
|
<!-- <span class="note">Note: Should this feature be under developer or platform -->
|
|
<!-- control?</span> -->
|
|
|
|
<h4><a id="S2_10_3_1" name="S2_10_3_1"></a>2.10.3.1 Audio Muting</h4>
|
|
|
|
<h4><a id="S2_10_3_1_1" name="S2_10_3_1_1"></a>2.10.3.1.1 VoiceXML 3.0 MUST
|
|
enable muting of an audio recording at any time for a specified length of
|
|
time or until otherwise indicated to un-mute.</h4>
|
|
|
|
<h4><a id="S2_10_3_1_2" name="S2_10_3_1_2"></a>2.10.3.1.2 Audio to insert
|
|
while muting can optionally be specified via a URI.</h4>
|
|
<!-- <span class="note">Note: Issues arise if inserted audio is shorter than mute -->
|
|
<!-- duration.</span> -->
|
|
|
|
<h4><a id="S2_10_3_1_3" name="S2_10_3_1_3"></a>2.10.3.1.3 Optionally record
|
|
the mute duration either in the recorded data or in associated meta data
|
|
(e.g. a mark (out of band) or via a log channel or some other method)</h4>
|
|
<!-- <span class="note">Note: Is it a breach of security to keep track of the -->
|
|
<!-- mute/blank/pause duration?</span> -->
|
|
|
|
<h4><a id="S2_10_3_1_5" name="S2_10_3_1_5"></a>2.10.3.1.5 Mute MUST be
|
|
controllable for each channel independently.</h4>
|
|
|
|
<h4><a id="S2_10_3_1_6" name="S2_10_3_1_6"></a>2.10.3.1.6 Mute MUST be
|
|
controllable for all channels in a group.</h4>
|
|
|
|
<h4><a id="S2_10_3_2" name="S2_10_3_2"></a>2.10.3.2 Blanking</h4>
|
|
|
|
<h4><a id="S2_10_3_2_1" name="S2_10_3_2_1"></a>2.10.3.2.1 VoiceXML 3.0 MUST
|
|
enable blanking of a video recording at any time for a specified length of
|
|
time or until otherwise indicated to un-blank.</h4>
|
|
|
|
<h4><a id="S2_10_3_2_2" name="S2_10_3_2_2"></a>2.10.3.2.2 A video or still
|
|
image to replace video stream while blanking can be optionally specified via
|
|
a URI.</h4>
|
|
|
|
<h4><a id="S2_10_3_2_2_1" name="S2_10_3_2_2_1"></a>2.10.3.2.2.1 An error will
|
|
be thrown in the case of platforms that cannot handle the media type referred
|
|
to by the URI.</h4>
|
|
|
|
<h4><a id="S2_10_3_2_3" name="S2_10_3_2_3"></a>2.10.3.2.3 The media inserted
|
|
by default MUST be the same length as the blank duration.</h4>
|
|
|
|
<p>If video, repeat until un-blank.</p>
|
|
|
|
<h4><a id="S2_10_3_2_4" name="S2_10_3_2_4"></a>2.10.3.2.4 The video being
|
|
inserted MUST optionally be specified to span a length less than the actual
|
|
mute/un-mute duration.</h4>
|
|
|
|
<h4><a id="S2_10_3_2_5" name="S2_10_3_2_5"></a>2.10.3.2.5 Blanking MUST be
|
|
controllable separately from other media channels.</h4>
|
|
|
|
<h4><a id="S2_10_3_3" name="S2_10_3_3"></a>2.10.3.3 Grouped Blanking and
|
|
Muting</h4>
|
|
|
|
<h4><a id="S2_10_3_3_1" name="S2_10_3_3_1"></a>2.10.3.3.1 It MUST be possible
|
|
to simultaneously blank video and mute audio that are in the same media
|
|
group.</h4>
|
|
|
|
<h4><a id="S2_10_3_4" name="S2_10_3_4"></a>2.10.3.4 Pause and Resume</h4>
|
|
|
|
<h4><a id="S2_10_3_4_1" name="S2_10_3_4_1"></a>2.10.3.4.1 VoiceXML 3.0 MUST
|
|
enable a recording to be paused until explicitly restarted.</h4>
|
|
|
|
<h4><a id="S2_10_3_4_2" name="S2_10_3_4_2"></a>2.10.3.4.2 VoiceXML 3.0 MUST
|
|
enable an indicator to be optionally specified in the file to denote that
|
|
recording was paused, then resumed.</h4>
|
|
|
|
<h4><a id="S2_10_3_4_3" name="S2_10_3_4_3"></a>2.10.3.4.3 VoiceXML 3.0 MAY
|
|
optionally enable the notation of the pause duration either in the recorded
|
|
data or in associated meta data (e.g. a mark (out of band) or via a log
|
|
channel or some other method)</h4>
|
|
|
|
<p>The mechanism is platform-specific.</p>
|
|
|
|
<h4><a id="S2_10_3_5" name="S2_10_3_5"></a>2.10.3.5 Arbitrary Start, Stop,
|
|
Restart/append</h4>
|
|
|
|
<h4><a id="S2_10_3_5_1" name="S2_10_3_5_1"></a>2.10.3.5.1 VoiceXML 3.0 MUST
|
|
be able to start a recording at any time.</h4>
|
|
|
|
<h4><a id="S2_10_3_5_2" name="S2_10_3_5_2"></a>2.10.3.5.2 VoiceXML 3.0 MUST
|
|
be able to stop an active recording at any time.</h4>
|
|
|
|
<h4><a id="S2_10_3_5_3" name="S2_10_3_5_3"></a>2.10.3.5.3 VoiceXML 3.0 MUST
|
|
be able to restart / append to a previously active recording at any time.
|
|
(during the session via reference to the recording)</h4>
|
|
|
|
<h4><a id="S2_10_3_5_4" name="S2_10_3_5_4"></a>2.10.3.5.4 optionally record
|
|
the pause duration either in the recorded data or in associated meta data
|
|
(e.g. a mark (out of band) or via a log channel or some other method)</h4>
|
|
|
|
<p>Recording is available for playback or upload once a recording is
|
|
'stopped'.</p>
|
|
|
|
<p>If a recording was stopped and uploaded, then later appended, the
|
|
application will need to keep track of when to upload the new version.</p>
|
|
|
|
<h4><a id="S2_10_4_" name="S2_10_4_"></a>2.10.4. Media types</h4>
|
|
|
|
<h4><a id="S2_10_4_1" name="S2_10_4_1"></a>2.10.4.1 Audio recording</h4>
|
|
|
|
<h4><a id="S2_10_4_1_1" name="S2_10_4_1_1"></a>2.10.4.1.1 VoiceXML 3.0 MUST
|
|
be able to record an incoming audio stream.</h4>
|
|
|
|
<h4><a id="S2_10_4_2" name="S2_10_4_2"></a>2.10.4.2 Video recording</h4>
|
|
|
|
<h4><a id="S2_10_4_2_1" name="S2_10_4_2_1"></a>2.10.4.2.1 VoiceXML 3.0 MUST
|
|
support recording of an incoming video stream.</h4>
|
|
|
|
<h4><a id="S2_10_4_2_2" name="S2_10_4_2_2"></a>2.10.4.2.2 VoiceXML 3.0 MUST
|
|
support recording of an incoming video stream with synchronized audio.</h4>
|
|
|
|
<h4><a id="S2_10_4_3" name="S2_10_4_3"></a>2.10.4.3 Media Type
|
|
specification</h4>
|
|
|
|
<h4><a id="S2_10_4_3_1" name="S2_10_4_3_1"></a>2.10.4.3.1 VoiceXML 3.0 MUST
|
|
be able to set the format of the media type of the recording according to
|
|
IETF RFC 4288 [RFC4288].</h4>
|
|
|
|
<h4><a id="S2_10_4_4" name="S2_10_4_4"></a>2.10.4.4 Media formats and
|
|
codecs</h4>
|
|
|
|
<h4><a id="S2_10_4_4_1" name="S2_10_4_4_1"></a>2.10.4.4.1 VoiceXML 3.0 MUST
|
|
support specification of the media format and corresponding codec.</h4>
|
|
|
|
<h4><a id="S2_10_4_5" name="S2_10_4_5"></a>2.10.4.5 Platform support of media
|
|
types</h4>
|
|
|
|
<h4><a id="S2_10_4_5_1" name="S2_10_4_5_1"></a>2.10.4.5.1 VoiceXML 3.0
|
|
platforms MUST support all media types that are indicated as required by the
|
|
VoiceXML 3.0 Recommendation (types to be determined).</h4>
|
|
|
|
<p>Note: This does not mean all possible media types are supported on all
|
|
platforms.</p>
|
|
|
|
<h4><a id="S2_10_5_" name="S2_10_5_"></a>2.10.5. Media Processing</h4>
|
|
|
|
<h4><a id="S2_10_5_1" name="S2_10_5_1"></a>2.10.5.1 Media processing MAY
|
|
occur either in real-time or as a post-processing function.</h4>
|
|
|
|
<p>DEFAULT: specific to each processing type</p>
|
|
|
|
<h4><a id="S2_10_5_2" name="S2_10_5_2"></a>2.10.5.2 Tone Clamping</h4>
|
|
|
|
<p>Use cases:</p>
|
|
<ol>
|
|
<li>Voicemail terminated with DTMF.</li>
|
|
<li>Whole-session recording where DTMF input must be removed for privacy or
|
|
other reasons.</li>
|
|
</ol>
|
|
|
|
<h4><a id="S2_10_5_2_1" name="S2_10_5_2_1"></a>2.10.5.2.1 VoiceXML 3.0 MAY
|
|
optionally provide a means to specify if DTMF tones are to be removed from
|
|
the recording.</h4>
|
|
|
|
<p>DEFAULT: Tones are not removed from the recording</p>
|
|
|
|
<p>DEFAULT: If tone clamping is enabled, it is performed after recording has
|
|
completed (not in real-time).</p>
|
|
|
|
<h4><a id="S2_10_5_3" name="S2_10_5_3"></a>2.10.5.3 Audio Processing Mode</h4>
|
|
|
|
<h4><a id="S2_10_5_3_1" name="S2_10_5_3_1"></a>2.10.5.3.1 VoiceXML 3.0 MUST
|
|
optionally provide a means to specify if automatic audio level controls (e.g.
|
|
Dynamic Range Compression, Limiting, Automatic Gain Control (AGC), etc.) are
|
|
to be applied to the recording or if the recording is to be raw.</h4>
|
|
|
|
<p>DEFAULT: raw</p>
|
|
Editor's note: how to specify:
|
|
<ul>
|
|
<li>raw or processed</li>
|
|
<li>type of processing</li>
|
|
<li>parameters specific to each processor or implementation</li>
|
|
<li>multiple processing operations (?)</li>
|
|
<li>real-time or post-processing</li>
|
|
</ul>
|
|
|
|
<h4><a id="S2_10_6_" name="S2_10_6_"></a>2.10.6. Recording data</h4>
|
|
|
|
<h4><a id="S2_10_6_1" name="S2_10_6_1"></a>2.10.6.1 The following information
|
|
MUST be reported after recording has completed.</h4>
|
|
<ul>
|
|
<li>Recording duration in milliseconds</li>
|
|
<li>Recording size in bytes</li>
|
|
<li>DTMF terminating string if recording was terminated via DTMFTERM, or
|
|
DTMF input available in application.lastresult</li>
|
|
<li>Indication if recording was terminated due to reaching maxtime</li>
|
|
<li>Format of the recording, as specified by RFC 4288</li>
|
|
</ul>
|
|
|
|
<h4><a id="S2_10_7" name="S2_10_7"></a>2.10.7 Upload, Storage, Caching</h4>
|
|
|
|
<h4><a id="S2_10_7_1" name="S2_10_7_1"></a>2.10.7.1 Destination</h4>
|
|
|
|
<h4><a id="S2_10_7_1_1" name="S2_10_7_1_1"></a>2.10.7.1.1 VoiceXML 3.0 MUST
|
|
support specification of the destination of the recording buffer [dest].</h4>
|
|
|
|
<h4><a id="S2_10_7_3" name="S2_10_7_3"></a>2.10.7.3 A local cache of the
|
|
recording MUST be optionally available to the application (e.g. V2 semantics
|
|
of form item)</h4>
|
|
|
|
<h4><a id="S2_10_7_4" name="S2_10_7_4"></a>2.10.7.4 It MUST be possible to
|
|
specify the upload to be either a synchronous or asynchronous operation.</h4>
|
|
|
|
<h4><a id="S2_10_7_5" name="S2_10_7_5"></a>2.10.7.5 It MUST be possible to
|
|
select the upload to be available realtime, at the end of the call, or
|
|
indefinitely after the end of the call.</h4>
|
|
|
|
<h4><a id="S2_10_7_6" name="S2_10_7_6"></a>2.10.7.6 All modes other than
|
|
indefinite upload shall expose any errors in recording or upload to the
|
|
application.</h4>
|
|
|
|
<h4><a id="S2_10_8_" name="S2_10_8_"></a>2.10.8. Errors and Events</h4>
|
|
|
|
<p>Errors and events as a result of media recording must be presented to the
|
|
application</p>
|
|
|
|
<p>Examples of types of errors possibly reported:</p>
|
|
<ul>
|
|
<li>error.unsupported.format (the requested media type is not
|
|
supported)</li>
|
|
<li>error.unavailable.format (the requested media type is currently not
|
|
available)</li>
|
|
<li>error during upload</li>
|
|
<li>disk full, other disk errors</li>
|
|
<li>permissions:</li>
|
|
<li>error.noauthorization (or error.noresource if want it hidden from
|
|
potential attacker?<br />
|
|
</li>
|
|
</ul>
|
|
|
|
<h3><a id="funct-mediaformat" name="funct-mediaformat">2.11 Media
|
|
Formats</a></h3>
|
|
<!-- <p><span class="owner">Owner: Jeff Hoepfinger</span><br /> -->
|
|
<!-- <span class="note">Note: These were recently added on the 6/24/2008 -->
|
|
<!-- call.</span></p> -->
|
|
|
|
<h4><a id="CS2_10_8_" name="CS2_10_8_"></a>VoiceXML 3 MUST support these
|
|
categories of media capabilities:</h4>
|
|
<ul>
|
|
<li>Audio Basic: audio only, with header or not (e.g. RIFF or AU
|
|
header)</li>
|
|
<li>Audio Rich: audio (one or more channels), plus meta data (e.g. header,
|
|
marks, transcription, etc.)</li>
|
|
<li>Multi-media: one or more media channels (e.g. audio, video,images,
|
|
etc.) plus meta data (e.g. header, marks, transcription, etc.)</li>
|
|
</ul>
|
|
|
|
<p>This does not imply platform support requirements. For example, a
|
|
particular platform may support Audio Basic but not Audio Rich. Another might
|
|
support Audio Rich but not all meta data elements.</p>
|
|
|
|
<h3><a id="funct-datamodel" name="funct-datamodel">2.12 Data Model (must
|
|
have)</a></h3>
|
|
|
|
<p>TBD.</p>
|
|
|
|
<h3><a id="funct-submitprocessing" name="funct-submitprocessing">2.11 Submit
|
|
Processing (must have)</a></h3>
|
|
|
|
<p>TBD.</p>
|
|
|
|
<h2><a id="format-reqs" name="format-reqs">3. Format Requirements</a></h2>
|
|
|
|
<h3><a id="format-flow" name="format-flow">3.1 Flow Language (must
|
|
have)</a></h3>
|
|
<!-- <p><span class="owner">Owner: Jim Barnett</span><br /> -->
|
|
<!-- <span class="note">Note: Jim reviewed and felt these were complete</span></p> -->
|
|
|
|
<p>A flow control language will be developed in conjunction with VoiceXML 3.0
|
|
(i.e. <a href="http://www.w3.org/TR/scxml/">SCXML</a>)</p>
|
|
|
|
<h4><a id="S3_1_1" name="S3_1_1"></a>3.1.1 The flow control language will
|
|
allow the separation of business logic from media control and user
|
|
interaction.</h4>
|
|
|
|
<h4><a id="S3_1_2" name="S3_1_2"></a>3.1.2 The flow control language will be
|
|
able to invoke VoiceXML 3.0 scripts, passing data into them and receiving
|
|
results back when the scripts terminate.</h4>
|
|
|
|
<h4><a id="S3_1_3" name="S3_1_3"></a>3.1.3 The flow control language will be
|
|
suitable for use as an Interaction Manager in the Multimodal Architecture
|
|
Framework.</h4>
|
|
|
|
<h4><a id="S3_1_4" name="S3_1_4"></a>3.1.4 The flow control language will be
|
|
based on state-machine concepts.</h4>
|
|
|
|
<h4><a id="S3_1_5" name="S3_1_5"></a>3.1.5 The flow control language will be
|
|
able to receive asynchronous messages from external entities.</h4>
|
|
|
|
<h4><a id="S3_1_6" name="S3_1_6"></a>3.1.6 The flow control language will be
|
|
able to send messages to external entities.</h4>
|
|
|
|
<h4><a id="S3_1_7" name="S3_1_7"></a>3.1.7 The flow control language will not
|
|
contain any media-specific concepts such as ASR or TTS.</h4>
|
|
|
|
<h3><a id="format-semmod" name="format-semmod">3.2 Semantic Model Definition
|
|
(must have)</a></h3>
|
|
<!-- <p><span class="owner">Owner: Mike Bodell</span></p> -->
|
|
|
|
<h4><a id="S3_2_1" name="S3_2_1"></a>3.2.1 The precise semantics of all VXML
|
|
3.0 tags MUST be provided</h4>
|
|
|
|
<h4><a id="S3_2_2" name="S3_2_2"></a>3.2.2 The semantic model MUST be the
|
|
authoritative description of VXML 3.0 functionality</h4>
|
|
|
|
<h4><a id="S3_2_3" name="S3_2_3"></a>3.2.3 Different conformance profiles
|
|
MUST be possible, but they MUST be defined in terms of the semantic
|
|
model.</h4>
|
|
|
|
<h4><a id="S3_2_4" name="S3_2_4"></a>3.2.4 The semantic model descriptions of
|
|
VXML 3.0 MUST be able to express all of the functionality of VXML 2.1</h4>
|
|
|
|
<h4><a id="S3_2_5" name="S3_2_5"></a>3.2.5 Extensions to VXML 3.0 SHOULD be
|
|
able to build on the semantic model descriptions</h4>
|
|
|
|
<h2><a id="other-reqs" name="other-reqs">4. Other Requirements</a></h2>
|
|
|
|
<h3><a id="other-vxml" name="other-vxml">4.1 Consistent with other Voice
|
|
Browser Working Group specs (must have)</a></h3>
|
|
<!-- <p><span class="owner">Owner: Dan Burnett</span></p> -->
|
|
|
|
<h4><a id="S4_1_1" name="S4_1_1"></a>4.1.1 Wherever similar functionality to
|
|
that of another Voice Browser Working Group specification is available, this
|
|
language MUST use a syntax similar to that used in the relevant
|
|
specification.</h4>
|
|
|
|
<h4><a id="S4_1_2" name="S4_1_2"></a>4.1.2 For data that is likely to be
|
|
represented in another Voice Browser Working Group markup language (eg., SRGS
|
|
or EMMA) or used by another Voice Browser Working Group language, there MUST
|
|
be a clear definition of the mapping between the two data
|
|
representations.</h4>
|
|
|
|
<h4><a id="S4_1_3" name="S4_1_3"></a>4.1.3 It MUST be possible to pass
|
|
Internet-related document and server information (caching parameters,
|
|
xml:base, etc.) from this language to other VBWG language processors for
|
|
embedded VBWG languages.</h4>
|
|
|
|
<h3><a id="other-other" name="other-other">4.2 Consistent with other specs
|
|
(XML, MMI, I18N, Accessibility, MRCP, Backplane Activities) (must
|
|
have)</a></h3>
|
|
<!-- <p><span class="owner">Owner: Dan Burnett/Scott McGlashan</span></p> -->
|
|
|
|
<h4><a id="S4_2_1" name="S4_2_1"></a>4.2.1 MRCP</h4>
|
|
|
|
<h4><a id="S4_2_1_1" name="S4_2_1_1"></a>4.2.1.1 This language MUST support a
|
|
profile that can be implemented using MRCPv2.</h4>
|
|
|
|
<h4><a id="S4_2_1_2" name="S4_2_1_2"></a>4.2.1.2 Where possible, this
|
|
language SHOULD remain compatible with MRCPv2 in terms of data formats (SRGS,
|
|
SSML).</h4>
|
|
|
|
<h4><a id="S4_2_2_" name="S4_2_2_"></a>4.2.2. <a
|
|
href="http://www.w3.org/TR/mmi-arch/">MMI</a></h4>
|
|
<!-- <p><span class="owner">Owner: Jeff Hoepfinger</span></p> -->
|
|
|
|
<p>There must be at least one profile of VoiceXML 3.0 in which all of the
|
|
following requirements are supported.</p>
|
|
|
|
<h4><a id="S4_2_2_1" name="S4_2_2_1"></a>4.2.2.1 It MUST be possible for
|
|
VoiceXML 3.0 implementations to receive, process, and generate MMI life cycle
|
|
events. Some events maybe handled automatically, while others maybe under
|
|
author control.</h4>
|
|
|
|
<h4><a id="S4_2_2_2" name="S4_2_2_2"></a>4.2.2.2 VoiceXML 3.0 MUST provide a
|
|
way for the author to specify the exact functions required for the
|
|
application such that the platform can allocate the minimum necessary
|
|
resources.</h4>
|
|
|
|
<h4><a id="S4_2_2_3" name="S4_2_2_3"></a>4.2.2.3 VoiceXML 3.0 MUST be able to
|
|
provide EMMA-formatted information inside the data field of MMI life cycle
|
|
events.</h4>
|
|
|
|
<h4><a id="S4_2_2_4" name="S4_2_2_4"></a>4.2.2.4 VoiceXML 3.0 platforms MUST
|
|
specify one or more event I/O processors for interoperable exchange of life
|
|
cycle events. The Voice Browser Group requests public comment on what such
|
|
event processors should be or whether they should be part of the language at
|
|
all.</h4>
|
|
|
|
<h3><a id="other-simplify" name="other-simplify">4.3 Simplify Existing
|
|
VoiceXML Tasks (must have)</a></h3>
|
|
<!-- <p><span class="owner">Owner: Dan Burnett/Scott McGlashan</span></p> -->
|
|
|
|
<h4><a id="S4_3_1" name="S4_3_1"></a>4.3.1 This language MUST provide a
|
|
mechanism for authors to develop dialog managers (state-based, task-based,
|
|
rule-based, etc.) that are easily used and configured by other authors.</h4>
|
|
|
|
<h4><a id="S4_3_2" name="S4_3_2"></a>4.3.2 This language MUST provide
|
|
mechanisms to simplify authoring of these common tasks: (we need to collect a
|
|
list of common tasks)</h4>
|
|
|
|
<h3><a id="other-maintain" name="other-maintain">4.4 Maintain Functionality
|
|
from Previous VXML Versions</a></h3>
|
|
|
|
<h4><a id="S4_4_1" name="S4_4_1"></a>4.4.1 New features added in VoiceXML 3.0
|
|
MUST be backward compatible with previous VoiceXML versions</h4>
|
|
|
|
<h4><a id="S4_4_1_1" name="S4_4_1_1"></a>4.4.1.1 Functionality available in
|
|
VoiceXML 2.0 and VoiceXML 2.1 MUST be available in VoiceXML 3.0.</h4>
|
|
|
|
<h4><a id="S4_4_1_2" name="S4_4_1_2"></a>4.4.1.2 Applications written in
|
|
VoiceXML 2.0/2.1 MUST be portable to VoiceXML 3.0 without losing application
|
|
capabilities.</h4>
|
|
|
|
<h3><a id="other-crs" name="other-crs">4.5 Address Change Requests from
|
|
previous VoiceXML Versions (must have)</a></h3>
|
|
<!-- <p><span class="owner">Owner: Jeff Hoepfinger</span><br /> -->
|
|
<!-- <span class="note">Reviewed all deferred and open change requests from VXML -->
|
|
<!-- 2.0/2.1</span></p> -->
|
|
|
|
<h4><a id="S4_5_1" name="S4_5_1"></a>4.5.1 Deferred change requests from VXML
|
|
2.0 and 2.1 reevaluated for VXML 3.0</h4>
|
|
|
|
<p>In particular, the following deferred CRs reevaluated: R51, R92, R104,
|
|
R113, R145, R155, R156, R186, R230, R233, R348, R394, R528, R541, and
|
|
R565.</p>
|
|
|
|
<h4><a id="S4_5_2" name="S4_5_2"></a>4.5.2 Unassigned change requests from
|
|
VXML 2.0 and 2.1 reevaluated for VXML 3.0</h4>
|
|
|
|
<p>In particular, the following unassigned CRs reevaluated: R600, R614, R619,
|
|
R620, R622, R623, R624, R625, R626, R627, R628, R629, R631, and R632.</p>
|
|
|
|
<h2><a id="acknowledgments" name="acknowledgments">5. Acknowledgments</a></h2>
|
|
|
|
<p>TBD</p>
|
|
|
|
<h2><a id="prev-reqs" name="prev-reqs">Appendix A. Previous
|
|
Requirements</a></h2>
|
|
|
|
<p>The following requirements have been satisfied by previous Voice Browser
|
|
Working Group Specifications</p>
|
|
|
|
<h3><a id="A_1_1" name="A_1_1"></a>A.1.1 Audio Modality Input and Output
|
|
(must have) FULLY COVERED</h3>
|
|
|
|
<p>The markup language can specify which spoken user input is interpreted by
|
|
the voice browser, as well as the content rendered as spoken output by the
|
|
voice browser.</p>
|
|
|
|
<h4><a id="CA_1_1" name="CA_1_1"></a>Requirement Coverage</h4>
|
|
|
|
<p>Audio output: <prompt>, <audio> <a
|
|
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>
|
|
|
|
<p>Audio input: <grammar> <a
|
|
href="http://www.w3.org/TR/2004/REC-speech-grammar-20040316/">SRGS 1.0</a></p>
|
|
|
|
<h3><a id="A_1_2" name="A_1_2"></a>A.1.2 Sequential multi-modal Input (must
|
|
have) FULLY COVERED</h3>
|
|
|
|
<p>The markup language specifies that user input from multiple modalities is
|
|
to be interpreted by the voice browser. There is no requirement that the
|
|
input modalities are simultaneously active. For example, a voice browser
|
|
interpreting the markup language in a telephony environment could accept DTMF
|
|
input in one dialog state, and spoken input in another.</p>
|
|
|
|
<h4><a id="CA_1_2" name="CA_1_2"></a>Requirement Coverage</h4>
|
|
|
|
<p><grammar> mode attribute: dtmf,voice <a
|
|
href="http://www.w3.org/TR/2004/REC-speech-grammar-20040316/">SRGS 1.0</a></p>
|
|
|
|
<h3><a id="A_1_3" name="A_1_3"></a>A.1.3 Unco-ordinated, Simultaneous,
|
|
Multi-modal Input (should have) FULLYCOVERED</h3>
|
|
|
|
<p>The markup language specifies that user input from different modalities is
|
|
to be interpreted at the same time. There is no requirement that
|
|
interpretation of the input modalities are co-ordinated. For example, a voice
|
|
browser in a desktop environment could accept keyboard input or spoken input
|
|
in same dialog state.</p>
|
|
|
|
<h4><a id="CA_1_3" name="CA_1_3"></a>Requirement Coverage</h4>
|
|
|
|
<p><grammar> mode attribute: dtmf,voice <a
|
|
href="http://www.w3.org/TR/2004/REC-speech-grammar-20040316/">SRGS 1.0</a></p>
|
|
|
|
<p><field> defining multiple <grammar>s with different mode
|
|
attribute values <a
|
|
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>
|
|
|
|
<h3><a id="A_1_4" name="A_1_4"></a>A.1.4 Co-ordinated, Simultaneous
|
|
Multi-modal Input (nice to have) FULLYCOVERED</h3>
|
|
|
|
<p>The markup language specifies that user input from multiple modalities is
|
|
interpreted at the same time and that interpretation of the inputs are
|
|
co-ordinated by the voice browser. For example, in a telephony environment,
|
|
the user can type<em>200</em> on the keypad and say <em>transfer to checking
|
|
account</em> and the interpretations are co-ordinated so that they are
|
|
understood as <em>transfer 200 to checking account</em>.</p>
|
|
|
|
<h4><a id="CA_1_4" name="CA_1_4"></a>Requirement Coverage</h4>
|
|
|
|
<p><grammar> mode attribute: dtmf,voice <a
|
|
href="http://www.w3.org/TR/2004/REC-speech-grammar-20040316/">SRGS 1.0</a></p>
|
|
|
|
<p><field> defining multiple <grammar>s with different mode
|
|
attribute values <a
|
|
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>
|
|
|
|
<h3><a id="A_1_5" name="A_1_5"></a>A.1.5 Sequential multi-modal Output (must
|
|
have) FULLY COVERED</h3>
|
|
|
|
<p>The markup language specifies that content is rendered in multiple
|
|
modalities by the voice browser. There is no requirement the output
|
|
modalities are rendered simultaneously. For example, a voice browser could
|
|
output speech in one dialog state, and graphics in another.</p>
|
|
|
|
<h4><a id="CA_1_5" name="CA_1_5"></a>Requirement Coverage</h4>
|
|
|
|
<p><prompt>, <audio> <a
|
|
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>
|
|
|
|
<h3><a id="A_1_6" name="A_1_6"></a>A.1.6 Unco-ordinated, Simultaneous,
|
|
Multi-modal Output (nice to have)FULLY COVERED</h3>
|
|
|
|
<p>The markup language specifies that content is rendered in multiple
|
|
modalities at the same time. There is no requirement the rendering of output
|
|
modalities are co-ordinated. For example, a voice browser in a desktop
|
|
environment could display graphics and provide audio output at the same
|
|
time.</p>
|
|
|
|
<h4><a id="CA_1_6" name="CA_1_6"></a>Requirement Coverage</h4>
|
|
|
|
<p><prompt>, <audio> <a
|
|
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>
|
|
|
|
<h3><a id="A_1_7" name="A_1_7"></a>A.1.7 Co-ordinated, Simultaneous
|
|
Multi-modal Output (nice to have) FULLYCOVERED</h3>
|
|
|
|
<p>The markup language specifies that content is to be simultaneously
|
|
rendered in multiple modalities and that output rendering is co-ordinated.
|
|
For example, graphical output on a cellular telephone display is co-ordinated
|
|
with spoken output.</p>
|
|
|
|
<h4><a id="CA_1_7" name="CA_1_7"></a>Requirement Coverage</h4>
|
|
|
|
<p><prompt>, <audio> <a
|
|
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>
|
|
|
|
<h3><a id="A_2_1" name="A_2_1"></a>A.2.1 Mixed Initiative: Form Level (must
|
|
have) FULLY COVERED</h3>
|
|
|
|
<p>Mixed initiative refers to dialog where one participant take the
|
|
initiative by, for example, asking a question and expects the other
|
|
participant to respond to this initiative by, for example, answering the
|
|
question. The other participant, however, responds instead with an initiative
|
|
by asking another question. Typically, the first participant then responds to
|
|
this initiative, before the second participant responds to the original
|
|
initiative. This behavior is illustrated below:<br />
|
|
<br />
|
|
<em>S-A1: When do you want to fly to Paris?<br />
|
|
U-B1: What did you say?<br />
|
|
S-B2: I said when do you want to fly to Paris?<br />
|
|
U-A2: Tuesday.</em></p>
|
|
|
|
<p>where A1 is responded to in A2 after a nested interaction, or sub-dialog
|
|
in B1 and B2. Note that the B2 response itself could have been another
|
|
initiative leading to further nesting of the interaction.</p>
|
|
|
|
<p>The form-level mixed initiative requirement is that the markup language
|
|
can specify to the voice browser that it can take the initiative when user
|
|
expects a response, and also allow the user to take the initiative when it
|
|
expects a response where the content of these initiatives is relevant to the
|
|
task at hand, contains navigation instructions or concerns general
|
|
meta-communication issues. This mixed initiative requirement is particularly
|
|
important when processing form input (hence the name) and is further
|
|
elaborated in requirements A.2.1.1, A.2.1.2, A.2.1.3 and A.2.1.4 below.</p>
|
|
|
|
<h4><a id="CA_2_1" name="CA_2_1"></a>Requirement Coverage</h4>
|
|
|
|
<p><field> <a
|
|
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>
|
|
|
|
<p><noinput>, <nomatch> <a
|
|
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>
|
|
|
|
<h4><a id="A_2_1_1" name="A_2_1_1"></a>A.2.1.1 Clarification Subdialog (must
|
|
have) FULLY COVERED</h4>
|
|
|
|
<p>The markup language can specify that a clarification sub-dialog should be
|
|
performed when the user provides incomplete, form-related information. For
|
|
example, in a flight enquiry service, the departure city and date may be
|
|
required but the user does not always provide all the information at once:<br
|
|
/>
|
|
<br />
|
|
<em>S1: How can I help you?<br />
|
|
U1: I want to fly to Paris.<br />
|
|
S2: When?<br />
|
|
U1: Monday</em></p>
|
|
|
|
<p>U1 is incomplete (or 'underinformative') with respect to the service (or
|
|
form) and the system then initiates a sub-dialog in S2 to collect the
|
|
required information. If additional parameters are required, further
|
|
sub-dialogs may be initiated.</p>
|
|
|
|
<h4><a id="CA_2_1_1" name="CA_2_1_1"></a>Requirement Coverage</h4>
|
|
|
|
<p><initial>, <field> <a
|
|
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>
|
|
|
|
<h4><a id="A_2_1_2" name="A_2_1_2"></a>A.2.1.2 Confirmation Subdialog (must
|
|
have) FULLY COVERED</h4>
|
|
|
|
<p>The markup language can specify that a confirmation sub-dialog is to be
|
|
performed when the confidence associated with the interpretation of the user
|
|
input is too low.<br />
|
|
<br />
|
|
<em>U1: I want to fly to Paris.<br />
|
|
S1: Did you say 'I want a fly to Paris'?<br />
|
|
U2: Yes.<br />
|
|
S2: When?<br />
|
|
U3: ...</em></p>
|
|
|
|
<p>Note confirmation sub-dialogs take precedence over clarification
|
|
sub-dialogs.</p>
|
|
|
|
<h4><a id="CA_2_1_2" name="CA_2_1_2"></a>Requirement Coverage</h4>
|
|
|
|
<p><field> <a
|
|
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>
|
|
|
|
<p><i>name$</i>.confidence shadow variable <a
|
|
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>
|
|
|
|
<h4><a id="A_2_1_3" name="A_2_1_3"></a>A.2.1.3 Over-informative Input:
|
|
corrective (must have) FULLY COVERED</h4>
|
|
|
|
<p>The markup language can specify that unsolicited user input in a
|
|
sub-dialog which corrects earlier input is to be interpreted appropriately.
|
|
For example, in a confirmation sub-dialog users may provide corrective
|
|
information relevant to the form:<br />
|
|
<br />
|
|
<em>S1: Did you say you wanted to travel from Paris?<br />
|
|
U1: No, from Perros.</em> (modification) <em><br />
|
|
U1': Yes, from Paris</em> (repetition)</p>
|
|
|
|
<h4><a id="CA_2_1_3" name="CA_2_1_3"></a>Requirement Coverage</h4>
|
|
|
|
<p><field> <a
|
|
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>
|
|
|
|
<p>$GARBAGE rule <a
|
|
href="http://www.w3.org/TR/2004/REC-speech-grammar-20040316/">SRGS 1.0</a></p>
|
|
|
|
<h4><a id="A_2_1_4" name="A_2_1_4"></a>A.2.1.4 Over-informative Input:
|
|
additional (nice to have) FULLYCOVERED</h4>
|
|
|
|
<p>The markup language can specify that unsolicited user input in a
|
|
sub-dialog which is not corrective but additional, relevant information for
|
|
the current form is to be interpreted appropriately. For example, in a
|
|
confirmation sub-dialog users may provide additional information relevant to
|
|
the form:<br />
|
|
<em>S1: Did you say you wanted to travel from Paris?<br />
|
|
U1: Yes, I want to fly to Paris on Monday around 11.30</em></p>
|
|
|
|
<h4><a id="CA_2_1_4" name="CA_2_1_4"></a>Requirement Coverage</h4>
|
|
|
|
<p><initial>, <field> <a
|
|
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>
|
|
|
|
<p>form level <grammar>s <a
|
|
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a>,
|
|
<a href="http://www.w3.org/TR/2004/REC-speech-grammar-20040316/">SRGS
|
|
1.0</a></p>
|
|
|
|
<h3><a id="A_2_2" name="A_2_2"></a>A.2.2 Mixed Initiative: Task Level (must
|
|
have) FULLY COVERED</h3>
|
|
|
|
<p>The markup language needs to address mixed initiative in dialogs which
|
|
involve more than one task (or topic). For example, a portal service may
|
|
allow the user to interact with a number of specific services such as car
|
|
hire, hotel reservation, flight enquiries, etc, which may be located on the
|
|
different web sites or servers. This requirement is further elaborated in
|
|
requirements A.2.2.1, A.2.2.2, A.2.2.3, A.2.2.4 and A.2.2.5 below.</p>
|
|
|
|
<h4><a id="A_2_2_1" name="A_2_2_1"></a>A.2.2.1 Explicit Task Switching (must
|
|
have) FULLY COVERED</h4>
|
|
|
|
<p>The markup language can specify how users can explicitly switch from one
|
|
task to another. For example, by means of a set of global commands which are
|
|
active in all tasks and which take the user to a specific task; e.g. <em>Take
|
|
me to car hire</em>, <em>Go to hotel reservations</em>.</p>
|
|
|
|
<h4><a id="CA_2_2_1" name="CA_2_2_1"></a>Requirement Coverage</h4>
|
|
|
|
<p><link>, <goto>, <submit> <a
|
|
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>
|
|
|
|
<p>form level <grammar>s <a
|
|
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a>,
|
|
<a href="http://www.w3.org/TR/2004/REC-speech-grammar-20040316/">SRGS
|
|
1.0</a></p>
|
|
|
|
<h4><a id="A_2_2_2" name="A_2_2_2"></a>A.2.2.2 Implicit Task Switching
|
|
(should have) FULLY COVERED</h4>
|
|
|
|
<p>The markup language can specify how users can implicitly switch from one
|
|
task to another. For example, by means of simply uttering a phrases relevant
|
|
to another task; <em>I want to reserve a McLaren F1 in Monaco next
|
|
Wednesday</em>.</p>
|
|
|
|
<h4><a id="CA_2_2_2" name="CA_2_2_2"></a>Requirement Coverage</h4>
|
|
|
|
<p><link>, <goto>, <submit> <a
|
|
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>
|
|
|
|
<p>form level <grammar>s <a
|
|
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a>,
|
|
<a href="http://www.w3.org/TR/2004/REC-speech-grammar-20040316/">SRGS
|
|
1.0</a></p>
|
|
|
|
<h4><a id="A_2_2_3" name="A_2_2_3"></a>A.2.2.3 Manual Return from Task Switch
|
|
(must have) FULLY COVERED</h4>
|
|
|
|
<p>The markup language can specify how users can explicitly return to a
|
|
previous task at any time. For example, by means of global task navigation
|
|
commands such as <em>previous task</em>.</p>
|
|
|
|
<h4><a id="CA_2_2_3" name="CA_2_2_3"></a>Requirement Coverage</h4>
|
|
|
|
<p><link>, <goto> <a
|
|
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>
|
|
|
|
<h4><a id="A_2_2_4" name="A_2_2_4"></a>A.2.2.4 Automatic Return from Task
|
|
Switch (should have) FULLY COVERED</h4>
|
|
|
|
<p>The markup language can specify that users can automatically return to the
|
|
previous task upon completion or explicit cancellation of the current
|
|
task.</p>
|
|
|
|
<h4><a id="CA_2_2_4" name="CA_2_2_4"></a>Requirement Coverage</h4>
|
|
|
|
<p><link>, <goto> <a
|
|
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>
|
|
|
|
<h4><a id="A_2_2_5" name="A_2_2_5"></a>A.2.2.5 Suspended Tasks (should have)
|
|
FULLY COVERED</h4>
|
|
|
|
<p>The markup language can specify that when task switching occurs the
|
|
previous task is suspended rather than canceled. Thus when the user returns
|
|
to the previous task, the interaction is resumed at the point it was
|
|
suspended.</p>
|
|
|
|
<h4><a id="CA_2_2_5" name="CA_2_2_5"></a>Requirement Coverage</h4>
|
|
|
|
<p><link> <a
|
|
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>
|
|
|
|
<h3><a id="A_2_3" name="A_2_3"></a>A.2.3 Help Behavior (should have) FULLY
|
|
COVERED</h3>
|
|
|
|
<p>The markup language can specify help information when requested by the
|
|
user. Help information should be available in all dialog states.<br />
|
|
<em>S1: How can I help you?<br />
|
|
U1: What can you do?<br />
|
|
S2: I can give you flight information about flights between major cities
|
|
world-wide just like a travel agent. How can I help you?<br />
|
|
U1: I want a flight to Paris ...</em><br />
|
|
</p>
|
|
|
|
<p>Help information can be tapered so that it can be elaborated upon on
|
|
subsequent user requests.</p>
|
|
|
|
<h4><a id="CA_2_3" name="CA_2_3"></a>Requirement Coverage</h4>
|
|
|
|
<p><help> using count attribute for tapering <a
|
|
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>
|
|
|
|
<h3><a id="A_2_4" name="A_2_4"></a>A.2.4 Error Correction Behavior (must
|
|
have) FULLY COVERED</h3>
|
|
|
|
<p>The markup language can specify how error events generated by the voice
|
|
browser are to be handled. For example, by initiating a sub-dialog to
|
|
describe and correct the error:<br />
|
|
<em>S1: How can I help you?<br />
|
|
U1: <audio but no interpretation><br />
|
|
S2: Sorry, I didn't understand that. Where do you want to travel to?<br />
|
|
U2: Paris</em></p>
|
|
|
|
<p>The markup language can specify how specific types of errors encountered
|
|
in spoken dialog, e.g. no audio, too loud/soft, no interpretation, no audio,
|
|
internal error, etc, are to be handled as well as providing a general 'catch
|
|
all' method.</p>
|
|
|
|
<h4><a id="CA_2_4" name="CA_2_4"></a>Requirement Coverage</h4>
|
|
|
|
<p><error>, <nomatch>, <noinput>, <catch> <a
|
|
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>
|
|
|
|
<h3><a id="A_2_5" name="A_2_5"></a>A.2.5 Timeout Behavior (must have) FULLY
|
|
COVERED</h3>
|
|
|
|
<p>The markup language can specify what to do when the voice browser times
|
|
out waiting for input; for example, a timeout event can be handled by
|
|
repeating the current dialog state:<br />
|
|
<em>S1: Did you say Monday?<br />
|
|
U1: <timeout><br />
|
|
S2: Did you say Monday?</em><br />
|
|
</p>
|
|
|
|
<p>Note that the strategy may be dependent upon the environment; in a desktop
|
|
environment, repetition for example may be irritating.</p>
|
|
|
|
<h4><a id="CA_2_5" name="CA_2_5"></a>Requirement Coverage</h4>
|
|
|
|
<p><noinput>, <catch> <a
|
|
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>
|
|
|
|
<h3><a id="A_2_6" name="A_2_6"></a>A.2.6 Meta-Commands (should have) FULLY
|
|
COVERED</h3>
|
|
|
|
<p>The markup language specifies a set of meta-command functions which are
|
|
available in all dialog states; for example, repeat, cancel, quit, operator,
|
|
etc.</p>
|
|
|
|
<p>The precise set of meta-commands will be co-ordinated with the Telephony
|
|
Speech Standards Committee.</p>
|
|
|
|
<p>The markup language should specify how the scope of meta-commands like
|
|
'cancel' is resolved.</p>
|
|
|
|
<h4><a id="CA_2_6" name="CA_2_6"></a>Requirement Coverage</h4>
|
|
|
|
<p>Universal Grammars <a
|
|
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>
|
|
|
|
<h3><a id="A_2_7" name="A_2_7"></a>A.2.7 Barge-in Behavior (should have)
|
|
FULLY COVERED</h3>
|
|
|
|
<p>The markup language specifies when the user is able to bargein on the
|
|
system output, and when it is not allowed.</p>
|
|
|
|
<p>Note: The output device may generate timestamped events when barge-in
|
|
occurs (see 3.9).</p>
|
|
|
|
<h4><a id="CA_2_7" name="CA_2_7"></a>Requirement Coverage</h4>
|
|
|
|
<p>bargein property <a
|
|
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>
|
|
|
|
<h3><a id="A_2_8" name="A_2_8"></a>A.2.8 Call Transfer (should have) FULLY
|
|
COVERED</h3>
|
|
|
|
<p>The markup language specifies a mechanism to allow transfer of the caller
|
|
to another line in a telephony environment. For example, in cases of dialog
|
|
breakdown, the user can be transferred to an operator (cf. 'callto' in HTML).
|
|
The markup language also provides a mechanism to deal with transfer failures
|
|
such as when the called line is busy or engaged.</p>
|
|
|
|
<h4><a id="CA_2_8" name="CA_2_8"></a>Requirement Coverage</h4>
|
|
|
|
<p><transfer> <a
|
|
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>
|
|
|
|
<p><createcall>, <redirect> <a
|
|
href="http://www.w3.org/TR/2005/WD-ccxml-20050629/">CCXML 1.0</a></p>
|
|
|
|
<h3><a id="A_2_9" name="A_2_9"></a>A.2.9 Quit Behavior (must have) FULLY
|
|
COVERED</h3>
|
|
|
|
<p>The markup language provides a mechanism to terminate the session (cf.
|
|
user-terminated sessions via a 'quit' meta-command in 2.6).</p>
|
|
|
|
<h4><a id="CA_2_9" name="CA_2_9"></a>Requirement Coverage</h4>
|
|
|
|
<p>Universal Grammars <a
|
|
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>
|
|
|
|
<h3><a id="A_2_10" name="A_2_10"></a>A.2.10 Interaction with External
|
|
Components (must have) FULLY COVERED</h3>
|
|
|
|
<p>The markup language must support a generic component interface to allow
|
|
for the use of external components on the client and/or server side. The
|
|
interface provides a mechanism for transferring data between the markup
|
|
language's variables and the component. Examples of such data are:
|
|
configuration parameters (such as timeouts), and events for data input and
|
|
error codes. Except for event handling, a call to an external component does
|
|
not directly change the dialog state, i.e. the dialog continues in the state
|
|
from which the external component was called.</p>
|
|
|
|
<p>Examples of external components are pre-built dialog components and server
|
|
scripts. Pre-built dialogs are further described in Section A.3.3. Server
|
|
scripts can be used to interact with remote services, devices or
|
|
databases.</p>
|
|
|
|
<h4><a id="CA_2_10" name="CA_2_10"></a>Requirement Coverage</h4>
|
|
|
|
<p><property> <a
|
|
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>
|
|
|
|
<p><submit> namelist attribute, <submit>, <goto> query
|
|
string <a href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML
|
|
2.0</a></p>
|
|
|
|
<h3><a id="A_3_1" name="A_3_1"></a>A.3.1 Ease of Use (must have) FULLY
|
|
COVERED</h3>
|
|
|
|
<p>The markup language should be easy for designers to understand and author
|
|
without special tools or knowledge of vendor technology or protocols (dialog
|
|
design knowledge is still essential).</p>
|
|
|
|
<h4><a id="CA_3_1" name="CA_3_1"></a>Requirement Coverage</h4>
|
|
|
|
<p>Form Interpretation Algorithm (FIA) <a
|
|
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>
|
|
|
|
<h3><a id="A_3_2" name="A_3_2"></a>A.3.2 Simplicity and Power (must have)
|
|
FULLY COVERED</h3>
|
|
|
|
<p>The markup language allows designers to rapidly develop simple dialogs
|
|
without the need to worry about interactional details but also allow
|
|
designers to take more control over interaction to develop complex
|
|
dialogs.</p>
|
|
|
|
<h4><a id="CA_3_2" name="CA_3_2"></a>Requirement Coverage</h4>
|
|
|
|
<p>Form Interpretation Algorithm (FIA) <a
|
|
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>
|
|
|
|
<h3><a id="A_3_3" name="A_3_3"></a>A.3.3 Support for Modularity and Re-use
|
|
(should have) FULLY COVERED</h3>
|
|
|
|
<p>The markup language complies with the requirements of the Reusable Dialog
|
|
Components Subgroup.</p>
|
|
|
|
<p>The markup language can specify a number of pre-built dialog components.
|
|
This enables one to build a library of reusable 'dialogs'. This is useful for
|
|
handling both application specific input types, such as telephone numbers,
|
|
credit card number, etc as well as those that are more generic, such as
|
|
times, dates, numbers, etc.</p>
|
|
|
|
<h4><a id="CA_3_3" name="CA_3_3"></a>Requirement Coverage</h4>
|
|
|
|
<p><subdialog> <a
|
|
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>
|
|
|
|
<h3><a id="A_3_4" name="A_3_4"></a>A.3.4 Naming (must have) FULLY COVERED</h3>
|
|
|
|
<p>Dialogs, states, inputs and outputs can be referenced by a URI in the
|
|
markup language.</p>
|
|
|
|
<h4><a id="CA_3_4" name="CA_3_4"></a>Requirement Coverage</h4>
|
|
|
|
<p><form> id attribute, form item name attribute <a
|
|
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>
|
|
|
|
<h3><a id="A_3_5" name="A_3_5"></a>A.3.5 Variables (must have) FULLY
|
|
COVERED</h3>
|
|
|
|
<p>Variables can be defined and assigned values.</p>
|
|
|
|
<p>Variables can be scoped within namespaces: for example, state-level,
|
|
dialog-level, document-level, application-level or session-level. The markup
|
|
language defines the precise scope of all variables.</p>
|
|
|
|
<p>The markup language must specify if variables are atomic or structured.</p>
|
|
|
|
<p>Variables can be assigned default values. Assignment may be optional; for
|
|
example, in a flight reservation form, a 'special meal' variable need not be
|
|
assigned a value by the user.</p>
|
|
|
|
<p>Variables may be referred to in the output content of the markup
|
|
language.</p>
|
|
|
|
<p>The precise requirements on variables may be affected by W3C work on
|
|
modularity and XML schema datatypes.</p>
|
|
|
|
<h4><a id="CA_3_5" name="CA_3_5"></a>Requirement Coverage</h4>
|
|
|
|
<p><var>, <assign>, <script> <a
|
|
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>
|
|
|
|
<h3><a id="A_3_6" name="A_3_6"></a>A.3.6 Variable Binding (must have) FULLY
|
|
COVERED</h3>
|
|
|
|
<p>User input can bind one or more state variables. A single input may bind a
|
|
single variable or it may bind multiple variables in any order; for example,
|
|
the following utterances result in the same variable bindings<br />
|
|
</p>
|
|
<ul>
|
|
<li>Transfer $200 from savings to checking</li>
|
|
<li>Transfer $200 to checking from savings</li>
|
|
<li>Transfer from savings $200 to checking</li>
|
|
</ul>
|
|
|
|
<h4><a id="CA_3_6" name="CA_3_6"></a>Requirement Coverage</h4>
|
|
|
|
<p>application.lastresult$.interpretation <a
|
|
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>
|
|
|
|
<h3><a id="A_3_7" name="A_3_7"></a>A.3.7 Event Handler (must have) FULLY
|
|
COVERED</h3>
|
|
|
|
<p>The markup language provides an explicit event handling mechanism for
|
|
specifying actions to be carried out when events are generated in a dialog
|
|
state.</p>
|
|
|
|
<p>Event handlers can be ordered so that if multiple event handlers match the
|
|
current event, only the handler with the highest ranking is executed. By
|
|
default, event handler ranking is based on proximity and specificity: i.e.
|
|
the handler closest in the event hierarchy with the most specific matching
|
|
conditions.</p>
|
|
|
|
<p>Actions can be conditional upon variable assignments, as well as the type
|
|
and content of events (e.g. input events specifying media, content,
|
|
confidence, and so on).</p>
|
|
|
|
<p>Actions include: the binding of variables with information, for example,
|
|
information contained in events; transition to another dialog state
|
|
(including the current state).</p>
|
|
|
|
<h4><a id="CA_3_7" name="CA_3_7"></a>Requirement Coverage</h4>
|
|
|
|
<p><catch> <a
|
|
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>
|
|
|
|
<p><transition> <a
|
|
href="http://www.w3.org/TR/2005/WD-ccxml-20050629/">CCXML 1.0</a></p>
|
|
|
|
<h3><a id="A_3_8" name="A_3_8"></a>A.3.8 Builtin Event Handlers (should have)
|
|
FULLY COVERED</h3>
|
|
|
|
<p>The markup language can provide implicit event handlers which provide
|
|
default handling of, for example, timeout and error events as well as
|
|
handlers for situations, such as confirmation and clarification, where there
|
|
is a transition to a implicit dialog state. For example, there can be a
|
|
default handler for user input events such that if the recognition confidence
|
|
score is below a given threshold, then the input is confirmed in a
|
|
sub-dialog.</p>
|
|
|
|
<p>Properties of implicit event handlers (thresholds, counters, locale, etc)
|
|
can be explicitly customized in the markup language.</p>
|
|
|
|
<p>Implicit event handlers are always overridden by explicit handlers.</p>
|
|
|
|
<h4><a id="CA_3_8" name="CA_3_8"></a>Requirement Coverage</h4>
|
|
|
|
<p>Default event handlers (nomatch, noinput, error, etc...) <a
|
|
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>
|
|
|
|
<h3><a id="A_3_9" name="A_3_9"></a>A.3.9 Output Content and Events (must
|
|
have) FULLY COVERED</h3>
|
|
|
|
<p>The markup language complies with the requirements developed by the Speech
|
|
Synthesis Markup Subgroup for output text content and parameter settings for
|
|
the output device. Requirements on multimodal output will be co-ordinated by
|
|
the Multimodal Interaction Subgroup (cf. Section 1).</p>
|
|
|
|
<p>In addition, the markup supports the following output features (if not
|
|
already defined in the Synthesis Markup):</p>
|
|
<ol>
|
|
<li>Pre-recorded audio file output</li>
|
|
<li>Streamed audio</li>
|
|
<li>Playing/synthesizing sounds such as tones and beeps</li>
|
|
<li>variable level of detail control over structured text</li>
|
|
</ol>
|
|
|
|
<p>The output device generates timestamped events including error events and
|
|
progress events (output started/stopped, current position).</p>
|
|
|
|
<h4><a id="CA_3_9" name="CA_3_9"></a>Requirement Coverage</h4>
|
|
|
|
<p><audio>, <prompt> <a
|
|
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>
|
|
|
|
<p><speak> and other SSML elements <a
|
|
href="http://www.w3.org/TR/speech-synthesis/">SSML 1.0</a></p>
|
|
|
|
<p>application.lastresult$.markname, application.lastresult$.marktime <a
|
|
href="http://www.w3.org/TR/2006/WD-voicexml21-20060915/">VoiceXML 2.1</a></p>
|
|
|
|
<h3><a id="A_3_10" name="A_3_10"></a>A.3.10 Richer Output (nice to have)
|
|
FULLY COVERED</h3>
|
|
|
|
<p>The markup language allows for richer output than variable substitution in
|
|
the output content. For example, natural language generation of output
|
|
content.</p>
|
|
|
|
<h4><a id="CA_3_10" name="CA_3_10"></a>Requirement Coverage</h4>
|
|
|
|
<p><prompt> <a
|
|
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>
|
|
|
|
<p><speak> and other SSML elements <a
|
|
href="http://www.w3.org/TR/speech-synthesis/">SSML 1.0</a></p>
|
|
|
|
<h3><a id="A_3_11" name="A_3_11"></a>A.3.11 Input Content and Events (must
|
|
have) FULLY COVERED</h3>
|
|
|
|
<p>The markup language complies with the requirements developed by the
|
|
Grammar Representation Subgroup for the representation of speech grammar
|
|
content. Requirements on multimodal input will be co-ordinated by the
|
|
Multimodal Interaction Subgroup (cf. Section 1).</p>
|
|
|
|
<p>The markup language can specify the activation and deactivation of
|
|
multiple speech grammars. These can be user-defined, or builtin grammars
|
|
(digits, date, time, money, etc).</p>
|
|
|
|
<p>The markup language can specify parameters for speech grammar content
|
|
including timeout parameters --- maximum initial silence, maximum utterance
|
|
duration, maximum within-utterance pause --- energy thresholds necessary for
|
|
bargein, etc.</p>
|
|
|
|
<p>The input device generates timestamped events including input timeout and
|
|
error events, progress events (utterance started, interference, etc), and
|
|
recognition result events (including content, interpretation/variable
|
|
bindings, confidence).</p>
|
|
|
|
<p>In addition to speech grammars, the markup language allows input content
|
|
and events to be specified for DTMF and keyboard devices.</p>
|
|
|
|
<h4><a id="CA_3_11" name="CA_3_11"></a>Requirement Coverage</h4>
|
|
|
|
<p>timeout, completetimeout, incompletetimeout, interdigittimeout,
|
|
termtimeout properties <a
|
|
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>
|
|
|
|
<p>application.lastresult$.interpretation, application.lastresult$.confidence
|
|
<a href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML
|
|
2.0</a></p>
|
|
|
|
<p>application.lastresult$.markname, application.lastresult$.marktime <a
|
|
href="http://www.w3.org/TR/2006/WD-voicexml21-20060915/">VoiceXML 2.1</a></p>
|
|
|
|
<p><grammar> and other elements <a
|
|
href="http://www.w3.org/TR/2004/REC-speech-grammar-20040316/">SRGS 1.0</a></p>
|
|
|
|
<h3><a id="A_4_1" name="A_4_1"></a>A.4.1 Event Handling (must have) FULLY
|
|
COVERED</h3>
|
|
|
|
<p>One key difference between contemporary event models (e.g. DOM Level 2,
|
|
'try-catch' in object-oriented programming) is whether the same event can be
|
|
handled by more than one event handler within the hierarchy. The markup
|
|
language must motivate whether it supports this feature or not.</p>
|
|
|
|
<h3><a id="A_4_2" name="A_4_2"></a>A.4.2 Logging (nice to have) FULLY
|
|
COVERED</h3>
|
|
|
|
<p>For development and testing it is important that data and events are to be
|
|
logged by the voice browser. At the most detailed level, this will include
|
|
logging of input and output audio data. A mechanism which allows logged data
|
|
to be retrieved from a voice browser, preferably via standard Internet
|
|
protocol (http, ftp, etc), is also required.</p>
|
|
|
|
<p>One approach is to require that the markup language can control logging
|
|
via, for example, an optional meta tag. Another approach is for logging to be
|
|
controlled by means other than the markup language, such as via proprietary
|
|
meta tags.</p>
|
|
|
|
<h4><a id="CA_4_2" name="CA_4_2"></a>Requirement Coverage</h4>
|
|
|
|
<p><log> <a
|
|
href="http://www.w3.org/TR/2004/REC-voicexml20-20040316/">VoiceXML 2.0</a></p>
|
|
|
|
<p><log> <a href="http://www.w3.org/TR/2005/WD-ccxml-20050629/">CCXML
|
|
1.0</a></p>
|
|
</body>
|
|
</html>
|