You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
614 lines
25 KiB
614 lines
25 KiB
<?xml version="1.0" encoding="UTF-8"?>
|
|
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
|
|
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
|
|
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
|
|
<head>
|
|
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
|
|
<title>Multimodal Application Developer Feedback</title>
|
|
<style type="text/css">
|
|
code { font-family: monospace; margin-left: 2em }
|
|
.ref { font-size: 80% }
|
|
.quote { margin-left: 5%; margin-right: 10% }
|
|
.definition { margin-left: 5%; margin-right: 10%; font-style: italic }
|
|
.diagram { text-align: center; font-size: 80%; font-weight: bold }
|
|
.changed {background-color: rgb(255, 255, 224)}
|
|
.deleted {background-color: rgb(240, 240, 240); text-decoration: line-through }
|
|
.comment {background-color: rgb(0, 204, 204)}
|
|
.pending {background-color: rgb(255, 224, 224)}
|
|
ul.toc li { list-style-type: none }
|
|
</style>
|
|
<link href="http://www.w3.org/StyleSheets/TR/W3C-WG-NOTE.css"
|
|
rel="stylesheet" type="text/css" />
|
|
</head>
|
|
|
|
<body xml:lang="en" lang="en">
|
|
|
|
<div class="head">
|
|
<a href="http://www.w3.org/"><img alt="W3C" height="48"
|
|
src="http://www.w3.org/Icons/w3c_home" width="72" /></a>
|
|
|
|
<h1>Multimodal Application Developer Feedback</h1>
|
|
|
|
<h2>W3C Working Group Note 14 April 2006</h2>
|
|
|
|
<dl>
|
|
<dt>This version:</dt>
|
|
<dd><a
|
|
href="http://www.w3.org/TR/2006/NOTE-mmi-dev-feedback-20060414/">http://www.w3.org/TR/2006/NOTE-mmi-dev-feedback-20060414/</a></dd>
|
|
<dt>Latest version:</dt>
|
|
|
|
<dd><a
|
|
href="http://www.w3.org/TR/mmi-dev-feedback/">http://www.w3.org/TR/mmi-dev-feedback/</a></dd>
|
|
<dt>Previous version:</dt>
|
|
<dd><em>This is the first publication.</em></dd>
|
|
<dt>Editors:</dt>
|
|
<dd>Andrew Wahbe, VoiceGenie Technologies</dd>
|
|
<dd>Gerald McCobb, IBM</dd>
|
|
<dd>Klaus Reifenrath, Nuance</dd>
|
|
<dd>Raj Tumuluri, Openstream</dd>
|
|
<dd>Sunil Kumar, V-Enable</dd>
|
|
</dl>
|
|
|
|
<p class="copyright"><a
|
|
href="http://www.w3.org/Consortium/Legal/ipr-notice#Copyright">Copyright</a>
|
|
© 2006 <a href="http://www.w3.org/"><acronym
|
|
title="World Wide Web Consortium">W3C</acronym></a><sup>®</sup> (<a
|
|
href="http://www.csail.mit.edu/"><acronym
|
|
title="Massachusetts Institute of Technology">MIT</acronym></a>, <a
|
|
href="http://www.ercim.org/"><acronym
|
|
title="European Research Consortium for Informatics and Mathematics">ERCIM</acronym></a>,
|
|
|
|
<a href="http://www.keio.ac.jp/">Keio</a>), All Rights Reserved. W3C <a
|
|
href="http://www.w3.org/Consortium/Legal/ipr-notice#Legal_Disclaimer">liability</a>,
|
|
<a href="http://www.w3.org/Consortium/Legal/ipr-notice#W3C_Trademarks">trademark</a>
|
|
and <a href="http://www.w3.org/Consortium/Legal/copyright-documents">document
|
|
use</a> rules apply.</p>
|
|
</div>
|
|
|
|
<!-- end of head div -->
|
|
|
|
<hr title="Separator for header" />
|
|
|
|
<h2 id="abstract">Abstract</h2>
|
|
|
|
<p>Several years of multimodal application development in
|
|
various business areas and on various device platforms has
|
|
provided developers enough experience to provide detailed
|
|
feedback about what they like, dislike, and want to see
|
|
improve and continue. This experience is provided here as
|
|
an input to the specifications under development in the W3C
|
|
<a href="http://www.w3.org/2002/mmi/">Multimodal Interaction</a>
|
|
and <a href="http://www.w3.org/voice">Voice Browser</a>
|
|
Activities.</p>
|
|
|
|
<h2 id="status">Status of this Document</h2>
|
|
|
|
<p><em>This section describes the status of this document at
|
|
the time of its publication. Other documents may supersede this
|
|
document. A list of current W3C publications and the latest revision
|
|
of this technical report can be found in the
|
|
<a href="http://www.w3.org/TR/">W3C technical reports
|
|
index</a> at http://www.w3.org/TR/.</em></p>
|
|
|
|
<p>This document is a W3C Working Group Note. It represents
|
|
the views of the W3C Multimidal Interaction Working Group at
|
|
the time of publication. The document may be updated as new
|
|
technologies emerge or mature. Publication as a Working
|
|
Group Note does not imply endorsement by the W3C Membership.
|
|
This is a draft document and may be updated, replaced or
|
|
obsoleted by other documents at any time. It is inappropriate
|
|
to cite this document as other than work in progress.</p>
|
|
|
|
<p>This document is one of a series produced by the
|
|
<a href="http://www.w3.org/2002/mmi/Group/">Multimodal
|
|
Interaction Working Group</a> <em>(<a
|
|
href="http://cgi.w3.org/MemberAccess/AccessRequest">Member
|
|
Only Link</a>)</em>, part of the <a
|
|
href="http://www.w3.org/2002/mmi/">W3C Multimodal
|
|
Interaction Activity</a>. The MMI activity statement can
|
|
be seen at
|
|
<a href="http://www.w3.org/2002/mmi/Activity">http://www.w3.org/2002/mmi/Activity</a>.</p>
|
|
|
|
<p>Comments on this document can be sent to <a
|
|
href="mailto:www-multimodal@w3.org">www-multimodal@w3.org</a>,
|
|
the public forum for discussion of the W3C's work on
|
|
Multimodal Interaction. To subscribe, send an email to
|
|
<a href="mailto:www-multimodal-request@w3.org">www-multimodal-request@w3.org</a>
|
|
with the word subscribe in the subject line (include the
|
|
word unsubscribe if you want to unsubscribe). The
|
|
<a href="http://lists.w3.org/Archives/Public/www-multimodal/">archive</a>
|
|
for the list is accessible online.</p>
|
|
|
|
<p>This document was produced by a group operating under the <a href="http://www.w3.org/Consortium/Patent-Policy-20040205/">5 February 2004 W3C Patent Policy</a>. This document is informative only. W3C maintains a <a rel="disclosure" href="http://www.w3.org/2004/01/pp-impl/34607/status">public list of any patent disclosures</a> made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains <a href="http://www.w3.org/Consortium/Patent-Policy-20040205/#def-essential">Essential Claim(s)</a> must disclose the information in accordance with <a href="http://www.w3.org/Consortium/Patent-Policy-20040205/#sec-Disclosure">section 6 of the W3C Patent Policy</a>.</p>
|
|
|
|
<h2 id="contents">Table of Contents</h2>
|
|
|
|
<ul class="toc">
|
|
<li>1 <a href="#s1">Introduction</a></li>
|
|
<li>2 <a href="#s2">What developers liked</a>
|
|
<ul>
|
|
<li>2.1 <a href="#s2.1">Reusable and pluggable modality
|
|
components</a></li>
|
|
<li>2.2 <a href="#s2.2">Modular modality components</a></li>
|
|
<li>2.3 <a href="#s2.3">Declarative synchronization between
|
|
modalities</a></li>
|
|
<li>2.4 <a href="#s2.4">Scripting and semantic interpretation
|
|
</a></li>
|
|
<li>2.5 <a href="#s2.5">Styling</a></li>
|
|
</ul></li>
|
|
<li>3 <a href="#s3">What developers would like to see</a>
|
|
<ul>
|
|
<li>3.1 <a href="#s3.1">Global grammars</a></li>
|
|
<li>3.2 <a href="#s3.2">Speech grammars for HTML links
|
|
and controls</a></li>
|
|
<li>3.3 <a href="#s3.3">Speech prompts for voice-enabled
|
|
HTML links and controls</a></li>
|
|
<li>3.4 <a href="#s3.4">Speech-enabled widgets</a></li>
|
|
<li>3.5 <a href="#s3.5">Use speech to activate links
|
|
and change focus</a></li>
|
|
<li>3.6 <a href="#s3.6">Back functionality</a></li>
|
|
</ul></li>
|
|
<li>4 <a href="#s4">What developers would like to see
|
|
continue and improve</a>
|
|
<ul>
|
|
<li>4.1 <a href="#s4.1">Support for both off-line and
|
|
on-line multimodal interaction</a></li>
|
|
<li>4.2 <a href="#s4.2">Support for events distributed
|
|
over the network</a></li>
|
|
<li>4.3 <a href="#s4.3">Support for implicit events</a></li>
|
|
<li>4.4 <a href="#s4.4">VoiceXML tag and feature support</a></li>
|
|
<li>4.5 <a href="#s4.5">Support for both directed and
|
|
user-initiated dialogs</a></li>
|
|
<li>4.6 <a href="#s4.6">Mixed-initiative interaction</a></li>
|
|
<li>4.7 <a href="#s4.7">Access to speech confidence scores
|
|
and n-best list by the application</a></li>
|
|
<li>4.8 <a href="#s4.8">Access to device details</a></li>
|
|
<li>4.9 <a href="#s4.9">Choice of ASR</a></li>
|
|
<li>4.10 <a href="#s4.10">Controlling N-Best choice of
|
|
ASR</a></li>
|
|
</ul></li>
|
|
</ul>
|
|
|
|
<hr />
|
|
|
|
<h2 id="s1">1 Introduction</h2>
|
|
|
|
<p>IBM, VoiceGenie Technologies, Nuance, V-Enable, and
|
|
OpenStream customers have been developing multimodal
|
|
applications in a broad range of business areas, including
|
|
Field-Force Productivity, Health Care and Life Sciences,
|
|
Warehouse and Distribution, Industrial Plant Floor, Financial
|
|
and Information Services, Directory Assistance, and the
|
|
Mobile Web. Customer device platforms have included PC's
|
|
(desktops, laptops, and tablets), PDA's, kiosks, appliances,
|
|
equipment consoles, and web browser-based smart phones.
|
|
The multimodal applications primarily extended the traditional
|
|
GUI mode of interaction with speech, with the location of the
|
|
speech services either local on the device or distributed on
|
|
a remote server. Several XML markup languages were used to
|
|
develop these applications, including <a
|
|
href="http://www.voicexml.org/specs/multimodal/x+v/12/">XHTML+Voice
|
|
(X+V)</a> and <a href="http://www.nuance.com/xhmi/">xHMI</a>.</p>
|
|
|
|
<p>During the process of developing these applications,
|
|
developers found features they liked about the development
|
|
environment they were using and found features they thought
|
|
were lacking. Their experiences were collected and are
|
|
summarized here as feedback for the W3C <a
|
|
href="http://www.w3.org/2002/mmi/">Multimodal Interaction</a>
|
|
and <a href="http://www.w3.org/Voice/">Voice Browser</a>
|
|
Working Groups to consider when specifying future multimodal
|
|
and voice authoring capabilities. We also solicit comments
|
|
from the wider multimodal development community on the extent
|
|
to which these observations are consistent with their own
|
|
development experiences.</p>
|
|
|
|
<p>The developers surveyed were expert in various programming
|
|
languages and application environments. Developers expert in
|
|
C/C++ and Java generally speech enabled native applications on
|
|
small devices. Device platforms included Windows Mobile, BREW,
|
|
embedded Linux, Symbian, and J2ME. Developers expert in the
|
|
Web generally speech enabled browser based applications. Web
|
|
browser platforms included Opera, Access' NetFront, Windows
|
|
Mobile Internet Explorer, and the Nokia Series 60. Web
|
|
developers understood the web programming model very well but
|
|
generally were new to speech. They liked XHTML, XML namespaces,
|
|
XML Events, CSS, JavaScript, and VoiceXML with its ability to
|
|
hide platform details. Developers expert in VoiceXML and
|
|
dictation had backgrounds in speech and telephony and generally
|
|
worked on adding GUI to voice and dictation applications.</p>
|
|
|
|
<h2 id="s2">2 What developers liked</h2>
|
|
|
|
<h3 id="s2.1">2.1 Reusable and pluggable modality
|
|
components</h3>
|
|
|
|
<p>Developers preferred to develop modality components that
|
|
are reusable and pluggable.</p>
|
|
|
|
<h4 id="s2.1.1">Use Case: VoiceXML modality component</h4>
|
|
|
|
<p>A VoiceXML modality component is reused without
|
|
modification in different multimodal applications.</p>
|
|
|
|
<h3 id="s2.2">2.2 Modular modality components</h3>
|
|
|
|
<p>Modular modality components are preferred because they
|
|
can be authored separately by the modality experts.</p>
|
|
|
|
<h4 id="s2.2.1">Use Case: XHTML and VoiceXML modality
|
|
components</h4>
|
|
|
|
<p>A VoiceXML expert authors the voice modality component
|
|
and an XHTML expert authors the GUI component. Modality
|
|
component coordination is handled independently, for example,
|
|
by X+V <sync> and <cancel> elements.</p>
|
|
|
|
<h3 id="s2.3">2.3 Declarative synchronization between
|
|
modalities</h3>
|
|
|
|
<p>Implicit event support includes both implicit event
|
|
generation and implicit event handling. At different
|
|
stages in the operation of the modality component, there
|
|
will be either event generation or event handling by the
|
|
component itself.</p>
|
|
|
|
<h4 id="s2.3.1">Use Case: X+V <sync> element</h4>
|
|
|
|
<p>The X+V <sync> element provides a declarative
|
|
synchronization of XHTML form control elements and the
|
|
VoiceXML <field> element. The <sync> element
|
|
allows input from one speech or visual modality to set
|
|
the field in the other modality. Also, setting the focus
|
|
of an <input> element that is synchronized with a
|
|
VoiceXML field updates the FIA to visit that VoiceXML
|
|
field.</p>
|
|
|
|
<h3 id="s2.4">2.4 Scripting and semantic interpretation</h3>
|
|
|
|
<p>Developers liked support for modality component
|
|
integration via scripting and semantic interpretation.</p>
|
|
|
|
<h4 id="s2.4.1">Use Case: Timed notifications of an
|
|
operating room medical procedure</h4>
|
|
|
|
<p>A timed notification changes dynamically as time
|
|
progresses. The notification depends on the current
|
|
state of the application as well as the notification
|
|
state. For a GUI+speech multimodal application a
|
|
notification may be a TTS output and a new GUI page,
|
|
corresponding to the next step of an operating room
|
|
medical procedure.</p>
|
|
|
|
<h4 id="s2.4.2">Use Case: Integrated pen and speech
|
|
interaction with a map</h4>
|
|
|
|
<p>The user says "zoom in here" while drawing an area on a map.
|
|
The application responds by enlarging the detail of the area
|
|
within the boundary drawn by the user.</p>
|
|
|
|
<h3 id="s2.5">2.5 Styling</h3>
|
|
|
|
<p>Developers liked CSS for styling each modality. For example,
|
|
the CSS3 module for styling speech based on SSML was useful
|
|
for styling the voice modality.</p>
|
|
|
|
<h4 id="s2.5.1">Use Case: TTS rendering of a news article
|
|
on the web</h4>
|
|
|
|
<p>The news article is read by the computer in a realistic
|
|
voice that uses a different sounding voices for headlines,
|
|
section headings, and text. There are also a pauses between
|
|
paragraphs and before article headlines.</p>
|
|
|
|
<h2 id="s3">3 What developers would like to see</h2>
|
|
|
|
<h3 id="s3.1">3.1 Global grammars</h3>
|
|
|
|
<p>Developers would like support for top-level ("global")
|
|
grammars that are active across multiple windows (e.g.,
|
|
HTML frames or portlets) of the application.</p>
|
|
|
|
<h4 id="s3.1.1">Use Case: Top-level menus</h4>
|
|
|
|
<p>An application has top level menus "buy", "sell", and
|
|
"trade". At any time while involved in the "buy" dialog,
|
|
a user can say "trade" and be switched to the "trade"
|
|
multimodal dialog.</p>
|
|
|
|
<h3 id="s3.2">3.2 Speech grammars for HTML links and
|
|
controls</h3>
|
|
|
|
<p>Developers would like support for explicitly adding
|
|
speech grammars to activate HTML links and controls.
|
|
An automatically created speech grammar may not capture
|
|
everything the user may say.</p>
|
|
|
|
<h4 id="s3.2.1">Use Case: Hotel booking application:
|
|
get list of hotels</h4>
|
|
|
|
<p>Before booking a hotel reservation the user looks up a list
|
|
of available hotels. On the page along with the reservation is
|
|
a link labeled "Available Hotels." The developer anticipates that
|
|
besides "available hotels", the user may say "show me the
|
|
available hotels" or ask "what hotels are available", and adds
|
|
these two phrases to the grammar for activating the link.</p>
|
|
|
|
<h4 id="s3.2.2">Use Case: Hotel booking application:
|
|
submit reservation</h4>
|
|
|
|
<p>The reservation form's submit button says "submit reservation",
|
|
but the developer anticipates that a user might say "submit booking"
|
|
instead, and adds "submit booking" to the grammar for activating
|
|
the button.</p>
|
|
|
|
<h3 id="s3.3">3.3 Speech prompts for voice-enabled HTML
|
|
links and controls</h3>
|
|
|
|
<p>Developers would like support for explicitly adding speech
|
|
prompts to voice-enabled HTML hyperlinks and controls. The
|
|
prompts can provide more information than the visual labels
|
|
attached to the HTML hyperlinks and input fields.</p>
|
|
|
|
<h4 id="s3.3.1">Use Case: Hotel booking application:
|
|
enter Hotel name</h4>
|
|
|
|
<p>The user is prompted to enter a hotel name with the
|
|
following TTS: "please enter a hotel name. You can get a
|
|
list of available hotels by saying 'show me available
|
|
hotels.'"</p>
|
|
|
|
<h3 id="s3.4">3.4 Speech-enabled widgets</h3>
|
|
|
|
<p>Developers would like to see speech enabled UI widgets
|
|
which contain a simple dialog flow (e.g. widgets which contain
|
|
confirmation or disambiguation steps). This allows an author
|
|
to configure the dialog properties (prompts, grammars,
|
|
confirmation-mode, confidence thresholds, etc.) of an HTML
|
|
control or hyperlink.</p>
|
|
|
|
<h4 id="s3.4.1">Use Case: Hotel booking application:
|
|
confirm hotel</h4>
|
|
|
|
<p>The user says the name of one of the available hotels.
|
|
The application repeats the name of the hotel back to the
|
|
user and asks if it is correct. If the user says 'yes' then
|
|
the application fills in the HTML field with the user's input.</p>
|
|
|
|
<h3 id="s3.5">3.5 Use speech to activate links and change
|
|
focus</h3>
|
|
|
|
<p>It should be easy to use speech to do more than fill in
|
|
HTML form controls. For example, there should be declarative
|
|
support for activating an HTML link or changing focus within
|
|
an HTML page.</p>
|
|
|
|
<h4 id="s3.5.1">Use Case: Speech enabled bookmark page</h4>
|
|
|
|
<p>A page that displays the user's bookmarks is speech-enabled
|
|
such that each bookmark has an associated grammar for moving
|
|
the browser to the bookmarked page.</p>
|
|
|
|
<h3 id="s3.6">3.6 Back functionality</h3>
|
|
|
|
<p>Developers like to see support for a consistent and
|
|
intuitive "back" handling across modalities. The browser
|
|
"Back" multimodal functionality should be built-in and
|
|
not require custom code.</p>
|
|
|
|
<h4 id="s3.6.1">Use Case: Browser "back" button</h4>
|
|
|
|
<p>The user can either press the browser back button or
|
|
say "browser go back" to return to the previous multimodal
|
|
page. All spoken commands which control the browser are
|
|
preceded by "browser" so there is no collision with an
|
|
application grammar.</p>
|
|
|
|
<h2 id="s4">4 What developers would like to see continue
|
|
and improve</h2>
|
|
|
|
<h3 id="s4.1">4.1 Support for both off-line and on-line
|
|
multimodal interaction</h3>
|
|
|
|
<p>Multimodal interaction should be supported both for
|
|
applications that are on-line, that is, are connected to
|
|
the network, as well as for off-line applications. If the
|
|
multimodal application goes from an on-line to an off-line
|
|
state, multimodal interaction should still be supported by
|
|
the modality components that run locally on the device.</p>
|
|
|
|
<h4 id="s4.1.1">Use Case: Access of medical information
|
|
while walking down a hallway</h4>
|
|
|
|
<p>A doctor carrying a wireless tablet accesses patient
|
|
medical information while walking down a hallway. Loss of
|
|
wireless connectivity does not prevent the multimodal
|
|
application from interacting with the doctor or presenting
|
|
information it has stored on the doctor's tablet.</p>
|
|
|
|
<h4 id="s4.1.2">Use Case: Multimodal application in hospital
|
|
operating room</h4>
|
|
|
|
<p>An off-line multimodal application in an operating room
|
|
delivers timely instructions to the doctor.</p>
|
|
|
|
<h3 id="s4.2">4.2 Support for events distributed over
|
|
the network</h3>
|
|
|
|
<p>Because a modality may be distributed on a remote server,
|
|
there must be support for distributed events between a
|
|
modality and the interaction manager.</p>
|
|
|
|
<h4 id="s4.2.1">Use Case: Driving directions</h4>
|
|
|
|
<p>A user accesses a multimodal driving directions application
|
|
using a cell-phone. The application tells the user to turn
|
|
right at the next intersection. An arrow pointing right pops
|
|
up over a map. The application had received an event to
|
|
display an arrow from the server.</p>
|
|
|
|
<h3 id="s4.3">4.3 Support for implicit events</h3>
|
|
|
|
<p>Implicit event support includes both implicit event
|
|
generation and implicit event handling. At different
|
|
stages in the operation of the modality component, there
|
|
will be either event generation or event handling by the
|
|
component itself. For example, the VoiceXML modality
|
|
component could implicitly generate a focus event when
|
|
the FIA selects a new form input item.</p>
|
|
|
|
<h4 id="s4.3.1">Use Case: Hotel booking application:
|
|
name, address, phone number</h4>
|
|
|
|
<p>A hotel booking application has a form with separate
|
|
HTML input fields for entering name, street address, city,
|
|
state and phone number. When the user selects one of the
|
|
fields the user hears a prompt for entering the correction
|
|
information into the field. The visual input focus is
|
|
coordinated with the speech input focus.</p>
|
|
|
|
<h3 id="s4.4">4.4 VoiceXML tag and feature support</h3>
|
|
|
|
<p>VoiceXML support should include, for example, the
|
|
<object> and <mark> tags and the "record
|
|
while recognition is in progress" feature.</p>
|
|
|
|
<h4 id="s4.4.1">Use case: Windows program for calculating
|
|
stock purchase totals</h4>
|
|
|
|
<p>The <object> element can be used to load a
|
|
reusable platform-specific plug-in. For example, the
|
|
application would load a Windows program which calculates
|
|
stock purchase totals using the <object> element.</p>
|
|
|
|
<h4 id="s4.4.2">Use case: Read part of an e-mail message</h4>
|
|
|
|
<p>The <mark> tag can be used to mark how much of
|
|
the text was actually read before the user left the page.
|
|
When the user returns to the page the rest of the text can
|
|
be read beginning where the user left off.</p>
|
|
|
|
<h4 id="s4.4.3">Use case: Unrecognized user input</h4>
|
|
|
|
<p>The recording of an unrecognized user input can be
|
|
logged by the speech recognizer.</p>
|
|
|
|
<h3 id="s4.5">4.5 Support for both directed and
|
|
user-initiated dialogs</h3>
|
|
|
|
<p>There must be arbitrary as well as procedural speech
|
|
access to the visual application. For a dialog mechanism
|
|
used in conjunction with a visual form there should be
|
|
support for user-initiated dialogs. For example, the
|
|
user should be able to jump to arbitrary points in the
|
|
dialog by changing the visual focus (e.g., by clicking
|
|
on a text box).</p>
|
|
|
|
<h4 id="s4.5.1">Use Case: Form filling for air travel
|
|
reservation</h4>
|
|
|
|
<p>The air travel reservation application takes the user
|
|
step by step through making a reservation, beginning with
|
|
the origin and destination of the flight. After the user
|
|
has been given a selection of flights, the user clicks on
|
|
the visual departure date field to change the departure date.</p>
|
|
|
|
<h4 id="s4.5.2">Use Case: Application with two HTML forms</h4>
|
|
|
|
<p>The user is taken step-by-step through filling out a
|
|
set of HTML fields in a form. Before all the fields have
|
|
been filled, the user clicks on a field belonging to the
|
|
other form.</p>
|
|
|
|
<h3 id="s4.6">4.6 Mixed-initiative interaction</h3>
|
|
|
|
<p>Dialog mechanisms that combine speech and text input
|
|
must support mixed-initiative interaction.</p>
|
|
|
|
<h4 id="s4.6.1">Use Case: Flight reservation application</h4>
|
|
|
|
<p>A flight reservation application has separate HTML
|
|
input fields for entering destination airport, date of
|
|
travel and seating class. With a single utterance "I'd
|
|
like to go to San Francisco on April 20th, business class"
|
|
the user fills in all the fields at one time.</p>
|
|
|
|
<h3 id="s4.7">4.7 Access to speech confidence scores
|
|
and n-best list by the application</h3>
|
|
|
|
<p>Confidence scores and n-best lists are useful for
|
|
example to allow the user to pick from a set of results
|
|
supplied by an input recognizer.</p>
|
|
|
|
<h4 id="s4.7.1">Use Case: Select a football player</h4>
|
|
|
|
<p>A user says the name of a favorite football player.
|
|
A number of players matched the user's input with the same
|
|
low confidence score. Instead of asking the user to repeat
|
|
the name, the application displays a visual list of player
|
|
names that was matched. The user selects a name from the
|
|
list.</p>
|
|
|
|
<h3 id="s4.8">4.8 Access to device details</h3>
|
|
|
|
<p>The developer would like access to device information
|
|
such as, for example, the cell phone number, phone model,
|
|
and display screen size. Typically in any mobile
|
|
application the content is very specific to the device and
|
|
at times personalized for the user. Access to device
|
|
specific details such device model (e.g., Nokia 6680) helps
|
|
the application reduce the grammar size and render
|
|
device specific content. Access to user information such
|
|
as the phone number allows the application to personalize
|
|
the content for the user.</p>
|
|
|
|
<h4 id="s4.8.1">Use Case: Mobile appointment application</h4>
|
|
|
|
<p>When user 'George' accesses the appointment application
|
|
the application says "Welcome 'George'" and presents a list
|
|
of appointments for the day. The user can select any of his
|
|
appointments by saying an appointment label shown on his phone.
|
|
Each label is short enough to fit entirely on George's display.</p>
|
|
|
|
<h3 id="s4.9">4.9 Choice of ASR</h3>
|
|
|
|
<p>The developer would like to have more control over ASR.
|
|
An example is the capability of a multimodal application to
|
|
choose between a local ASR or network based ASR depending
|
|
on the location of the grammar. The developer should be
|
|
allowed to pick the ASR depending on the application logic.</p>
|
|
|
|
<h4 id="s4.9.1">Use Case: Music search mobile application</h4>
|
|
|
|
<p>In a music search mobile application the application uses
|
|
network-based ASR to perform a search for a particular
|
|
Artist/Album such as 'Green Day', '50 Cent' etc. In case
|
|
of network-based recognition the grammar is changing
|
|
dynamically and is large in size. The same music application
|
|
may use local ASR for the purpose navigating through the
|
|
application using commands such as 'Home', 'Next Page' etc.</p>
|
|
|
|
<h3 id="s4.10">4.10 Controlling N-Best choice of ASR</h3>
|
|
|
|
<p>The application should be able to control the number of
|
|
results it wants from ASR based on either a number N (say
|
|
return top 5 matches) or confidence score (say return >
|
|
0.8 score). The developer should be able to author this
|
|
N-Best list control.</p>
|
|
|
|
<h4 id="s4.10.1">Use Case: Select a football player mobile
|
|
application</h4>
|
|
|
|
<p>As with the previous football player selection use case,
|
|
the list of players is visually displayed for the user to
|
|
select. The user can make a selection from the visual
|
|
presentation. The ASR may return more than 10 results as
|
|
part of its N-Best response mechanism. However, the application
|
|
depending on the screen size may choose to display only the
|
|
top 5 entries on the screen. The application requests only
|
|
the top 5 players in the N-best result instead of receiving
|
|
10 results and then ignoring the last 5 results.</p>
|
|
|
|
<hr />
|
|
</body>
|
|
</html>
|