You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
1448 lines
56 KiB
1448 lines
56 KiB
<?xml version="1.0" encoding="UTF-8"?>
|
|
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
|
|
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
|
|
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
|
|
<head>
|
|
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
|
|
<title>Common Sense Suggestions for Developing Multimodal User
|
|
Interfaces</title>
|
|
<style type="text/css">
|
|
ul.toc { font-size: 200%; }
|
|
ul.toc li { list-style: none; font-size: 80%; margin-top: 0.5em; }
|
|
li { margin-top: 0.5em; }
|
|
table { width: 100%; border: none; margin-top: 1em;
|
|
padding: 0.1em; background-color: gray }
|
|
td, th { border: none; background: white; padding: 0.3em; margin: 0.1em }
|
|
.suggestion { font-weight: bold; font-style: italic; margin-left: 1em; }
|
|
caption { font-style: italic; font-size: 80% }
|
|
</style>
|
|
<link href="http://www.w3.org/StyleSheets/TR/W3C-WG-NOTE.css"
|
|
rel="stylesheet" type="text/css" />
|
|
</head>
|
|
<body xml:lang="en" lang="en">
|
|
<div class="head">
|
|
<a href="http://www.w3.org/"><img alt="W3C" height="48"
|
|
src="http://www.w3.org/Icons/w3c_home" width="72" /></a>
|
|
|
|
<h1>Common Sense Suggestions for Developing Multimodal
|
|
User Interfaces</h1>
|
|
|
|
<h2>W3C Working Group Note 11 September 2006</h2>
|
|
|
|
<dl>
|
|
<dt>This version:</dt>
|
|
<dd><a
|
|
href="http://www.w3.org/TR/2006/NOTE-mmi-suggestions-20060911/">http://www.w3.org/TR/2006/NOTE-mmi-suggestions-20060911/</a></dd>
|
|
<dt>Latest version:</dt>
|
|
|
|
<dd><a
|
|
href="http://www.w3.org/TR/mmi-suggestions/">http://www.w3.org/TR/mmi-suggestions/</a></dd>
|
|
<dt>Previous version:</dt>
|
|
<dd><em>This is the first publication.</em></dd>
|
|
<dt>Editors:</dt>
|
|
<dd>Jim Larson, Intel</dd>
|
|
</dl>
|
|
|
|
<p class="copyright"><a
|
|
href="http://www.w3.org/Consortium/Legal/ipr-notice#Copyright">Copyright</a>
|
|
© 2006 <a href="http://www.w3.org/"><acronym
|
|
title="World Wide Web Consortium">W3C</acronym></a><sup>®</sup> (<a
|
|
href="http://www.csail.mit.edu/"><acronym
|
|
title="Massachusetts Institute of Technology">MIT</acronym></a>, <a
|
|
href="http://www.ercim.org/"><acronym
|
|
title="European Research Consortium for Informatics and Mathematics">ERCIM</acronym></a>,
|
|
|
|
<a href="http://www.keio.ac.jp/">Keio</a>), All Rights Reserved. W3C <a
|
|
href="http://www.w3.org/Consortium/Legal/ipr-notice#Legal_Disclaimer">liability</a>,
|
|
<a href="http://www.w3.org/Consortium/Legal/ipr-notice#W3C_Trademarks">trademark</a>
|
|
and <a href="http://www.w3.org/Consortium/Legal/copyright-documents">document
|
|
use</a> rules apply.</p>
|
|
</div>
|
|
|
|
<!-- end of head div -->
|
|
|
|
<hr title="Separator for header" />
|
|
|
|
<h2 id="abstract">Abstract</h2>
|
|
|
|
<p>This document is based on the accumulated experience of several
|
|
years of developing multimodal applications. It provides a
|
|
collection of common sense advice for developers of multimodal
|
|
user interfaces.</p>
|
|
|
|
<h2 id="status">Status of this Document</h2>
|
|
|
|
<p><em>This section describes the status of this document at
|
|
the time of its publication. Other documents may supersede this
|
|
document. A list of current W3C publications and the latest revision
|
|
of this technical report can be found in the
|
|
<a href="http://www.w3.org/TR/">W3C technical reports
|
|
index</a> at http://www.w3.org/TR/.</em></p>
|
|
|
|
<p>This document is a W3C Working Group Note. It represents
|
|
the views of the W3C Multimodal Interaction Working Group at
|
|
the time of publication. The document may be updated as new
|
|
technologies emerge or mature. Publication as a Working
|
|
Group Note does not imply endorsement by the W3C Membership.
|
|
This is a draft document and may be updated, replaced or
|
|
obsoleted by other documents at any time. It is inappropriate
|
|
to cite this document as other than work in progress.</p>
|
|
|
|
<p>This document is one of a series produced by the
|
|
<a href="http://www.w3.org/2002/mmi/Group/">Multimodal
|
|
Interaction Working Group</a> <em>(<a
|
|
href="http://cgi.w3.org/MemberAccess/AccessRequest">Member
|
|
Only Link</a>)</em>, part of the <a
|
|
href="http://www.w3.org/2002/mmi/">W3C Multimodal
|
|
Interaction Activity</a>. The MMI activity statement can
|
|
be seen at
|
|
<a href="http://www.w3.org/2002/mmi/Activity">http://www.w3.org/2002/mmi/Activity</a>.</p>
|
|
|
|
<p>Comments on this document can be sent to <a
|
|
href="mailto:www-multimodal@w3.org">www-multimodal@w3.org</a>,
|
|
the public forum for discussion of the W3C's work on
|
|
Multimodal Interaction. To subscribe, send an email to
|
|
<a href="mailto:www-multimodal-request@w3.org">www-multimodal-request@w3.org</a>
|
|
with the word subscribe in the subject line (include the
|
|
word unsubscribe if you want to unsubscribe). The
|
|
<a href="http://lists.w3.org/Archives/Public/www-multimodal/">archive</a>
|
|
for the list is accessible online.</p>
|
|
|
|
<p>This document was produced by a group operating under the <a href="http://www.w3.org/Consortium/Patent-Policy-20040205/">5 February 2004 W3C Patent Policy</a>. This document is informative only. W3C maintains a <a rel="disclosure" href="http://www.w3.org/2004/01/pp-impl/34607/status">public list of any patent disclosures</a> made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains <a href="http://www.w3.org/Consortium/Patent-Policy-20040205/#def-essential">Essential Claim(s)</a> must disclose the information in accordance with <a href="http://www.w3.org/Consortium/Patent-Policy-20040205/#sec-Disclosure">section 6 of the W3C Patent Policy</a>.</p>
|
|
|
|
<h2 id="contents">Table of Contents</h2>
|
|
|
|
<ul class="toc">
|
|
<li><a href="#Four_Major_Principles">Four Major Principles</a></li>
|
|
<li>1. <a href="#Satisfy_real-world_constraints">Satisfy
|
|
Real-world Constraints</a>
|
|
|
|
<ul>
|
|
<li><a href="#Task-oriented_Suggestions">Task-oriented
|
|
Suggestions</a>
|
|
|
|
<ul>
|
|
<li>1.1 <a href="#G11"> Suggestion: For each task, use the easiest
|
|
mode available on the device.</a></li>
|
|
</ul>
|
|
</li>
|
|
<li><a href="#Physical_Suggestions">Physical Suggestions</a>
|
|
|
|
<ul>
|
|
<li>1.2 <a href="#G12"></a><a href=
|
|
"#G12">Suggestion: If the use's hands are
|
|
busy, then use speech.</a></li>
|
|
<li>1.3 <a href="#G13">Suggestion: If the user's eyes are busy,
|
|
then use speech.</a></li>
|
|
<li>1.4 <a href="#G14">Suggestion: If the user may be walking, use
|
|
speech for input.</a></li>
|
|
</ul>
|
|
</li>
|
|
<li><a href="#Environmental_Suggestions">Environmental
|
|
Suggestions</a>
|
|
|
|
<ul>
|
|
<li>1.5 <a href="#G15">Suggestion: If the user may be in a noisy
|
|
environment, then use a pen or keys</a></li>
|
|
<li>1.6 <a href="#G16">Suggestion: If the user's manual dexterity
|
|
may be impaired, then use speech.</a></li>
|
|
</ul>
|
|
</li>
|
|
</ul>
|
|
</li>
|
|
<li>2. <a href="#Communicate_Clearly_Concisely_and">Communicate
|
|
Clearly, Concisely, and Consistently with Users</a> <a href=
|
|
"#Consistency_Suggestions"></a>
|
|
|
|
<ul>
|
|
<li><a href="#Consistency_Suggestions">Consistency Suggestions</a>
|
|
|
|
<ul>
|
|
<li>2.1 <a href="#G21">Suggestion: Phrase all prompts
|
|
consistently.</a></li>
|
|
<li>2.2 <a href="#G22">Suggestion: Enable the user to speak keyword
|
|
utterances rather than natural language sentences.</a></li>
|
|
<li>2.3 <a href="#G23">Suggestion: Switch presentation modes only
|
|
when the information is not easily presented in the current
|
|
mode.</a></li>
|
|
<li>2.4 <a href="#G24">Suggestion: Make commands
|
|
consistent.</a></li>
|
|
<li>2.5 <a href="#G25">Suggestion: Make the focus consistent across
|
|
modes</a></li>
|
|
</ul>
|
|
</li>
|
|
<li><a href="#Organizational_Suggestions">Organizational
|
|
Suggestions</a>
|
|
|
|
<ul>
|
|
<li>2.6 <a href="#G26">Suggestion: Use audio to indicate the verbal
|
|
structure.</a></li>
|
|
<li>2.7 <a href="#G28">Suggestion: Use pauses to divide information
|
|
into natural "chunks."</a></li>
|
|
<li>2.8 <a href="#G29">Suggestion: Use animation and sound to show
|
|
transitions.</a></li>
|
|
<li>2.9 <a href="#G210">Use voice navigation to reduce the number
|
|
of screens.</a></li>
|
|
<li>2.10 <a href="#G211">Synchronize multiple modalities
|
|
appropriately.</a></li>
|
|
<li>2.11 <a href="#G212">Keep the user interface as simple as
|
|
possible</a>.</li>
|
|
</ul>
|
|
</li>
|
|
</ul></li>
|
|
|
|
<li>3. <a href="#Help_Users_Recover_Quickly_and">Help Users
|
|
Recover Quickly and Efficiently from Errors</a>
|
|
|
|
<ul>
|
|
<li><a href="#Conversational_Suggestions">Conversational
|
|
Suggestions</a>
|
|
|
|
<ul>
|
|
<li>3.1 <a href="#G31">Suggestion: Users tend to use the same mode
|
|
that was used to prompt them.</a></li>
|
|
<li>3.2 <a href="#G32">Suggestion: If privacy is not a concern,
|
|
use speech as output to provide commentary or help.</a></li>
|
|
<li>3.3 <a href="#G33">Suggestion: Use directed user interfaces
|
|
unless the user is always knowledgeable and experienced in the
|
|
domain</a>.</li>
|
|
<li>3.4 <a href="#G34">Suggestion: Always provide context sensitive
|
|
help for every field and command</a></li>
|
|
</ul>
|
|
</li>
|
|
<li><a href="#Reliability_Suggestions">Reliability Suggestions</a>
|
|
|
|
<ul>
|
|
<li>3.5 <a href="#G35">Suggestion: The user always should be able
|
|
to easily determine if the device is listening to the
|
|
user.</a></li>
|
|
<li>3.6 <a href="#G36">Suggestion: The user always should be able
|
|
to easily determine how much longer the device will be
|
|
operational.</a></li>
|
|
<li>3.7 <a href="#G37">Suggestion: Support at least two input modes
|
|
so one input mode can be used when the other cannot.</a></li>
|
|
<li>3.8 <a href="#G38">Suggestion: Present words recognized by the
|
|
speech recognition system on the display so the user can verify
|
|
they are correct.</a></li>
|
|
<li>3.9 <a href="#G39">Suggestion: Display the n-best list to
|
|
enable easy speech recognition error correction</a></li>
|
|
<li>3.10 <a href="#G310">Try to keep response times less than 5
|
|
seconds. Inform the user of longer response times</a></li>
|
|
</ul>
|
|
</li>
|
|
</ul>
|
|
</li>
|
|
|
|
<li>4. <a href="#Make_Users_Feel_Comfortable">Make Users
|
|
Comfortable</a>
|
|
|
|
<ul>
|
|
<li><a href="#SpeakingMode">Listening mode</a>
|
|
|
|
<ul>
|
|
<li>4.1 <a href="#G41">Suggestion: Speak after pressing a speak key
|
|
which automatically releases after the user finishes
|
|
speaking.</a><a href="#System_Status"></a></li>
|
|
</ul>
|
|
</li>
|
|
<li><a href="#System_Status">System Status</a>
|
|
|
|
<ul>
|
|
<li>4.2 <a href="#G42">Suggestion: Always present the current
|
|
system status to the user.</a></li>
|
|
</ul>
|
|
</li>
|
|
<li><a href="#Human_memory_Constraints">Human-memory
|
|
Constraints</a>
|
|
|
|
<ul>
|
|
<li>4.3 <a href="#G43">Suggestion: Use the screen to ease stress on
|
|
the user's short-term memory.</a></li>
|
|
</ul>
|
|
</li>
|
|
<li><a href="#Social_Suggestions">Social Suggestions</a>
|
|
|
|
<ul>
|
|
<li>4.4 <a href="#G44">Suggestion: If the user may need privacy,
|
|
use a display rather than render speech.</a></li>
|
|
<li>4.5 <a href="#G45">Suggestion: If the user may desire privacy,
|
|
use a pen or keys.</a></li>
|
|
<li>4.6 <a href="#G46">Suggestion: If the device may be used during
|
|
a business meeting, then use a pen or keys (with the keyboard
|
|
sounds turned off).</a></li>
|
|
</ul>
|
|
</li>
|
|
<li><a href="#Advertising_Suggestions">Advertising Suggestions</a>
|
|
|
|
<ul>
|
|
<li>4.7 <a href="#G47">Suggestion: Use animation and sound to
|
|
attract the user's attention.</a></li>
|
|
<li>4.8 <a href="#G48">Suggestion: Use landmarks to help the know
|
|
where he is.</a></li>
|
|
</ul>
|
|
</li>
|
|
<li><a href="#Ambience_Suggestion">Ambience Suggestion</a>
|
|
|
|
<ul>
|
|
<li>4.9 <a href="#G49">Suggestion: Use audio and graphics design to
|
|
set the mood and convey emotion in games and entertainment
|
|
applications.</a></li>
|
|
<li style="list-style: none"><a href="#Summary">Summary</a></li>
|
|
</ul>
|
|
</li>
|
|
</ul>
|
|
</li>
|
|
</ul>
|
|
|
|
<hr title="Separator for introduction" />
|
|
|
|
<h2 id="introduction">Introduction</h2>
|
|
|
|
<p>When fonts were first introduced, many messages looked like ransom notes from
|
|
kidnappers. When color was introduced, many reports looked like they had barely
|
|
survived an explosion in a paint factory. To avoid these annoying user interfaces,
|
|
developers adopted suggestions and best practices for using fonts and colors.</p>
|
|
|
|
<p>With the introduction of multiple modes of input-voice, pen, and
|
|
keys-inexperienced developers may design loud, confusing, and
|
|
annoying user interfaces that result in low user performance and
|
|
high user discontent. This document attempts to enumerate a
|
|
collection of commonsense suggestions for developing high
|
|
performance and high preference multimodal user interfaces. We have
|
|
collected suggestions, techniques, and principles from many diverse
|
|
disciplines to generate the following suggestions for developing
|
|
multimodal user interfaces.</p>
|
|
|
|
<p>This set of suggestions originated in a brainstorming session with some of my
|
|
students at the Oregon Graduate Institute of the Oregon Health and Sciences
|
|
University. I categorized the suggestions, and showed them to several multimodal
|
|
application developers, who added additional suggestions. These have been reviewed
|
|
and revised by the W3C Multimodal Interaction Working Group. The suggestions
|
|
will be reviewed by other relevant W3C working groups including Accessibility,
|
|
Internationalization, and Mobile Web Initiative Best Practices.</p>
|
|
|
|
<p>Again, these are commonsense suggestions. You may think that no
|
|
one would ever develop user interfaces that violate these
|
|
suggestions, but developers have violated commonsense suggestions
|
|
before and will likely do so again. Use these suggestions as a
|
|
checklist when you design a multimodal interface. These suggestions
|
|
should help you to construct a multimodal user interface that
|
|
improves user performance and satisfaction, so intended people can
|
|
use your application easily and effectively.</p>
|
|
|
|
<p>These suggestions can be used as follows:</p>
|
|
|
|
<ol>
|
|
<li>
|
|
<p>Review the suggestions before designing a multimodal user
|
|
interface. The suggestions will assist you in making decisions as
|
|
you design your multimodal user interface.</p>
|
|
<p>Review the suggestions after designing a multimodal user interface. Use
|
|
the suggestions as a check list to assess your design after it is completed.
|
|
Some designers rank their user interface with respect to each suggestion,
|
|
giving a high score if the user interface conforms to the suggestions and
|
|
a low score if it does not.</p>
|
|
</li>
|
|
<li>
|
|
<p>The suggestions are only suggestions. There are situations when
|
|
every suggestion should be overridden, and these suggestions are no
|
|
exception. If there are good reasons for not following a
|
|
suggestions, then ignore the suggestion.</p>
|
|
</li>
|
|
<li>
|
|
<p>Some users will want to configure their user interface to satisfy their personal
|
|
preferences. We encourage the use of configuration dialogs to help the user
|
|
achieve the configuration that is best for him or her. We also note that
|
|
many users are afraid of configuration, and are happy to use the user interface
|
|
"as is," without ever configuring the system.</p>
|
|
</li>
|
|
</ol>
|
|
|
|
<h2 id="Four_Major_Principles">Four Major Principles</h2>
|
|
|
|
<p>The suggestions are organized into four major principles of user
|
|
interface design. The following four principles determine how
|
|
quickly users are able to learn and how effectively they are able
|
|
to perform desired tasks with the user interface:</p>
|
|
|
|
<ol>
|
|
<li>Satisfy real-world constraints</li>
|
|
|
|
<li>Communicate clearly, concisely, and consistently with users</li>
|
|
|
|
<li>Help users recover quickly and efficiently from errors</li>
|
|
|
|
<li>Make users comfortable</li>
|
|
</ol>
|
|
|
|
<p>Multimodal user interface developers should follow the above four principles
|
|
and apply the following suggestions to avoid many of the potential usability
|
|
problems caused by using modes incorrectly.</p>
|
|
|
|
<h2 id="Satisfy_real-world_constraints">1. Satisfy
|
|
Real-world Constraints</h2>
|
|
|
|
<p>Real-world constraints limit what the users may achieve with an
|
|
application. These limitations may be due to the nature of the task
|
|
the user intend to perform, other activities the user is
|
|
performing, physical limitations of the user, and conditions of the
|
|
environment in which the user will perform the task. The user
|
|
interface should be designed to compensate for these
|
|
limitations.</p>
|
|
|
|
<h3 id="Task-oriented_Suggestions"> Task-oriented
|
|
Suggestions</h3>
|
|
|
|
<p>The nature of the task influences the mode (or modes) users select to perform
|
|
the task. Tasks which are easy to perform in one mode may be difficult or impossible
|
|
to perform using another mode. Task-oriented suggestions suggest which tasks
|
|
lend themselves best to data entry using various modes of entry.</p>
|
|
|
|
<p>New mobile devices will enable users to enter data by speaking
|
|
into a microphone, writing with a stylus, and pressing keys on a
|
|
small keypad. These input modes can be used to perform the
|
|
following four basic manipulation tasks:</p>
|
|
|
|
<ul>
|
|
<li>Select objects (e.g., menu options)</li>
|
|
|
|
<li>Enter text</li>
|
|
|
|
<li>Enter symbols (e.g., part of mathematical equations)</li>
|
|
|
|
<li>Enter sketches or illustrations</li>
|
|
</ul>
|
|
|
|
<p>There are other basic tasks, but the tasks mentioned above are
|
|
performed most frequently in common applications using handheld
|
|
computers.</p>
|
|
|
|
<p>Table 1 summarizes how users perform the four basic tasks using
|
|
the following popular input modes:</p>
|
|
|
|
<ul>
|
|
<li><em>Voice</em> - The user speaks into a microphone.</li>
|
|
|
|
<li><em>Pen</em> - The user manipulates a pen to write, draw, or
|
|
point.</li>
|
|
|
|
<li><em>Keys</em> - The user manipulates a keyboard or keypad by
|
|
pressing keys.</li>
|
|
</ul>
|
|
|
|
<table summary="5 columns">
|
|
<caption>
|
|
Table 1: Performing the four basic manipulation tasks using four popular input
|
|
modes, ranked from easiest (1) to most difficult (4)
|
|
</caption>
|
|
<tr>
|
|
<th>Content Manipulation Task</th>
|
|
<th>Voice Mode</th>
|
|
<th>Pen Mode</th>
|
|
<th>Keyboard/keypad</th>
|
|
<th>Mouse/Joystick</th>
|
|
</tr>
|
|
<tr>
|
|
<td>Select objects</td>
|
|
<td>(3) Speak the name of the object</td>
|
|
<td>(1) Point to or circle the object</td>
|
|
<td>(4) Press keys to position the cursor on the object and press
|
|
the <i>select key</i></td>
|
|
<td class="c8" valign="top">(2) Point to and click on the object or drag to
|
|
select text</td>
|
|
</tr>
|
|
<tr>
|
|
<td>Enter text</td>
|
|
<td>(2) Speak the words in the text</td>
|
|
<td>(3) Write the text</td>
|
|
<td>(1) Press keys to spell the words in the text</td>
|
|
<td>(4) Spell the text by selecting letters from a soft
|
|
keyboard</td>
|
|
</tr>
|
|
<tr>
|
|
<td>Enter symbols</td>
|
|
<td>(3) Say the name of the symbol and where it should be
|
|
placed.</td>
|
|
<td>(1) Draw the symbol where it should be placed</td> <td>(4) Enter one or more characters that together represent the symbol</td>
|
|
<td class="c8" valign="top">(2) Select the symbol from a menu and
|
|
indicate where it should be placed</td>
|
|
</tr>
|
|
<tr>
|
|
<td>Enter sketches or illustrations</td>
|
|
<td>(2) Verbally describe the sketch or illustration</td>
|
|
<td>(1) Draw the sketch or illustration</td>
|
|
<td>(4) Impossible</td> <td>(3) Create the sketch by moving the mouse so it leaves a trail (similar
|
|
to an Etch-a-Sketch™)</td>
|
|
</tr>
|
|
</table>
|
|
|
|
<p>Select objects. Object selection is easy with a pen-just point
|
|
to or circle the desired object. When using voice, just say the
|
|
name of the desired object, assuming the object has a name. With a
|
|
keyboard, press keys to position the cursor on the desired object
|
|
and press the <em>select</em> key.</p>
|
|
|
|
<p>Enter text. Each of the four modes can be used for text entry-the user speaks
|
|
words into a microphone, handwrites the words using a pen, presses keys on a
|
|
keypad to spell the words or selects letters from a soft keyboard. Most users
|
|
can speak and write easily. However, some training and practice may be necessary
|
|
to use a keyboard or mouse efficiently.</p>
|
|
|
|
<p>Enter symbols. Entering mathematical equations, special
|
|
characters, and signatures is easy with a pen, awkward and
|
|
time-consuming with a mouse, and most difficult with speech.</p>
|
|
|
|
<p>Enter sketches or illustrations. Drawing simple illustrations
|
|
and maps is easy with a pen, awkward with a mouse, and nearly
|
|
impossible with speech. When speaking, users must verbally describe
|
|
the illustration or map.</p>
|
|
|
|
<p>Each input mode has its strengths and weaknesses. Voice is good
|
|
for describing attributes. The pen is good for pointing and
|
|
sketching. Keys are good for entering text, numbers, and symbols. A
|
|
useful and efficient multimodal system uses the appropriate mode
|
|
for each entry.</p>
|
|
|
|
<p class="suggestion" id="G11"> 1.1. Suggestion: For each task, use
|
|
the easiest modes available on the device.</p>
|
|
|
|
<p>Suggestion examples include:</p>
|
|
|
|
<ul>
|
|
<li>To select an icon, use a pen or stylus to point to the
|
|
icon. (To aid in object section, highlight the object when
|
|
the cursor hovers above it. Highlight all selected objects.)</li>
|
|
<li>To enter text, use voice or a keypad.</li>
|
|
<li>To enter the symbols for a mathematical equation, use pen.
|
|
(or an onscreen keyboard with options for each symbol).</li>
|
|
<li>To draw a map, use a pen.</li>
|
|
</ul>
|
|
|
|
<h3 id="Physical_Suggestions">Physical Suggestions</h3>
|
|
|
|
<p>Different physical devices exhibit different usability
|
|
characteristics. The device's size, shape, and weight affect how it
|
|
may be used. Most important, the placement of a microphone and
|
|
speaker, the size of the display and writing surface, and the size
|
|
of keys in a keypad all affect the ease with which a user can enter
|
|
information by speaking, writing or pressing keys. Table 2
|
|
summarizes the three modes of input with respect to physical
|
|
usability issues.</p>
|
|
|
|
<table summary="4 columns">
|
|
<caption>
|
|
Table 2: Physical usability issues for the four most popular modes of information
|
|
entry
|
|
</caption>
|
|
<tr>
|
|
<th>Device Usability Issues</th>
|
|
<th>Voice Mode</th>
|
|
<th>Pen Mode</th>
|
|
<th>Keystrokes Mode</th>
|
|
<th>Mouse/joystick mode</th>
|
|
</tr>
|
|
<tr>
|
|
<td>Required number of user hands</td>
|
|
<td>None (plus possibly one to hold the device)</td>
|
|
<td>One (plus possibly one to hold the device)</td>
|
|
<td>One or two</td>
|
|
<td>One</td>
|
|
</tr>
|
|
<tr>
|
|
<td>Required use of eyes</td>
|
|
<td>No</td>
|
|
<td>Yes</td>
|
|
<td>Frequently, but some users can operate familiar keyboards without looking
|
|
at them</td>
|
|
<td>Yes</td>
|
|
</tr>
|
|
<tr>
|
|
<td>Portable</td>
|
|
<td>Yes, especially when walking</td>
|
|
<td>Yes, but difficult while walking</td>
|
|
<td>Yes, but difficult while walking</td>
|
|
<td>Yes, but difficult while walking</td>
|
|
</tr>
|
|
</table>
|
|
|
|
<p>Required number of user hands. A user's hands may be required when operating
|
|
machinery, assembling parts into a device, or creating an object of art. No
|
|
hands are needed to speak and listen to a voice user interface. A pen requires
|
|
one hand to hold the pen. A mouse requires one hand to hold the mouse and in
|
|
most cases requires a surface for the mouse to rest on. By their nature, handheld
|
|
devices also may require a hand to hold the device. A 12-key keypad requires
|
|
one hand to enter data, while a QWERTY keypad requires two hands to enter data
|
|
efficiently. Some users become skilled at holding a small QWERTY keyboard with
|
|
both hands and using their thumbs to type.</p>
|
|
|
|
<p class="suggestion" id="G12"> 1.2. Suggestion: If the user's hands
|
|
are unavailable for use, then make speech available.</p>
|
|
|
|
<p>Suggestion examples include:</p>
|
|
|
|
<ul>
|
|
<li>If the user is driving a car, use speech to ask for directions
|
|
to a restaurant.</li>
|
|
<li>If the user is repairing a machine, use speech to ask for the
|
|
next repair instruction.</li>
|
|
<li>If the user is preparing a meal, use speech to ask for the next
|
|
recipe instruction.</li>
|
|
</ul>
|
|
|
|
<p>Required use of eyes. A user's eyes should be focused primarily
|
|
on the road while driving a vehicle, on a physical device to be
|
|
constructed or repaired, or on subjects and their activities while
|
|
observing an experiment. Usually, users must look at what they are
|
|
writing with a pen or typing on a keypad. However, the user's eyes
|
|
may be free to observe his or her environment while speaking.</p>
|
|
|
|
<p class="suggestion" id="G13"> 1.3. Suggestion: If the user's eyes
|
|
are busy or not available, then make speech available.</p>
|
|
|
|
<p>Suggestion examples include:</p>
|
|
|
|
<ul>
|
|
<li>If the user is driving a car, use speech to manipulate a
|
|
radio.</li>
|
|
<li>If a guard is watching a TV monitor, use speech or hand
|
|
controls to manipulate the camera.</li>
|
|
<li>If a scientist is looking into a microscope, use speech to
|
|
dictate his or her observations.</li>
|
|
</ul>
|
|
|
|
<p>Portable. Speech and pen devices are very portable. Users may
|
|
use them while sitting, standing, walking, and sometimes while
|
|
running. Traditionally, keyboard devices are used only while the
|
|
user is not moving. Keypads requiring only one hand, like those
|
|
frequently found on handheld devices and telephones, can be used
|
|
while sitting or standing.</p>
|
|
|
|
<p class="suggestion" id="G14"> 1.4. Suggestion: If the user may be
|
|
walking, then make speech available</p>
|
|
|
|
<p>Suggestion examples include:</p>
|
|
|
|
<ul>
|
|
<li>While walking the streets of New York, use speech to ask
|
|
directions to the nearest subway station. (Both voice and a map may
|
|
be used to present directions to the user.)</li>
|
|
<li>While shopping in a department store, use speech to ask for the
|
|
location of a specific item.</li>
|
|
</ul>
|
|
|
|
<h3 id="Environmental_Suggestions">Environmental
|
|
Suggestions</h3>
|
|
|
|
<p>People work in environments that may not be ideal for some modes of user interfaces.
|
|
The environment might be noisy or quiet, hot or cold, light or dark, or moving
|
|
or stationary with a variety of distractions and possible dangers. Multimodal
|
|
user interfaces must be designed to work in the environments where they will
|
|
be used. Table 3 summarizes the environmental usability issues with respect
|
|
to four popular input modes.</p>
|
|
|
|
<table summary="4 columns">
|
|
<caption>
|
|
Table 3: Environmental usability issues for the four popular modes of information
|
|
entry
|
|
</caption>
|
|
<tr>
|
|
<th>Device Usability Issues</th>
|
|
<th>Voice Mode</th>
|
|
<th>Pen Mode</th>
|
|
<th>Keystroke Mode</th>
|
|
<th>Mouse/joystick mode</th>
|
|
</tr>
|
|
<tr>
|
|
<td>Noisy environment</td>
|
|
<td>Works poorly in a noisy environment</td>
|
|
<td>Works well in a noisy environment</td>
|
|
<td>Works well in a noisy environment</td>
|
|
<td>Works well in a noisy environment</td>
|
|
</tr>
|
|
<tr>
|
|
<td>Other environmental concerns</td>
|
|
<td>Works well independently of gloves</td>
|
|
<td>Does not work well when users must wear thick gloves</td>
|
|
<td>Does not work well when users must wear thick gloves</td>
|
|
<td>Does not work well when users must wear thick gloves</td>
|
|
</tr>
|
|
</table>
|
|
|
|
<p>Noisy environment. Because speech recognition systems pick up
|
|
background sounds, they often make mistakes if the user speaks in a
|
|
noisy environment.</p>
|
|
|
|
<p class="suggestion" id="G15"> 1.5. Suggestion: If the user may be in a noisy environment,
|
|
then use a pen, keys,or mouse.</p>
|
|
|
|
<p>Suggestion examples include:</p>
|
|
|
|
<ul>
|
|
<li>Use a pen or keys to enter a telephone number when in a noisy
|
|
airport.</li>
|
|
<li>Use a pen or keys to enter data when in a noisy shop.</li>
|
|
</ul>
|
|
|
|
<p>Other environmental concerns: Pen and keyboard devices are
|
|
difficult if the user must wear thick gloves, such as in a cold
|
|
environment or when protecting hands from rough objects.</p>
|
|
|
|
<p class="suggestion" id="G16"> 1.6. Suggestion: If the user's manual
|
|
dexterity may be impaired, then use speech.</p>
|
|
|
|
<p>A suggestion example is:</p>
|
|
|
|
<ul>
|
|
<li>If the user works in cold meat locker, works on a construction
|
|
site and handles rough material, or works with dangerous chemicals
|
|
and must wear gloves, then use voice to enter data.</li>
|
|
</ul>
|
|
|
|
<h2 id="Communicate_Clearly_Concisely_and">2.
|
|
Communicate Clearly, Concisely, and Consistently with Users</h2>
|
|
|
|
<p>Efficient communication is required if teams of people are to
|
|
achieve success in joint activities. Likewise, effective
|
|
communication between the user and the device is necessary for
|
|
achieving the user's goals. The multimodal user interface is the
|
|
conduit for all communication between the user and the device.
|
|
Communication should be clear and concise, avoiding ambiguities and
|
|
confusion. Communication styles should be consistent and systematic
|
|
so users know what to expect and can leverage the patterns and
|
|
rhythms in the dialog.</p>
|
|
|
|
<h3 id="Consistency_Suggestions">Consistency
|
|
Suggestions</h3>
|
|
|
|
<p>Consistency enables users to leverage conversational patterns to
|
|
accelerate their interaction. For example, users can follow a
|
|
consistent conversational rhythm without having to pause to adjust
|
|
to heterogeneous dialog styles.</p>
|
|
|
|
<p>Consistent prompts. If prompts are worded inconsistently, then
|
|
users must pause to decode each wording format. Users must spend
|
|
additional time and mental effort to respond to differently
|
|
structured questions. When prompts are consistently worded, users
|
|
can concentrate on the answers to questions rather than trying to
|
|
understand the questions.</p>
|
|
|
|
<p class="suggestion" id="G21">2.1. Suggestion: Phrase all prompts
|
|
consistently.</p>
|
|
|
|
<p>Suggestions examples include:</p>
|
|
|
|
<ul>
|
|
<li>To be consistent and encourage experienced users to barge-in,
|
|
consider using the following general voice prompt format:
|
|
|
|
<ol>
|
|
<li><em>Speak the name of the menu or form item.</em> The menu name
|
|
serves as a landmark. A <em>landmark</em> is a speech or non-speech
|
|
cue that marks a specific location within the dialog structure. By
|
|
providing a name, such as "main menu" or "thermostat," callers can
|
|
jump to this menu by speaking the menu name or return to the menu
|
|
when they get confused or lost. Also, repeating the menu name to
|
|
the caller confirms that the caller has reached the correct menu.
|
|
However, if the name is contained within the question and is not
|
|
needed as a landmark, then skip speaking the name.</li>
|
|
<li><em>Ask a question.</em> Often, this can be achieved with two
|
|
or three words. This should be enough to remind experienced callers
|
|
to respond without listening to the enumerated options. Novice
|
|
callers will listen to the enumerated options before speaking their
|
|
selection.</li>
|
|
<li><em>Enumerate options.</em> If there are a small number of
|
|
valid responses, then list the options so novice callers can hear
|
|
and select their desired option. However, if the user is likely to
|
|
know the set of valid responses, then skip this operation.</li>
|
|
</ol>
|
|
</li>
|
|
<li style="list-style: none">
|
|
|
|
<p>Experienced callers can barge-in after they hear the question,
|
|
while novice callers will respond after they hear the entire menu
|
|
option list.</p>
|
|
</li>
|
|
<li>Use the same terms in all prompts, whether the terms are text,
|
|
voice, or multimedia prompts.</li>
|
|
</ul>
|
|
|
|
<p><strong>Consistent command format.</strong> The current state of the art of
|
|
speech recognition and natural language technology does not always accurately
|
|
recognize and understand arbitrary complete sentences. Keyword recognition is
|
|
much faster and accurate. Many tasks lend themselves to keyword commands better
|
|
than natural language sentences.</p>
|
|
|
|
<p class="suggestion" id="G22">2.2. Suggestion: Enable the user to
|
|
speak keyword utterances rather than natural language
|
|
sentences.</p>
|
|
|
|
<p>Switching modes. Switching modes can be jarring and sometimes
|
|
surprising. For example, a user who has just answered three verbal
|
|
questions will be surprised if a textual question suddenly pops
|
|
up.</p>
|
|
|
|
<p class="suggestion" id="G23">2.3. Suggestion: Switch presentation
|
|
modes only when the information is not easily presented in the
|
|
current mode.</p>
|
|
|
|
<p>Suggestion examples include:</p>
|
|
|
|
<ul>
|
|
<li>If the user repeatedly experiences errors when using voice or
|
|
handwriting recognition, consider switching to a text mode. Text
|
|
mode often avoids the recognition errors occurring because of
|
|
heavily-accented speakers or poor handwriting.</li>
|
|
<li>Switch from audio to text output if the result of a verbal
|
|
query is large and the user is likely to become anxious listening
|
|
to the result.</li>
|
|
<li>Switch from audio output to graphical output if the result can
|
|
be structured as a table, graphic, or other illustration.</li>
|
|
</ul>
|
|
|
|
<p><strong>Command consistency.</strong> Using different commands
|
|
for the same purpose confuses users, as does using the same command
|
|
for multiple functions.</p>
|
|
|
|
<p class="suggestion" id="G24">2.4. Suggestion: Make commands
|
|
consistent.</p>
|
|
|
|
<p>Users tend to use the wording which is visually presented. Include the command
|
|
name on buttons and other navigational elements in the grammar for the voice
|
|
mode. All voice commands that achieve the same functionality should have the
|
|
same grammar. Users tend to use known commands from their daily use of computers.
|
|
Incorporate these commands into the grammar, even it they are not visually presented
|
|
in the GUI.</p>
|
|
|
|
<p>Suggestion exampless:</p>
|
|
|
|
<ul>
|
|
<li>If a button is labeled "exit," then "exit" should be in the
|
|
grammar for the voice mode.</li>
|
|
<li>If a user may say "exit" from each of three visual pages,
|
|
then the grammar for this command should be the same for all
|
|
three pages.</li>
|
|
<li>If users often use "exit" in many other applications, then use
|
|
"exit" in this application so that the user can apply knowledge
|
|
from other applications to this application.</li>
|
|
</ul>
|
|
|
|
<p class="suggestion" id="G25">2.5. Suggestion: Make the focus
|
|
consistent across modes</p>
|
|
|
|
<p>If the user is prompted to speak a value for a field, then
|
|
highlight that field in the GUI.</p>
|
|
|
|
<p>Suggestion examples:</p>
|
|
|
|
<ul>
|
|
<li>When filling out a form, highlight the field in the GUI when
|
|
the voice user interface prompts the user to speak a value for that
|
|
field.</li>
|
|
<li>Consistently highlight visual items in focus and consistently
|
|
highlight selected visual items.</li>
|
|
</ul>
|
|
|
|
<h3 id="Organizational_Suggestions">Organizational
|
|
Suggestions</h3>
|
|
|
|
<p>Grade school teachers always teach that organizing your thoughts before writing
|
|
a composition will dramatically improve its understandability. The same principle
|
|
applies to user interfaces. Organizing information and transitioning between
|
|
topics will improve the users' comprehension of and performance with the multimodal
|
|
interface. Information should be structured and organized in ways that are familiar
|
|
to the user.</p>
|
|
|
|
<p>Content structure. Audio cues help users understand audio
|
|
information. For example, use a click to introduce each item of a
|
|
bulleted list, increase the volume to emphasize highlighted text,
|
|
or use a whisper to speak parenthetical text.</p>
|
|
|
|
<p class="suggestion" id="G26">2.6. Suggestion: Use audio and/or
|
|
visual icons to indicate the content structure.</p>
|
|
|
|
<p>There are generally accepted icons to represent content
|
|
structure. for example, a clock may indicate that an application is
|
|
busy, arrows may represent next and previous pages, etc.</p>
|
|
|
|
<p>Because there are no standard assignments of meanings for sounds, common sense
|
|
and user testing should guide the dialog designer. Here are suggestions for
|
|
items that lend themselves to non-speech sounds:</p>
|
|
|
|
<ul>
|
|
<li><em>Links</em> Identify words that the user may say to jump to
|
|
another VoiceXML document by introducing them with a unique
|
|
sound.</li>
|
|
<li><em>Turn-taking tone</em> - A tone signals to the user that the
|
|
system has finished talking and that the user may speak.</li>
|
|
<li><em>Brand earcon</em> - Many businesses have audio icons, such
|
|
as the distinctive bong sound of AT&T, the three tones of NBC,
|
|
and the four tones for "Intel Inside." These audio icons can be
|
|
presented to the user to announce that the user has arrived at the
|
|
company's site.</li>
|
|
<li><em>Feedback</em> - The user needs to know if the speech application is
|
|
processing data or waiting for input. A non-speech sound, such as a percolating
|
|
coffee pot, is ideal for informing the user that the speech application system
|
|
is busy processing. It also reassures the user that the application is busy
|
|
and has not terminated abnormally. A bell tone is ideal for informing the
|
|
user that the system is ready for the user's input.</li>
|
|
<li><em>Barge-in temporarily disabled</em> - Designers may disable
|
|
barge-in when presenting advertisements or legal notices. To
|
|
prevent the user from barging-in, signal the user that barge-in is
|
|
temporarily disabled by presenting "barge-in disabled" and
|
|
"barge-in enabled" audio icons.</li>
|
|
<li><em>Bulleted list</em> - A short sound snippet can be used at
|
|
the beginning of each item on a list.</li>
|
|
</ul>
|
|
|
|
<p>Chunks of information. Users comprehend audio information more
|
|
easily if it is presented as blocks, or chunks, of information. For
|
|
example, users may not recognize "six, one, seven, two, two, five,
|
|
four, three, seven, six" as a telephone number, but they will
|
|
recognize "six, one, seven (pause) two, two, five (pause) four,
|
|
three, seven, six" as either an American or Canadian telephone
|
|
number.</p>
|
|
|
|
<p class="suggestion" id="G28">2.7. Suggestion: Use pauses to divide
|
|
information into natural "chunks."</p>
|
|
|
|
<p>Suggestion examples include:</p>
|
|
|
|
<ul>
|
|
<li><em>Chunking numbers</em> - Phone numbers, identification
|
|
numbers, and other sequences of numbers are frequently clustered
|
|
into groups of two or three numbers when spoken. A short pause
|
|
between the sets of groups helps users comprehend and remember the
|
|
number easier. For example, North American telephone numbers are
|
|
frequently spoken in three chunks: the three-digit area code, the
|
|
three-digit exchange number, and the four-digit subscriber
|
|
number.</li>
|
|
<li><em>Pause between instructions and options</em> - Placing a
|
|
pause between instructions and the options for prompts signals the
|
|
user when the instructions are complete. Experienced users may
|
|
barge-in after the instructions, but before hearing the list of
|
|
options.</li>
|
|
</ul>
|
|
|
|
<p>Transitions. A user may become disoriented if the information
|
|
content suddenly changes. Writers are well aware of the need for
|
|
transitions between topics. Similar transitions are needed for
|
|
visual and verbal information.</p>
|
|
|
|
<p class="suggestion" id="G29">2.8. Suggestion: Use animation and
|
|
sound to show transitions.</p>
|
|
|
|
<p>Suggestion examples:</p>
|
|
|
|
<ul>
|
|
<li>Display a turning page and present an audio sound to indicate
|
|
the transition between two pages.</li>
|
|
<li>Navigation: One study has shown that mobile users drop off at
|
|
the rate of 50% with each screen change. Voice navigation can be
|
|
used to reduce the number of screens.</li>
|
|
</ul>
|
|
|
|
<p class="suggestion" id="G210">2.9. Use voice navigation to reduce
|
|
the number of screens.</p>
|
|
|
|
<p><strong>Modality synchronization.</strong> Multiple modalities
|
|
should be appropriately synchronized. Here are some examples:</p>
|
|
|
|
<ol>
|
|
<li>Stop talking/listening when the visual browser is minimized or exited.</li>
|
|
<li>The visual browser verbal browsers should present the same
|
|
information at the same time.</li>
|
|
<li>In a multifield form, the focus field of the visual browser
|
|
should correspond to the field prompt currently presented by the
|
|
verbal browser.</li>
|
|
</ol>
|
|
|
|
<p class="suggestion" id="G211">2.10. Synchronize multiple
|
|
modalities appropriately.</p>
|
|
|
|
<p><strong>Simplicity.</strong> Complex user interfaces are
|
|
confusing to the user and lead to errors. While this rule applies
|
|
to all user interfaces, it is especially important to multimodal
|
|
user interfaces.</p>
|
|
|
|
<p class="suggestion" id="G212">2.11. Keep the user interface as
|
|
simple as possible.</p>
|
|
|
|
<h2 id="Help_Users_Recover_Quickly_and">3.
|
|
Help Users Recover Quickly and Efficiently from Errors</h2>
|
|
|
|
<p>The user interface must help users recover quickly and
|
|
efficiently from errors. All users, especially novice users, will
|
|
occasionally fail to respond to a prompt appropriately. The user
|
|
interface must be designed to detect such errors and assist users
|
|
to recover naturally. The multimodal interface also should help
|
|
users learn how to use the user interface to achieve the desired
|
|
results quickly and efficiently.</p>
|
|
|
|
<h3 id="Conversational_Suggestions">Conversational
|
|
Suggestions</h3>
|
|
|
|
<p>Principles of conversational discourse suggest that the
|
|
suggestions for the nature, content, and format of information
|
|
exchanged between two humans may be applied to information
|
|
exchanged between a human and a computer.</p>
|
|
|
|
<p>Reflexive principle. The reflexive principle states that people
|
|
tend to respond in the same manner that they are prompted. For
|
|
example, if users are given long rambling prompts, they will likely
|
|
reply with long rambling responses.</p>
|
|
|
|
<p class="suggestion" id="G31">3.1. Suggestion: Enable users to use
|
|
the same mode that was used to prompt them.</p>
|
|
|
|
<p>Suggestion examples include:</p>
|
|
|
|
<ul>
|
|
<li>When spoken to, users will use their voices to respond.</li>
|
|
<li>When presented with a drawing, users will respond with another
|
|
drawing.</li>
|
|
<li>When presented with text, users will type their responses.</li>
|
|
</ul>
|
|
|
|
<p>Verbal help. Speech is more immediate and does not obscure
|
|
screen contents.</p>
|
|
|
|
<p class="suggestion" id="G32">3.2. Suggestion: If privacy is not a
|
|
concern, use speech as output to provide commentary or help.</p>
|
|
|
|
<p>Suggestion examples include:</p>
|
|
|
|
<ul>
|
|
<li>Use speech to present short messages such as help
|
|
information</li>
|
|
<li>Use keys to enter personal identification numbers.</li>
|
|
<li>When using an automatic bank teller, always use a keypad to
|
|
enter the account number.</li>
|
|
<li>When using a weight management application, enable users to
|
|
enter their weight using a pen or keypad.</li>
|
|
</ul>
|
|
|
|
<p>When privacy is not a concern, consider using speech for help
|
|
and error messages about the current contents in the diaplay,
|
|
possibly augmenting the display by highlighting the area in which
|
|
the error occurs.</p>
|
|
|
|
<p><strong>Directed user interface</strong>. While user-directed
|
|
and mixed initiative user interfaces may be useful for experienced
|
|
users, they are confusing and inhibiting for novice users. Directed
|
|
user interfaces always work for all classes of users. Directed
|
|
search provides the user with results they want quickly and
|
|
accurately.</p>
|
|
|
|
<p class="suggestion" id="G33">3.3. Suggestion: Use directed user
|
|
interfaces unless the user is always knowledgeable and experienced
|
|
in the domain.</p>
|
|
|
|
<p><strong>Context sensitive help</strong>. As an application becomes more complex,
|
|
offering the user more choices, offering help becomes mandatory. For simple
|
|
application with fewer choices, the user may need help only the first time the
|
|
application is run. A novice user may not know the meaning of a field or command.</p>
|
|
|
|
<p class="suggestion" id="G34">3.4. Suggestion: Always provide context
|
|
sensitive help for every field and command</p>
|
|
|
|
<p>Enable users to learn the purpose and function of every field,
|
|
and what values can be entered into the field.</p>
|
|
|
|
<p>Suggestion example:</p>
|
|
|
|
<ul>
|
|
<li>It may not be clear to the user if the year field of a data should be two
|
|
digits for four digits. Context sensitive help should provide instructions
|
|
and possibly an example to clarify this.</li>
|
|
<li>Enable the user to ask "what can I say" or "what can I say
|
|
here" as well as "help." Show a list of available commands and/or
|
|
options.</li>
|
|
</ul>
|
|
|
|
<p>One advantage of verbal and visual modalities is that help can be offered using
|
|
speech and/or GUI interfaces.</p>
|
|
|
|
<h3 id="Reliability_Suggestions">Reliability
|
|
Suggestions</h3>
|
|
|
|
<p>Few situations are more frustrating to users than to have a
|
|
device at hand but not be able to use it.</p>
|
|
|
|
<p><strong>Operational status</strong>. Users need to know when the
|
|
device is listening to them speak and when the device is not
|
|
listening.</p>
|
|
|
|
<p class="suggestion" id="G35">3.5. Suggestion: The user always
|
|
should be able to easily determine if the device is listening to
|
|
the user.</p>
|
|
|
|
<p>Operational status can be presented as a light or icons
|
|
indicating the operational status of the device.</p>
|
|
|
|
<p>Power status. One especially frustrating situation is when the
|
|
device suddenly goes dead because the batteries are low.</p>
|
|
|
|
<p class="suggestion" id="G36">3.6. Suggestion: For devices with
|
|
batteries, user always should be able to easily determine how much
|
|
longer the device will be operational.</p>
|
|
|
|
<p>A suggestion example is:</p>
|
|
|
|
<ul>
|
|
<li>Use icons to indicate present the operational status of a
|
|
device, such as one or more icons or colors.Use a green icon to
|
|
indicate the that the device is operational. Use yellow to indicate
|
|
that power is in short supply. Better yet, display a meter or clock
|
|
indicating how much time the battery will continue to support the
|
|
operational device. (Note: because about 6 per cent of the male
|
|
population has some degree of color blindness, always use another
|
|
feature in addition to color. For example, use a "walking person"
|
|
icon that is green to indiate the device is operational, a battery
|
|
icon that is nearly emply with the color yellow to indicate that
|
|
the power is in short supply.)</li>
|
|
</ul>
|
|
|
|
<p>Backup mode. In Section 1, Table 1 summarized the various
|
|
strengths and weaknesses of using voice, pen, and keys as input
|
|
methods. Because user tasks, environmental situations, and user
|
|
distractions change, users should be able to switch modes when it
|
|
becomes inconvenient or impossible to use the primary mode of
|
|
input.</p>
|
|
|
|
<p class="suggestion" id="G37">3.7. Suggestion: Support at least two
|
|
input modes so one input mode can be used when the other
|
|
cannot.</p>
|
|
|
|
<p>Suggestion examples include:</p>
|
|
|
|
<ul>
|
|
<li>Enable the user to use a keypad when speaking or using a pen in
|
|
the event that the speech or handwriting recognition engine
|
|
fails.</li>
|
|
<li>Enable the user to speak or type if the user loses the pen or
|
|
input stylus.</li>
|
|
<li>Enable the user to speak if rain or snow will damage a
|
|
keypad.</li>
|
|
</ul>
|
|
|
|
<p><strong>Visual feedback</strong>. Sometimes speech recognition
|
|
systems misrecognize the words which a user speaks. It is useful to
|
|
present words recognized by the speech recognition system to the
|
|
user who can verify their correctness. In speech only systems, the
|
|
tiresome phrase "Did you say ...?" is the only option. However, in
|
|
multimodal systems, the recognized word can be presented on a
|
|
display.</p>
|
|
|
|
<p class="suggestion" id="G38">3.8. Suggestion: Present words
|
|
recognized by the speech recognition system on the display so the
|
|
user can verify they are correct.</p>
|
|
|
|
<p><strong>Correction mode</strong>. When the speech recognition
|
|
fails, the user needs to correct the error by entering the correct
|
|
word. While the user could simply speak again, a better approach is
|
|
to display the n-best list (the list of words the the speech
|
|
recognizer heard but did not select) so the user can select from
|
|
among these options rather than speak again (and possibly
|
|
experience the same error).</p>
|
|
|
|
<p class="suggestion" id="G39">3.9. Suggestion: Display the n-best
|
|
list to enable easy speech recognition error correction</p>
|
|
|
|
<p><strong>Response time.</strong> Response times greater than 5
|
|
seconds will significantly reduce usage. If a response time exceeds
|
|
this limit, inform the user that the computer is busy processing
|
|
the request.</p>
|
|
|
|
<p class="suggestion" id="G310">3.10. Try to keep response times less
|
|
than 5 seconds. Inform the user of longer response times.</p>
|
|
|
|
<h2 id="Make_Users_Feel_Comfortable">4. Make Users
|
|
Feel Comfortable</h2>
|
|
|
|
<p>Users often judge a computer application by its user interface.
|
|
If users do not like the user interface, the application will not
|
|
be used. If the user interface is not easy to learn and easy to
|
|
use, the application cannot be used successfully.</p>
|
|
|
|
<h3 id="SpeakingMode">Listening mode</h3>
|
|
|
|
<p>There are several possible listening modes, including</p>
|
|
|
|
<ul>
|
|
<li><em>Always listening</em> - Generally this requires an
|
|
attention word that signals the system that the user is about to
|
|
speak. Without first speaking the attention word, the system
|
|
assumes that the user is speaking to another person and does not
|
|
listen.</li>
|
|
<li><em>Push to speak</em> - The user must remember to hold down
|
|
the speak key while speaking to the computer</li>
|
|
<li><em>Speak after pressing a speak key and then press the speak
|
|
key again when finished</em> - The user must remember to press the
|
|
speak key a second time after the speaker stops speaking to the
|
|
computer.</li>
|
|
<li><em>Push to activate</em> - The user only needs to press a
|
|
speak key to speak to the computer.</li>
|
|
</ul>
|
|
|
|
<p>In theory, always listening would be the preferred listening mode. However,
|
|
this mode doesn't always work very well, and it makes heavy use of computer
|
|
resources. So the generally perferred mode is push to activate.</p>
|
|
|
|
<p class="suggestion" id="G41">4.1. Suggestion: Use push to activate listening mode
|
|
speak to a mobile device.</p>
|
|
|
|
<p>It is easy for users to press a speak key before talking. This
|
|
is similar to asking for permission to speak by raising your hand.
|
|
However, while speaking, it is desirable to concentrate on what is
|
|
being said without worring about holding down a key or pressing a
|
|
key when finished speaking.</p>
|
|
|
|
<h3 id="System_Status">System Status</h3>
|
|
|
|
<p>Users need feedback to determine whether the computer is
|
|
processing input data, is waiting for input, or is
|
|
malfunctioning.</p>
|
|
|
|
<p class="suggestion" id="G42">4.2. Suggestion: Always present the
|
|
current system status to the user.</p>
|
|
|
|
<p>Some suggestions for indicating if the computer is idle or busy
|
|
are shown in Table 4.</p>
|
|
|
|
<table summary="4 columns">
|
|
<caption>Table 4: Suggested indicators for the
|
|
current system status</caption>
|
|
<tr>
|
|
<th>Mode</th>
|
|
<th>Idle</th>
|
|
<th>Busy</th>
|
|
<th>Error</th>
|
|
</tr>
|
|
<tr>
|
|
<td>Text</td>
|
|
<td>"Ready for next input"</td>
|
|
<td>"Processing, please wait"</td>
|
|
<td>Explanation for the cause of the error and how to fix it</td>
|
|
</tr>
|
|
<tr>
|
|
<td>Icons</td>
|
|
<td>Green*</td>
|
|
<td>Red*</td>
|
|
<td>Blinking "danger" icon</td>
|
|
</tr>
|
|
<tr>
|
|
<td>Audio</td>
|
|
<td>Silence</td>
|
|
<td>Sounds of a clicking clock or a percolationg coffee pot</td>
|
|
<td>Emergency vehicle siren</td>
|
|
</tr>
|
|
</table>
|
|
|
|
<p>* Note: because about 6 per cent of the male population has some
|
|
degree of color blindness, always use another feature in addition
|
|
to color. For example, use a "standing person" icon that is green
|
|
to indiate the device is idle, and a "walking person" icon that is
|
|
red to indicate that the current system is busy.</p>
|
|
|
|
<h3 id="Human_memory_Constraints">Human-memory
|
|
Constraints</h3>
|
|
|
|
<p>Normally, human short-term memory holds only a limited number of items, so
|
|
it is necessary to keep verbal lists short. Instead of reading a list of options
|
|
to users, display the list so users will not forget the spoken information.</p>
|
|
|
|
<p class="suggestion" id="G43">4.3. Suggestion: Use the screen to
|
|
ease stress on the user's short-term memory.</p>
|
|
|
|
<p>Suggestion examples include:</p>
|
|
|
|
<ul>
|
|
<li>If a list of options contains more than 3 to 4 items, display
|
|
the list of options on a screen.</li>
|
|
<li>If possible, display the results of a query as a table. For
|
|
example, display travel schedules as a table instead of presenting
|
|
verbal text.</li>
|
|
<li>If the text contains more than two sentences, present the text
|
|
to the user visually rather than verbally</li>
|
|
</ul>
|
|
|
|
<h3 id="Social_Suggestions">Social Suggestions</h3>
|
|
|
|
<p>Social customs among people suggest suggestions for user
|
|
interfaces between users and devices.</p>
|
|
|
|
<p>Privacy. Speech presented by the device is not private. Others
|
|
in close proximity can hear the computer's speech. The display
|
|
provides greater privacy.</p>
|
|
|
|
<p class="suggestion" id="G44">4.4. Suggestion: If the user may need
|
|
privacy and the user is not using a headset, use a display rather
|
|
than render speech.</p>
|
|
|
|
<p>Speech uttered by the user is not private. Others in close
|
|
proximity can hear both the user. The keyboard/mouse and pen
|
|
provide greater privacy. Also, present asterisks for password
|
|
fields.</p>
|
|
|
|
<p class="suggestion" id="G45">4.5. Suggestion: If the user may need
|
|
privacy while he/she enters data, use a pen or keys.</p>
|
|
|
|
<p>Suggestion examples include:</p>
|
|
|
|
<ul>
|
|
<li>Use keys to enter personal identification numbers.</li>
|
|
<li>When using an automatic bank teller, always use a keypad to
|
|
enter the account number.</li>
|
|
<li>When using a weight management application, enable users to
|
|
enter their weight using a pen or keypad.</li>
|
|
</ul>
|
|
|
|
<p>A related suggestion is to present asterisks instead of displaying
|
|
private information (e.g., passwords) entered by the user.</p>
|
|
|
|
<p>Acceptance in meetings. Pen devices are accepted in meetings.
|
|
They replace a pen and pad of paper for taking notes. Keyboards and
|
|
keypads are becoming acceptable with the widespread use of laptops.
|
|
However, key sounds should be turned off. Usually, devices that
|
|
speak or are spoken to are not accepted in meetings without
|
|
the use of earphones; and, in some cases, earphones may imply
|
|
that the user is not interested in the current discussion.</p>
|
|
|
|
<p class="suggestion" id="G46">4.6. Suggestion: If the device may be
|
|
used during a business meeting or in a public place, and no headset
|
|
is used, then use a pen or keys (with the keyboard sounds turned off).</p>
|
|
|
|
<h3 id="Advertising_Suggestions">Advertising
|
|
Suggestions</h3>
|
|
|
|
<p>Techniques from the field of advertising can be applied to user
|
|
interfaces to make them more appealing and interesting to the
|
|
user.</p>
|
|
|
|
<p>Important messages. Users must notice important messages.</p>
|
|
|
|
<p class="suggestion" id="G47">4.7. Suggestion: Use animation and
|
|
sound to attract the user's attention.</p>
|
|
|
|
<p>A suggestion example is:</p>
|
|
|
|
<ul>
|
|
<li>Animate the delivery of important events and messages so users
|
|
will notice them. Often this type of animation is accompanied with
|
|
sound, which also attracts the users' attention.</li>
|
|
</ul>
|
|
|
|
<p>Caution: Users tire of animation and sound quickly. Do not
|
|
overuse animation and sound.</p>
|
|
|
|
<p>Navigational aids. It is easy for a user to become "lost in space" when using
|
|
multimodal applications.</p>
|
|
|
|
<p class="suggestion" id="G48">4.8. Suggestion: Use landmarks to
|
|
help the know where he is.</p>
|
|
|
|
<p>Example Suggestions include:</p>
|
|
|
|
<ul>
|
|
<li>The "bong" heard at the beginning of long distance telephone
|
|
calls indicates the service is being offered by AT&T.</li>
|
|
<li>The "Intel Inside" audio logo indicates that Intel supplied the
|
|
computer chip inside of a computing device.</li>
|
|
<li>Use the sound volume to indicate how close or far a user is
|
|
from a landmark.</li>
|
|
</ul>
|
|
|
|
<h3 id="Ambience_Suggestion">Ambience Suggestion</h3>
|
|
|
|
<p>Television and movie directors set the mood with set design,
|
|
lighting, and background music. Screen layout, colors, and
|
|
background music also create moods in multimodal user interfaces.
|
|
However, in some cases, moods and emotion may not be appropriate in
|
|
productivity applications.</p>
|
|
|
|
<p class="suggestion" id="G49">4.9. Suggestion: Use audio and graphics
|
|
design to set the mood and convey emotion in games and
|
|
entertainment applications.</p>
|
|
|
|
<p>Suggestion examples include:</p>
|
|
|
|
<ul>
|
|
<li>Use background music to introduce new scenes with the
|
|
appropriate mood. For example, discordant music indicates trouble
|
|
lies ahead, cheerful music signals a scene filled with goodwill,
|
|
and a dirge indicates a depressing scene.</li>
|
|
<li>Use background music to "set the stage." For example, classical
|
|
music for an art museum, calliope music for a circus or fun fair,
|
|
or bagpipes for a lonely scene in a ghost story.</li>
|
|
</ul>
|
|
|
|
<h3 id="Accessibility_Suggestions">Accessibility Suggestions</h3>
|
|
|
|
<p>Some users have special needs that when fulfilled, enable them
|
|
to gain all the benefits of computing generally available to users
|
|
without special needs. Users with limited or no sight, limited or
|
|
no hearing, or have a cognitive impairment should be able to
|
|
access the computer.</p>
|
|
|
|
<p class="suggestion" id="G410">4.10. Suggestion: For each traditional
|
|
output technique, provide an alternative output technique.</p>
|
|
|
|
<p>Suggestion examples include:</p>
|
|
|
|
<ul>
|
|
<li>Upon request, provide audio output for each visual output.
|
|
Reading values in different voices can highlight their value
|
|
and aid comprehension. (Some audio should be presented as sound:
|
|
A few well designed audio sounds, used consistently, will conve
|
|
meaning very clearly and much more quickly than spoken words.)</li>
|
|
<li>Upon request, provide visual output for each audio output.
|
|
Provide "closed captioning" for speech and video output. For
|
|
verbal messages, use text equivalents or flashing icons.</li>
|
|
<li>Consider using tactually controls such as the 12-key touch
|
|
pad, and the four-way navigation cross + center. These can be
|
|
powerful selection devices for the blind.</li>
|
|
</ul>
|
|
|
|
<p class="suggestion" id="G411">4.11. Suggestion: Enable users
|
|
to adjust the output presentation</p>
|
|
|
|
<p>Example suggestions include:</p>
|
|
|
|
<ul>
|
|
<li>Enable users to adjust the lighting and contrast of their
|
|
display for improved readability.</li>
|
|
<li>Enable users to adjust the volume and speech of audio for
|
|
improved hearing.</li>
|
|
<li>Upon request (and when privacy is not a concern), echo the
|
|
character string typed by a user as audio.</li>
|
|
<li>Enable users to turn off background images to avoid
|
|
distraction.</li>
|
|
<li>Enable blind users to turn off the screen. This will increase
|
|
the user's privacy.</li>
|
|
</ul>
|
|
|
|
<p>Designing user interfaces to support accessibility generally
|
|
results in better usability for all users.</p>
|
|
|
|
<h2 id="Summary">Summary</h2>
|
|
|
|
<p>Use these suggestions as a checklist when you first construct a multimodal user
|
|
interface. However, the final decisions about the usefulness and friendliness
|
|
of the user interface rest in an abundance of iterative usability testing. If
|
|
users do not like or cannot use the user interface, it does not matter if the
|
|
suggestions were followed. The user interface needs to be changed so users will
|
|
like and be productive with it, even when some suggestion may not have been followed.
|
|
The users' needs should be the foremost concern for multimodal user interface
|
|
designers and developers.</p>
|
|
<h2 id="acknowledgements">Acknowledgements</h2>
|
|
<p>The following members of the W3C Multimodal Interaction Working Group contributed
|
|
suggested suggestions to this Note:</p>
|
|
<p>Deborah Dahl, W3C Invited Expert, contributed points that were raised during
|
|
a tutorial on Multimodal Interfaces presented at the Spring 2006 SpeechTEK/AVIOS
|
|
meeting.</p>
|
|
<p>Ingmar Kliche, T-Systems, contributed suggestions based on his work with developers
|
|
of multimodal applications at T-Systems.</p>
|
|
<p>Gerald McCobb, IBM, contributed suggestions based on his work with developers
|
|
of multimodal applications at IBM.</p>
|
|
</body>
|
|
</html>
|