server_playground/doc/www.w3.org/TR/2006/NOTE-mmi-suggestions-20060911/index.html


								<?xml version="1.0" encoding="UTF-8"?>

								<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"

								       "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

								<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">

								<head>

								  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

								  <title>Common Sense Suggestions for Developing Multimodal User

								  Interfaces</title>

								  <style type="text/css">

								  ul.toc { font-size: 200%; }

								  ul.toc li { list-style: none; font-size: 80%; margin-top: 0.5em; }

								  li { margin-top: 0.5em; }

								  table { width: 100%; border: none; margin-top: 1em;

								        padding: 0.1em; background-color: gray }

								  td, th { border: none; background: white; padding: 0.3em; margin: 0.1em }

								  .suggestion { font-weight: bold; font-style: italic; margin-left: 1em; }

								  caption { font-style: italic; font-size: 80% }

								  </style>

								  <link href="http://www.w3.org/StyleSheets/TR/W3C-WG-NOTE.css"

								  rel="stylesheet" type="text/css" />

								</head>

								<body xml:lang="en" lang="en">

								<div class="head">

								<a href="http://www.w3.org/"><img alt="W3C" height="48"

								src="http://www.w3.org/Icons/w3c_home" width="72" /></a>


								<h1>Common Sense Suggestions for Developing Multimodal

								User Interfaces</h1>


								<h2>W3C Working Group Note 11 September 2006</h2>


								<dl>

								  <dt>This version:</dt>

								    <dd><a

								      href="http://www.w3.org/TR/2006/NOTE-mmi-suggestions-20060911/">http://www.w3.org/TR/2006/NOTE-mmi-suggestions-20060911/</a></dd>

								  <dt>Latest version:</dt>


								    <dd><a

								      href="http://www.w3.org/TR/mmi-suggestions/">http://www.w3.org/TR/mmi-suggestions/</a></dd>

								  <dt>Previous version:</dt>

								    <dd><em>This is the first publication.</em></dd>

								  <dt>Editors:</dt>

								    <dd>Jim Larson, Intel</dd>

								</dl>


								<p class="copyright"><a

								href="http://www.w3.org/Consortium/Legal/ipr-notice#Copyright">Copyright</a>

								&#169; 2006 <a href="http://www.w3.org/"><acronym

								title="World Wide Web Consortium">W3C</acronym></a><sup>&#174;</sup> (<a

								href="http://www.csail.mit.edu/"><acronym

								title="Massachusetts Institute of Technology">MIT</acronym></a>, <a

								href="http://www.ercim.org/"><acronym

								title="European Research Consortium for Informatics and Mathematics">ERCIM</acronym></a>,


								<a href="http://www.keio.ac.jp/">Keio</a>), All Rights Reserved. W3C <a

								href="http://www.w3.org/Consortium/Legal/ipr-notice#Legal_Disclaimer">liability</a>,

								<a href="http://www.w3.org/Consortium/Legal/ipr-notice#W3C_Trademarks">trademark</a>

								and <a href="http://www.w3.org/Consortium/Legal/copyright-documents">document

								use</a> rules apply.</p>

								</div>


								<!-- end of head div -->


								<hr title="Separator for header" />


								<h2 id="abstract">Abstract</h2>


								<p>This document is based on the accumulated experience of several

								years of developing multimodal applications. It provides a

								collection of common sense advice for developers of multimodal

								user interfaces.</p>


								<h2 id="status">Status of this Document</h2>


								<p><em>This section describes the status of this document at

								the time of its publication. Other documents may supersede this

								document. A list of current W3C publications and the latest revision

								of this technical report can be found in the

								<a href="http://www.w3.org/TR/">W3C technical reports

								index</a> at http://www.w3.org/TR/.</em></p>


								<p>This document is a W3C Working Group Note. It represents

								the views of the W3C Multimodal Interaction Working Group at

								the time of publication. The document may be updated as new

								technologies emerge or mature. Publication as a Working

								Group Note does not imply endorsement by the W3C Membership.

								This is a draft document and may be updated, replaced or

								obsoleted by other documents at any time. It is inappropriate

								to cite this document as other than work in progress.</p>


								<p>This document is one of a series produced by the

								<a href="http://www.w3.org/2002/mmi/Group/">Multimodal

								Interaction Working Group</a> <em>(<a

								 href="http://cgi.w3.org/MemberAccess/AccessRequest">Member

								Only Link</a>)</em>, part of the <a

								href="http://www.w3.org/2002/mmi/">W3C Multimodal

								Interaction Activity</a>. The MMI activity statement can

								be seen at

								<a href="http://www.w3.org/2002/mmi/Activity">http://www.w3.org/2002/mmi/Activity</a>.</p>


								<p>Comments on this document can be sent to <a

								href="mailto:www-multimodal@w3.org">www-multimodal@w3.org</a>,

								the public forum for discussion of the W3C's work on

								Multimodal Interaction. To subscribe, send an email to

								<a href="mailto:www-multimodal-request@w3.org">www-multimodal-request@w3.org</a>

								with the word subscribe in the subject line (include the

								word unsubscribe if you want to unsubscribe). The

								<a href="http://lists.w3.org/Archives/Public/www-multimodal/">archive</a>

								for the list is accessible online.</p>


								<p>This document was produced by a group operating under the <a href="http://www.w3.org/Consortium/Patent-Policy-20040205/">5 February 2004 W3C Patent Policy</a>. This document is informative only. W3C maintains a <a rel="disclosure" href="http://www.w3.org/2004/01/pp-impl/34607/status">public list of any patent disclosures</a> made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains <a href="http://www.w3.org/Consortium/Patent-Policy-20040205/#def-essential">Essential Claim(s)</a> must disclose the information in accordance with <a href="http://www.w3.org/Consortium/Patent-Policy-20040205/#sec-Disclosure">section 6 of the W3C Patent Policy</a>.</p>


								<h2 id="contents">Table of Contents</h2>


								<ul class="toc">

								<li><a href="#Four_Major_Principles">Four Major Principles</a></li>

								<li>1. <a href="#Satisfy_real-world_constraints">Satisfy

								Real-world Constraints</a>


								<ul>

								<li><a href="#Task-oriented_Suggestions">Task-oriented

								Suggestions</a>


								<ul>

								<li>1.1 <a href="#G11"> Suggestion: For each task, use the easiest

								mode available on the device.</a></li>

								</ul>

								</li>

								<li><a href="#Physical_Suggestions">Physical Suggestions</a>


								<ul>

								<li>1.2 <a href="#G12"></a><a href=

								"#G12">Suggestion: If the use's hands are

								busy, then use speech.</a></li>

								<li>1.3 <a href="#G13">Suggestion: If the user's eyes are busy,

								then use speech.</a></li>

								<li>1.4 <a href="#G14">Suggestion: If the user may be walking, use

								speech for input.</a></li>

								</ul>

								</li>

								<li><a href="#Environmental_Suggestions">Environmental

								Suggestions</a>


								<ul>

								<li>1.5 <a href="#G15">Suggestion: If the user may be in a noisy

								environment, then use a pen or keys</a></li>

								<li>1.6 <a href="#G16">Suggestion: If the user's manual dexterity

								may be impaired, then use speech.</a></li>

								</ul>

								</li>

								</ul>

								</li>

								<li>2. <a href="#Communicate_Clearly_Concisely_and">Communicate

								Clearly, Concisely, and Consistently with Users</a> <a href=

								"#Consistency_Suggestions"></a>


								<ul>

								<li><a href="#Consistency_Suggestions">Consistency Suggestions</a>


								<ul>

								<li>2.1 <a href="#G21">Suggestion: Phrase all prompts

								consistently.</a></li>

								<li>2.2 <a href="#G22">Suggestion: Enable the user to speak keyword

								utterances rather than natural language sentences.</a></li>

								<li>2.3 <a href="#G23">Suggestion: Switch presentation modes only

								when the information is not easily presented in the current

								mode.</a></li>

								<li>2.4 <a href="#G24">Suggestion: Make commands

								consistent.</a></li>

								<li>2.5 <a href="#G25">Suggestion: Make the focus consistent across

								modes</a></li>

								</ul>

								</li>

								<li><a href="#Organizational_Suggestions">Organizational

								Suggestions</a>


								<ul>

								<li>2.6 <a href="#G26">Suggestion: Use audio to indicate the verbal

								structure.</a></li>

								<li>2.7 <a href="#G28">Suggestion: Use pauses to divide information

								into natural "chunks."</a></li>

								<li>2.8 <a href="#G29">Suggestion: Use animation and sound to show

								transitions.</a></li>

								<li>2.9 <a href="#G210">Use voice navigation to reduce the number

								of screens.</a></li>

								<li>2.10 <a href="#G211">Synchronize multiple modalities

								appropriately.</a></li>

								<li>2.11 <a href="#G212">Keep the user interface as simple as

								possible</a>.</li>

								</ul>

								</li>

								</ul></li>


								<li>3. <a href="#Help_Users_Recover_Quickly_and">Help Users

								Recover Quickly and Efficiently from Errors</a>


								<ul>

								<li><a href="#Conversational_Suggestions">Conversational

								Suggestions</a>


								<ul>

								<li>3.1 <a href="#G31">Suggestion: Users tend to use the same mode

								that was used to prompt them.</a></li>

								<li>3.2 <a href="#G32">Suggestion: If privacy is not a concern,

								use speech as output to provide commentary or help.</a></li>

								<li>3.3 <a href="#G33">Suggestion: Use directed user interfaces

								unless the user is always knowledgeable and experienced in the

								domain</a>.</li>

								<li>3.4 <a href="#G34">Suggestion: Always provide context sensitive

								help for every field and command</a></li>

								</ul>

								</li>

								<li><a href="#Reliability_Suggestions">Reliability Suggestions</a>


								<ul>

								<li>3.5 <a href="#G35">Suggestion: The user always should be able

								to easily determine if the device is listening to the

								user.</a></li>

								<li>3.6 <a href="#G36">Suggestion: The user always should be able

								to easily determine how much longer the device will be

								operational.</a></li>

								<li>3.7 <a href="#G37">Suggestion: Support at least two input modes

								so one input mode can be used when the other cannot.</a></li>

								<li>3.8 <a href="#G38">Suggestion: Present words recognized by the

								speech recognition system on the display so the user can verify

								they are correct.</a></li>

								<li>3.9 <a href="#G39">Suggestion: Display the n-best list to

								enable easy speech recognition error correction</a></li>

								<li>3.10 <a href="#G310">Try to keep response times less than 5

								seconds. Inform the user of longer response times</a></li>

								</ul>

								</li>

								</ul>

								</li>


								<li>4. <a href="#Make_Users_Feel_Comfortable">Make Users

								Comfortable</a>


								<ul>

								<li><a href="#SpeakingMode">Listening mode</a>


								<ul>

								<li>4.1 <a href="#G41">Suggestion: Speak after pressing a speak key

								which automatically releases after the user finishes

								speaking.</a><a href="#System_Status"></a></li>

								</ul>

								</li>

								<li><a href="#System_Status">System Status</a>


								<ul>

								<li>4.2 <a href="#G42">Suggestion: Always present the current

								system status to the user.</a></li>

								</ul>

								</li>

								<li><a href="#Human_memory_Constraints">Human-memory

								Constraints</a>


								<ul>

								<li>4.3 <a href="#G43">Suggestion: Use the screen to ease stress on

								the user's short-term memory.</a></li>

								</ul>

								</li>

								<li><a href="#Social_Suggestions">Social Suggestions</a>


								<ul>

								<li>4.4 <a href="#G44">Suggestion: If the user may need privacy,

								use a display rather than render speech.</a></li>

								<li>4.5 <a href="#G45">Suggestion: If the user may desire privacy,

								use a pen or keys.</a></li>

								<li>4.6 <a href="#G46">Suggestion: If the device may be used during

								a business meeting, then use a pen or keys (with the keyboard

								sounds turned off).</a></li>

								</ul>

								</li>

								<li><a href="#Advertising_Suggestions">Advertising Suggestions</a>


								<ul>

								<li>4.7 <a href="#G47">Suggestion: Use animation and sound to

								attract the user's attention.</a></li>

								<li>4.8 <a href="#G48">Suggestion: Use landmarks to help the know

								where he is.</a></li>

								</ul>

								</li>

								<li><a href="#Ambience_Suggestion">Ambience Suggestion</a>


								<ul>

								<li>4.9 <a href="#G49">Suggestion: Use audio and graphics design to

								set the mood and convey emotion in games and entertainment

								applications.</a></li>

								<li style="list-style: none"><a href="#Summary">Summary</a></li>

								</ul>

								</li>

								</ul>

								</li>

								</ul>


								<hr title="Separator for introduction" />


								<h2 id="introduction">Introduction</h2>


								<p>When fonts were first introduced, many messages looked like ransom notes from

								  kidnappers. When color was introduced, many reports looked like they had barely

								  survived an explosion in a paint factory. To avoid these annoying user interfaces,

								  developers adopted suggestions and best practices for using fonts and colors.</p>


								<p>With the introduction of multiple modes of input-voice, pen, and

								keys-inexperienced developers may design loud, confusing, and

								annoying user interfaces that result in low user performance and

								high user discontent. This document attempts to enumerate a

								collection of commonsense suggestions for developing high

								performance and high preference multimodal user interfaces. We have

								collected suggestions, techniques, and principles from many diverse

								disciplines to generate the following suggestions for developing

								multimodal user interfaces.</p>


								<p>This set of suggestions originated in a brainstorming session with some of my

								students at the Oregon Graduate Institute of the Oregon Health and Sciences

								University. I categorized the suggestions, and showed them to several multimodal

								application developers, who added additional suggestions. These have been reviewed

								and revised by the W3C Multimodal Interaction Working Group. The suggestions

								will be reviewed by other relevant W3C working groups including Accessibility,

								Internationalization, and Mobile Web Initiative Best Practices.</p>


								<p>Again, these are commonsense suggestions. You may think that no

								one would ever develop user interfaces that violate these

								suggestions, but developers have violated commonsense suggestions

								before and will likely do so again. Use these suggestions as a

								checklist when you design a multimodal interface. These suggestions

								should help you to construct a multimodal user interface that

								improves user performance and satisfaction, so intended people can

								use your application easily and effectively.</p>


								<p>These suggestions can be used as follows:</p>


								<ol>

								<li>

								<p>Review the suggestions before designing a multimodal user

								interface. The suggestions will assist you in making decisions as

								you design your multimodal user interface.</p>

								<p>Review the suggestions after designing a multimodal user interface. Use

								the suggestions as a check list to assess your design after it is completed.

								Some designers rank their user interface with respect to each suggestion,

								giving a high score if the user interface conforms to the suggestions and

								a low score if it does not.</p>

								</li>

								<li>

								<p>The suggestions are only suggestions. There are situations when

								every suggestion should be overridden, and these suggestions are no

								exception. If there are good reasons for not following a

								suggestions, then ignore the suggestion.</p>

								</li>

								<li>

								<p>Some users will want to configure their user interface to satisfy their personal

								preferences. We encourage the use of configuration dialogs to help the user

								achieve the configuration that is best for him or her. We also note that

								many users are afraid of configuration, and are happy to use the user interface

								"as is," without ever configuring the system.</p>

								</li>

								</ol>


								<h2 id="Four_Major_Principles">Four Major Principles</h2>


								<p>The suggestions are organized into four major principles of user

								interface design. The following four principles determine how

								quickly users are able to learn and how effectively they are able

								to perform desired tasks with the user interface:</p>


								<ol>

								<li>Satisfy real-world constraints</li>


								<li>Communicate clearly, concisely, and consistently with users</li>


								<li>Help users recover quickly and efficiently from errors</li>


								<li>Make users comfortable</li>

								</ol>


								<p>Multimodal user interface developers should follow the above four principles

								  and apply the following suggestions to avoid many of the potential usability

								  problems caused by using modes incorrectly.</p>


								<h2 id="Satisfy_real-world_constraints">1. Satisfy

								Real-world Constraints</h2>


								<p>Real-world constraints limit what the users may achieve with an

								application. These limitations may be due to the nature of the task

								the user intend to perform, other activities the user is

								performing, physical limitations of the user, and conditions of the

								environment in which the user will perform the task. The user

								interface should be designed to compensate for these

								limitations.</p>


								<h3 id="Task-oriented_Suggestions"> Task-oriented

								Suggestions</h3>


								<p>The nature of the task influences the mode (or modes) users select to perform

								  the task. Tasks which are easy to perform in one mode may be difficult or impossible

								  to perform using another mode. Task-oriented suggestions suggest which tasks

								  lend themselves best to data entry using various modes of entry.</p>


								<p>New mobile devices will enable users to enter data by speaking

								into a microphone, writing with a stylus, and pressing keys on a

								small keypad. These input modes can be used to perform the

								following four basic manipulation tasks:</p>


								<ul>

								<li>Select objects (e.g., menu options)</li>


								<li>Enter text</li>


								<li>Enter symbols (e.g., part of mathematical equations)</li>


								<li>Enter sketches or illustrations</li>

								</ul>


								<p>There are other basic tasks, but the tasks mentioned above are

								performed most frequently in common applications using handheld

								computers.</p>


								<p>Table 1 summarizes how users perform the four basic tasks using

								the following popular input modes:</p>


								<ul>

								<li><em>Voice</em> - The user speaks into a microphone.</li>


								<li><em>Pen</em> - The user manipulates a pen to write, draw, or

								point.</li>


								<li><em>Keys</em> - The user manipulates a keyboard or keypad by

								pressing keys.</li>

								</ul>


								<table summary="5 columns">

								<caption>

								  Table 1: Performing the four basic manipulation tasks using four popular input

								  modes, ranked from easiest (1) to most difficult (4)

								  </caption>

								<tr>

								<th>Content Manipulation Task</th>

								<th>Voice Mode</th>

								<th>Pen Mode</th>

								<th>Keyboard/keypad</th>

								<th>Mouse/Joystick</th>

								</tr>

								<tr>

								<td>Select objects</td>

								<td>(3) Speak the name of the object</td>

								<td>(1) Point to or circle the object</td>

								<td>(4) Press keys to position the cursor on the object and press

								the <i>select key</i></td>

								    <td class="c8" valign="top">(2) Point to and click on the object or drag to

								      select text</td>

								</tr>

								<tr>

								<td>Enter text</td>

								<td>(2) Speak the words in the text</td>

								<td>(3) Write the text</td>

								<td>(1) Press keys to spell the words in the text</td>

								<td>(4) Spell the text by selecting letters from a soft

								keyboard</td>

								</tr>

								<tr>

								<td>Enter symbols</td>

								<td>(3) Say the name of the symbol and where it should be

								placed.</td>

								<td>(1) Draw the symbol where it should be placed</td>    <td>(4) Enter one or more characters that together represent the symbol</td>

								<td class="c8" valign="top">(2) Select the symbol from a menu and

								indicate where it should be placed</td>

								</tr>

								<tr>

								<td>Enter sketches or illustrations</td>

								<td>(2) Verbally describe the sketch or illustration</td>

								<td>(1) Draw the sketch or illustration</td>

								<td>(4) Impossible</td>    <td>(3) Create the sketch by moving the mouse so it leaves a trail (similar

								      to an Etch-a-Sketch&trade;)</td>

								</tr>

								</table>


								<p>Select objects. Object selection is easy with a pen-just point

								to or circle the desired object. When using voice, just say the

								name of the desired object, assuming the object has a name. With a

								keyboard, press keys to position the cursor on the desired object

								and press the <em>select</em> key.</p>


								<p>Enter text. Each of the four modes can be used for text entry-the user speaks

								  words into a microphone, handwrites the words using a pen, presses keys on a

								  keypad to spell the words or selects letters from a soft keyboard. Most users

								  can speak and write easily. However, some training and practice may be necessary

								  to use a keyboard or mouse efficiently.</p>


								<p>Enter symbols. Entering mathematical equations, special

								characters, and signatures is easy with a pen, awkward and

								time-consuming with a mouse, and most difficult with speech.</p>


								<p>Enter sketches or illustrations. Drawing simple illustrations

								and maps is easy with a pen, awkward with a mouse, and nearly

								impossible with speech. When speaking, users must verbally describe

								the illustration or map.</p>


								<p>Each input mode has its strengths and weaknesses. Voice is good

								for describing attributes. The pen is good for pointing and

								sketching. Keys are good for entering text, numbers, and symbols. A

								useful and efficient multimodal system uses the appropriate mode

								for each entry.</p>


								<p class="suggestion" id="G11"> 1.1. Suggestion: For each task, use

								the easiest modes available on the device.</p>


								<p>Suggestion examples include:</p>


								<ul>

								<li>To select an icon, use a pen or stylus to point to the

								icon. (To aid in object section, highlight the object when

								the cursor hovers above it. Highlight all selected objects.)</li>

								<li>To enter text, use voice or a keypad.</li>

								<li>To enter the symbols for a mathematical equation, use pen.

								(or an onscreen keyboard with options for each symbol).</li>

								<li>To draw a map, use a pen.</li>

								</ul>


								<h3 id="Physical_Suggestions">Physical Suggestions</h3>


								<p>Different physical devices exhibit different usability

								characteristics. The device's size, shape, and weight affect how it

								may be used. Most important, the placement of a microphone and

								speaker, the size of the display and writing surface, and the size

								of keys in a keypad all affect the ease with which a user can enter

								information by speaking, writing or pressing keys. Table 2

								summarizes the three modes of input with respect to physical

								usability issues.</p>


								<table summary="4 columns">

								  <caption>

								  Table 2: Physical usability issues for the four most popular modes of information

								  entry

								  </caption>

								  <tr>

								    <th>Device Usability Issues</th>

								    <th>Voice Mode</th>

								    <th>Pen Mode</th>

								    <th>Keystrokes Mode</th>

								    <th>Mouse/joystick mode</th>

								  </tr>

								  <tr>

								    <td>Required number of user hands</td>

								    <td>None (plus possibly one to hold the device)</td>

								    <td>One (plus possibly one to hold the device)</td>

								    <td>One or two</td>

								    <td>One</td>

								  </tr>

								  <tr>

								    <td>Required use of eyes</td>

								    <td>No</td>

								    <td>Yes</td>

								    <td>Frequently, but some users can operate familiar keyboards without looking

								      at them</td>

								    <td>Yes</td>

								  </tr>

								  <tr>

								    <td>Portable</td>

								    <td>Yes, especially when walking</td>

								    <td>Yes, but difficult while walking</td>

								    <td>Yes, but difficult while walking</td>

								    <td>Yes, but difficult while walking</td>

								  </tr>

								</table>


								<p>Required number of user hands. A user's hands may be required when operating

								  machinery, assembling parts into a device, or creating an object of art. No

								  hands are needed to speak and listen to a voice user interface. A pen requires

								  one hand to hold the pen. A mouse requires one hand to hold the mouse and in

								  most cases requires a surface for the mouse to rest on. By their nature, handheld

								  devices also may require a hand to hold the device. A 12-key keypad requires

								  one hand to enter data, while a QWERTY keypad requires two hands to enter data

								  efficiently. Some users become skilled at holding a small QWERTY keyboard with

								  both hands and using their thumbs to type.</p>


								<p class="suggestion" id="G12"> 1.2. Suggestion: If the user's hands

								are unavailable for use, then make speech available.</p>


								<p>Suggestion examples include:</p>


								<ul>

								<li>If the user is driving a car, use speech to ask for directions

								to a restaurant.</li>

								<li>If the user is repairing a machine, use speech to ask for the

								next repair instruction.</li>

								<li>If the user is preparing a meal, use speech to ask for the next

								recipe instruction.</li>

								</ul>


								<p>Required use of eyes. A user's eyes should be focused primarily

								on the road while driving a vehicle, on a physical device to be

								constructed or repaired, or on subjects and their activities while

								observing an experiment. Usually, users must look at what they are

								writing with a pen or typing on a keypad. However, the user's eyes

								may be free to observe his or her environment while speaking.</p>


								<p class="suggestion" id="G13"> 1.3. Suggestion: If the user's eyes

								are busy or not available, then make speech available.</p>


								<p>Suggestion examples include:</p>


								<ul>

								<li>If the user is driving a car, use speech to manipulate a

								radio.</li>

								<li>If a guard is watching a TV monitor, use speech or hand

								controls to manipulate the camera.</li>

								<li>If a scientist is looking into a microscope, use speech to

								dictate his or her observations.</li>

								</ul>


								<p>Portable. Speech and pen devices are very portable. Users may

								use them while sitting, standing, walking, and sometimes while

								running. Traditionally, keyboard devices are used only while the

								user is not moving. Keypads requiring only one hand, like those

								frequently found on handheld devices and telephones, can be used

								while sitting or standing.</p>


								<p class="suggestion" id="G14"> 1.4. Suggestion: If the user may be

								walking, then make speech available</p>


								<p>Suggestion examples include:</p>


								<ul>

								<li>While walking the streets of New York, use speech to ask

								directions to the nearest subway station. (Both voice and a map may

								be used to present directions to the user.)</li>

								<li>While shopping in a department store, use speech to ask for the

								location of a specific item.</li>

								</ul>


								<h3 id="Environmental_Suggestions">Environmental

								Suggestions</h3>


								<p>People work in environments that may not be ideal for some modes of user interfaces.

								  The environment might be noisy or quiet, hot or cold, light or dark, or moving

								  or stationary with a variety of distractions and possible dangers. Multimodal

								  user interfaces must be designed to work in the environments where they will

								  be used. Table 3 summarizes the environmental usability issues with respect

								  to four popular input modes.</p>


								<table summary="4 columns">

								  <caption>

								  Table 3: Environmental usability issues for the four popular modes of information

								  entry

								  </caption>

								  <tr>

								    <th>Device Usability Issues</th>

								    <th>Voice Mode</th>

								    <th>Pen Mode</th>

								    <th>Keystroke Mode</th>

								    <th>Mouse/joystick mode</th>

								  </tr>

								  <tr>

								    <td>Noisy environment</td>

								    <td>Works poorly in a noisy environment</td>

								    <td>Works well in a noisy environment</td>

								    <td>Works well in a noisy environment</td>

								    <td>Works well in a noisy environment</td>

								  </tr>

								  <tr>

								    <td>Other environmental concerns</td>

								    <td>Works well independently of gloves</td>

								    <td>Does not work well when users must wear thick gloves</td>

								    <td>Does not work well when users must wear thick gloves</td>

								    <td>Does not work well when users must wear thick gloves</td>

								  </tr>

								</table>


								<p>Noisy environment. Because speech recognition systems pick up

								background sounds, they often make mistakes if the user speaks in a

								noisy environment.</p>


								<p class="suggestion" id="G15"> 1.5. Suggestion: If the user may be in a noisy environment,

								  then use a pen, keys,or mouse.</p>


								<p>Suggestion examples include:</p>


								<ul>

								<li>Use a pen or keys to enter a telephone number when in a noisy

								airport.</li>

								<li>Use a pen or keys to enter data when in a noisy shop.</li>

								</ul>


								<p>Other environmental concerns: Pen and keyboard devices are

								difficult if the user must wear thick gloves, such as in a cold

								environment or when protecting hands from rough objects.</p>


								<p class="suggestion" id="G16"> 1.6. Suggestion: If the user's manual

								dexterity may be impaired, then use speech.</p>


								<p>A suggestion example is:</p>


								<ul>

								<li>If the user works in cold meat locker, works on a construction

								site and handles rough material, or works with dangerous chemicals

								and must wear gloves, then use voice to enter data.</li>

								</ul>


								<h2 id="Communicate_Clearly_Concisely_and">2.

								Communicate Clearly, Concisely, and Consistently with Users</h2>


								<p>Efficient communication is required if teams of people are to

								achieve success in joint activities. Likewise, effective

								communication between the user and the device is necessary for

								achieving the user's goals. The multimodal user interface is the

								conduit for all communication between the user and the device.

								Communication should be clear and concise, avoiding ambiguities and

								confusion. Communication styles should be consistent and systematic

								so users know what to expect and can leverage the patterns and

								rhythms in the dialog.</p>


								<h3 id="Consistency_Suggestions">Consistency

								Suggestions</h3>


								<p>Consistency enables users to leverage conversational patterns to

								accelerate their interaction. For example, users can follow a

								consistent conversational rhythm without having to pause to adjust

								to heterogeneous dialog styles.</p>


								<p>Consistent prompts. If prompts are worded inconsistently, then

								users must pause to decode each wording format. Users must spend

								additional time and mental effort to respond to differently

								structured questions. When prompts are consistently worded, users

								can concentrate on the answers to questions rather than trying to

								understand the questions.</p>


								<p class="suggestion" id="G21">2.1. Suggestion: Phrase all prompts

								consistently.</p>


								<p>Suggestions examples include:</p>


								<ul>

								<li>To be consistent and encourage experienced users to barge-in,

								consider using the following general voice prompt format:


								<ol>

								<li><em>Speak the name of the menu or form item.</em> The menu name

								serves as a landmark. A <em>landmark</em> is a speech or non-speech

								cue that marks a specific location within the dialog structure. By

								providing a name, such as "main menu" or "thermostat," callers can

								jump to this menu by speaking the menu name or return to the menu

								when they get confused or lost. Also, repeating the menu name to

								the caller confirms that the caller has reached the correct menu.

								However, if the name is contained within the question and is not

								needed as a landmark, then skip speaking the name.</li>

								<li><em>Ask a question.</em> Often, this can be achieved with two

								or three words. This should be enough to remind experienced callers

								to respond without listening to the enumerated options. Novice

								callers will listen to the enumerated options before speaking their

								selection.</li>

								<li><em>Enumerate options.</em> If there are a small number of

								valid responses, then list the options so novice callers can hear

								and select their desired option. However, if the user is likely to

								know the set of valid responses, then skip this operation.</li>

								</ol>

								</li>

								<li style="list-style: none">


								<p>Experienced callers can barge-in after they hear the question,

								while novice callers will respond after they hear the entire menu

								option list.</p>

								</li>

								<li>Use the same terms in all prompts, whether the terms are text,

								voice, or multimedia prompts.</li>

								</ul>


								<p><strong>Consistent command format.</strong> The current state of the art of

								  speech recognition and natural language technology does not always accurately

								  recognize and understand arbitrary complete sentences. Keyword recognition is

								  much faster and accurate. Many tasks lend themselves to keyword commands better

								  than natural language sentences.</p>


								<p class="suggestion" id="G22">2.2. Suggestion: Enable the user to

								speak keyword utterances rather than natural language

								sentences.</p>


								<p>Switching modes. Switching modes can be jarring and sometimes

								surprising. For example, a user who has just answered three verbal

								questions will be surprised if a textual question suddenly pops

								up.</p>


								<p class="suggestion" id="G23">2.3. Suggestion: Switch presentation

								modes only when the information is not easily presented in the

								current mode.</p>


								<p>Suggestion examples include:</p>


								<ul>

								<li>If the user repeatedly experiences errors when using voice or

								handwriting recognition, consider switching to a text mode. Text

								mode often avoids the recognition errors occurring because of

								heavily-accented speakers or poor handwriting.</li>

								<li>Switch from audio to text output if the result of a verbal

								query is large and the user is likely to become anxious listening

								to the result.</li>

								<li>Switch from audio output to graphical output if the result can

								be structured as a table, graphic, or other illustration.</li>

								</ul>


								<p><strong>Command consistency.</strong> Using different commands

								for the same purpose confuses users, as does using the same command

								for multiple functions.</p>


								<p class="suggestion" id="G24">2.4. Suggestion: Make commands

								consistent.</p>


								<p>Users tend to use the wording which is visually presented. Include the command

								  name on buttons and other navigational elements in the grammar for the voice

								  mode. All voice commands that achieve the same functionality should have the

								  same grammar. Users tend to use known commands from their daily use of computers.

								  Incorporate these commands into the grammar, even it they are not visually presented

								  in the GUI.</p>


								<p>Suggestion exampless:</p>


								<ul>

								<li>If a button is labeled "exit," then "exit" should be in the

								grammar for the voice mode.</li>

								<li>If a user may say "exit" from each of three visual pages,

								then the grammar for this command should be the same for all

								three pages.</li>

								<li>If users often use "exit" in many other applications, then use

								"exit" in this application so that the user can apply knowledge

								from other applications to this application.</li>

								</ul>


								<p class="suggestion" id="G25">2.5. Suggestion: Make the focus

								consistent across modes</p>


								<p>If the user is prompted to speak a value for a field, then

								highlight that field in the GUI.</p>


								<p>Suggestion examples:</p>


								<ul>

								<li>When filling out a form, highlight the field in the GUI when

								the voice user interface prompts the user to speak a value for that

								field.</li>

								<li>Consistently highlight visual items in focus and consistently

								highlight selected visual items.</li>

								</ul>


								<h3 id="Organizational_Suggestions">Organizational

								Suggestions</h3>


								<p>Grade school teachers always teach that organizing your thoughts before writing

								  a composition will dramatically improve its understandability. The same principle

								  applies to user interfaces. Organizing information and transitioning between

								  topics will improve the users' comprehension of and performance with the multimodal

								  interface. Information should be structured and organized in ways that are familiar

								  to the user.</p>


								<p>Content structure. Audio cues help users understand audio

								information. For example, use a click to introduce each item of a

								bulleted list, increase the volume to emphasize highlighted text,

								or use a whisper to speak parenthetical text.</p>


								<p class="suggestion" id="G26">2.6. Suggestion: Use audio and/or

								visual icons to indicate the content structure.</p>


								<p>There are generally accepted icons to represent content

								structure. for example, a clock may indicate that an application is

								busy, arrows may represent next and previous pages, etc.</p>


								<p>Because there are no standard assignments of meanings for sounds, common sense

								  and user testing should guide the dialog designer. Here are suggestions for

								  items that lend themselves to non-speech sounds:</p>


								<ul>

								<li><em>Links</em> Identify words that the user may say to jump to

								another VoiceXML document by introducing them with a unique

								sound.</li>

								<li><em>Turn-taking tone</em> - A tone signals to the user that the

								system has finished talking and that the user may speak.</li>

								<li><em>Brand earcon</em> - Many businesses have audio icons, such

								as the distinctive bong sound of AT&amp;T, the three tones of NBC,

								and the four tones for "Intel Inside." These audio icons can be

								presented to the user to announce that the user has arrived at the

								company's site.</li>

								  <li><em>Feedback</em> - The user needs to know if the speech application is

								    processing data or waiting for input. A non-speech sound, such as a percolating

								    coffee pot, is ideal for informing the user that the speech application system

								    is busy processing. It also reassures the user that the application is busy

								    and has not terminated abnormally. A bell tone is ideal for informing the

								    user that the system is ready for the user's input.</li>

								<li><em>Barge-in temporarily disabled</em> - Designers may disable

								barge-in when presenting advertisements or legal notices. To

								prevent the user from barging-in, signal the user that barge-in is

								temporarily disabled by presenting "barge-in disabled" and

								"barge-in enabled" audio icons.</li>

								<li><em>Bulleted list</em> - A short sound snippet can be used at

								the beginning of each item on a list.</li>

								</ul>


								<p>Chunks of information. Users comprehend audio information more

								easily if it is presented as blocks, or chunks, of information. For

								example, users may not recognize "six, one, seven, two, two, five,

								four, three, seven, six" as a telephone number, but they will

								recognize "six, one, seven (pause) two, two, five (pause) four,

								three, seven, six" as either an American or Canadian telephone

								number.</p>


								<p class="suggestion" id="G28">2.7. Suggestion: Use pauses to divide

								information into natural "chunks."</p>


								<p>Suggestion examples include:</p>


								<ul>

								<li><em>Chunking numbers</em> - Phone numbers, identification

								numbers, and other sequences of numbers are frequently clustered

								into groups of two or three numbers when spoken. A short pause

								between the sets of groups helps users comprehend and remember the

								number easier. For example, North American telephone numbers are

								frequently spoken in three chunks: the three-digit area code, the

								three-digit exchange number, and the four-digit subscriber

								number.</li>

								<li><em>Pause between instructions and options</em> - Placing a

								pause between instructions and the options for prompts signals the

								user when the instructions are complete. Experienced users may

								barge-in after the instructions, but before hearing the list of

								options.</li>

								</ul>


								<p>Transitions. A user may become disoriented if the information

								content suddenly changes. Writers are well aware of the need for

								transitions between topics. Similar transitions are needed for

								visual and verbal information.</p>


								<p class="suggestion" id="G29">2.8. Suggestion: Use animation and

								sound to show transitions.</p>


								<p>Suggestion examples:</p>


								<ul>

								<li>Display a turning page and present an audio sound to indicate

								the transition between two pages.</li>

								<li>Navigation: One study has shown that mobile users drop off at

								the rate of 50% with each screen change. Voice navigation can be

								used to reduce the number of screens.</li>

								</ul>


								<p class="suggestion" id="G210">2.9. Use voice navigation to reduce

								the number of screens.</p>


								<p><strong>Modality synchronization.</strong> Multiple modalities

								should be appropriately synchronized. Here are some examples:</p>


								<ol>

								  <li>Stop talking/listening when the visual browser is minimized or exited.</li>

								<li>The visual browser verbal browsers should present the same

								information at the same time.</li>

								<li>In a multifield form, the focus field of the visual browser

								should correspond to the field prompt currently presented by the

								verbal browser.</li>

								</ol>


								<p class="suggestion" id="G211">2.10. Synchronize multiple

								modalities appropriately.</p>


								<p><strong>Simplicity.</strong> Complex user interfaces are

								confusing to the user and lead to errors. While this rule applies

								to all user interfaces, it is especially important to multimodal

								user interfaces.</p>


								<p class="suggestion" id="G212">2.11. Keep the user interface as

								simple as possible.</p>


								<h2 id="Help_Users_Recover_Quickly_and">3.

								Help Users Recover Quickly and Efficiently from Errors</h2>


								<p>The user interface must help users recover quickly and

								efficiently from errors. All users, especially novice users, will

								occasionally fail to respond to a prompt appropriately. The user

								interface must be designed to detect such errors and assist users

								to recover naturally. The multimodal interface also should help

								users learn how to use the user interface to achieve the desired

								results quickly and efficiently.</p>


								<h3 id="Conversational_Suggestions">Conversational

								Suggestions</h3>


								<p>Principles of conversational discourse suggest that the

								suggestions for the nature, content, and format of information

								exchanged between two humans may be applied to information

								exchanged between a human and a computer.</p>


								<p>Reflexive principle. The reflexive principle states that people

								tend to respond in the same manner that they are prompted. For

								example, if users are given long rambling prompts, they will likely

								reply with long rambling responses.</p>


								<p class="suggestion" id="G31">3.1. Suggestion: Enable users to use

								the same mode that was used to prompt them.</p>


								<p>Suggestion examples include:</p>


								<ul>

								<li>When spoken to, users will use their voices to respond.</li>

								<li>When presented with a drawing, users will respond with another

								drawing.</li>

								<li>When presented with text, users will type their responses.</li>

								</ul>


								<p>Verbal help. Speech is more immediate and does not obscure

								screen contents.</p>


								<p class="suggestion" id="G32">3.2. Suggestion: If privacy is not a

								concern, use speech as output to provide commentary or help.</p>


								<p>Suggestion examples include:</p>


								<ul>

								<li>Use speech to present short messages such as help

								information</li>

								<li>Use keys to enter personal identification numbers.</li>

								<li>When using an automatic bank teller, always use a keypad to

								enter the account number.</li>

								<li>When using a weight management application, enable users to

								enter their weight using a pen or keypad.</li>

								</ul>


								<p>When privacy is not a concern, consider using speech for help

								and error messages about the current contents in the diaplay,

								possibly augmenting the display by highlighting the area in which

								the error occurs.</p>


								<p><strong>Directed user interface</strong>. While user-directed

								and mixed initiative user interfaces may be useful for experienced

								users, they are confusing and inhibiting for novice users. Directed

								user interfaces always work for all classes of users. Directed

								search provides the user with results they want quickly and

								accurately.</p>


								<p class="suggestion" id="G33">3.3. Suggestion: Use directed user

								interfaces unless the user is always knowledgeable and experienced

								in the domain.</p>


								<p><strong>Context sensitive help</strong>. As an application becomes more complex,

								  offering the user more choices, offering help becomes mandatory. For simple

								  application with fewer choices, the user may need help only the first time the

								  application is run. A novice user may not know the meaning of a field or command.</p>


								<p class="suggestion" id="G34">3.4. Suggestion: Always provide context

								sensitive help for every field and command</p>


								<p>Enable users to learn the purpose and function of every field,

								and what values can be entered into the field.</p>


								<p>Suggestion example:</p>


								<ul>

								  <li>It may not be clear to the user if the year field of a data should be two

								    digits for four digits. Context sensitive help should provide instructions

								    and possibly an example to clarify this.</li>

								<li>Enable the user to ask "what can I say" or "what can I say

								here" as well as "help." Show a list of available commands and/or

								options.</li>

								</ul>


								<p>One advantage of verbal and visual modalities is that help can be offered using

								  speech and/or GUI interfaces.</p>


								<h3 id="Reliability_Suggestions">Reliability

								Suggestions</h3>


								<p>Few situations are more frustrating to users than to have a

								device at hand but not be able to use it.</p>


								<p><strong>Operational status</strong>. Users need to know when the

								device is listening to them speak and when the device is not

								listening.</p>


								<p class="suggestion" id="G35">3.5. Suggestion: The user always

								should be able to easily determine if the device is listening to

								the user.</p>


								<p>Operational status can be presented as a light or icons

								indicating the operational status of the device.</p>


								<p>Power status. One especially frustrating situation is when the

								device suddenly goes dead because the batteries are low.</p>


								<p class="suggestion" id="G36">3.6. Suggestion: For devices with

								batteries, user always should be able to easily determine how much

								longer the device will be operational.</p>


								<p>A suggestion example is:</p>


								<ul>

								<li>Use icons to indicate present the operational status of a

								device, such as one or more icons or colors.Use a green icon to

								indicate the that the device is operational. Use yellow to indicate

								that power is in short supply. Better yet, display a meter or clock

								indicating how much time the battery will continue to support the

								operational device. (Note: because about 6 per cent of the male

								population has some degree of color blindness, always use another

								feature in addition to color. For example, use a "walking person"

								icon that is green to indiate the device is operational, a battery

								icon that is nearly emply with the color yellow to indicate that

								the power is in short supply.)</li>

								</ul>


								<p>Backup mode. In Section 1, Table 1 summarized the various

								strengths and weaknesses of using voice, pen, and keys as input

								methods. Because user tasks, environmental situations, and user

								distractions change, users should be able to switch modes when it

								becomes inconvenient or impossible to use the primary mode of

								input.</p>


								<p class="suggestion" id="G37">3.7. Suggestion: Support at least two

								input modes so one input mode can be used when the other

								cannot.</p>


								<p>Suggestion examples include:</p>


								<ul>

								<li>Enable the user to use a keypad when speaking or using a pen in

								the event that the speech or handwriting recognition engine

								fails.</li>

								<li>Enable the user to speak or type if the user loses the pen or

								input stylus.</li>

								<li>Enable the user to speak if rain or snow will damage a

								keypad.</li>

								</ul>


								<p><strong>Visual feedback</strong>. Sometimes speech recognition

								systems misrecognize the words which a user speaks. It is useful to

								present words recognized by the speech recognition system to the

								user who can verify their correctness. In speech only systems, the

								tiresome phrase "Did you say ...?" is the only option. However, in

								multimodal systems, the recognized word can be presented on a

								display.</p>


								<p class="suggestion" id="G38">3.8. Suggestion: Present words

								recognized by the speech recognition system on the display so the

								user can verify they are correct.</p>


								<p><strong>Correction mode</strong>. When the speech recognition

								fails, the user needs to correct the error by entering the correct

								word. While the user could simply speak again, a better approach is

								to display the n-best list (the list of words the the speech

								recognizer heard but did not select) so the user can select from

								among these options rather than speak again (and possibly

								experience the same error).</p>


								<p class="suggestion" id="G39">3.9. Suggestion: Display the n-best

								list to enable easy speech recognition error correction</p>


								<p><strong>Response time.</strong> Response times greater than 5

								seconds will significantly reduce usage. If a response time exceeds

								this limit, inform the user that the computer is busy processing

								the request.</p>


								<p class="suggestion" id="G310">3.10. Try to keep response times less

								than 5 seconds. Inform the user of longer response times.</p>


								<h2 id="Make_Users_Feel_Comfortable">4. Make Users

								Feel Comfortable</h2>


								<p>Users often judge a computer application by its user interface.

								If users do not like the user interface, the application will not

								be used. If the user interface is not easy to learn and easy to

								use, the application cannot be used successfully.</p>


								<h3 id="SpeakingMode">Listening mode</h3>


								<p>There are several possible listening modes, including</p>


								<ul>

								<li><em>Always listening</em> - Generally this requires an

								attention word that signals the system that the user is about to

								speak. Without first speaking the attention word, the system

								assumes that the user is speaking to another person and does not

								listen.</li>

								<li><em>Push to speak</em> - The user must remember to hold down

								the speak key while speaking to the computer</li>

								<li><em>Speak after pressing a speak key and then press the speak

								key again when finished</em> - The user must remember to press the

								speak key a second time after the speaker stops speaking to the

								computer.</li>

								<li><em>Push to activate</em> - The user only needs to press a

								speak key to speak to the computer.</li>

								</ul>


								<p>In theory, always listening would be the preferred listening mode. However,

								  this mode doesn't always work very well, and it makes heavy use of computer

								  resources. So the generally perferred mode is push to activate.</p>


								<p class="suggestion" id="G41">4.1. Suggestion: Use push to activate listening mode

								  speak to a mobile device.</p>


								<p>It is easy for users to press a speak key before talking. This

								is similar to asking for permission to speak by raising your hand.

								However, while speaking, it is desirable to concentrate on what is

								being said without worring about holding down a key or pressing a

								key when finished speaking.</p>


								<h3 id="System_Status">System Status</h3>


								<p>Users need feedback to determine whether the computer is

								processing input data, is waiting for input, or is

								malfunctioning.</p>


								<p class="suggestion" id="G42">4.2. Suggestion: Always present the

								current system status to the user.</p>


								<p>Some suggestions for indicating if the computer is idle or busy

								are shown in Table 4.</p>


								<table summary="4 columns">

								<caption>Table 4: Suggested indicators for the

								current system status</caption>

								<tr>

								<th>Mode</th>

								<th>Idle</th>

								<th>Busy</th>

								<th>Error</th>

								</tr>

								<tr>

								<td>Text</td>

								<td>"Ready for next input"</td>

								<td>"Processing, please wait"</td>

								<td>Explanation for the cause of the error and how to fix it</td>

								</tr>

								<tr>

								<td>Icons</td>

								<td>Green*</td>

								<td>Red*</td>

								<td>Blinking "danger" icon</td>

								</tr>

								<tr>

								<td>Audio</td>

								<td>Silence</td>

								<td>Sounds of a clicking clock or a percolationg coffee pot</td>

								<td>Emergency vehicle siren</td>

								</tr>

								</table>


								<p>* Note: because about 6 per cent of the male population has some

								degree of color blindness, always use another feature in addition

								to color. For example, use a "standing person" icon that is green

								to indiate the device is idle, and a "walking person" icon that is

								red to indicate that the current system is busy.</p>


								<h3 id="Human_memory_Constraints">Human-memory

								Constraints</h3>


								<p>Normally, human short-term memory holds only a limited number of items, so

								  it is necessary to keep verbal lists short. Instead of reading a list of options

								  to users, display the list so users will not forget the spoken information.</p>


								<p class="suggestion" id="G43">4.3. Suggestion: Use the screen to

								ease stress on the user's short-term memory.</p>


								<p>Suggestion examples include:</p>


								<ul>

								<li>If a list of options contains more than 3 to 4 items, display

								the list of options on a screen.</li>

								<li>If possible, display the results of a query as a table. For

								example, display travel schedules as a table instead of presenting

								verbal text.</li>

								<li>If the text contains more than two sentences, present the text

								to the user visually rather than verbally</li>

								</ul>


								<h3 id="Social_Suggestions">Social Suggestions</h3>


								<p>Social customs among people suggest suggestions for user

								interfaces between users and devices.</p>


								<p>Privacy. Speech presented by the device is not private. Others

								in close proximity can hear the computer's speech. The display

								provides greater privacy.</p>


								<p class="suggestion" id="G44">4.4. Suggestion: If the user may need

								privacy and the user is not using a headset, use a display rather

								than render speech.</p>


								<p>Speech uttered by the user is not private. Others in close

								proximity can hear both the user. The keyboard/mouse and pen

								provide greater privacy. Also, present asterisks for password

								fields.</p>


								<p class="suggestion" id="G45">4.5. Suggestion: If the user may need

								privacy while he/she enters data, use a pen or keys.</p>


								<p>Suggestion examples include:</p>


								<ul>

								<li>Use keys to enter personal identification numbers.</li>

								<li>When using an automatic bank teller, always use a keypad to

								enter the account number.</li>

								<li>When using a weight management application, enable users to

								enter their weight using a pen or keypad.</li>

								</ul>


								<p>A related suggestion is to present asterisks instead of displaying

								private information (e.g., passwords) entered by the user.</p>


								<p>Acceptance in meetings. Pen devices are accepted in meetings.

								They replace a pen and pad of paper for taking notes. Keyboards and

								keypads are becoming acceptable with the widespread use of laptops.

								However, key sounds should be turned off. Usually, devices that

								speak or are spoken to are not accepted in meetings without

								the use of earphones; and, in some cases, earphones may imply

								that the user is not interested in the current discussion.</p>


								<p class="suggestion" id="G46">4.6. Suggestion: If the device may be

								used during a business meeting or in a public place, and no headset

								is used, then use a pen or keys (with the keyboard sounds turned off).</p>


								<h3 id="Advertising_Suggestions">Advertising

								Suggestions</h3>


								<p>Techniques from the field of advertising can be applied to user

								interfaces to make them more appealing and interesting to the

								user.</p>


								<p>Important messages. Users must notice important messages.</p>


								<p class="suggestion" id="G47">4.7. Suggestion: Use animation and

								sound to attract the user's attention.</p>


								<p>A suggestion example is:</p>


								<ul>

								<li>Animate the delivery of important events and messages so users

								will notice them. Often this type of animation is accompanied with

								sound, which also attracts the users' attention.</li>

								</ul>


								<p>Caution: Users tire of animation and sound quickly. Do not

								overuse animation and sound.</p>


								<p>Navigational aids. It is easy for a user to become "lost in space" when using

								  multimodal applications.</p>


								<p class="suggestion" id="G48">4.8. Suggestion: Use landmarks to

								help the know where he is.</p>


								<p>Example Suggestions include:</p>


								<ul>

								<li>The "bong" heard at the beginning of long distance telephone

								calls indicates the service is being offered by AT&amp;T.</li>

								<li>The "Intel Inside" audio logo indicates that Intel supplied the

								computer chip inside of a computing device.</li>

								<li>Use the sound volume to indicate how close or far a user is

								from a landmark.</li>

								</ul>


								<h3 id="Ambience_Suggestion">Ambience Suggestion</h3>


								<p>Television and movie directors set the mood with set design,

								lighting, and background music. Screen layout, colors, and

								background music also create moods in multimodal user interfaces.

								However, in some cases, moods and emotion may not be appropriate in

								productivity applications.</p>


								<p class="suggestion" id="G49">4.9. Suggestion: Use audio and graphics

								design to set the mood and convey emotion in games and

								entertainment applications.</p>


								<p>Suggestion examples include:</p>


								<ul>

								<li>Use background music to introduce new scenes with the

								appropriate mood. For example, discordant music indicates trouble

								lies ahead, cheerful music signals a scene filled with goodwill,

								and a dirge indicates a depressing scene.</li>

								<li>Use background music to "set the stage." For example, classical

								music for an art museum, calliope music for a circus or fun fair,

								or bagpipes for a lonely scene in a ghost story.</li>

								</ul>


								<h3 id="Accessibility_Suggestions">Accessibility Suggestions</h3>


								<p>Some users have special needs that when fulfilled, enable them

								to gain all the benefits of computing generally available to users

								without special needs. Users with limited or no sight, limited or

								no hearing, or have a cognitive impairment should be able to

								access the computer.</p>


								<p class="suggestion" id="G410">4.10. Suggestion: For each traditional

								output technique, provide an alternative output technique.</p>


								<p>Suggestion examples include:</p>


								<ul>

								<li>Upon request, provide audio output for each visual output.

								Reading values in different voices can highlight their value

								and aid comprehension. (Some audio should be presented as sound:

								A few well designed audio sounds, used consistently, will conve

								meaning very clearly and much more quickly than spoken words.)</li>

								<li>Upon request, provide visual output for each audio output.

								Provide "closed captioning" for speech and video output. For

								verbal messages, use text equivalents or flashing icons.</li>

								<li>Consider using tactually controls such as the 12-key touch

								pad, and the four-way navigation cross + center. These can be

								powerful selection devices for the blind.</li>

								</ul>


								<p class="suggestion" id="G411">4.11. Suggestion: Enable users

								to adjust the output presentation</p>


								<p>Example suggestions include:</p>


								<ul>

								<li>Enable users to adjust the lighting and contrast of their

								display for improved readability.</li>

								<li>Enable users to adjust the volume and speech of audio for

								improved hearing.</li>

								<li>Upon request (and when privacy is not a concern), echo the

								character string typed by a user as audio.</li>

								<li>Enable users to turn off background images to avoid

								distraction.</li>

								<li>Enable blind users to turn off the screen. This will increase

								the user's privacy.</li>

								</ul>


								<p>Designing user interfaces to support accessibility generally

								results in better usability for all users.</p>


								<h2 id="Summary">Summary</h2>


								<p>Use these suggestions as a checklist when you first construct a multimodal user

								  interface. However, the final decisions about the usefulness and friendliness

								  of the user interface rest in an abundance of iterative usability testing. If

								  users do not like or cannot use the user interface, it does not matter if the

								  suggestions were followed. The user interface needs to be changed so users will

								  like and be productive with it, even when some suggestion may not have been followed.

								  The users' needs should be the foremost concern for multimodal user interface

								  designers and developers.</p>

								<h2 id="acknowledgements">Acknowledgements</h2>

								<p>The following members of the W3C Multimodal Interaction Working Group contributed

								  suggested suggestions to this Note:</p>

								<p>Deborah Dahl, W3C Invited Expert, contributed points that were raised during

								  a tutorial on Multimodal Interfaces presented at the Spring 2006 SpeechTEK/AVIOS

								  meeting.</p>

								<p>Ingmar Kliche, T-Systems, contributed suggestions based on his work with developers

								  of multimodal applications at T-Systems.</p>

								<p>Gerald McCobb, IBM, contributed suggestions based on his work with developers

								  of multimodal applications at IBM.</p>

								</body>

								</html>