server_playground/doc/www.w3.org/TR/2003/NOTE-mmi-reqs-20030108/index.html


								<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"

								    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

								<html xmlns="http://www.w3.org/1999/xhtml">

								<head>

								<meta name="generator"

								content="HTML Tidy for Linux/x86 (vers 1st April 2002), see www.w3.org" />

								<title>Multimodal Interaction Requirements</title>


								<style type="text/css">

								/*<![CDATA[*/

								.ref { font-size: 80% }

								.requirement { background-color: #C1DBFF }

								.example { background-color: #f0ffff }

								.change { background-color: #ffe0c0 }

								.ednote { font-size: 80%; color: green }

								.ednote :link { color: teal }

								.ednote :visited { color: olive }

								.c1 { display: none }

								table.smaller { font-size: 80%; margin-left: 0% }

								/*]]>*/

								</style>

								<link href="http://www.w3.org/StyleSheets/TR/W3C-NOTE"

								type="text/css" rel="stylesheet" />

								</head>

								<body lang="en">

								<div class="head">

								<p><a href="http://www.w3.org/"><img height="48" alt="W3C"

								src="http://www.w3.org/Icons/w3c_home" width="72" /></a></p>


								<h1 id="name">Multimodal Interaction Requirements</h1>


								<h2>W3C NOTE 8 January 2003</h2>


								<dl>

								<dt>This version:</dt>


								<dd><a

								href="http://www.w3.org/TR/2003/NOTE-mmi-reqs-20030108/">http://www.w3.org/TR/2003/NOTE-mmi-reqs-20030108/</a></dd>


								<dt>Latest version:</dt>


								<dd><a

								href="http://www.w3.org/TR/mmi-reqs/">http://www.w3.org/TR/mmi-reqs/</a></dd>


								<dt>Previous version:</dt>


								<dd><i>this is the first publication</i></dd>


								<dt>Editors:</dt>


								<dd>St&eacute;phane H. Maes, Oracle Corporation <a

								href="mailto:stephane.maes@oracle.com">&lt;stephane.maes@oracle.com&gt;</a></dd>


								<dd>Vijay Saraswat, Penn State University <a

								href="mailto:saraswat@cse.psu.edu">&lt;saraswat@cse.psu.edu&gt;</a></dd>


								<dt>Contributors:</dt>


								<dd>See <a href="#Acknowledgments">Acknowledgements</a></dd>

								</dl>


								<p class="copyright"><a href="http://www.w3.org/Consortium/Legal/ipr-notice#Copyright"> Copyright</a> &#xa9; 2003 <a href="http://www.w3.org/"><acronym title="World Wide Web Consortium">W3C</acronym></a><sup>&#xae;</sup> (<a href="http://www.lcs.mit.edu/"><acronym title="Massachusetts Institute of Technology">MIT</acronym></a>, <a href="http://www.ercim.org/"><acronym title="European Research Consortium for Informatics and Mathematics">ERCIM</acronym></a>, <a href="http://www.keio.ac.jp/">Keio</a>), All Rights Reserved. W3C <a href="http://www.w3.org/Consortium/Legal/ipr-notice#Legal_Disclaimer">liability</a>, <a href="http://www.w3.org/Consortium/Legal/ipr-notice#W3C_Trademarks">trademark</a>, <a href="http://www.w3.org/Consortium/Legal/copyright-documents">document use</a> and <a href="http://www.w3.org/Consortium/Legal/copyright-software">software licensing</a> rules apply.</p>

								</div>


								<hr title="Separator from Header" />

								<h2 id="abstract">Abstract</h2>


								<p>This document describes fundamental requirements for the

								specifications under development in the W3C <a

								href="http://www.w3.org/2002/mmi/">Multimodal Interaction

								Activity</a>. These requirements were derived from use case studies

								as discussed in <a href="#Appendixa">Appendix A</a>. They have been

								developed for use by the <a

								href="http://www.w3.org/2002/mmi/Group/">Multimodal Interaction

								Working Group</a> (<a

								href="http://cgi.w3.org/MemberAccess/AccessRequest">W3C Members

								only</a>), but may also be relevant to other W3C working groups and

								related external standard activities.</p>


								<p>The requirements cover general issues, inputs, outputs,

								architecture, integration, synchronization points, runtimes and

								deployments, but this document does not address application or

								deployment conformance rules.</p>


								<h2 id="statusofthisdocument">Status of this Document</h2>


								<p><em>This section describes the status of this document at the

								time of its publication. Other documents may supersede this

								document. The latest status of this document series is maintained

								at the <abbr

								title="the World Wide Web Consortium">W3C</abbr>.</em></p>


								<p>W3C's <a href="http://www.w3.org/2002/mmi/">Multimodal

								Interaction Activity</a> is developing specifications for extending

								the Web to support multiple modes of interaction. This document

								describes fundamental requirements for multimodal interaction.</p>


								<p>This document has been produced as part of the <a

								href="http://www.w3.org/2002/mmi/">W3C Multimodal Interaction

								Activity</a>,<span class="c1"><a

								href="http://www.w3.org/2002/mmi/Activity.html"></a></span>

								following the procedures set out for the <a

								href="http://www.w3.org/Consortium/Process/">W3C Process</a>. The

								authors of this document are members of the <a

								href="http://www.w3.org/2002/mmi/Group/">Multimodal Interaction

								Working Group</a> (<a

								href="http://cgi.w3.org/MemberAccess/AccessRequest">W3C Members

								only</a>). This is a Royalty Free Working Group, as described in

								W3C's <a href="/TR/2002/NOTE-patent-practice-20020124">Current

								Patent Practice</a> NOTE. Working Group participants are required

								to provide <a href="http://www.w3.org/2002/01/mmi-ipr.html">patent

								disclosures</a>.</p>


								<p>Please send comments about this document to the public mailing

								list: <a

								href="mailto:www-multimodal@w3.org">www-multimodal@w3.org</a> (<a

								href="http://lists.w3.org/Archives/Public/www-multimodal/">public

								archives</a>). To subscribe, send an email to &lt;<a

								href="mailto:www-multimodal-request@w3.org">www-multimodal-request@w3.org</a>&gt;

								with the word <em>subscribe</em> in the subject line (include the

								word <em>unsubscribe</em> if you want to unsubscribe).</p>


								<p>A list of current W3C Recommendations and other technical

								documents including Working Drafts and Notes can be found at <a

								href="http://www.w3.org/TR/">http://www.w3.org/TR/</a>.</p>


								<h2 class="notoc" id="tableofcontent">Table of contents</h2>


								<ul class="toc">

								<li><a href="#abstract">Abstract</a></li>


								<li><a href="#statusofthisdocument">Status of this

								document</a></li>


								<li><a href="#tableofcontent">Table of contents</a></li>


								<li><a href="#introduction">Introduction</a></li>


								<li><a href="#MMIframework">Multimodal interaction</a></li>


								<li><a href="#generalrequirements">1. General Requirements</a>

								<ul class="toc">

								<li><a href="#Scalabilityacrosswiderangeofdevicescapabilities">1.1

								Scalability across wide range of device capabilities</a></li>


								<li><a

								href="#Supplementaryandcomplementaryuseofdifferentmodalities">1.2

								Supplementary and complementary use of different

								modalities</a></li>


								<li><a href="#Seamlesssynchronizationofmodalities">1.3 Seamless

								synchronization of modalities</a></li>


								<li><a href="#Multilingualsupport">1.4 Multilingual

								support</a></li>


								<li><a href="#Easytoimplement">1.5 Easy to implement</a></li>


								<li><a href="#accessibility">1.6 Accessibility</a></li>


								<li><a href="#Securityandprivacy">1.7 Security and privacy</a></li>


								<li><a href="#Deliveryandcontext">1.8 Delivery and context</a></li>


								<li><a href="#Navigationspecification">1.9 Navigation

								specification</a></li>

								</ul>

								</li>


								<li><a href="#Inputmodalityrequirements">2. Input Modality

								Requirements</a>

								<ul class="toc">

								<li><a href="#Inputprocessing">2.1 Input processing</a></li>


								<li><a href="#Sequentialmultimodalinput">2.2 Sequential multimodal

								input</a></li>


								<li><a href="#Simultaneousmultimodalinput">2.3 Simultaneous

								multimodal input</a></li>


								<li><a href="#Compositemultimodalinput">2.4 Composite multimodal

								input</a></li>


								<li><a href="#Inputmodessupported">2.5 Input modes supported</a>

								<ul class="toc">

								<li><a href="#InputMUSTspecify">2.5.1 MUST specify</a></li>


								<li><a href="#InputNICEtospecify">2.5.2 NICE to specify</a></li>


								<li><a href="#InputExtensibility">2.5.3 Extensibility</a></li>

								</ul>

								</li>


								<li><a href="#SemanticsofinputgeneratedbyUIcomponents">2.6

								Semantics of input generated by UI components</a></li>


								<li><a href="#Coordinatedconstraintsandinterpretations">2.7

								Coordinated constraints</a></li>


								<li><a

								href="#Supportforconflictinginputfromdifferentmodalities">2.8

								Support for conflicting input from different modalities</a></li>


								<li><a href="#Temporalpositioningofevents">2.9 Temporal positioning

								of input events</a></li>

								</ul>

								</li>


								<li><a href="#Outputmediarequirements">3. Output Media

								Requirements</a>

								<ul class="toc">

								<li><a href="#Sequentialmediaoutput">3.1 Sequential media

								output</a></li>


								<li><a href="#Simultaneousmediaoutput">3.2. Simultaneous media

								output</a></li>


								<li><a href="#Supportedoutputmedias">3.3 Supported output

								medias</a>

								<ul class="toc">

								<li><a href="#outputMUSTspecify">3.3.1 MUST specify</a></li>


								<li><a href="#outputNicetospecify">3.3.2. Nice to specify</a></li>


								<li><a href="#outputExtensibility">3.3.3. Extensibility</a></li>

								</ul>

								</li>


								<li><a href="#Outputprocessing">3.4 Output processing</a></li>

								</ul>

								</li>


								<li><a

								href="#Architectureintegrationandsynchronizationpoints">4.Architecture,

								integration and synchronization points</a>

								<ul class="toc">

								<li><a href="#Reusestandardmarkuplanguages">4.1 Reuse standard

								markup languages</a></li>


								<li><a href="#XHTMLmodularization">4.2 XHTML

								Modularization</a></li>


								<li><a href="#CompatibilitywithXForms">4.3 Separation of data

								model, presentation layer and application logic</a></li>


								<li><a href="#Detectionofavailablemodalities">4.4 Detection of

								available modalities and changes</a></li>


								<li><a href="#Synchronizationgranularities">4.5 Synchronization

								granularities</a></li>


								<li><a href="#Independentinputandoutput">4.6 Independent input and

								output interfaces even in a same modality</a></li>


								<li><a href="#Distributedsynchronization">4.7 Distributed

								synchronization</a></li>


								<li><a href="#Distributedprocessing">4.8 Distributed

								processing</a></li>


								<li><a href="#Externalinput">4.9 External input and output</a></li>


								<li><a href="#Temporalpositioningofinputandoutputevents">4.10

								Temporal positioning of input and output events</a></li>

								</ul>

								</li>


								<li><a href="#Runtimesanddeployments">5. Runtimes and

								deployments</a>

								<ul class="toc">

								<li><a href="#Configurations">5.1 Configurations</a></li>


								<li><a href="#Mobiledeployments">5.2 Mobile deployments</a></li>


								<li><a href="#EMMA">5.3 EMMA</a></li>


								<li><a href="#Multimodalsynchronizationexchanges">5.4 Multimodal

								synchronization exchanges</a></li>

								</ul>

								</li>


								<li><a href="#References">6. References</a></li>


								<li><a href="#Acknowledgments">7. Acknowledgements</a></li>


								<li><a href="#Appendices">Appendices</a>

								<ul class="toc">

								<li><a href="#Appendixa">Appendix A: Use cases</a>

								<ul class="toc">

								<li><a href="#Overviewoftheusecases">A.1 Overview of the use

								cases</a></li>


								<li><a href="#appendixa2analysis">A.2 Event analysis</a></li>

								</ul>

								</li>


								<li><a href="#Appendixb">Appendix B: Glossary</a></li>

								</ul>

								</li>

								</ul>


								<h2 id="introduction">Introduction</h2>


								<p>Multimodal interactions extend the Web user interface to allow

								multiple modes of interaction, offering users the choice of using

								their voice, or an input device such as a key pad, keyboard, mouse

								or stylus. For output, users will be able to listen to spoken

								prompts and audio, and to view information on graphical displays.

								This capability for the user to specify the mode or device for a

								particular interaction in a particular situation is expected to

								significantly improve the user interface, its accessibility and

								reliability, especially for mobile applications. The W3C Multimodal

								Interaction Working Group (WG) is developing markup specifications

								for authoring applications synchronized across multiple modalities

								or devices with a wide range of capabilities.</p>


								<p>This document is an internal working draft prepared as part of

								the discussions on multimodal interaction requirements for

								multimodal interaction specifications.</p>


								<p>The work on the present requirement document started from the

								<em>multimodal requirements for voice markup languages public

								working draft (version 1.0)</em> published by the W3C Voice

								activity <a href="#MMReqVoice">[MM Req Voice]</a>. The outline of

								the document remains very similar.</p>


								<p>The present requirements scope the nature of the work and

								specifications that will be developed by the W3C Multimodal

								Interaction Working Group (as specified by the charter <a

								href="#MMIcharter">[MMI Charter]</a>). These intended works may be

								referred to below as "specification(s)".</p>


								<p>The requirements in this document do not express conformance

								rules on application, platform runtime implementation or

								deployment.</p>


								<p>In this document, the following conventions have been followed

								when phrasing the requirements:</p>


								<ul>

								<li>"MUST specify": The specifications will address and satisfy the

								requirement or supporting the features, starting from the first

								version.</li>


								<li>"SHOULD specify": The specifications will aim at addressing and

								satisfying the requirement or supporting the features during the

								lifetime of the working group. Early specifications will take this

								into account to allow easy and interoperable updates.</li>


								<li>"NICE to specify": The specifications will be designed with the

								requirement or feature taken into account. If a technical solution

								is available, the specifications will try to satisfy the

								requirement or support the feature, provided that it does not

								excessively delay the work plan.</li>

								</ul>


								<p>It is not required that a particular specification produced by

								the W3C MMI working group addresses <em>all</em> the requirements

								in this document. It is possible that the requirements be addressed

								by different specifications and that all the "MUST specify"

								requirement are only satisfied by combining the different

								specifications produced by the W3C Multimodal Interaction Working

								Group. However, in such a case, it should be possible to clearly

								indicate which specification will address what requirements.</p>


								<h2 id="MMIframework">Multimodal interactions</h2>


								<p>To lay the groundwork for the technical requirements, we first

								discuss an intended frame of reference for a multimodal system,

								introducing various concepts and terms that will be referred to in

								the normative sections below. For the reader's convenience, we have

								collected the concepts and terms introduced in this frame of

								reference in the <a href="#Appendixb">glossary</a>.</p>


								<p>We are interested in defining the requirements for the design of

								<a href="#multimodalsystem">multimodal systems</a> -- systems that

								support a user communicating with an application by using different

								<a href="#modality">modalities</a> such as voice (in a <a

								href="#humanlanguage">human language</a>), gesture, <a

								href="#handwriting">handwriting</a>, typing, <a

								href="#audiovisualspeech">audio-visual speech</a>, etc. The user

								may be considered to be operating in a <a

								href="#deliverycontext">delivery context</a>: a term used to

								specify the set of attributes that characterizes the capabilities

								of the access mechanism in terms of <a href="#deviceprofile">device

								profile</a>, <a href="#userprofile">user profile</a> (e.g.

								identify, preferences and usage patterns) and <a

								href="#situation">situation</a>. The user interacts with the

								application in the context of a <a href="#session">session</a>,

								using one or more modalities (which may be realized through one or

								more devices). Within a session, the user may <a

								href="#suspendresume">suspend and resume</a> interaction with the

								application within the same modality or <a

								href="#modalityswitch">switch</a> modalities. A session is

								associated with a <a href="#sessioncontext">context</a>, which

								records the interactions with the user.</p>


								<p>In multimodal systems, an <a href="#event">event</a> is a

								representation of some asynchronous occurrence of interest to the

								multimodal system. Examples include mouse clicks, hanging up the

								phone, speech recognition results or errors. Events may be

								associated with information about the user interaction e.g. the

								location the mouse was clicked. A typical event source is a user,

								such events are called <a href="#input">input events</a>. An <a

								href="#externalevent">external input event</a> is one not generated

								by a user, e.g. a <a href="#GPS">GPS</a> signal. The multimodal

								system may also produce <a href="#externalevent">external output

								events</a> for external systems (e.g. a logging system). In order

								to preserve temporal ordering, events may be <a

								href="#timestamp">time stamped</a>. Typically, events are

								formalized as generated by <a href="#eventsource">event

								sources</a>, and associated with <a href="#eventhandler">event

								handlers</a>, which <a href="#subscribe">subscribe</a> to the

								event, and are <a href="#notify">notified</a> of its occurrence.

								This is exemplified by the <a href="#XMLEvent">XML Event</a>

								model.</p>


								<p>The user typically provides input in one or more modalities, and

								receives output in one or more modalities.&nbsp; Input may be

								classified as <a href="#sequentialinput">sequential</a>, <a

								href="#simultaneousinput">simultaneous</a> or <a

								href="#compositeinput">composite</a>. Sequential input is input

								received on a single modality, though that modality can change over

								time. Simultaneous input is input received on multiple modalities,

								and treated separately by downstream processes (such as

								interpretation). Composite input is input received on multiple

								modalities at the same time and treated as a single, integrated

								"composite" input by downstream processes. Inputs are combined

								using the <a href="#coordinationcapability">coordination

								capability</a> of the multimodal system, typically driven by <a

								href="#inputconstraints">input constraints</a> or decided by the <a

								href="#dialogmanager">interaction manager</a>.</p>


								<p>Input is typically subject to <a href="#inputprocessing">input

								processing</a>. For instance, speech input may be input to a <a

								href="#speechrecognitionengine">speech recognition engine</a>

								(including, for instance, <a href="#semanticinterp">semantic

								interpretation</a> in order to extract meaningful information (e.g.

								<a href="#semanticrep">semantic representation</a>) for downstream

								processing. Note that simultaneous and composite input may be <a

								href="#conflictinginput">conflicting</a>, in that the

								interpretations of the input may not be consistent (e.g. the user

								says "yes" but clicks on "no").</p>


								<p><a>Two fundamentally different uses of multimodality may be

								identified:</a> <a href="#supplementarymm">supplementary

								multimodality</a>, and <a href="#complementarymm">complementary

								multimodality</a>. An application makes supplementary use of

								multimodality if it allows to&nbsp; carry every interaction (input

								or output) through to completion in each modality as if it was the

								only available modality. Such an application enables the user to

								select at each time the modality that is best suited to the nature

								of the interaction and the user's situation. Conversely, an

								application makes complementary use of multimodality if

								interactions in one modality are used to complement interactions in

								another. (For instance, the application may visually display

								several options in a form and aurally prompt the user "Choose the

								city to fly to".) Complementary use may help a particular class of

								users (e.g. those with dyslexia). Note that in an application

								supporting complementary use of different modalities each

								interaction may not be acessible separately in each modality.

								Therefore it may not be possible for the user to determine which

								modality to use. Instead, the document author may prescribe the

								modality (or modalities) to be used in a particular

								interaction.</p>


								<p>The <a href="#synchronizationbehavior">synchronization

								behavior</a> of an application describes the way in which any input

								in one modality is reflected in the output in another modality, as

								well as the way input is combined across modalities (<a

								href="#coordinationcapability">coordination capability</a>). The <a

								href="#synchronizationlevel">synchronization granularity</a>

								specifies the level at which the application coordinates

								interactions. The application is said to exhibit <em>event-level

								synchronization</em> if user inputs in one modality are captured at

								the level of the individual <a href="#DOM">DOM</a> events and

								immediately reflected in the other modality. The application

								exhibits <em>field-level synchronization</em> if inputs in one

								modality are reflected in the other after the user changes focus

								(e.g. moves from input field to input field) or completes the

								interaction (e.g. completes a select in a menu). The application

								exhibits <em>form-level synchronization</em> if inputs in one

								modality are reflected in the other only after a particular point

								in the presentation is reached (e.g. after a certain number of

								fields have been completed in the form).</p>


								<p>The output generatedstatus by a multimodal system can take

								various forms, e.g. audio (including spoken prompts and playback,

								e.g. using <a href="#NLG">natural language generation</a>, <a

								href="#TTS">text-to-speech (TTS)</a> which <a

								href="#synthesis">synthesizes</a> audio), visual (e.g. XHMTL or SVG

								markup rendered on displays), <a

								href="#lipsynch">lipsynch</a>(multimedia output in which there is a

								visual rendition of a face whose lip movements are synchronized

								with the audio), etc. Of relevance here is the W3C Recommendation

								<a href="#SMIL">SMIL 2.0</a> which enables simple authoring of

								interactive audiovisual applications and supports <a

								href="#mediasynch">media synchronization</a>.</p>


								<p>Interaction (input, output) between the user and the application

								may often be conceptualized as a series of dialogs, manged by an <a

								href="#dialogmanager">interaction manager</a>. A dialog is an

								interaction between the user and the application which involves

								<em>turn taking</em>. In each turn, the interaction manager manager

								(working on behalf of the application) collects input from the

								user, processes it (using the session context and possibly external

								knowledge sources) to determine , computes a response and updates

								the presentation for the user. An interaction manager generates or

								updates the presentation by processing user inputs, session context

								and possibly other external knowledge sources to determine the

								intent of the user. An interaction manager relies on strategies to

								determine focus and intent as well as to disambiguate, correct and

								confirm sub-dialogs. We typically distinguish <a

								href="#directeddialog">directed dialogs</a> (e.g. user-driven or

								application-driven) and <a href="#mixedinitiative">mixed

								initiative</a> or free flow dialogs.</p>


								<p>The interaction manager may use (1) inputs from the user, (2)

								the session context, (3) external knowledge sources, and (4)

								disambiguation, correction, and configuration sub-dialogs to

								determine the user's focus and intent. Based on the user's focus

								and intent, the interaction manager also (1) maintains the context

								and state of the application, (2) manages the composition of inputs

								and synchronization across modalities, (3) interfaces with business

								logic, and (4) produces output for presentation to the user. In

								some architectures, the interaction manager may have <a

								href="#distributedcomponents">distributed components</a>, utilizing

								an event based mechanism for coordination.</p>


								<p>Finally, in this document, we use the term <a

								href="#executionmodel">configuration or execution model</a> to

								refer to the runtime structure of the various system components and

								their interconnection, in a particular manifestation of a

								multimodal system.</p>


								<h2 id="generalrequirements">1. General Requirements</h2>


								<h3 id="Scalabilityacrosswiderangeofdevicescapabilities">1.1

								Scalability across wide range of device capabilities</h3>


								<p>It is the intent of the WG to define specifications that apply

								to a variety of multimodal capabilities and deployment

								conditions.</p>


								<p class="requirement"><strong><a id="MMI-G1"

								name="MMI-G1">(MMI-G1)</a>:</strong> The multimodal specifications

								MUST support authoring multimodal applications for a wide range of

								multimodal capabilities (MUST specify).</p>


								<p>The specifications should support different combinations of

								input and output modalities, <a

								href="#synchronizationlevel">synchronization granularity</a>, <a

								href="#configuration">configurations</a> and <a

								href="#device">devices</a>. Some aspects of this requirement are

								elaborated in detail below. For instance, the range of <a

								href="#synchronizationlevel">synchronization granularity</a> is

								addressed by requirement <a href="#MMI-A6">MMI-A6</a>.</p>


								<p>It is advantageous that the specifications allow the application

								developer to author a single version of the application, instead of

								multiple versions targeted at combinations of multimodal

								capabilities.</p>


								<p class="requirement"><strong><a id="MMI-G2"

								name="MMI-G2">(MMI-G2)</a>:</strong> The multimodal specifications

								SHOULD support authoring multimodal applications once for

								deployment on difference devices with different multimodal

								capabilities (NICE to specify).</p>


								<p>The multimodal capabilities may differ based on available

								modalities, presentation and interaction capability for each

								modality (modality-specific delivery context), synchronization

								granularity, available devices and their configurations

								etc...&nbsp; They are to be captured in the delivery context

								associated to the multimodal system.</p>


								<h3 id="Supplementaryandcomplementaryuseofdifferentmodalities">1.2

								Supplementary and complementary use of different modalities</h3>


								<p class="requirement"><strong><a id="MMI-G3"

								name="MMI-G3">(MMI-G3)</a>:</strong> The multimodal specifications

								MUST support <a href="#supplementarymm">supplementary</a> use of

								modalities (MUST specify).</p>


								<p>Supplementary use of modalities in multimodal applications

								significantly improves accessibility of the applications. The user

								may select the modality best used to the nature of the interaction

								and the context of use.</p>


								<p>When supported by the runtime or prescribed by the author, it

								may be possible for the user to combine modalities as discussed for

								example in requirement <a href="#MMI-I7">MMI-I7</a> about composite

								input.</p>


								<p class="requirement"><strong><a id="MMI-G4"

								name="MMI-G4">(MMI-G4)</a>:</strong> The multimodal specifications

								MUST support <a href="#complementarymm">complementary</a> use of

								modalities (MUST specify).</p>


								<p>Authors of multimodal applications that rely on complementary

								multimodality should pay special attention to the accessibility of

								the application, for example by ensuring accessibility in each

								modality or by providing supplementary alternatives.</p>


								<h3 id="Seamlesssynchronizationofmodalities">1.3 Seamless

								synchronization of modalities</h3>


								<p class="requirement"><span class="requirement"><strong><a

								id="MMI-G5" name="MMI-G5">(MMI-G5)</a>:</strong> The multimodal

								specifications will be designed such that an author can write

								applications where the <a

								href="#synchronizationbehavior">synchronization</a> of the various

								modalities is seamless from the user's point of view (MUST

								specify).</span></p>


								<p>To elaborate, an interaction event or an external event in one

								modality results in a change in another; based on the <a

								href="#synchronizationlevel">synchronization granularity</a>

								supported by the application. See <a

								href="#Synchronizationgranularities">section 4.5</a> for a

								discussion of synchronization granularities.</p>


								<p style="margin-top: 0; margin-bottom: 0">Seamlessness can

								encompass multiple aspects:</p>


								<ul>

								<li>

								<p style="margin-top: 0; margin-bottom: 0">Limited latency in the

								synchronization behavior with respect to what is expected by the

								user for the particular application and multimodal

								capabilities.</p>

								</li>


								<li>

								<p style="margin-top: 0; margin-bottom: 0">Predictable,

								non-confusing multimodal behavior</p>

								</li>

								</ul>


								<p>Expanding on the considerations made in <a

								href="#Scalabilityacrosswiderangeofdevicescapabilities">section

								1.1</a>, it is important to support authoring for any granularity

								of synchronization covered in <a href="#MMI-A6">(MMI-A6)</a>:</p>


								<p class="requirement"><span class="requirement"><strong><a

								id="MMI-G6" name="MMI-G6">(MMI-G6)</a>:</strong> The multimodal

								specifications MUST support authoring seamless synchronization of

								various modalities for any any</span> <a

								href="#synchronizationlevel">synchronization granularity</a> <span

								class="requirement">and <a

								href="#coordinationcapability">coordination capabilities</a> (MUST

								specify).</span></p>


								<p>Coordination is defined as the capability to combine multimodal

								inputs into composite inputs based on an interpretation algorithm

								that decides what makes sense to combine based on the context.

								Composite inputs are further discussed in <a

								href="#Compositemultimodalinput">section 2.4</a>. It is a notion

								different from synchronization granularity described in <a

								href="#Synchronizationgranularities">section 4.5</a>.</p>


								<p>The following requirement is proposed in order to address the

								combinatorial explosion of synchronization granularities that the

								application developer must author for.</p>


								<p class="requirement"><span class="requirement"><strong><a

								id="MMI-G7" name="MMI-G7">(MMI-G7)</a>:</strong> The multimodal

								specifications SHOULD support authoring seamless synchronization of

								various modalities once for deployment across with a whole range

								of</span> <a href="#synchronizationlevel">synchronization

								granularity</a> <span class="requirement">or <a

								href="#coordinationcapability">coordination capabilities</a> (NICE

								to specify).</span></p>


								<p>This requirement addresses the capability for the application

								developer to write the application once for a particular

								synchronization granularity or coordination capability and to have

								the application able to adapt its synchronization behavior when

								other levels are available.</p>


								<h3 id="Multilingualsupport">1.4 Multilingual support</h3>


								<p>Multimodal applications are not different from any other web

								applications. It is important that the specifications be not

								limited to specific languages.&nbsp;</p>


								<p class="requirement"><span class="requirement"><strong><a

								id="MMI-G8" name="MMI-G8">(MMI-G8)</a>:</strong> The multimodal

								specifications MUST support authoring multimodal applications in

								any <a href="#humanlanguage">human language</a> (MUST

								specify).</span></p>


								<p>In particular, it must be possible to apply conventional methods

								for localization and internationalization of applications.</p>


								<p class="requirement"><strong><a id="MMI-G9"

								name="MMI-G9">(MMI-G9)</a><a>:</a></strong> The multimodal

								specification MUST not preclude the capability to move multimodal

								application from one <span class="requirement"><a

								href="#humanlanguage">human language</a></span> to another, without

								having to rewrite the whole application (MUST specify).</p>


								<p>For example, it should be possible to encapsulate

								language-specific items, separately encapsulated from the

								language-independent description.</p>


								<h3 id="Easytoimplement">1.5 Easy to implement</h3>


								<p>It is important that multimodal applications remain easy to

								author and deploy in order to allow wide adoption by the web

								community.&nbsp;</p>


								<p class="requirement"><span class="requirement"><strong><a

								id="MMI-G10" name="MMI-G10">(MMI-G10)</a>:</strong> The multimodal

								specifications produced by the MMI working group MUST be easy to

								implement and use (MUST specify).</span></p>


								<p>This is a generic requirement that requires designers to

								consider from the outset issues of: ease-of-authoring by

								application developers; ease-of-implementation by platform

								developers and ease-of-use by the user. Thus it affects authoring,

								platform implementation and deployment.</p>


								<p>The following requirement qualifies this further to guarantee

								that the specifications will be widely deployable with existing

								technologies (e.g. standards, network and client capabilities

								etc...)</p>


								<p class="requirement"><span class="requirement"><strong><a

								id="MMI-G11" name="MMI-G11">(MMI-G11)</a>:</strong> The multimodal

								specifications produced by the <a href="#MMIWG">MMI working

								group</a> MUST depend only on technologies that are widely

								available during the lifetime of the working group (MUST

								specify).</span></p>


								<p>For W3C specifications, wide availability is understood as

								having reached at least the stage of candidate recommendation.</p>


								<p>Related considerations are made in<a

								href="#Reusestandardmarkuplanguages">section 4.1</a>.</p>


								<h3 id="accessibility">1.6 Accessibility</h3>


								<p>Multimodal applications will provide mechanisms to develop and

								deploy accessible applications as discussed in <a

								href="#Supplementaryandcomplementaryuseofdifferentmodalities">section

								1.2</a>.</p>


								<p>In addition, it is important that, as for all other web

								applications; the following requirement be satisfied:</p>


								<p class="requirement"><strong><a id="MMI-G12"

								name="MMI-G12">(MMI-G12)</a>:</strong> The multimodal

								specifications produced by the <span class="requirement"><a

								href="#MMIWG">MMI working group</a></span> MUST not preclude

								conforming to the W3C accessibility guidelines (MUST specify).</p>


								<p>This is especially important for applications that make

								complementary use of modalities.</p>


								<h3 id="Securityandprivacy">1.7 Security and privacy</h3>


								<p>Early deployments of multimodal applications show that security

								and privacy issues can be very critical for multimodal deployments.

								While addressing these issues is not directly within the scope of

								the W3C Multimodal Interaction Working Group, it is important that

								these issues be considered.</p>


								<p class="requirement"><strong><a id="MMI-G13"

								name="MMI-G13">(MMI-G13)</a>:</strong> The multimodal

								specifications SHOULD be aligned with the W3C work and

								specifications for security and privacy (SHOULD specify).</p>


								<p>The follow<span style="color: #000000">ing sec</span><span

								style="color: #000000">urity and privacy issues have been

								identified for multimodal and multi-device interact</span><span

								style="color: #000000">ions.</span></p>


								<ul style="color: #000000">

								<li style="color: #000000"><span

								style="color: #000000">Security:</span>

								<ul style="color: #000000">

								<li><span style="color: #000000">In some distributed

								configurations:</span>

								<ul>

								<li>the exchange of interaction events that can be intercepted by

								unauthorized third parties. This would enable reconstruction of the

								complete interaction with the application; especially in between

								submits to the backend. Any note, temporary selections etc would be

								accessible!</li>


								<li>unauthorized third parties may be able to issue presentation

								manipulations that would affect the user agent.</li>

								</ul>

								</li>

								</ul>

								</li>


								<li>Privacy:

								<ul>

								<li>In some distributed configurations, the interaction events may

								enable reconstruction of the complete interaction with the

								application, including in between submits to the backend. This

								information or aspect of it may be considered as private by the

								user.</li>


								<li>User profiles (preferences and usage habits), used to optimize

								the user's interaction with multimodal applications, includes

								information that users may consider as private.</li>

								</ul>

								</li>

								</ul>


								<p>Other considerations and issues may exist and should be

								compiled.</p>


								<h3 id="Deliveryandcontext">1.8 Delivery and context</h3>


								<p>Notions of profile and <a href="#deliverycontext">delivery

								context</a> have been widely introduced to characterize the the

								capabilities of devices and preferences of users.</p>


								<p>From a multimodal point of view, different types of profiles are

								relevant:</p>


								<ul>

								<li><a href="#userprofile">User profile</a> that may include user

								credentials and user preferences and usage patterns that captures

								the information manually or automatically the way that a user

								interacts or likes to interact with a multimodal application</li>


								<li><a href="#deviceprofile">Device profiles</a> that captures the

								characteristics the capability of a particular devices used to

								access an application.</li>

								</ul>


								<p>These profiles are combined into the notion of <a

								href="#deliverycontext">delivery context</a> introduced by the W3C

								device independent activity <a href="#DIactivity">[DI Activity]</a>

								. The delivery context captures the set of attributes that

								characterize the capabilities of the access mechanism (device or

								devices) (device profile), the dynamic preferences of the user (as

								they relates to interaction through this device) and <a

								href="#configuration">configurations</a>. Delivery context may

								dynamically change as the application progresses, as the user

								situation changes (situationalization) or as the number and

								configurations of the devices change.</p>


								<p>CC/PP is an example of formalism to describe and exchange the

								delivery context <a href="#CCPP">[CC/PP]</a>.</p>


								<p>Users of multimodal interactions will expect to be able to rely

								on these profiles to optimize the way that multimodal applications

								are presented to them.</p>


								<p class="requirement"><span class="requirement"><strong><a

								id="MMI-G14" name="MMI-G14">(MMI-G14)</a>:</strong> The multimodal

								specifications MUST enable optimization and adaptation of

								multimodal applications based on <a

								href="#deliverycontext">delivery context</a> or dynamic changes of

								delivery context (MUST specify).</span></p>


								<p>Dynamic changes of delivery context encompass situations where

								available devices, modalities and configurations; or usage

								preferences dynamically. These changes can be involuntary or

								initiated by the user, the application developer or the service

								providers.</p>


								<p class="requirement"><span class="requirement"><strong><a

								id="MMI-G15" name="MMI-G15">(MMI-G15)</a>:</strong> The multimodal

								specifications MUST enable authors to specify how <a

								href="#deliverycontext">delivery context</a> and changes of

								delivery context affect the multimodal interface of a particular

								application (MUST specify).</span></p>


								<p>The description of such impacts on a multimodal application

								could be specified by the author but modified by the user, platform

								vendor or service provider. In particular, the author can describe

								how the application can be be affected or adapted to the delivery

								context but the user and service providers should be able to modify

								the delivery context. <em>Other use cases should also be

								considered.</em></p>


								<h3 id="Navigationspecification">1.9 Navigation specification</h3>


								<p>It is expected that the author of multimodal application should

								always be able to specify the expected flow of navigation (i.e.

								sequence of interaction) through the application or the algorithm

								to determine such a flow (e.g. in mixed initiative cases). This

								leads to the following requirement:</p>


								<p class="requirement"><span class="requirement"><strong><a

								id="MMI-G16" name="MMI-G16">(MMI-G16)</a>:</strong> The multimodal

								specifications MUST enable the author of an application to describe

								the navigation flow through the application or indicate the

								algorithms to determine the navigation flow (MUST

								specify).</span></p>


								<h2 id="Inputmodalityrequirements">2. Input Modality

								Requirements</h2>


								<h3 id="Inputprocessing">2.1 Input processing</h3>


								<p>Numerous modalities or input types require some form of

								processing before the nature of the input is identified. For

								instance, speech input requires speech detection and speech

								recognition which requires specific data files (e.g. grammars,

								language models etc). Similarly handwritten input requires

								recognition.</p>


								<p class="requirement"><strong><a id="MMI-I1"

								name="MMI-I1">(MMI-I1)</a>:</strong> The multimodal specifications

								MUST provide a mechanism to specify and attach modality related

								information when authoring a multimodal application.&nbsp; (MUST

								specify).</p>


								<p>This implies that authors should be able to include

								modality-related information, such as the media types, processing

								requirements or fallback mechanisms that a user agent will need for

								the particular modality. Mechanisms should be available to make

								this available to the user agent.</p>


								<p>For example, audio input may be recognized (speech recognizer),

								recorded or processed by speaker recognizers, natural language

								processing, using specific data files (e.g. grammar, language

								model), etc. The author must be able to completely define such

								processing steps.</p>


								<h3 id="Sequentialmultimodalinput">2.2 Sequential multimodal

								input</h3>


								<p class="requirement"><span class="requirement"><strong><a

								id="MMI-I2" name="MMI-I2">(MMI-I2)</a>:</strong> The multimodal

								specifications developed by the MMI working group MUST support <a

								href="#sequentialinput">sequential multimodal input</a> (MUST

								specify).</span></p>


								<p>It im<span style="color: #000000">plies that</span></p>


								<ul style="color: #000000">

								<li class="requirement"><span class="requirement"><strong><a

								id="MMI-I2a" name="MMI-I2a">(MMI-I2a)</a>:</strong> It MUST be

								possible to author <a href="#sequentialmm">sequential

								multimodal</a> applications, where inputs across modality are

								provided sequentially (MUST specify).</span></li>


								<li class="requirement"><strong><a id="MMI-I2b"

								name="MMI-I2b">(MMI-I2b)</a>:</strong> It MUST be possible to

								specify what modality or device to use for input in <a

								href="#sequentialmm">sequential multimodality</a> and hint or

								enforce <a href="#modalityswitch">modality switches</a>. This is an

								application developer's capability (MUST specify).</li>


								<li class="requirement"><span class="requirement"><strong><a

								id="MMI-I2c" name="MMI-I2c">(MMI-I2c)</a>:</strong> The

								specifications MUST enable writing multimodal applications where

								the user can select what modality or device to use at any time

								based on the user's <a href="#situation">situation</a> and the

								nature of the input interactions. More concretely, the

								specifications must support writing multimodal applications that

								can be accessed through each modality alone, and that support <a

								href="#modalityswitch">modality switches</a> whenever desired by

								the user (MUST specify).</span></li>

								</ul>


								<h3 id="Simultaneousmultimodalinput">2.3 Simultaneous multimodal

								input</h3>


								<p class="requirement"><span class="requirement"><strong><a

								id="MMI-I3" name="MMI-I3">(MMI-I3)</a>:</strong> The multimodal

								specifications developed by the MMI working group MUST support <a

								href="#simultaneousinput">simultaneous multimodal input</a> (MUST

								specify).</span></p>


								<p class="requirement"><strong><a id="MMI-I4"

								name="MMI-I4">(MMI-I4)</a>:</strong> The multimodal specifications

								MUST enable the author to specify the <a

								href="#synchronizationlevel">granularity of input

								synchronization</a> (MUST specify).</p>


								<p>It should be remarked, however, that the actual granularity of

								input synchronization may be decided by the user, by the runtime or

								by the network (delivery context) or some combination thereof.</p>


								<p class="requirement"><span class="requirement"><strong><a

								id="MMI-I5" name="MMI-I5">(MMI-I5)</a>:</strong> The multimodal

								specifications MUST enable the author to specify how the multimodal

								application evolves when the</span> <a

								href="#synchronizationlevel">granularity of input

								synchronization</a> <span class="requirement">is modified by

								external factors (MUST specify).</span></p>


								<p>This requirement enables the application developer to specify

								how the performance of the application can degrade gracefully with

								changes in the input mechanism. For instance, it should be possible

								to access an application designed for event-level or field-level

								synchronization between voice (on the server side) and GUI (on the

								terminal) on a network that permits only session-level

								synchronization (that is, permits only <a

								href="#sequentialmm">sequential multimodality</a>).</p>


								<p class="requirement"><span class="requirement"><strong><a

								id="MMI-I6" name="MMI-I6">(MMI-I6)</a>:</strong> The multimodal

								specifications SHOULD enable a default input synchronization

								behavior and provide "overwrite" mechanisms (SHOULD

								specify).</span></p>


								<p>Therefore, it should be possible to author multimodal

								applications while assuming a default synchronization behavior. For

								example, <a href="#supplementarymm">supplementary</a> event-level

								multimodal <a href="#synchronizationlevel">synchronization

								granularity</a>.</p>


								<h3 id="Compositemultimodalinput">2.4 Composite multimodal

								input</h3>


								<p class="requirement"><strong><a id="MMI-I7"

								name="MMI-I7">(MMI-I7)</a>:</strong> The multimodal specifications

								developed by the MMI working group MUST support <a

								href="#compositeinput">composite multimodal input</a> (MUST

								specify).</p>


								<p class="requirement"><span class="requirement"><strong><a

								id="MMI-I8" name="MMI-I8">(MMI-I8)</a>:</strong> The multimodal

								specifications SHOULD allow the author to specify how input

								combination is achieved, possibly taking into account the <a

								href="#coordinationcapability">coordination capabilities</a>

								available in the given <a href="#deliverycontext">delivery

								context</a> &nbsp;(NICE to specify).</span></p>


								<p>This can be achieved with explicit scripts that describe the

								interpretation and composition algorithms. On the other hand, it

								may also be left to the <a href="#dialogmanager">interaction

								manager</a> to apply an interpretation strategy that includes

								composition, for example by determining the most sensible

								interpretation given the <a href="#sessioncontext">session

								context</a> and therefore determining what input combination (if

								any) to select. This is addressed by the following requirement.</p>


								<p class="requirement"><span class="requirement"><strong><a

								id="MMI-I9" name="MMI-I9">(MMI-I9)</a>:</strong> The multimodal

								specifications SHOULD enable the author to specify the mechanism

								used to decide when coordinated inputs are to be combined and how

								they are combined (NICE to specify).</span></p>


								<p>Possible ways to address this include:</p>


								<ul>

								<li>Time windowing</li>


								<li>Interaction management strategy or algorithms based on ordering

								of events and context</li>

								</ul>


								<h3 id="Inputmodessupported">2.5 Input modes supported</h3>


								<h4 id="InputMUSTspecify">2.5.1 MUST specify</h4>


								<p class="requirement"><span class="requirement"><strong><a

								id="MMI-I10" name="MMI-I10">(MMI-I10)</a>:</strong> The multimodal

								specifications must support the description of input to be obtained

								from:</span></p>


								<ul class="requirement">

								<li>Keyboard / keypad (e.g. keyboard (i.e. Qwerty (i.e. US

								keyboard), etc), Handset (e.g. DTMF) or customized keypad).</li>


								<li>Pointing devices (e.g. mouse, stylus, touch screen)</li>


								<li>Combined input interfaces like joystick and game

								controllers</li>


								<li>Audio input (e.g. speech input to be recognized or

								recorded)</li>


								<li>Video input</li>


								<li>Sign languages</li>


								<li>Pen / stylus handwriting and stroke input.

								<ul>

								<li>(hand-writing script and hand-writing gesture - e.g. to delete,

								to insert)</li>


								<li>This incorporates stroke input and recognized handwriting. This

								is expected to be addressed by requirement <a

								href="#MMI-I1">MMI-I1</a>.</li>

								</ul>

								</li>

								</ul>


								<p class="requirement"><span class="requirement">(MUST

								specify).</span></p>


								<h4 id="InputNICEtospecify">2.5.2 NICE to specify</h4>


								<p class="requirement"><span class="requirement"><strong><a

								id="MMI-I11" name="MMI-I11">(MMI-I11)</a>:</strong> The multimodal

								specifications SHOULD support other input modes,

								including:</span></p>


								<ul class="requirement">

								<li>gaze recognition (e.g. as a pointer).

								<ul>

								<li>This is also expected to be covered by the "pointing" aspect of

								<a href="#MMI-I10">MMI-10</a> and requirement <a

								href="#MMI-I1">MMI-I1</a>.</li>

								</ul>

								</li>


								<li>Combined audio-visual speech recognition.</li>


								<li>Haptic and tactile input</li>


								<li>Non-spoken audio input (e.g. hummed tune, songs</li>

								</ul>


								<p class="requirement"><span class="requirement">(NICE to

								specify).</span></p>


								<h4 id="InputExtensibility">2.5.3 Extensibility</h4>


								<p class="requirement"><strong><a id="MMI-I12"

								name="MMI-I12">(MMI-I12)</a>:</strong> The multimodal

								specifications MUST describe how extensibility is to be achieved

								and how new devices or modalities can be added (MUST specify).</p>


								<h3 id="SemanticsofinputgeneratedbyUIcomponents">2.6 Semantics of

								input generated by UI components</h3>


								<p class="requirement"><span class="requirement"><strong><a

								id="MMI-I13" name="MMI-I13">(MMI-I13)</a>:</strong> The multimodal

								specifications MUST support the representation of the meaning of a

								user input (MUST specify).</span></p>


								<ul>

								<li class="requirement"><span><strong><a id="MMI-I14"

								name="MMI-I14">(MMI-I14)</a>:</strong> The representation of the

								meaning may be modality or device dependent. However, whenever

								possible, the representation of the meaning SHOULD be independent

								of the input modality (NICE to specify).</span></li>


								<li class="requirement"><span class="requirement"><strong><a

								id="MMI-I15" name="MMI-I15">(MMI-I15)</a>:</strong> The

								representation of the input SHOULD indicate the modality(ies) where

								the input(s) was (were) provided (SHOULD specify).</span></li>

								</ul>


								<h3 id="Coordinatedconstraintsandinterpretations">2.7 Coordinated

								constraints</h3>


								<p class="requirement"><span class="requirement"><strong><a

								id="MMI-I16" name="MMI-I16">(MMI-I16)</a>:</strong> The multimodal

								specifications MUST enable to coordinate the <a

								href="#inputconstraints">input constraints</a> across modalities

								(MUST specify).</span></p>


								<p>Input constraints specify, for example through grammars, how

								inputs are can be combined via rules or interaction management

								strategies. For example the markup language may coordinates

								grammars for modalities other than speech with speech grammars to

								avoid duplication of effort in authoring multimodal grammars.</p>


								<p>Possible ways to address this could include:</p>


								<ul>

								<li>Coordinated Grammars.</li>


								<li>Constraints expressed in the data model (e.g. XForms, XML

								Schema)</li>


								<li>Interaction management algorithm or strategy.</li>

								</ul>


								<p>These methods will be considered during the specification

								work.</p>


								<h3 id="Supportforconflictinginputfromdifferentmodalities">2.8

								Support for conflicting input from different modalities</h3>


								<p>When using multiple modalities or user agents, a user may

								introduce errors consciously or inadvertently. For example in a

								voice and GUI multimodal application, the user may say "yes"

								simultaneously click on "no" in the user interface. We require that

								the specifications detect such conflict.</p>


								<p class="requirement"><strong><a id="MMI-I17"

								name="MMI-I17">(MMI-I17)</a>:</strong> The multimodal

								specifications MUST support the detection of conflicting input from

								several modalities (MUST specify).</p>


								<p>It is naturally expected that the author will specify how to

								handle the conflict through an explicit script or piece of code. It

								is also possible that an interaction management strategy will be

								able to detect the possible conflict and provide a strategy or

								sub-dialog to resolve it.</p>


								<h3 id="Temporalpositioningofevents">2.9 Temporal positioning of

								input events</h3>


								<p>The <a href="#dialogmanager">interaction manager</a> should be

								able to place different input events on the timeline, in order to

								determine the intent of the user.</p>


								<p class="requirement"><strong><a id="MMI-I18"

								name="MMI-I18">(MMI-I18)</a>:</strong> The multimodal

								specifications MUST provide mechanisms to position the input events

								relatively to each other in time (MUST specify).</p>


								<p class="requirement"><span class="requirement"><strong><a

								id="MMI-I19" name="MMI-I19">(MMI-I19)</a>:</strong> The multimodal

								specifications SHOULD provide mechanisms to allow for temporal

								grouping of input events (SHOULD specify).</span></p>


								<p>These requirements may by satisfied by mechanisms to order of

								the input events or, when needed, relative time stamping. For some

								configurations, this may involve clock synchronization.</p>


								<h2 id="Outputmediarequirements">3. Output Media Requirements</h2>


								<h3 id="Sequentialmediaoutput">3.1 Sequential media output</h3>


								<p class="requirement"><span class="requirement"><strong><a

								id="MMI-O1" name="MMI-O1">(MMI-O1)</a>:</strong> The multimodal

								specifications developed by the MMI working group MUST support

								sequential media output (MUST specify).</span></p>


								<p>As <a href="#SMIL">SMIL</a> supports the sequencing of medias,

								the specification is expected to rely on similar mechanism. This is

								addressed in more details in other requirements.</p>


								<p>It im<span style="color: #000000">plies that</span></p>


								<ul>

								<li class="requirement"><strong><a id="MMI-O1a"

								name="MMI-O1a">(MMI-O1a)</a>:</strong> It MUST be possible to

								author <a href="#sequentialmm">sequential multimodal</a>

								applications where output medias are sequentially presented to the

								user (MUST specify).</li>


								<li class="requirement"><strong><a id="MMI-O1b"

								name="MMI-O1b">(MMI-O1b)</a>:</strong> It MUST be possible to

								specify what modality or device to use for output in <a

								href="#sequentialmm">sequential multimodal</a> and hint or enforce

								<a href="#modalityswitch">modality switches</a>. This is an

								application developer's capability (MUST specify).</li>


								<li class="requirement"><strong><a id="MMI-O1c"

								name="MMI-O1c">(MMI-O1c)</a>:</strong> The specifications MUST

								enable writing multimodal applications where the user can select

								what modality to use at any time based on the user's <a

								href="#situation">situation</a> and the nature of the output

								interactions. More concretely, the specifications must support

								writing multimodal applications that can be accessed through each

								modality alone, and that support <a href="#modalityswitch">modality

								switches</a> whenever desired by the user (MUST specify).</li>

								</ul>


								<h3 id="Simultaneousmediaoutput">3.2. Simultaneous media

								output</h3>


								<p class="requirement"><strong><a id="MMI-O2"

								name="MMI-O2">(MMI-O2)</a>:</strong> The multimodal specifications

								MUST provide the ability to synchronize different output medias

								with different granularities (MUST specify).</p>


								<p>This covers simultaneous outputs. The granularity of output

								synchronization as provided by SMIL may range from no

								synchronization at all between the medias other than the play in

								parallel to tightly synchronization mechanisms.</p>


								<p class="requirement"><span class="requirement"><strong><a

								id="MMI-O3" name="MMI-O3">(MMI-O3)</a>:</strong> The multimodal

								specifications MUST enable the author to specify the granularity of

								output synchronization (MUST specify).</span></p>


								<p>However, it should be possible that the granularity of output

								media synchronization be decided by the user (delivery context)

								runtime or network.</p>


								<p class="requirement"><strong><a id="MMI-O4"

								name="MMI-O4">(MMI-O4)</a>:</strong> The multimodal markup MUST

								enable the author to specify how the multimodal application degrade

								when the granularity of output synchronization is modified by

								external factors (MUST specify).</p>


								<p class="requirement"><strong><a id="MMI-O5"

								name="MMI-O5">(MMI-O5)</a>:</strong> The multimodal specifications

								SHOULD rely on a default output synchronization behavior for a

								particular <span class="requirement">granularity and it should

								provide "overwrite" mechanisms</span> (SHOULD specify)</p>


								<h3 id="Supportedoutputmedias">3.3 Supported output medias</h3>


								<h4 id="outputMUSTspecify">3.3.1 MUST specify</h4>


								<p class="requirement"><strong><a id="MMI-O6"

								name="MMI-O6">(MMI-O6)</a>:</strong> The multimodal specifications

								MUST support as output media:</p>


								<ul class="requirement">

								<li>Audio, including spoken prompts and playback</li>


								<li>Visual (XHTML, SVG), encompassing different display

								characteristics (monitor, PDA, smart phone etc...)</li>


								<li>SMIL objects (animation, audio, img, video, text,

								textstream)</li>


								<li>Synthesis of audio,</li>


								<li>MIDI</li>


								<li>Streaming</li>


								<li>Sign languages</li>

								</ul>


								<p class="requirement">(MUST specify).</p>


								<h4 id="outputNicetospecify">3.3.2. Nice to specify</h4>


								<p class="requirement"><strong><a id="MMI-O7"

								name="MMI-O7">(MMI-O7)</a>:</strong> The multimodal specifications

								SHOULD support additional media outputs like:</p>


								<ul class="requirement">

								<li>media types supported by CSS3</li>


								<li>lip-synch face synthesis</li>


								<li>tactile and haptic output</li>

								</ul>


								<p><span class="requirement">(NICE to specify).</span></p>


								<h4 id="outputExtensibility">3.3.3. Extensibility</h4>


								<p class="requirement"><strong><a id="MMI-O8"

								name="MMI-O8">(MMI-O8)</a>:</strong> The multimodal specifications

								MUST describe how extensibility is to be achieved and how new

								output medias can be added (MUST specify).</p>


								<h3 id="Outputprocessing">3.4 Output processing</h3>


								<p class="requirement"><strong><a id="MMI-O9"

								name="MMI-O9">(MMI-O9)</a>:</strong> The multimodal specifications

								MUST support the specification of which output media should be

								processed and how it should be done. The specification MUST provide

								a mechanism that describe how this can be achieved or extended for

								different modalities (MUST specify).</p>


								<p>Examples of output processing may include: adaptation or styling

								of presentation for particular modalities, speech synthesis of text

								output into audio output, natural language generation, etc...</p>


								<h2 id="Architectureintegrationandsynchronizationpoints">4.

								Architecture, integration and synchronization points</h2>


								<h3 id="Reusestandardmarkuplanguages">4.1 Reuse standard markup

								languages</h3>


								<p class="requirement"><strong><a id="MMI-A1"

								name="MMI-A1">(MMI-A1)</a>:</strong> Where the functionality is

								appropriate, and clean integration is possible, the multimodal

								specifications must enable the use and integration of existing

								standard language specifications including visual, aural, voice and

								multimedia standards (MUST specify).</p>


								<p>In general, it is understood that in order to satisfy <a

								href="#MMI-G11">MMI-G11</a>, dependencies of the multimodal

								specifications on other specifications must be carefully evaluated

								if these are not yet W3C recommendations or not yet widely

								adopted.</p>


								<p>SMIL 2.0 provide multimedia synchronization mechanisms.

								Therefore, <a href="#MMI-A1">MMI-A1</a> implies:</p>


								<p class="requirement"><strong><a id="MMI-A1a"

								name="MMI-A1a">(MMI-A1a)</a>:</strong> The multimodal

								specifications MUST enable the synchronization of input and output

								media through SMIL2.0 as control mechanism (MUST specify).</p>


								<h3 id="XHTMLmodularization">4.2 XHTML Modularization</h3>


								<p>The following requirement results from <a

								href="#MMI-A1">MMI-A1</a>.</p>


								<p class="requirement"><strong><a id="MMI-A2"

								name="MMI-A2">(MMI-A2)</a>:</strong> The multimodal specifications

								MUST be expressible in terms of XHTML modularization (MUST

								specify).</p>


								<h3 id="CompatibilitywithXForms">4.3 Separation of data model,

								presentation layer and application logic</h3>


								<p class="requirement"><strong><a id="MMI-A3"

								name="MMI-A3">(MMI-A3)</a>:</strong> The multimodal specification

								MUST allow the separation of data model, presentation layer and

								application logic in the following ways:</p>


								<ul class="requirement">

								<li>Enable an explicit data model for the back end (i.e. the data)

								and its mapping to the front end.</li>


								<li>Enable the separation of the data model from the presentation.

								The presentation depends on the device modality.</li>


								<li>Application data must be modality independent</li>


								<li>Logic should be modality independent.</li>

								</ul>


								<p class="requirement">(MUST specify).</p>


								<p>This will enable the multimodal specifications to be compatible

								with XForms in environments which support XForms. This would comply

								with <a href="#MMI-A1">MMI-A1</a>.</p>


								<h3 id="Detectionofavailablemodalities">4.4 Detection of available

								modalities and changes</h3>


								<p>From an authoring point of view, it is important to have

								mechanisms (events, protocols, handlers) to detect or prescribe the

								modalities that are or should be available: i.e. to check the

								delivery context and to adapt to the delivery context. This is

								covered by <a href="#MMI-G14">MMI-G14</a> and <a

								href="#MMI-G15">MMI-G15</a>.</p>


								<p class="requirement"><strong><a id="MMI-A4"

								name="MMI-A4">(MMI-A4)</a>:</strong> There MUST be events

								associated to changes of <a href="#deliverycontext">delivery

								context</a> and mechanisms to specify how to handle these events by

								adapting the multimodal application (MUST specify).</p>


								<p class="requirement"><strong><a id="MMI-A5"

								name="MMI-A5">(MMI-A5)</a>:</strong> There SHOULD be mechanisms

								available to&nbsp;define the <a href="#deliverycontext">delivery

								context</a> or behavior that is expected or recommended by the

								author (SHOULD specify).</p>


								<h3 id="Synchronizationgranularities">4.5 Synchronization

								granularities</h3>


								<p class="requirement"><strong><a id="MMI-A6"

								name="MMI-A6">(MMI-A6)</a>:</strong> The multimodal specifications

								MUST support the <a href="#synchronizationlevel">synchronization

								granularities</a> at the following levels of synchronization:</p>


								<ul class="requirement">

								<li>Form-level input synchronization: Inputs in one modality are

								reflected in the other only after reaching a particular point in

								presentation (e.g. completing a certain amount of fields in a

								form).</li>


								<li>Field-level input synchronization: Inputs in one modality are

								reflected in the other after the user finishes performing a

								particular interaction with a field. This can be detected because

								the user in general changes / sets a value in the data model. For

								example, this results from a changes focus (e.g. move from input

								field to input field) or completes the interaction (e.g. complete a

								select in a menu).</li>


								<li>Page-level: Inputs in one modality are reflected in the other

								only after submission of the page.</li>


								<li>Event-level synchronization: User inputs in one modality are

								captured at the level the individual DOM events and immediately

								reflected in the other modality; when it makes sense</li>


								<li>Event-level input synchronization with output media</li>


								<li>Media synchronization: Synchronization between output media as

								specified by SMIL</li>


								<li>Session level: <a href="#suspendresume">Suspend and resume</a>

								behavior; an application suspended in one modality can be resumed

								in the same or another modality.</li>

								</ul>


								<p><span class="requirement">(MUST specify).</span></p>


								<p>In addition,</p>


								<ul>

								<li class="requirement"><strong><a id="MMI-A6a"

								name="MMI-A6a">(MMI-A6a)</a>:</strong> It MUST be possible to

								author <a href="#sequentialmm">sequential multimodal</a>

								applications (MUST specify).</li>


								<li class="requirement"><strong><a id="MMI-A6b"

								name="MMI-A6b">(MMI-A6b)</a>:</strong> It MUST be possible to

								specify what modality or device to use for interaction in <a

								href="#sequentialmm">sequential multimodal</a> cases and hint or

								enforce <a href="#modalityswitch">modality switches</a>. This is an

								application developer's capability (MUST specify).</li>


								<li class="requirement"><strong><a id="MMI-A6c"

								name="MMI-A6c">(MMI-A6c)</a>:</strong> The specifications MUST

								enable writing multimodal applications where the user can select

								what modality or device to use at any time based on the user's <a

								href="#situation">situation</a> and the nature of the input and

								output interactions. More concretely, the specifications must

								support writing multimodal applications that can be accessed

								through each modality alone, and that support <a

								href="#modalityswitch">modality switches</a> whenever desired by

								the user (MUST specify).</li>

								</ul>


								<p>The following requirement results from <a

								href="#MMI-A1">MMI-A1</a>.</p>


								<p class="requirement"><strong><a id="MMI-A7a"

								name="MMI-A7a">(MMI-A7a)</a>:</strong> Event-level synchronization

								MUST follow the <a href="#DOM">DOM</a> event model (MUST

								specify).</p>


								<p class="requirement"><strong><a id="MMI-A7b"

								name="MMI-A7b">(MMI-A7b)</a>:</strong> Event-level synchronization

								SHOULD follow <a href="#XMLEvent">XML events</a> (SHOULD

								specify).</p>


								<p>Such events are not limited to events generated by user

								interactions as discussed in <a href="#MMI-A16">MMI-A16</a>.</p>


								<p>It is important that the application developer be able to fully

								define the synchronization granularity.</p>


								<p class="requirement"><strong><a id="MMI-A8"

								name="MMI-A8">(MMI-A8)</a>:</strong> The multimodal specifications

								MUST enable the author to specify the <a

								href="#synchronizationlevel">granularity of synchronization</a>

								(MUST specify).</p>


								<p>However:</p>


								<p class="requirement"><span><strong><a id="MMI-A9"

								name="MMI-A9">(MMI-A9)</a>:</strong> It MUST be possible that the

								granularity of synchronization be decided by the user runtime or

								network (through the <a href="#deliverycontext">delivery

								context</a>) (MUST specify).</span></p>


								<p class="requirement"><strong><a id="MMI-A10"

								name="MMI-A10">(MMI-A10)</a>:</strong> The multimodal

								specifications MUST enable the author to specify how the multimodal

								application degrade when the <a

								href="#synchronizationlevel">granularity of synchronization</a> is

								modified by external factors (MUST specify).</p>


								<p class="requirement"><strong><a id="MMI-A11"

								name="MMI-A11">(MMI-A11)</a>:</strong> The multimodal

								specifications should rely on an input and output <a

								href="#defaultsynchronization">default synchronization</a> behavior

								and it should provide "overwrite" mechanisms (SHOULD specify).</p>


								<h3 id="Independentinputandoutput">4.6 Independent input and output

								interfaces even in a same modality</h3>


								<p>Nothing imposes that input and output, even in a same modality,

								be provided in a same device or user agent. The input and output

								can be independent and the granularity of interfaces afforded by

								the specification should apply independently to the mechanisms of

								input and output within a given modality when necessary.</p>


								<p class="requirement"><strong><a id="MMI-A12"

								name="MMI-A12">(MMI-A12)</a>:</strong> The specification MUST

								support separate interfaces for input and output even within a same

								modality (MUST specify).</p>


								<h3 id="Distributedsynchronization">4.7 Distributed

								synchronization</h3>


								<p class="requirement"><strong><a id="MMI-A13"

								name="MMI-A13">(MMI-A13)</a>:</strong> The multimodal

								specifications MUST support <a

								href="#synchronizationbehavior">synchronization</a> of different

								modalities or devices <a

								href="#distributedcomponents">distributed</a> across the network,

								providing the user with the capability to interact through

								different devices (MUST specify).</p>


								<p>In particular, this includes multi-device applications where

								different devices or user agents are used to interact with a same

								applications; these may involve presentation in the same modality

								but on different devices.&nbsp;</p>


								<h3 id="Distributedprocessing">4.8 Distributed processing</h3>


								<p>Distribution of input and output processing refers to cases

								where the processing algorithms applied on input and output may be

								performed by distributed components.</p>


								<p class="requirement"><span class="requirement"><strong><a

								id="MMI-A14" name="MMI-A14">(MMI-A14)</a>:</strong> The multimodal

								specifications MUST support the distribution of input and <a

								href="#outputprocessing">output processing</a> (MUST

								specify).</span></p>


								<p class="requirement"><span class="requirement"><strong><a

								id="MMI-A15" name="MMI-A15">(MMI-A15)</a>:</strong> The multimodal

								specifications MUST support the expression of some level of control

								over the distributed processing of input and output processing

								(MUST specify).</span></p>


								<p>This requirement is related to <a href="#MMI-I1">MMI-I1</a> and

								<a href="#MMI-O9">MMI-O9</a>.</p>


								<h3 id="Externalinput">4.9 External input and output</h3>


								<p class="requirement"><span class="requirement"><strong><a

								id="MMI-A16" name="MMI-A16">(MMI-A16)</a>:</strong> The multimodal

								specifications MUST enable author to specify how multimodal

								applications handle external input events and generate external

								output events used by other processes (MUST specify).</span></p>


								<p>Examples of input events include camera, sensors or GPS events.

								Example of output event include any form of notification or trigger

								generated by the user interaction.</p>


								<p>This is expected to be automatically satisfied if events are

								treated as <a href="#XMLEvent">XML events</a>.&nbsp;</p>


								<h3 id="Temporalpositioningofinputandoutputevents">4.10 Temporal

								positioning of input and output events</h3>


								<p>Requirements <a href="#MMI-I8">MMI-I8</a> and <a

								href="#MMI-I9">MMI-I9</a> generalize as follows.</p>


								<p class="requirement"><strong><a id="MMI-A17"

								name="MMI-A17">(MMI-A17)</a>:</strong> The multimodal

								specifications MUST provide mechanisms to position the input and

								output events relatively to each other in time (MUST specify).</p>


								<p class="requirement"><span class="requirement"><strong><a

								id="MMI-A18" name="MMI-A18">(MMI-A18)</a>:</strong> The multimodal

								specifications SHOULD provide mechanisms to allow for temporal

								grouping of input and output events (SHOULD specify).</span></p>


								<p>These requirements may by satisfied by mechanisms to order of

								the events or, when needed, relative time stamping. For some

								configurations, this may involve clock synchronization.</p>


								<h2 id="Runtimesanddeployments">5. Runtimes and deployments</h2>


								<h3 id="Configurations">5.1 Configurations</h3>


								<p>It is expected that users will interact with multimodal

								applications through different deployment configurations (i.e.

								architectures): the different modules responsible for media

								rendering, input capture, processing, synchronization,

								interpretation etc, may be partitioned or combined on a single

								device or distributed across several devices or servers. As

								previously discussed, these configurations may dynamically

								change.</p>


								<p>The specifications of such configuration is beyond the scope of

								the W3C Multimodal Interaction Working Group. However:</p>


								<p class="requirement"><strong><a id="MMI-C1"

								name="MMI-C1">(MMI-C1)</a>:</strong> The multimodal specifications

								MUST support the deployment of multimodal applications authored

								according the W3C MMI specifications, with all the relevant

								deployment configurations where functions are partitioned or

								combined on a single engine or distributed across several devices

								or servers (MUST specify).</p>


								<p>The possibility to interact with multiple devices leads

								naturally to multi-user access to applications.</p>


								<p class="requirement"><strong><a id="MMI-C2"

								name="MMI-C2">(MMI-C2)</a>:</strong> The multimodal specifications

								SHOULD support multi-user deployments (NICE to specify).</p>


								<h3 id="Mobiledeployments">5.2 Mobile deployments</h3>


								<p>Multimodal interactions are especially important for mobile

								deployments. Therefore, the W3C multimodal working group will pay

								attention to the constraints associated to mobile deployments and

								especially cell phones.&nbsp;</p>


								<p class="requirement"><strong><a id="MMI-R1"

								name="MMI-R1">(MMI-R1)</a>:</strong> The multimodal specifications

								MUST be compatible with deployments based on user agents /

								renderers that run on mobile platforms (MUST specify).</p>


								<p>Mobile platforms, like smart phones, are typically constrained

								in terms of processing power and memory available. It is expected

								that the multimodal specifications will take such constraints into

								account and be designed so that multimodal deployments are possible

								on smart phones.</p>


								<p>In addition, it is important to pay attention to the challenges

								introduced by mobile networks like: limited bandwidth, delays

								etc...:</p>


								<p class="requirement"><strong><a id="MMI-R2"

								name="MMI-R2">(MMI-R2)</a>:</strong> The multimodal specifications

								MUST support deployments over mobile networks, considering the

								bandwidth limitations and delays that they may introduce (MUST

								specify).</p>


								<p>This may enable deployment techniques or specification from

								other standard activity to provision the necessary quality of

								service.</p>


								<h3 id="EMMA">5.3 EMMA</h3>


								<p>The following requirements apply to the objectives for the

								specification work on EMMA as defined in the <a

								href="#Appendixb">glossary</a>. EMMA is intended to support the

								necessary exchanges of information between the multimodal modules

								mentioned in <a href="#Configurations">section 5.1</a>.</p>


								<p class="requirement"><strong><a id="MMI-E1"

								name="MMI-E1">(MMI-E1)</a>:</strong> The multimodal specifications

								MUST support the generation, representation and exchange of input

								events and results of input or output processing (MUST specify)</p>


								<p class="requirement"><strong><a id="MMI-E2"

								name="MMI-E2">(MMI-E2)</a>:</strong> The multimodal specification

								MUST support the generation, representation and exchange of

								interpretation and combinations of input event and results of input

								or output processing (MUST specify).</p>


								<h3 id="Multimodalsynchronizationexchanges">5.4 Multimodal

								synchronization exchanges</h3>


								<p class="requirement"><strong><a id="MMI-S1"

								name="MMI-S1">(MMI-S1)</a>:</strong> The multimodal specifications

								MUST enable to author the generation of asynchronous events and

								their handler (MUST specify).</p>


								<p class="requirement"><span class="requirement"><strong><a

								id="MMI-S2" name="MMI-S2">(MMI-S2)</a>:</strong> The multimodal

								specifications MUST enable to author the generation of synchronous

								events and their handler (MUST specify).</span></p>


								<p class="requirement"><span class="requirement"><strong><a

								id="MMI-S3" name="MMI-S3">(MMI-S3)</a>:</strong> The multimodal

								specifications MUST support event handlers local to the event

								generator (MUST specify).</span></p>


								<p><span class="requirement"><strong><a id="MMI-S4"

								name="MMI-S4">(MMI-S4)</a>:</strong> The multimodal specifications

								MUST support event handlers remote to the event

								generator.</span></p>


								<p class="requirement"><span class="requirement"><strong><a

								id="MMI-S5" name="MMI-S5">(MMI-S5)</a>:</strong> The multimodal

								specifications MUST support the exchange of EMMA fragments as part

								of the synchronization events content (MUST specify).</span></p>


								<p class="requirement"><span class="requirement"><strong><a

								id="MMI-S6" name="MMI-S6">(MMI-S6)</a>:</strong> The multimodal

								specifications MUST support the specification of event handlers for

								externally generated events (MUST specify).</span></p>


								<p class="requirement"><span class="requirement"><strong><a

								id="MMI-S7" name="MMI-S7">(MMI-S7)</a>:</strong> The multimodal

								specifications MUST support the specification of event handlers for

								externally generated events that result from the interaction of the

								user (MUST specify).</span></p>


								<p class="requirement"><span class="requirement"><strong><a

								id="MMI-S8" name="MMI-S8">(MMI-S8)</a>:</strong> The multimodal

								specifications MUST support handlers that manipulate or update the

								presentation associated to a particular modality (MUST

								specify).</span></p>


								<p>In distributed configurations, it is important that

								synchronization exchanges take place with minimum delays. In

								practical deployments this implies that the highest available

								quality of services should be allocated to such exchanges.</p>


								<p class="requirement"><span class="requirement"><strong><a

								id="MMI-S9" name="MMI-S9">(MMI-S9)</a>:</strong> The multimodal

								specifications MUST enable the identification of multimodal

								synchronization exchanges. (MUST specify)</span></p>


								<p>This would enable the underlying network to allocate the highest

								quality of services associated to synchronization exchanges, if it

								is aware of such needs. This network behavior is beyond the scope

								of the multimodal specifications.</p>


								<p class="requirement"><span class="requirement"><strong><a

								id="MMI-S10" name="MMI-S10">(MMI-S10)</a>:</strong> The multimodal

								specifications MUST support confirmation of event handling (MUST

								specify).</span></p>


								<p class="requirement"><span class="requirement"><strong><a

								id="MMI-S11" name="MMI-S11">(MMI-S11)</a>:</strong> The multimodal

								specifications MUST support event generation or event handling

								pending confirmation of a particular event handling (MUST

								specify).</span></p>


								<p class="requirement"><span class="requirement"><strong><a

								id="MMI-S12" name="MMI-S12">(MMI-S12a)</a>:</strong> The multimodal

								specifications MUST be compatible with existing standards including

								<a href="#DOM">DOM</a> events and <a href="#DOM">DOM</a>

								specifications (MUST specify).</span></p>


								<p class="requirement"><span class="requirement"><strong><a

								id="MMI-S12b" name="MMI-S12b">(MMI-S12b)</a>:</strong> The

								multimodal specifications SHOULD be compatible with existing

								standards including <a href="#XMLEvent">XML events</a>

								specifications (SHOULD specify).</span></p>


								<p class="requirement"><span class="requirement"><strong><a

								id="MMI-S13" name="MMI-S13">(MMI-S13)</a>:</strong>The multimodal

								specification MUST allow lightweight multimodal synchronization

								exchanges compatible with wireless network and mobile terminals

								(MUST specify).</span></p>


								<p>This last requirement is derived from <a

								href="#MMI-R1">MMI-R1</a> and <a href="#MMI-R2">MMI-R2</a>.</p>


								<h2 id="References">6. References</h2>


								<p><a id="CCPP" name="CCPP"><strong>[CC/PP]:</strong></a> W3C CC/PP

								Working Group, URI: <a

								href="http://www.w3c.org/Mobile/CCPP/">http://www.w3c.org/Mobile/CCPP/</a>.</p>


								<p><a id="DIactivity" name="DIactivity"><strong>[DI

								activity]:</strong></a> W3C Device Independent Activity, URI: <a

								href="http://www.w3c.org/2001/di/">http://www.w3c.org/2001/di/</a>.</p>


								<p><a id="MMIcharter" name="MMIcharter"><strong>[MMI

								charter]</strong></a><strong>:</strong> W3C Multimodal Interaction

								Working group Charter, URI: <a

								href="http://www.w3c.org/2002/01/multimodal-charter.html">http://www.w3c.org/2002/01/multimodal-charter.html</a>.</p>


								<p><a id="MMIWG" name="MMIWG"><strong>[MMI

								WG]</strong></a><strong>:</strong> W3C Multimodal Interaction

								Working Group, URI: <a

								href="http://www.w3c.org/2002/mmi/">http://www.w3c.org/2002/mmi/.</a></p>


								<p><a id="MMReqVoice" name="MMReqVoice"><strong>[MM Req

								Voice]</strong></a><strong>:</strong> Multimodal Requirements for

								Voice Markup Languages, W3C Working Draft, URI: <a

								href="http://www.w3c.org/TR/multimodal-reqs">http://www.w3c.org/TR/multimodal-reqs.</a></p>


								<h2 id="Acknowledgments">7. Acknowledgements</h2>


								<p><span style="font-style: italic; font-family: times">This

								section is informative.</span></p>


								<p>This document was jointly prepared by the members of the W3C

								Multimodal Interaction Working Group.</p>


								<p>Special acknowledgments to Jim Larson (Intel) and Emily Candell

								(Comverse) for their significant editorial contributions.</p>


								<h2 id="Appendices">Appendices</h2>


								<h3 id="Appendixa">Appendix A: Use cases</h3>


								<h4 id="Overviewoftheusecases">A.1 Overview of the use cases</h4>


								<p>Analysis of use cases provides insight into the requirements for

								applications likely to require a multimodal infrastructure.</p>


								<p>The use cases described below were selected for analysis in

								order to highlight different requirements resulting from

								application variations in areas such as device requirements, event

								handling, network dependencies and methods of user interaction</p>


								<p><strong>Use Case Device Classification</strong></p>


								<h5 id="ThinClient">Thin Client</h5>


								<p>A device with little processing power and capabilities that can

								be used to capture user input (microphone, touch display, stylus,

								etc) as well as non-user input such as GPS. The device may have a

								very limited capability to interpret the input, for example a small

								vocabulary speech recognition, or a character recognizer. The bulk

								of the processing occurs on the server including natural language

								processing and interaction management.</p>


								<p>An example of such a device may be a mobile phone with DSR

								capabilities and a visual browser (there could actually be thinner

								clients than this).</p>


								<h5 id="ThickClient">Thick Client</h5>


								<p>A device with powerful processing capabilities, such that most

								of the processing can occur locally. Such a device is capable of

								input capture and interpretation. For example, the device can have

								a medium vocabulary speech recognizer, a handwriting recognizer,

								natural language processing and interaction management

								capabilities. The data itself may still be stored on the

								server.</p>


								<p>An example of such a device may be a recent production PDA or an

								in-car system.</p>


								<h5 id="MediumClient">Medium Client</h5>


								<p>A device capable of input capture and some degree of

								interpretation. The processing is distributed in a client/server or

								a multi-device architecture. For example, a medium client will have

								the voice recognition capabilities to handle small vocabulary

								command and control tasks but connects to a voice server for more

								advanced dialog tasks.</p>


								<p><strong>Use Case Summaries</strong></p>


								<p><strong>Form Filling for air travel reservation</strong></p>


								<table border="1" summary="4 column table">

								<tbody>

								<tr>

								<td><b>Description</b></td>

								<td><b>Device Classification</b></td>

								<td><b>Device Details</b></td>

								<td><b>Execution Model</b></td>

								</tr>


								<tr>

								<td>The means for a user to reserve a flight using a wireless

								personal mobile device and a combination of input and output

								modalities. The dialog between the user and the application is

								directed through the use of a form-filling paradigm.</td>

								<td>Thin and medium clients</td>

								<td>touch-enabled display (i.e., supports pen input), voice input,

								local ASR and Distributed Speech Recognition Framework, local

								handwriting recognition, voice output, TTS, GPS, wireless

								connectivity, roaming between various networks.</td>

								<td>Client Side Execution</td>

								</tr>

								</tbody>

								</table>


								<h5 id="ReservationScenario">Scenario Details</h5>


								<p>User wants to make a flight reservation with his mobile device

								while he is on the way to work. The user initiates the service via

								means of making a phone call to a multimodal service (telephone

								metaphor) or by selecting an application (portal environment

								metaphor). The details are not described here.</p>


								<p>As the user moves between networks with very different

								characteristics, the user is offered the flexibility to interact

								using the preferred and most appropriate modes for the situation.

								For example, while sitting in a train, the use of stylus and

								handwriting can achieve higher accuracy than speech (due to

								surrounding noise) and protect privacy. When the user is walking,

								the input and output modalities that more appropriate would be

								voice with some visual output. Finally, at the office the user can

								use pen and voice in a synergistic way.</p>


								<p>The dialog between the user and the application is driven by a

								form-filling paradigm where the user provides input to fields such

								as "Travel Origin:", "Travel Destination:", "Leaving on date",

								"Returning on date". As the user selects each field in the

								application to enter information, the corresponding input

								constraints are activated to drive the recognition and

								interpretation of the user input. The capability of providing

								composite multimodal input is also examined, where input from

								multiple modalities is combined for the interpretation of the

								user's intent.</p>


								<p><strong>Driving Directions</strong></p>


								<table border="1" summary="4 column table">

								<tbody>

								<tr>

								<td><b>Description</b></td>

								<td><b>Device Classification</b></td>

								<td><b>Device Details</b></td>

								<td><b>Execution Model</b></td>

								</tr>


								<tr>

								<td>This application provides a mechanism for a user to request and

								receive driving directions via speech and graphical input and

								output</td>

								<td>Medium Client</td>

								<td>on-board system (in a car) with a graphical display, map

								database, touch screen, voice and touch input, speech output, local

								ASR and TTS Processing and GPS.</td>

								<td>Client Side Execution</td>

								</tr>

								</tbody>

								</table>


								<h5 id="DrivingScenario">Scenario Details</h5>


								<p>User wants to go to a specific address from his current location

								and while driving wants to take a detour to a local restaurant (The

								user does not know the restaurant address nor the name). The user

								initiates service via a button on his steering wheel and interacts

								with the system via the touch screen and speech.</p>


								<p><strong>Name Dialing</strong></p>


								<table border="1" summary="4 column table">

								<tbody>

								<tr>

								<td><b>Description</b></td>

								<td><b>Device Classification</b></td>

								<td><b>Device Details</b></td>

								<td><b>Execution Model</b></td>

								</tr>


								<tr>

								<td>The means for users to call someone by saying their name.</td>

								<td>thin and fat devices</td>

								<td>Telephone</td>

								<td>The study covers several possibilities:

								<ul>

								<li>whether the application runs in the device or the server</li>


								<li>whether the device supports limited local speech

								recognition</li>

								</ul>


								<p>These choices determine the kinds of events that are needed to

								coordinate the device and network based services.</p>

								</td>

								</tr>

								</tbody>

								</table>


								<h5 id="DialingScenario">Scenario Details</h5>


								<p>Janet presses a button on her multimodal phone and says one of

								the following commands:</p>


								<ul>

								<li>Call Wendy</li>


								<li>Call Wendy on her cell phone</li>


								<li>Call Wendy at work</li>


								<li>Call Wendy Smith at Acme Research</li>

								</ul>


								<p>The application initially looks for a match in Janet's personal

								contact list and if no match is found then proceeds to look in

								other directories. Directed dialog and tapered help are used to

								narrow down the search, using aural and visual prompts. Janet is

								able to respond by pressing buttons, or tapping with a stylus, or

								by using her voice.</p>


								<p>Once a selection has been made, rules defined by Wendy are used

								to determine how the call should be handled. Janet may see a

								picture of Wendy along with a personalized message (aural and

								visual) that Wendy has left for her. Call handling may depend on

								the time of day, the location and status of the both parties, and

								the relationship between them. An "ex" might be told to never call

								again, while Janet might be told that Wendy will be free in half an

								hour after Wendy's meeting has finished. The call may be

								automatically directed to Wendy's home, office or mobile phone, or

								Janet may be invited to leave a message.</p>


								<h4 id="appendixa2analysis">A.2 Event analysis</h4>


								<p>The use-case analysis exercise helped to identify the types of

								events a multimodal system would likely need to support.</p>


								<p>Based on the use case analysis, the following events

								classifications were defined:</p>


								<ul>

								<li>Asynchronous vs. Synchronous</li>


								<li>Local vs. remote generation</li>


								<li>Local vs. remote handling</li>


								<li>Input interpretation Events</li>


								<li>Externally generated events vs. Events generated as a result of

								user action</li>


								<li>Actions vs. Notifications</li>

								</ul>


								<p>The events from the use cases described above have been

								consolidated in the following table.</p>


								<p><strong>Event Table:</strong></p>


								<table border="1" summary="8 column table" cellspacing="0"

								class="smaller">

								<tbody>

								<tr>

								<td><br />

								</td>

								<td><b>Event Type</b></td>

								<td><b>Asynchronous vs. Synchronous</b></td>

								<td><b>Local vs. remote generation</b></td>

								<td><b>Local vs. remote handling</b></td>

								<td><b>Input inter- pretation</b></td>

								<td><b>External vs. User</b></td>

								<td><b>Notifications vs. actions</b></td>

								<td><b>Comments</b></td>

								</tr>


								<tr>

								<td>1.</td>

								<td>Data Reply Event</td>

								<td>Synchronous</td>

								<td>Remote</td>

								<td>Local</td>

								<td>No</td>

								<td>External</td>

								<td>Notification</td>

								<td>Event containing results from a previous data request</td>

								</tr>


								<tr>

								<td>2.</td>

								<td>HTTP Request</td>

								<td>Asynchronous</td>

								<td>Local</td>

								<td>Remote</td>

								<td>No</td>

								<td>External</td>

								<td>N/A</td>

								<td>A request sent via the HTTP Protocol</td>

								</tr>


								<tr>

								<td>3.</td>

								<td>GPS_DATA_in</td>

								<td>Synchronous</td>

								<td>Remote</td>

								<td>Local</td>

								<td>No</td>

								<td>External</td>

								<td>Notification</td>

								<td>Event containing GPS Location Data</td>

								</tr>


								<tr>

								<td>4.</td>

								<td>Touch Screen Event</td>

								<td>Asynchronous</td>

								<td>Local</td>

								<td>Local</td>

								<td>Yes</td>

								<td>User</td>

								<td>Action</td>

								<td>Event that contains coordinates corresponding to a location on

								a touch screen</td>

								</tr>


								<tr>

								<td>5.</td>

								<td>Start_Listening Event</td>

								<td>Asynchronous</td>

								<td>Local / Remote</td>

								<td>Local / Remote</td>

								<td>No</td>

								<td>User</td>

								<td>Action</td>

								<td>Event to invoke the speech recognizer</td>

								</tr>


								<tr>

								<td>6.</td>

								<td>Return Reco Results</td>

								<td>Synchronous</td>

								<td>Local / Remote</td>

								<td>Local</td>

								<td>Yes</td>

								<td>External</td>

								<td>Notification</td>

								<td>Event containing the results of a recognition</td>

								</tr>


								<tr>

								<td>7.</td>

								<td>Alert</td>

								<td>Asynchronous</td>

								<td>Remote</td>

								<td>Local</td>

								<td>No</td>

								<td>External</td>

								<td>Notification</td>

								<td>Event containing unsolicited data which may be of use to an

								application</td>

								</tr>


								<tr>

								<td>8.</td>

								<td>Register User Ack</td>

								<td>Synchronous</td>

								<td>Remote</td>

								<td>Local</td>

								<td>No</td>

								<td>External</td>

								<td>Notification</td>

								<td>Event acknowledging that user has registered with the

								service</td>

								</tr>


								<tr>

								<td>9.</td>

								<td>Call</td>

								<td>Asynchronous</td>

								<td>Local</td>

								<td>Remote</td>

								<td>No</td>

								<td>User</td>

								<td>Action</td>

								<td>Request to place an outgoing call</td>

								</tr>


								<tr>

								<td>10.</td>

								<td>Call Ack</td>

								<td>Synchronous</td>

								<td>Remote</td>

								<td>Local</td>

								<td>No</td>

								<td>External</td>

								<td>Notification</td>

								<td>Event acknowledging request to place an outgoing call</td>

								</tr>


								<tr>

								<td>11.</td>

								<td>Leave Message</td>

								<td>Asynchronous</td>

								<td>Local</td>

								<td>Remote</td>

								<td>No</td>

								<td>User</td>

								<td>Action</td>

								<td>Request to leave a message</td>

								</tr>


								<tr>

								<td>12.</td>

								<td>Message Ack</td>

								<td>Synchronous</td>

								<td>Remote</td>

								<td>Local</td>

								<td>No</td>

								<td>External</td>

								<td>Notification</td>

								<td>Event acknowledging request to leave a message</td>

								</tr>


								<tr>

								<td>13.</td>

								<td>Send Mail</td>

								<td>Asynchronous</td>

								<td>Local</td>

								<td>Remote</td>

								<td>No</td>

								<td>User</td>

								<td>Action</td>

								<td>Request to send a message</td>

								</tr>


								<tr>

								<td>14.</td>

								<td>Mail Ack</td>

								<td>Synchronous</td>

								<td>Remote</td>

								<td>Local</td>

								<td>No</td>

								<td>External</td>

								<td>Notification</td>

								<td>Event acknowledging request to send a message</td>

								</tr>


								<tr>

								<td>15.</td>

								<td>Register_Device_Profile (delivery_context)</td>

								<td>Synchronous</td>

								<td>Local<br />

								</td>

								<td>Remote</td>

								<td>No</td>

								<td>External</td>

								<td>Notification</td>

								<td>Occurs on connection</td>

								</tr>


								<tr>

								<td>16.</td>

								<td>Update_Device_Profile (delivery_context)</td>

								<td>Asynchronous/<br />

								 Synchronous</td>

								<td>Local</td>

								<td>Remote</td>

								<td>No</td>

								<td>External/<br />

								 User<br />

								</td>

								<td>Notifiication</td>

								<td>The user selects a new set of modalities by pressing a button

								or making menu selections (synchronous event). &nbsp;If the device

								can detect changes in the network or location via GPS or beacons,

								then the event is asynchronous.</td>

								</tr>


								<tr>

								<td>17.</td>

								<td>On_Focus (field_name)</td>

								<td>Synchronous</td>

								<td>Local</td>

								<td>Remote</td>

								<td>No</td>

								<td>User</td>

								<td>Action</td>

								<td>Event sends selected field to multimodal synchronization server

								for the purpose of loading the appropriate input constraints for

								the field.</td>

								</tr>


								<tr>

								<td>18.</td>

								<td>Handwriting_Reco ()</td>

								<td>Synchronous</td>

								<td>Local</td>

								<td>Local</td>

								<td>Yes</td>

								<td>User</td>

								<td>Action</td>

								<td>Event to invoke the handwriting recognizer (HWR) after pen

								input in a field. In the current scenario, we consider that HWR is

								handled locally, but this may be expanded later to include remote

								processing.</td>

								</tr>


								<tr>

								<td>19.</td>

								<td>Submit_Partial_Result ()</td>

								<td>Synchronous</td>

								<td>Local</td>

								<td>Remote<br />

								</td>

								<td>No</td>

								<td>External</td>

								<td>Notification<br />

								</td>

								<td>Result of recognition of field input is sent to the server</td>

								</tr>


								<tr>

								<td>20.</td>

								<td>Send_Ink (ink_data, time_stamp)</td>

								<td>Synchronous</td>

								<td>Local</td>

								<td>Remote</td>

								<td>Yes<br />

								</td>

								<td>User</td>

								<td>Action</td>

								<td>Ink collected for a pen gesture is sent to multimodal server

								for integration. As before, this event associates time stamp

								information with the ink data for synchronization.The result of the

								pen gesture can be transmitted as a sequence of (x,y) coordinates

								relative to the device display,</td>

								</tr>


								<tr>

								<td valign="middle">21<br />

								</td>

								<td valign="middle">Collect_Pen_Input ()<br />

								</td>

								<td valign="middle">Synchronous<br />

								</td>

								<td valign="middle">Local<br />

								</td>

								<td valign="middle">Local<br />

								</td>

								<td valign="middle">Yes<br />

								</td>

								<td valign="middle">User<br />

								</td>

								<td valign="middle">Action<br />

								</td>

								<td valign="middle">Ink collection could be interpreted first

								locally into basic shapes (i.e, circles, lines) and have those

								transmitted to the server.</td>

								</tr>


								<tr>

								<td valign="middle">22<br />

								</td>

								<td valign="middle">Send_Gesture (gesture_data, time_stamp)<br />

								</td>

								<td valign="middle">Synchronous<br />

								</td>

								<td valign="middle">Local<br />

								</td>

								<td valign="middle">Remote<br />

								</td>

								<td valign="middle">Yes<br />

								</td>

								<td valign="middle">User<br />

								</td>

								<td valign="middle">Action<br />

								</td>

								<td valign="middle">The server can provide a deeper semantic

								interpetation &nbsp;than the basic shapes that are recognized on

								the client<br />

								</td>

								</tr>

								</tbody>

								</table>


								<h3 id="Appendixb">Appendix B: Glossary</h3>


								<p><a id="audiovisualspeech"

								name="audiovisualspeech"><strong>audio-visual

								speech</strong></a></p>


								<p>Combination of video and audio to process input (joint

								face/lips/movement recognition and speech recognition) and generate

								output (audio-visual media)</p>


								<p class="Section1"><a id="complementarymm"

								name="complementarymm"><strong>complementary use of

								modalities</strong></a></p>


								<p>A use of modalities where the interactions available to the user

								differ per modality.</p>


								<p class="Section1"><a id="compositeinput"

								name="compositeinput"><strong>composite inputs</strong></a></p>


								<p class="Section1">Composite input is input received on multiple

								modalities at the same time and treated as a single, integrated

								compound input by downstream processes.</p>


								<p class="Section1"><a id="configuration"

								name="configuration"><strong>configuration</strong></a></p>


								<p class="Section1">See <a href="#executionmodel">execution

								model.</a></p>


								<p class="Section1"><a id="conflictinginput"

								name="conflictinginput"><strong>conflicting inputs</strong></a></p>


								<p class="Section1">Contradictory inputs provided by the user in

								different modalities or on different devices. For examples, they

								may indicate different exclusive selection.</p>


								<p class="Section1"><a id="sessioncontext"

								name="sessioncontext"><strong>context</strong></a></p>


								<p class="Section1">A session context consists of the history of

								the interaction between the user and the multimodal system,

								including the input received from the user, the output presented to

								the user, the current data model and the sequence of data model

								changes.</p>


								<p class="Section1"><a id="coordinationcapability"

								name="coordinationcapability"><strong>coordination

								capability</strong></a></p>


								<p class="Section1">Capability of a multimodal system to combine

								multimodal inputs into composite inputs based on an interpretation

								algorithm that decides what makes sense to combine based on the

								context</p>


								<p class="Section1"><strong>CC/PP [ Composite Capability/Preference

								Profiles],</strong></p>


								<p class="Section1">A W3C working group which is developing an

								RDF-based framework for the management of device profile

								information. For more details about the group activity please visit

								<a

								href="http://www.w3.org/Mobile/CCPP/">http://www.w3.org/Mobile/CCPP/</a></p>


								<p class="Section1"><strong>concatenation</strong></p>


								<p class="Section1">The text-to-speech engine concatenates short

								digital-audio segments and performs intersegment smoothing to

								produce a continuous sound.</p>


								<p><strong>CSS</strong></p>


								<p class="Section1">Cascading Stylesheets</p>


								<p class="Section1"><strong>data file</strong></p>


								<p>Argument files to input or output processing algorithms</p>


								<p class="Section1"><a id="defaultsynchronization"

								name="defaultsynchronization"><strong>default

								synchronization</strong></a></p>


								<p class="Section1">Synchronization behavior supported by default

								by a multimodal application.</p>


								<p class="Section1"><a id="deliverycontext"

								name="deliverycontext"><strong>delivery context</strong></a></p>


								<p class="Section1">A set of attributes that characterizes the

								capabilities of the access mechanism in terms of device profile,

								user profile (e.g. identify, preferences and usage patterns) and

								situation. Delivery context may have static and dynamic

								components.</p>


								<p class="Section1"><a id="device"

								name="device"><strong>device</strong></a></p>


								<p class="Section1">A piece of hardware used to access and interact

								with an application.</p>


								<p class="Section1"><a id="deviceprofile"

								name="deviceprofile"><strong>device profile</strong></a></p>


								<p class="Section1">A particular subset of the delivery context

								that describes the device characteristics including for example

								device form factor, available modalities, level of synchronization

								and coordination.</p>


								<p class="Section1"><strong>DI [Device Independence]</strong></p>


								<p class="Section1">The W3C Device Independence Activity is working

								to ensure seamless Web access with all kinds of devices, and

								worldwide standards for the benefit of Web users and content

								providers alike. For more details pleases refer to <a

								href="http://www.w3.org/2001/di/">http://www.w3.org/2001/di/</a></p>


								<p class="Section1"><a id="digitalink"

								name="digitalink"><strong>digital ink</strong></a></p>


								<p class="Section1">Stored or recognized handwriting input.</p>


								<p class="Section1"><a id="directeddialog"

								name="directeddialog"><strong>directed dialog</strong></a></p>


								<p>A dialog in which one party (the user or the computer) follows a

								pre-selected path, independent of the responses of the other. (cfr.

								<a href="#mixedinitiative">mixed initiative</a> dialog).</p>


								<p class="Section1"><a id="distributedcomponents"

								name="distributedcomponents"><strong>distributed

								components</strong></a></p>


								<p class="Section1">System components may live at various points of

								the network, including the local client.</p>


								<p class="Section1"><a id="DOM" name="DOM"><strong>DOM [Document

								Object Model]</strong></a></p>


								<p class="Section1">A standard interface to the contents of a web

								page. Please visit <a

								href="http://www.w3.org/DOM/">http://www.w3.org/DOM/</a> for more

								details.</p>


								<p class="Section1"><strong>EMMA</strong></p>


								<p class="Section1">Extensible MultiModal Annotation Markup

								Language. Formerly known as NLSML&Acirc;&mdash;Natural Language

								Semantics Markup Language. This markup language is intended for use

								by systems to represent semantic interpretations for a variety of

								inputs, including but not necessarily limited to, speech and

								natural language text input</p>


								<p class="Section1"><a id="event"

								name="event"><strong>event</strong></a></p>


								<p class="Section1">An event is a representation of some

								asynchronous occurrence of interest to the multimodal system.

								Examples include mouse clicks, hanging up the phone, speech

								recognition errors. Events may be associated with data e.g. the

								location the mouse was clicked.</p>


								<p class="Section1"><a id="eventhandler"

								name="eventhandler"><strong>event handler</strong></a></p>


								<p class="Section1">A software object intended to interpret and

								respond to a given class of events.</p>


								<p class="Section1"><a id="eventsource"

								name="eventsource"><strong>event source</strong></a></p>


								<p class="Section1">An agent (human or software) capable of

								generating events.</p>


								<p class="Section1"><a id="executionmodel"

								name="executionmodel"><strong>execution model</strong></a></p>


								<p class="Section1">Runtime configuration of the various system

								components in a particular manifestation of a multimodal

								system.</p>


								<p class="Section1"><a id="externalevent"

								name="externalevent"><strong>external event</strong></a></p>


								<p class="Section1">External input events are events that are not

								originating from direct user input. External output events are

								events that originate in the multimodal system and are handled by

								other processes.</p>


								<p class="Section1"><a id="GPS" name="GPS"><strong>GPS [Global

								Positioning System]</strong></a></p>


								<p class="Section1">A worldwide radio-navigation system formed from

								a constellation of 24 satellites and their ground stations. GPS

								uses these "man-made stars" as reference points to calculate

								positions accurate to a matter of meters.</p>


								<p class="Section1"><strong>grammar</strong></p>


								<p class="Section1">A computational mechanism that defines a finite

								or infinite set of legal strings, usually with some structure.</p>


								<p class="Section1"><a id="handwriting"

								name="handwriting"><strong>handwriting</strong></a></p>


								<p class="Section1">use of the pen for input which is converted

								into text or symbols. Involves handwriting recognition.</p>


								<p class="Section1"><a id="history"

								name="history"><strong>history</strong></a></p>


								<p class="Section1">Portions of profile and session context

								persisted for a same user across sessions.</p>


								<p class="Section1"><strong>HTML [HyperText Markup

								Language]</strong></p>


								<p class="Section1">A simple markup language used to create

								hypertext documents that are portable from one platform to another.

								To find more information about specification of HTML and the

								working group acitivity please visit <a

								href="http://www.w3c.org/MarkUp/">http://www.w3c.org/MarkUp/</a></p>


								<p class="Section1"><strong>HTTP [Hypertext Transfer

								Protocol]</strong></p>


								<p class="Section1">To get details about the HTTP working group and

								the HTTP specification please visit <a

								href="http://www.w3c.org/Protocols/">http://www.w3c.org/Protocols/</a>.</p>


								<p class="Section1"><strong><a id="humanlanguage"

								name="humanlanguage">human language</a></strong></p>


								<p class="Section1">Any spoken language (e.g. French, Japanese,

								English etc...).</p>


								<p class="Section1"><strong>ink</strong></p>


								<p class="Section1">See digital ink.</p>


								<p class="Section1"><a id="input"

								name="input"><strong>input</strong></a></p>


								<p class="Section1">Event, set of events or macro-event generated

								by a user interaction in a particular modality on a particular

								device.<!--StartFragment-->

								</p>


								<p><a id="inputconstraints" name="inputconstraints"><strong>input

								constraints</strong></a></p>


								<p>Specify how inputs are can be combined via rules or interaction

								management strategies. For example the markup language may

								coordinates grammars for modalities other than speech with speech

								grammars to avoid duplication of effort in authoring multimodal

								grammars.</p>


								<p class="Section1"><a id="inputprocessing"

								name="inputprocessing"><strong>input processing</strong></a></p>


								<p class="Section1">Algorithm to apply to a particular input in

								order to transform or extract information from it (e.g. filtering,

								speech recognition; spaker recognition, NL parsing,...). The

								algorithm may rely on data files as argument (e.g. grammar,

								acoustic model, NL models, ...)</p>


								<p class="Section1"><a id="dialogmanager"

								name="dialogmanager"><strong>interaction manager</strong></a></p>


								<p class="Section1">An interaction manager generates or updates the

								presentation by processing user inputs, session context and

								possibly other external knowledge sources to determine the intent

								of the user. An interaction manager relies on strategies to

								determine focus and intent as well as to disambiguate, correct and

								confirm sub-dialogs. We typically distinguish <a

								href="#directeddialog">directed dialogs</a> (e.g. user-driven or

								application-driven) and <a href="#mixedinitiative">mixed

								initiative</a> or free flow dialogs.</p>


								<p class="Section1"><a id="lipsynch"

								name="lipsynch"><strong>lipsynch</strong></a></p>


								<p class="Section1">Output media where at least a face has lip

								movements synchronized with an output audio speech</p>


								<p class="Section1"><strong>markup components</strong></p>


								<p class="Section1">XML vocabularies that provide markup-level

								access to various system components</p>


								<p class="Section1"><a id="mediasynch"

								name="mediasynch"><strong>media synchronization</strong></a></p>


								<p class="Section1">Synchronization between output media as

								specified by SMIL: <a

								href="http://www.w3.org/AudioVideo/">http://www.w3.org/AudioVideo/</a></p>


								<p class="Section1"><strong>medium</strong></p>


								<p class="Section1">It is a description that can be rendered into

								physical effects that can be perceived and interacted with by the

								user in one or multiple modalities and on one or multiple

								devices</p>


								<p class="Section1"><strong>MIDI</strong></p>


								<p class="Section1">Musical Instrument Digital Interface, an audio

								format.</p>


								<p class="Section1"><a id="mixedinitiative"

								name="mixedinitiative"><strong>mixed initiative

								dialog</strong></a></p>


								<p>A style of dialog where both parties (the computer and the user)

								can control what is talked about and when. A party may on its own

								change the course of the interaction (e.g., by asking questions,

								providing more or less information than what was requested or

								making digressions). Mixed initiative dialog is contrasted with

								directed dialog where only one party controls the conversation. (cf

								directed dialog)</p>


								<p class="Section1"><strong>MMI: [Multimodal

								Interaction]</strong></p>


								<p class="Section1">A W3C Working Group which is developing markup

								specifications that extends the Web user interface to allow

								multiple modes of interaction. For more details of MMI working

								group and MMI activity, please visit <a

								href="http://www.w3c.org/2002/mmi/">http://www.w3c.org/2002/mmi/</a></p>


								<p class="Section1"><a id="modality"

								name="modality"><strong>modality</strong></a></p>


								<p>The type of communication channel used for interaction. It also

								covers the way an idea is expressed or perceived, or the manner in

								which an action is performed.</p>


								<p class="Section1"><a id="modalityswitch"

								name="modalityswitch"><strong>modality switch</strong></a></p>


								<p class="Section1">Change of modality to perform a particular

								interaction. It can be decided by the user or imposed by the

								application or runtime (e.g. when a phone call drops).</p>


								<p class="Section1"><strong>MPEG</strong></p>


								<p class="Section1">Working group established under the joint

								direction of the International Standards Organization/International

								Electrotechnical Commission (ISO/IEC), that has for goal to create

								standards for the digital video and the audiophonic compression.

								More precisely, MPEG defines the syntax of audio and video format

								needing low data rates, as well as operations to be undertaken by

								decoders.</p>


								<p class="Section1"><strong>MP3 [MPEG Audio Layer-3]</strong></p>


								<p class="Section1">An Internet music format. For MP3 related

								technologies please refer to <a

								href="http://www.mp3-tech.org/">http://www.mp3-tech.org/</a></p>


								<p class="Section1"><a id="multimodalsystem"

								name="multimodalsystem"><strong>multimodal system</strong></a></p>


								<p class="Section1">A multimodal system supports communication with

								the user through different modalities such as voice, gesture, and

								typing. (cfr modality)</p>


								<p class="Section1"><strong>must specify</strong></p>


								<p class="Section1">A must specify requirement must be satisfied by

								the multimodal specification(s), starting from their very first

								version.</p>


								<p class="Section1"><strong>natural Language (NL)</strong></p>


								<p class="Section1">Term used for human language, as opposed to

								artificial languages (such as computer programming languages or

								those based on mathematical logic). A processor capable of handling

								NL must typically be able to deal with a flexible set of

								sentences.</p>


								<p class="Section1"><a id="NLG" name="NLG"><strong>natural language

								generation (NLG)</strong></a></p>


								<p class="Section1">A technique for generating natural language

								sentences based on some higher-level information. Generation by

								template is an example of simple language generation techniques.

								The flight from &lt;departure-city&gt; to &lt;arrival-city&gt;

								leaves at &lt;departure-time&gt; is an example of template where

								the slots indicated by &lt;&Acirc;&hellip;&gt; have to be filled

								with the appropriate information by a higher-level process.</p>


								<p class="Section1"><strong>natural language

								processing</strong></p>


								<p class="Section1">Natural language understanding, generation,

								translation and other transformations on human language.</p>


								<p class="Section1"><strong>natural language understanding

								(NLU)</strong></p>


								<p class="Section1">The process of interpreting natural language

								phrases to specify their meaning, typically as a formula in formal

								logic.</p>


								<p class="Section1"><strong>nice&nbsp;to specify</strong></p>


								<p class="Section1">A "nice to specify" requirement will be taken

								into account when designing the specification. If a technical

								solution is available, the specifications will try to satisfy the

								requirement or support the feature, provided that it does not

								excessively delay the work plan.</p>


								<p class="Section1"><a id="notify"

								name="notify"><strong>notify</strong></a></p>


								<p class="Section1">The act of communicating an event (see

								subscribe).</p>


								<p class="Section1"><strong>override mechanism for

								synchronization</strong></p>


								<p class="Section1">Information that specifies how the

								synchronization should behave when not following its default

								behavior. (cf. default synchronization)</p>


								<p class="Section1"><strong>output generation</strong></p>


								<p class="Section1">Expressing information to be conveyed in a

								user-friendly form, possibly using multiple output media

								streams.</p>


								<p class="Section1"><a id="outputprocessing"

								name="outputprocessing"><strong>output processing</strong></a></p>


								<p class="Section1">Algorithm to apply in order to transform or

								generate an output (e.g. TTS, NLG)</p>


								<p class="Section1"><strong>semantics</strong></p>


								<p class="Section1">The meaning or interpretation of a word,

								phrase, or sentence, as opposed to its syntactic form. In natural

								language and dialog technology the term semantics is typically used

								to indicate a representation of a phrase or a sentence whose

								elements can be related to entities of the application (e.g.

								departure airport and arrival time for a flight application), or

								dialog acts (e.g. request for help, repeat, etc.).</p>


								<p class="Section1"><a id="semanticinterp"

								name="semanticinterp"><strong>semantic

								interpretation</strong></a></p>


								<p class="Section1">The process of interpreting the semantic part

								of a grammar. The result of the interpretation is a semantic

								representation. This process is often referred as Semantic

								Tagging.</p>


								<p class="Section1"><a id="semanticrep"

								name="semanticrep"><strong>semantic representation</strong></a></p>


								<p class="Section1">The semantic result of parsing a written

								sentence, or a spoken utterance. The semantic interpretation can be

								expressed as attribute value pairs or more complex structures. W3C

								is working on the definition of Semantic Representation

								formalism</p>


								<p class="Section1"><a id="sequentialinput"

								name="sequentialinput"><strong>sequential inputs</strong></a></p>


								<p class="Section1">A sequential input is one received on a single

								modality. The modality may change over time.] (cf. <a

								href="#simultaneousinput">simultaneous</a> or <a

								href="#compositeinput">composite</a> input.</p>


								<p class="Section1"><a id="sequentialmm"

								name="sequentialmm"><strong>sequential

								multimodality</strong></a></p>


								<p class="Section1">A sequential multimodal application is one in

								which the user may interact with the application only one modality

								at a time, <a href="#modalityswitch">switching</a> between

								modalities as needed.]</p>


								<p class="Section1"><a id="session"

								name="session"><strong>session</strong></a></p>


								<p class="Section1">The time interval during which an application

								and its context context is associated to a user and persisted.

								Within a session, users may suspend and resume interaction with an

								application within a same modality or device or switch modality or

								device.</p>


								<p class="Section1"><strong>session level synchronization

								granularity</strong></p>


								<p class="Section1">Multimodal application that supports suspend

								and resume behavior across modalities</p>


								<p class="Section1"><!--StartFragment -->

								<strong>should&nbsp;specify</strong></p>


								<p class="Section1">The specifications (multimodal markup language

								and other) will aim at addressing and satisfying the requirement or

								supporting the features during the lifetime of the working group.

								Early specification will take this into account to allow easy and

								interoperable updates.</p>


								<p class="Section1"><a id="simultaneousinput"

								name="simultaneousinput"><strong>simultaneous

								inputs</strong></a></p>


								<p class="Section1">Simultaneous inputs denote inputs that can come

								from different modalities but are not combined into composite

								inputs. Simultaneous multimodal inputs, imply that the inputs from

								several modalities are interpreted one after the other in the order

								that they where received instead of being combined before

								interpretation.</p>


								<p class="Section1"><strong><a id="situation"

								name="situation">situation</a></strong></p>


								<p class="Section1">External information that can affect the usage

								or expected behavior of multimodal applications including for

								example on-going activities (e.g. walking versus driving),

								environment (e.g. noisy), privacy (e.g. alone versus in public),

								etc...</p>


								<p class="Section1"><a id="SMIL" name="SMIL"><strong>SMIL

								[Synchronized Multimedia Integration Language]</strong></a></p>


								<p class="Section1">A W3C Recommendation, SMIL 2.0 enables simple

								authoring of interactive audiovisual applications. See <a

								href="http://www.w3.org/TR/smil20/">http://www.w3.org/TR/smil20/</a>

								for details.</p>


								<p class="Section1"><strong>speech recognition</strong></p>


								<p class="Section1">The ability of a computer to understand the

								spoken word for the purpose of receiving command and data input

								from the speaker.</p>


								<p class="Section1"><a id="speechrecognitionengine"

								name="speechrecognitionengine"><strong>speech-recognition

								engine</strong></a></p>


								<p class="Section1">A software/hardware component that performs

								recognition from a digital-audio stream. speech recognition engines

								are supplied by vendors who specialize in the software.</p>


								<p class="Section1"><a id="subscribe"

								name="subscribe"><strong>subscribe</strong></a></p>


								<p class="Section1">The act of informing an event source that you

								want to be notified of some class of events.</p>


								<p><a id="supplementarymm"

								name="supplementarymm"><strong>supplementary use of

								modalities</strong></a></p>


								<p class="Section1">Describes multimodal applications in which

								every interaction (input or output) can be carried through in each

								modality as if it was the only available modality</p>


								<p class="Section1"><a id="suspendresume"

								name="suspendresume"><strong>suspend and resume</strong></a></p>


								<p class="Section1">Suspend and resume behavior; an application

								suspended in one modality can be resumed in the same or another

								modality</p>


								<p class="Section1"><a id="synchronizationbehavior"

								name="synchronizationbehavior"><strong>synchronization

								behavior</strong></a></p>


								<p class="Section1">Way that an input in one modality is reflected

								in the output in another modality/device as well as way that it may

								be combined across modalities (<a

								href="#coordinationcapability">coordination capability</a>)</p>


								<p class="Section1"><a id="synchronizationlevel"

								name="synchronizationlevel"><strong>synchronization granularity or

								level</strong></a></p>


								<ul>

								<li><strong>Event-level synchronization</strong>: Inputs in one

								modality are captured at the level the individual DOM events and

								immediately reflected in the other modality; when it makes

								sense</li>


								<li><strong>Field-level synchronization</strong>: Inputs in one

								modality are reflected in the other after the user changes focus

								(e.g. moves from input field to input field) or completes the

								interaction with a field (e.g. completes a select in a menu)</li>


								<li><strong>Form-level synchronization</strong>: Inputs in one

								modality are reflected in the other only after a particular point

								in the presentation is reached (e.g. after a certain number of

								fields have been completed in the form).</li>


								<li><strong>Session-level synchronization</strong>: Inputs in one

								modality are reflected in the other only after a switch from one

								modality to another.</li>

								</ul>


								<p class="Section1"><a id="synthesis"

								name="synthesis"><strong>synthesis</strong></a></p>


								<p class="Section1">The text-to-speech engine synthesizes the

								glottal pulse from human vocal cords and applies various filters to

								simulate throat length, mouth cavity, lip shape, and tongue

								position.</p>


								<p class="Section1"><a id="TTS"

								name="TTS"><strong>text-to-speech</strong></a></p>


								<p class="Section1">Technologies for converting textual (ASCII)

								information into synthetic speech output. Used in voice-processing

								applications requiring production of broad, unrelated, and

								unpredictable vocabularies, such as products in a catalog or names

								and addresses. This technology is appropriate when system design

								constraints prevent the more efficient use of speech concatenation

								alone.</p>


								<p class="Section1"><a id="timestamp" name="timestamp"><strong>time

								stamping</strong></a></p>


								<p class="Section1">Annotation of an event that characterize the

								relative (with respect to an agreed upon reference) or absolute

								time of occurrence of the event</p>


								<p class="Section1"><strong>TTS</strong></p>


								<p class="Section1">text-to-speech</p>


								<p class="Section1"><strong>turn</strong></p>


								<p class="Section1">Set of input collected from the user before

								updating the output</p>


								<p class="Section1"><strong>URI</strong></p>


								<p class="Section1">Uniform Resource Identifier - <a

								href="http://www.w3.org/Addressing/">http://www.w3.org/Addressing/</a></p>


								<p class="Section1"><a id="userprofile"

								name="userprofile"><strong>user profile</strong></a></p>


								<p class="Section1">A particular subset of the delivery context

								that describes the user including for example the identity,

								personal information, personal preferences and usage

								preferences.</p>


								<p class="Section1"><a id="XMLEvent" name="XMLEvent"><strong>XML

								Event</strong></a></p>


								<p class="Section1">An XML Events module that provides XML

								languages with the ability to uniformly integrate event listeners

								and associated event handlers with DOM Level 2 event interfaces.

								The result is to provide an interoperable way of associating

								behaviors with document-level markup. For XML Event specification

								please visit <a

								href="http://www.w3.org/TR/2001/WD-xml-events-20011026/Overview.html#s_intro">

								http://www.w3.org/TR/2001/WD-xml-events-20011026/Overview.html#s_intro</a></p>


								<p class="Section1"><strong>XSL</strong></p>


								<p class="Section1">Extensible Stylesheet Language</p>


								<p class="Section1"><strong>XSLT</strong></p>


								<p class="Section1">Extensible Stylesheet Language

								Transformations</p>

								</body>

								</html>