You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
3083 lines
115 KiB
3083 lines
115 KiB
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
|
|
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
|
|
<html xmlns="http://www.w3.org/1999/xhtml">
|
|
<head>
|
|
<meta name="generator"
|
|
content="HTML Tidy for Linux/x86 (vers 1st April 2002), see www.w3.org" />
|
|
<title>Multimodal Interaction Requirements</title>
|
|
|
|
<style type="text/css">
|
|
/*<![CDATA[*/
|
|
.ref { font-size: 80% }
|
|
.requirement { background-color: #C1DBFF }
|
|
.example { background-color: #f0ffff }
|
|
.change { background-color: #ffe0c0 }
|
|
.ednote { font-size: 80%; color: green }
|
|
.ednote :link { color: teal }
|
|
.ednote :visited { color: olive }
|
|
.c1 { display: none }
|
|
table.smaller { font-size: 80%; margin-left: 0% }
|
|
/*]]>*/
|
|
</style>
|
|
<link href="http://www.w3.org/StyleSheets/TR/W3C-NOTE"
|
|
type="text/css" rel="stylesheet" />
|
|
</head>
|
|
<body lang="en">
|
|
<div class="head">
|
|
<p><a href="http://www.w3.org/"><img height="48" alt="W3C"
|
|
src="http://www.w3.org/Icons/w3c_home" width="72" /></a></p>
|
|
|
|
<h1 id="name">Multimodal Interaction Requirements</h1>
|
|
|
|
<h2>W3C NOTE 8 January 2003</h2>
|
|
|
|
<dl>
|
|
<dt>This version:</dt>
|
|
|
|
<dd><a
|
|
href="http://www.w3.org/TR/2003/NOTE-mmi-reqs-20030108/">http://www.w3.org/TR/2003/NOTE-mmi-reqs-20030108/</a></dd>
|
|
|
|
<dt>Latest version:</dt>
|
|
|
|
<dd><a
|
|
href="http://www.w3.org/TR/mmi-reqs/">http://www.w3.org/TR/mmi-reqs/</a></dd>
|
|
|
|
<dt>Previous version:</dt>
|
|
|
|
<dd><i>this is the first publication</i></dd>
|
|
|
|
<dt>Editors:</dt>
|
|
|
|
<dd>Stéphane H. Maes, Oracle Corporation <a
|
|
href="mailto:stephane.maes@oracle.com"><stephane.maes@oracle.com></a></dd>
|
|
|
|
<dd>Vijay Saraswat, Penn State University <a
|
|
href="mailto:saraswat@cse.psu.edu"><saraswat@cse.psu.edu></a></dd>
|
|
|
|
<dt>Contributors:</dt>
|
|
|
|
<dd>See <a href="#Acknowledgments">Acknowledgements</a></dd>
|
|
</dl>
|
|
|
|
<p class="copyright"><a href="http://www.w3.org/Consortium/Legal/ipr-notice#Copyright"> Copyright</a> © 2003 <a href="http://www.w3.org/"><acronym title="World Wide Web Consortium">W3C</acronym></a><sup>®</sup> (<a href="http://www.lcs.mit.edu/"><acronym title="Massachusetts Institute of Technology">MIT</acronym></a>, <a href="http://www.ercim.org/"><acronym title="European Research Consortium for Informatics and Mathematics">ERCIM</acronym></a>, <a href="http://www.keio.ac.jp/">Keio</a>), All Rights Reserved. W3C <a href="http://www.w3.org/Consortium/Legal/ipr-notice#Legal_Disclaimer">liability</a>, <a href="http://www.w3.org/Consortium/Legal/ipr-notice#W3C_Trademarks">trademark</a>, <a href="http://www.w3.org/Consortium/Legal/copyright-documents">document use</a> and <a href="http://www.w3.org/Consortium/Legal/copyright-software">software licensing</a> rules apply.</p>
|
|
</div>
|
|
|
|
<hr title="Separator from Header" />
|
|
<h2 id="abstract">Abstract</h2>
|
|
|
|
<p>This document describes fundamental requirements for the
|
|
specifications under development in the W3C <a
|
|
href="http://www.w3.org/2002/mmi/">Multimodal Interaction
|
|
Activity</a>. These requirements were derived from use case studies
|
|
as discussed in <a href="#Appendixa">Appendix A</a>. They have been
|
|
developed for use by the <a
|
|
href="http://www.w3.org/2002/mmi/Group/">Multimodal Interaction
|
|
Working Group</a> (<a
|
|
href="http://cgi.w3.org/MemberAccess/AccessRequest">W3C Members
|
|
only</a>), but may also be relevant to other W3C working groups and
|
|
related external standard activities.</p>
|
|
|
|
<p>The requirements cover general issues, inputs, outputs,
|
|
architecture, integration, synchronization points, runtimes and
|
|
deployments, but this document does not address application or
|
|
deployment conformance rules.</p>
|
|
|
|
<h2 id="statusofthisdocument">Status of this Document</h2>
|
|
|
|
<p><em>This section describes the status of this document at the
|
|
time of its publication. Other documents may supersede this
|
|
document. The latest status of this document series is maintained
|
|
at the <abbr
|
|
title="the World Wide Web Consortium">W3C</abbr>.</em></p>
|
|
|
|
<p>W3C's <a href="http://www.w3.org/2002/mmi/">Multimodal
|
|
Interaction Activity</a> is developing specifications for extending
|
|
the Web to support multiple modes of interaction. This document
|
|
describes fundamental requirements for multimodal interaction.</p>
|
|
|
|
<p>This document has been produced as part of the <a
|
|
href="http://www.w3.org/2002/mmi/">W3C Multimodal Interaction
|
|
Activity</a>,<span class="c1"><a
|
|
href="http://www.w3.org/2002/mmi/Activity.html"></a></span>
|
|
following the procedures set out for the <a
|
|
href="http://www.w3.org/Consortium/Process/">W3C Process</a>. The
|
|
authors of this document are members of the <a
|
|
href="http://www.w3.org/2002/mmi/Group/">Multimodal Interaction
|
|
Working Group</a> (<a
|
|
href="http://cgi.w3.org/MemberAccess/AccessRequest">W3C Members
|
|
only</a>). This is a Royalty Free Working Group, as described in
|
|
W3C's <a href="/TR/2002/NOTE-patent-practice-20020124">Current
|
|
Patent Practice</a> NOTE. Working Group participants are required
|
|
to provide <a href="http://www.w3.org/2002/01/mmi-ipr.html">patent
|
|
disclosures</a>.</p>
|
|
|
|
<p>Please send comments about this document to the public mailing
|
|
list: <a
|
|
href="mailto:www-multimodal@w3.org">www-multimodal@w3.org</a> (<a
|
|
href="http://lists.w3.org/Archives/Public/www-multimodal/">public
|
|
archives</a>). To subscribe, send an email to <<a
|
|
href="mailto:www-multimodal-request@w3.org">www-multimodal-request@w3.org</a>>
|
|
with the word <em>subscribe</em> in the subject line (include the
|
|
word <em>unsubscribe</em> if you want to unsubscribe).</p>
|
|
|
|
<p>A list of current W3C Recommendations and other technical
|
|
documents including Working Drafts and Notes can be found at <a
|
|
href="http://www.w3.org/TR/">http://www.w3.org/TR/</a>.</p>
|
|
|
|
<h2 class="notoc" id="tableofcontent">Table of contents</h2>
|
|
|
|
<ul class="toc">
|
|
<li><a href="#abstract">Abstract</a></li>
|
|
|
|
<li><a href="#statusofthisdocument">Status of this
|
|
document</a></li>
|
|
|
|
<li><a href="#tableofcontent">Table of contents</a></li>
|
|
|
|
<li><a href="#introduction">Introduction</a></li>
|
|
|
|
<li><a href="#MMIframework">Multimodal interaction</a></li>
|
|
|
|
<li><a href="#generalrequirements">1. General Requirements</a>
|
|
<ul class="toc">
|
|
<li><a href="#Scalabilityacrosswiderangeofdevicescapabilities">1.1
|
|
Scalability across wide range of device capabilities</a></li>
|
|
|
|
<li><a
|
|
href="#Supplementaryandcomplementaryuseofdifferentmodalities">1.2
|
|
Supplementary and complementary use of different
|
|
modalities</a></li>
|
|
|
|
<li><a href="#Seamlesssynchronizationofmodalities">1.3 Seamless
|
|
synchronization of modalities</a></li>
|
|
|
|
<li><a href="#Multilingualsupport">1.4 Multilingual
|
|
support</a></li>
|
|
|
|
<li><a href="#Easytoimplement">1.5 Easy to implement</a></li>
|
|
|
|
<li><a href="#accessibility">1.6 Accessibility</a></li>
|
|
|
|
<li><a href="#Securityandprivacy">1.7 Security and privacy</a></li>
|
|
|
|
<li><a href="#Deliveryandcontext">1.8 Delivery and context</a></li>
|
|
|
|
<li><a href="#Navigationspecification">1.9 Navigation
|
|
specification</a></li>
|
|
</ul>
|
|
</li>
|
|
|
|
<li><a href="#Inputmodalityrequirements">2. Input Modality
|
|
Requirements</a>
|
|
<ul class="toc">
|
|
<li><a href="#Inputprocessing">2.1 Input processing</a></li>
|
|
|
|
<li><a href="#Sequentialmultimodalinput">2.2 Sequential multimodal
|
|
input</a></li>
|
|
|
|
<li><a href="#Simultaneousmultimodalinput">2.3 Simultaneous
|
|
multimodal input</a></li>
|
|
|
|
<li><a href="#Compositemultimodalinput">2.4 Composite multimodal
|
|
input</a></li>
|
|
|
|
<li><a href="#Inputmodessupported">2.5 Input modes supported</a>
|
|
<ul class="toc">
|
|
<li><a href="#InputMUSTspecify">2.5.1 MUST specify</a></li>
|
|
|
|
<li><a href="#InputNICEtospecify">2.5.2 NICE to specify</a></li>
|
|
|
|
<li><a href="#InputExtensibility">2.5.3 Extensibility</a></li>
|
|
</ul>
|
|
</li>
|
|
|
|
<li><a href="#SemanticsofinputgeneratedbyUIcomponents">2.6
|
|
Semantics of input generated by UI components</a></li>
|
|
|
|
<li><a href="#Coordinatedconstraintsandinterpretations">2.7
|
|
Coordinated constraints</a></li>
|
|
|
|
<li><a
|
|
href="#Supportforconflictinginputfromdifferentmodalities">2.8
|
|
Support for conflicting input from different modalities</a></li>
|
|
|
|
<li><a href="#Temporalpositioningofevents">2.9 Temporal positioning
|
|
of input events</a></li>
|
|
</ul>
|
|
</li>
|
|
|
|
<li><a href="#Outputmediarequirements">3. Output Media
|
|
Requirements</a>
|
|
<ul class="toc">
|
|
<li><a href="#Sequentialmediaoutput">3.1 Sequential media
|
|
output</a></li>
|
|
|
|
<li><a href="#Simultaneousmediaoutput">3.2. Simultaneous media
|
|
output</a></li>
|
|
|
|
<li><a href="#Supportedoutputmedias">3.3 Supported output
|
|
medias</a>
|
|
<ul class="toc">
|
|
<li><a href="#outputMUSTspecify">3.3.1 MUST specify</a></li>
|
|
|
|
<li><a href="#outputNicetospecify">3.3.2. Nice to specify</a></li>
|
|
|
|
<li><a href="#outputExtensibility">3.3.3. Extensibility</a></li>
|
|
</ul>
|
|
</li>
|
|
|
|
<li><a href="#Outputprocessing">3.4 Output processing</a></li>
|
|
</ul>
|
|
</li>
|
|
|
|
<li><a
|
|
href="#Architectureintegrationandsynchronizationpoints">4.Architecture,
|
|
integration and synchronization points</a>
|
|
<ul class="toc">
|
|
<li><a href="#Reusestandardmarkuplanguages">4.1 Reuse standard
|
|
markup languages</a></li>
|
|
|
|
<li><a href="#XHTMLmodularization">4.2 XHTML
|
|
Modularization</a></li>
|
|
|
|
<li><a href="#CompatibilitywithXForms">4.3 Separation of data
|
|
model, presentation layer and application logic</a></li>
|
|
|
|
<li><a href="#Detectionofavailablemodalities">4.4 Detection of
|
|
available modalities and changes</a></li>
|
|
|
|
<li><a href="#Synchronizationgranularities">4.5 Synchronization
|
|
granularities</a></li>
|
|
|
|
<li><a href="#Independentinputandoutput">4.6 Independent input and
|
|
output interfaces even in a same modality</a></li>
|
|
|
|
<li><a href="#Distributedsynchronization">4.7 Distributed
|
|
synchronization</a></li>
|
|
|
|
<li><a href="#Distributedprocessing">4.8 Distributed
|
|
processing</a></li>
|
|
|
|
<li><a href="#Externalinput">4.9 External input and output</a></li>
|
|
|
|
<li><a href="#Temporalpositioningofinputandoutputevents">4.10
|
|
Temporal positioning of input and output events</a></li>
|
|
</ul>
|
|
</li>
|
|
|
|
<li><a href="#Runtimesanddeployments">5. Runtimes and
|
|
deployments</a>
|
|
<ul class="toc">
|
|
<li><a href="#Configurations">5.1 Configurations</a></li>
|
|
|
|
<li><a href="#Mobiledeployments">5.2 Mobile deployments</a></li>
|
|
|
|
<li><a href="#EMMA">5.3 EMMA</a></li>
|
|
|
|
<li><a href="#Multimodalsynchronizationexchanges">5.4 Multimodal
|
|
synchronization exchanges</a></li>
|
|
</ul>
|
|
</li>
|
|
|
|
<li><a href="#References">6. References</a></li>
|
|
|
|
<li><a href="#Acknowledgments">7. Acknowledgements</a></li>
|
|
|
|
<li><a href="#Appendices">Appendices</a>
|
|
<ul class="toc">
|
|
<li><a href="#Appendixa">Appendix A: Use cases</a>
|
|
<ul class="toc">
|
|
<li><a href="#Overviewoftheusecases">A.1 Overview of the use
|
|
cases</a></li>
|
|
|
|
<li><a href="#appendixa2analysis">A.2 Event analysis</a></li>
|
|
</ul>
|
|
</li>
|
|
|
|
<li><a href="#Appendixb">Appendix B: Glossary</a></li>
|
|
</ul>
|
|
</li>
|
|
</ul>
|
|
|
|
<h2 id="introduction">Introduction</h2>
|
|
|
|
<p>Multimodal interactions extend the Web user interface to allow
|
|
multiple modes of interaction, offering users the choice of using
|
|
their voice, or an input device such as a key pad, keyboard, mouse
|
|
or stylus. For output, users will be able to listen to spoken
|
|
prompts and audio, and to view information on graphical displays.
|
|
This capability for the user to specify the mode or device for a
|
|
particular interaction in a particular situation is expected to
|
|
significantly improve the user interface, its accessibility and
|
|
reliability, especially for mobile applications. The W3C Multimodal
|
|
Interaction Working Group (WG) is developing markup specifications
|
|
for authoring applications synchronized across multiple modalities
|
|
or devices with a wide range of capabilities.</p>
|
|
|
|
<p>This document is an internal working draft prepared as part of
|
|
the discussions on multimodal interaction requirements for
|
|
multimodal interaction specifications.</p>
|
|
|
|
<p>The work on the present requirement document started from the
|
|
<em>multimodal requirements for voice markup languages public
|
|
working draft (version 1.0)</em> published by the W3C Voice
|
|
activity <a href="#MMReqVoice">[MM Req Voice]</a>. The outline of
|
|
the document remains very similar.</p>
|
|
|
|
<p>The present requirements scope the nature of the work and
|
|
specifications that will be developed by the W3C Multimodal
|
|
Interaction Working Group (as specified by the charter <a
|
|
href="#MMIcharter">[MMI Charter]</a>). These intended works may be
|
|
referred to below as "specification(s)".</p>
|
|
|
|
<p>The requirements in this document do not express conformance
|
|
rules on application, platform runtime implementation or
|
|
deployment.</p>
|
|
|
|
<p>In this document, the following conventions have been followed
|
|
when phrasing the requirements:</p>
|
|
|
|
<ul>
|
|
<li>"MUST specify": The specifications will address and satisfy the
|
|
requirement or supporting the features, starting from the first
|
|
version.</li>
|
|
|
|
<li>"SHOULD specify": The specifications will aim at addressing and
|
|
satisfying the requirement or supporting the features during the
|
|
lifetime of the working group. Early specifications will take this
|
|
into account to allow easy and interoperable updates.</li>
|
|
|
|
<li>"NICE to specify": The specifications will be designed with the
|
|
requirement or feature taken into account. If a technical solution
|
|
is available, the specifications will try to satisfy the
|
|
requirement or support the feature, provided that it does not
|
|
excessively delay the work plan.</li>
|
|
</ul>
|
|
|
|
<p>It is not required that a particular specification produced by
|
|
the W3C MMI working group addresses <em>all</em> the requirements
|
|
in this document. It is possible that the requirements be addressed
|
|
by different specifications and that all the "MUST specify"
|
|
requirement are only satisfied by combining the different
|
|
specifications produced by the W3C Multimodal Interaction Working
|
|
Group. However, in such a case, it should be possible to clearly
|
|
indicate which specification will address what requirements.</p>
|
|
|
|
<h2 id="MMIframework">Multimodal interactions</h2>
|
|
|
|
<p>To lay the groundwork for the technical requirements, we first
|
|
discuss an intended frame of reference for a multimodal system,
|
|
introducing various concepts and terms that will be referred to in
|
|
the normative sections below. For the reader's convenience, we have
|
|
collected the concepts and terms introduced in this frame of
|
|
reference in the <a href="#Appendixb">glossary</a>.</p>
|
|
|
|
<p>We are interested in defining the requirements for the design of
|
|
<a href="#multimodalsystem">multimodal systems</a> -- systems that
|
|
support a user communicating with an application by using different
|
|
<a href="#modality">modalities</a> such as voice (in a <a
|
|
href="#humanlanguage">human language</a>), gesture, <a
|
|
href="#handwriting">handwriting</a>, typing, <a
|
|
href="#audiovisualspeech">audio-visual speech</a>, etc. The user
|
|
may be considered to be operating in a <a
|
|
href="#deliverycontext">delivery context</a>: a term used to
|
|
specify the set of attributes that characterizes the capabilities
|
|
of the access mechanism in terms of <a href="#deviceprofile">device
|
|
profile</a>, <a href="#userprofile">user profile</a> (e.g.
|
|
identify, preferences and usage patterns) and <a
|
|
href="#situation">situation</a>. The user interacts with the
|
|
application in the context of a <a href="#session">session</a>,
|
|
using one or more modalities (which may be realized through one or
|
|
more devices). Within a session, the user may <a
|
|
href="#suspendresume">suspend and resume</a> interaction with the
|
|
application within the same modality or <a
|
|
href="#modalityswitch">switch</a> modalities. A session is
|
|
associated with a <a href="#sessioncontext">context</a>, which
|
|
records the interactions with the user.</p>
|
|
|
|
<p>In multimodal systems, an <a href="#event">event</a> is a
|
|
representation of some asynchronous occurrence of interest to the
|
|
multimodal system. Examples include mouse clicks, hanging up the
|
|
phone, speech recognition results or errors. Events may be
|
|
associated with information about the user interaction e.g. the
|
|
location the mouse was clicked. A typical event source is a user,
|
|
such events are called <a href="#input">input events</a>. An <a
|
|
href="#externalevent">external input event</a> is one not generated
|
|
by a user, e.g. a <a href="#GPS">GPS</a> signal. The multimodal
|
|
system may also produce <a href="#externalevent">external output
|
|
events</a> for external systems (e.g. a logging system). In order
|
|
to preserve temporal ordering, events may be <a
|
|
href="#timestamp">time stamped</a>. Typically, events are
|
|
formalized as generated by <a href="#eventsource">event
|
|
sources</a>, and associated with <a href="#eventhandler">event
|
|
handlers</a>, which <a href="#subscribe">subscribe</a> to the
|
|
event, and are <a href="#notify">notified</a> of its occurrence.
|
|
This is exemplified by the <a href="#XMLEvent">XML Event</a>
|
|
model.</p>
|
|
|
|
<p>The user typically provides input in one or more modalities, and
|
|
receives output in one or more modalities. Input may be
|
|
classified as <a href="#sequentialinput">sequential</a>, <a
|
|
href="#simultaneousinput">simultaneous</a> or <a
|
|
href="#compositeinput">composite</a>. Sequential input is input
|
|
received on a single modality, though that modality can change over
|
|
time. Simultaneous input is input received on multiple modalities,
|
|
and treated separately by downstream processes (such as
|
|
interpretation). Composite input is input received on multiple
|
|
modalities at the same time and treated as a single, integrated
|
|
"composite" input by downstream processes. Inputs are combined
|
|
using the <a href="#coordinationcapability">coordination
|
|
capability</a> of the multimodal system, typically driven by <a
|
|
href="#inputconstraints">input constraints</a> or decided by the <a
|
|
href="#dialogmanager">interaction manager</a>.</p>
|
|
|
|
<p>Input is typically subject to <a href="#inputprocessing">input
|
|
processing</a>. For instance, speech input may be input to a <a
|
|
href="#speechrecognitionengine">speech recognition engine</a>
|
|
(including, for instance, <a href="#semanticinterp">semantic
|
|
interpretation</a> in order to extract meaningful information (e.g.
|
|
<a href="#semanticrep">semantic representation</a>) for downstream
|
|
processing. Note that simultaneous and composite input may be <a
|
|
href="#conflictinginput">conflicting</a>, in that the
|
|
interpretations of the input may not be consistent (e.g. the user
|
|
says "yes" but clicks on "no").</p>
|
|
|
|
<p><a>Two fundamentally different uses of multimodality may be
|
|
identified:</a> <a href="#supplementarymm">supplementary
|
|
multimodality</a>, and <a href="#complementarymm">complementary
|
|
multimodality</a>. An application makes supplementary use of
|
|
multimodality if it allows to carry every interaction (input
|
|
or output) through to completion in each modality as if it was the
|
|
only available modality. Such an application enables the user to
|
|
select at each time the modality that is best suited to the nature
|
|
of the interaction and the user's situation. Conversely, an
|
|
application makes complementary use of multimodality if
|
|
interactions in one modality are used to complement interactions in
|
|
another. (For instance, the application may visually display
|
|
several options in a form and aurally prompt the user "Choose the
|
|
city to fly to".) Complementary use may help a particular class of
|
|
users (e.g. those with dyslexia). Note that in an application
|
|
supporting complementary use of different modalities each
|
|
interaction may not be acessible separately in each modality.
|
|
Therefore it may not be possible for the user to determine which
|
|
modality to use. Instead, the document author may prescribe the
|
|
modality (or modalities) to be used in a particular
|
|
interaction.</p>
|
|
|
|
<p>The <a href="#synchronizationbehavior">synchronization
|
|
behavior</a> of an application describes the way in which any input
|
|
in one modality is reflected in the output in another modality, as
|
|
well as the way input is combined across modalities (<a
|
|
href="#coordinationcapability">coordination capability</a>). The <a
|
|
href="#synchronizationlevel">synchronization granularity</a>
|
|
specifies the level at which the application coordinates
|
|
interactions. The application is said to exhibit <em>event-level
|
|
synchronization</em> if user inputs in one modality are captured at
|
|
the level of the individual <a href="#DOM">DOM</a> events and
|
|
immediately reflected in the other modality. The application
|
|
exhibits <em>field-level synchronization</em> if inputs in one
|
|
modality are reflected in the other after the user changes focus
|
|
(e.g. moves from input field to input field) or completes the
|
|
interaction (e.g. completes a select in a menu). The application
|
|
exhibits <em>form-level synchronization</em> if inputs in one
|
|
modality are reflected in the other only after a particular point
|
|
in the presentation is reached (e.g. after a certain number of
|
|
fields have been completed in the form).</p>
|
|
|
|
<p>The output generatedstatus by a multimodal system can take
|
|
various forms, e.g. audio (including spoken prompts and playback,
|
|
e.g. using <a href="#NLG">natural language generation</a>, <a
|
|
href="#TTS">text-to-speech (TTS)</a> which <a
|
|
href="#synthesis">synthesizes</a> audio), visual (e.g. XHMTL or SVG
|
|
markup rendered on displays), <a
|
|
href="#lipsynch">lipsynch</a>(multimedia output in which there is a
|
|
visual rendition of a face whose lip movements are synchronized
|
|
with the audio), etc. Of relevance here is the W3C Recommendation
|
|
<a href="#SMIL">SMIL 2.0</a> which enables simple authoring of
|
|
interactive audiovisual applications and supports <a
|
|
href="#mediasynch">media synchronization</a>.</p>
|
|
|
|
<p>Interaction (input, output) between the user and the application
|
|
may often be conceptualized as a series of dialogs, manged by an <a
|
|
href="#dialogmanager">interaction manager</a>. A dialog is an
|
|
interaction between the user and the application which involves
|
|
<em>turn taking</em>. In each turn, the interaction manager manager
|
|
(working on behalf of the application) collects input from the
|
|
user, processes it (using the session context and possibly external
|
|
knowledge sources) to determine , computes a response and updates
|
|
the presentation for the user. An interaction manager generates or
|
|
updates the presentation by processing user inputs, session context
|
|
and possibly other external knowledge sources to determine the
|
|
intent of the user. An interaction manager relies on strategies to
|
|
determine focus and intent as well as to disambiguate, correct and
|
|
confirm sub-dialogs. We typically distinguish <a
|
|
href="#directeddialog">directed dialogs</a> (e.g. user-driven or
|
|
application-driven) and <a href="#mixedinitiative">mixed
|
|
initiative</a> or free flow dialogs.</p>
|
|
|
|
<p>The interaction manager may use (1) inputs from the user, (2)
|
|
the session context, (3) external knowledge sources, and (4)
|
|
disambiguation, correction, and configuration sub-dialogs to
|
|
determine the user's focus and intent. Based on the user's focus
|
|
and intent, the interaction manager also (1) maintains the context
|
|
and state of the application, (2) manages the composition of inputs
|
|
and synchronization across modalities, (3) interfaces with business
|
|
logic, and (4) produces output for presentation to the user. In
|
|
some architectures, the interaction manager may have <a
|
|
href="#distributedcomponents">distributed components</a>, utilizing
|
|
an event based mechanism for coordination.</p>
|
|
|
|
<p>Finally, in this document, we use the term <a
|
|
href="#executionmodel">configuration or execution model</a> to
|
|
refer to the runtime structure of the various system components and
|
|
their interconnection, in a particular manifestation of a
|
|
multimodal system.</p>
|
|
|
|
<h2 id="generalrequirements">1. General Requirements</h2>
|
|
|
|
<h3 id="Scalabilityacrosswiderangeofdevicescapabilities">1.1
|
|
Scalability across wide range of device capabilities</h3>
|
|
|
|
<p>It is the intent of the WG to define specifications that apply
|
|
to a variety of multimodal capabilities and deployment
|
|
conditions.</p>
|
|
|
|
<p class="requirement"><strong><a id="MMI-G1"
|
|
name="MMI-G1">(MMI-G1)</a>:</strong> The multimodal specifications
|
|
MUST support authoring multimodal applications for a wide range of
|
|
multimodal capabilities (MUST specify).</p>
|
|
|
|
<p>The specifications should support different combinations of
|
|
input and output modalities, <a
|
|
href="#synchronizationlevel">synchronization granularity</a>, <a
|
|
href="#configuration">configurations</a> and <a
|
|
href="#device">devices</a>. Some aspects of this requirement are
|
|
elaborated in detail below. For instance, the range of <a
|
|
href="#synchronizationlevel">synchronization granularity</a> is
|
|
addressed by requirement <a href="#MMI-A6">MMI-A6</a>.</p>
|
|
|
|
<p>It is advantageous that the specifications allow the application
|
|
developer to author a single version of the application, instead of
|
|
multiple versions targeted at combinations of multimodal
|
|
capabilities.</p>
|
|
|
|
<p class="requirement"><strong><a id="MMI-G2"
|
|
name="MMI-G2">(MMI-G2)</a>:</strong> The multimodal specifications
|
|
SHOULD support authoring multimodal applications once for
|
|
deployment on difference devices with different multimodal
|
|
capabilities (NICE to specify).</p>
|
|
|
|
<p>The multimodal capabilities may differ based on available
|
|
modalities, presentation and interaction capability for each
|
|
modality (modality-specific delivery context), synchronization
|
|
granularity, available devices and their configurations
|
|
etc... They are to be captured in the delivery context
|
|
associated to the multimodal system.</p>
|
|
|
|
<h3 id="Supplementaryandcomplementaryuseofdifferentmodalities">1.2
|
|
Supplementary and complementary use of different modalities</h3>
|
|
|
|
<p class="requirement"><strong><a id="MMI-G3"
|
|
name="MMI-G3">(MMI-G3)</a>:</strong> The multimodal specifications
|
|
MUST support <a href="#supplementarymm">supplementary</a> use of
|
|
modalities (MUST specify).</p>
|
|
|
|
<p>Supplementary use of modalities in multimodal applications
|
|
significantly improves accessibility of the applications. The user
|
|
may select the modality best used to the nature of the interaction
|
|
and the context of use.</p>
|
|
|
|
<p>When supported by the runtime or prescribed by the author, it
|
|
may be possible for the user to combine modalities as discussed for
|
|
example in requirement <a href="#MMI-I7">MMI-I7</a> about composite
|
|
input.</p>
|
|
|
|
<p class="requirement"><strong><a id="MMI-G4"
|
|
name="MMI-G4">(MMI-G4)</a>:</strong> The multimodal specifications
|
|
MUST support <a href="#complementarymm">complementary</a> use of
|
|
modalities (MUST specify).</p>
|
|
|
|
<p>Authors of multimodal applications that rely on complementary
|
|
multimodality should pay special attention to the accessibility of
|
|
the application, for example by ensuring accessibility in each
|
|
modality or by providing supplementary alternatives.</p>
|
|
|
|
<h3 id="Seamlesssynchronizationofmodalities">1.3 Seamless
|
|
synchronization of modalities</h3>
|
|
|
|
<p class="requirement"><span class="requirement"><strong><a
|
|
id="MMI-G5" name="MMI-G5">(MMI-G5)</a>:</strong> The multimodal
|
|
specifications will be designed such that an author can write
|
|
applications where the <a
|
|
href="#synchronizationbehavior">synchronization</a> of the various
|
|
modalities is seamless from the user's point of view (MUST
|
|
specify).</span></p>
|
|
|
|
<p>To elaborate, an interaction event or an external event in one
|
|
modality results in a change in another; based on the <a
|
|
href="#synchronizationlevel">synchronization granularity</a>
|
|
supported by the application. See <a
|
|
href="#Synchronizationgranularities">section 4.5</a> for a
|
|
discussion of synchronization granularities.</p>
|
|
|
|
<p style="margin-top: 0; margin-bottom: 0">Seamlessness can
|
|
encompass multiple aspects:</p>
|
|
|
|
<ul>
|
|
<li>
|
|
<p style="margin-top: 0; margin-bottom: 0">Limited latency in the
|
|
synchronization behavior with respect to what is expected by the
|
|
user for the particular application and multimodal
|
|
capabilities.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p style="margin-top: 0; margin-bottom: 0">Predictable,
|
|
non-confusing multimodal behavior</p>
|
|
</li>
|
|
</ul>
|
|
|
|
<p>Expanding on the considerations made in <a
|
|
href="#Scalabilityacrosswiderangeofdevicescapabilities">section
|
|
1.1</a>, it is important to support authoring for any granularity
|
|
of synchronization covered in <a href="#MMI-A6">(MMI-A6)</a>:</p>
|
|
|
|
<p class="requirement"><span class="requirement"><strong><a
|
|
id="MMI-G6" name="MMI-G6">(MMI-G6)</a>:</strong> The multimodal
|
|
specifications MUST support authoring seamless synchronization of
|
|
various modalities for any any</span> <a
|
|
href="#synchronizationlevel">synchronization granularity</a> <span
|
|
class="requirement">and <a
|
|
href="#coordinationcapability">coordination capabilities</a> (MUST
|
|
specify).</span></p>
|
|
|
|
<p>Coordination is defined as the capability to combine multimodal
|
|
inputs into composite inputs based on an interpretation algorithm
|
|
that decides what makes sense to combine based on the context.
|
|
Composite inputs are further discussed in <a
|
|
href="#Compositemultimodalinput">section 2.4</a>. It is a notion
|
|
different from synchronization granularity described in <a
|
|
href="#Synchronizationgranularities">section 4.5</a>.</p>
|
|
|
|
<p>The following requirement is proposed in order to address the
|
|
combinatorial explosion of synchronization granularities that the
|
|
application developer must author for.</p>
|
|
|
|
<p class="requirement"><span class="requirement"><strong><a
|
|
id="MMI-G7" name="MMI-G7">(MMI-G7)</a>:</strong> The multimodal
|
|
specifications SHOULD support authoring seamless synchronization of
|
|
various modalities once for deployment across with a whole range
|
|
of</span> <a href="#synchronizationlevel">synchronization
|
|
granularity</a> <span class="requirement">or <a
|
|
href="#coordinationcapability">coordination capabilities</a> (NICE
|
|
to specify).</span></p>
|
|
|
|
<p>This requirement addresses the capability for the application
|
|
developer to write the application once for a particular
|
|
synchronization granularity or coordination capability and to have
|
|
the application able to adapt its synchronization behavior when
|
|
other levels are available.</p>
|
|
|
|
<h3 id="Multilingualsupport">1.4 Multilingual support</h3>
|
|
|
|
<p>Multimodal applications are not different from any other web
|
|
applications. It is important that the specifications be not
|
|
limited to specific languages. </p>
|
|
|
|
<p class="requirement"><span class="requirement"><strong><a
|
|
id="MMI-G8" name="MMI-G8">(MMI-G8)</a>:</strong> The multimodal
|
|
specifications MUST support authoring multimodal applications in
|
|
any <a href="#humanlanguage">human language</a> (MUST
|
|
specify).</span></p>
|
|
|
|
<p>In particular, it must be possible to apply conventional methods
|
|
for localization and internationalization of applications.</p>
|
|
|
|
<p class="requirement"><strong><a id="MMI-G9"
|
|
name="MMI-G9">(MMI-G9)</a><a>:</a></strong> The multimodal
|
|
specification MUST not preclude the capability to move multimodal
|
|
application from one <span class="requirement"><a
|
|
href="#humanlanguage">human language</a></span> to another, without
|
|
having to rewrite the whole application (MUST specify).</p>
|
|
|
|
<p>For example, it should be possible to encapsulate
|
|
language-specific items, separately encapsulated from the
|
|
language-independent description.</p>
|
|
|
|
<h3 id="Easytoimplement">1.5 Easy to implement</h3>
|
|
|
|
<p>It is important that multimodal applications remain easy to
|
|
author and deploy in order to allow wide adoption by the web
|
|
community. </p>
|
|
|
|
<p class="requirement"><span class="requirement"><strong><a
|
|
id="MMI-G10" name="MMI-G10">(MMI-G10)</a>:</strong> The multimodal
|
|
specifications produced by the MMI working group MUST be easy to
|
|
implement and use (MUST specify).</span></p>
|
|
|
|
<p>This is a generic requirement that requires designers to
|
|
consider from the outset issues of: ease-of-authoring by
|
|
application developers; ease-of-implementation by platform
|
|
developers and ease-of-use by the user. Thus it affects authoring,
|
|
platform implementation and deployment.</p>
|
|
|
|
<p>The following requirement qualifies this further to guarantee
|
|
that the specifications will be widely deployable with existing
|
|
technologies (e.g. standards, network and client capabilities
|
|
etc...)</p>
|
|
|
|
<p class="requirement"><span class="requirement"><strong><a
|
|
id="MMI-G11" name="MMI-G11">(MMI-G11)</a>:</strong> The multimodal
|
|
specifications produced by the <a href="#MMIWG">MMI working
|
|
group</a> MUST depend only on technologies that are widely
|
|
available during the lifetime of the working group (MUST
|
|
specify).</span></p>
|
|
|
|
<p>For W3C specifications, wide availability is understood as
|
|
having reached at least the stage of candidate recommendation.</p>
|
|
|
|
<p>Related considerations are made in<a
|
|
href="#Reusestandardmarkuplanguages">section 4.1</a>.</p>
|
|
|
|
<h3 id="accessibility">1.6 Accessibility</h3>
|
|
|
|
<p>Multimodal applications will provide mechanisms to develop and
|
|
deploy accessible applications as discussed in <a
|
|
href="#Supplementaryandcomplementaryuseofdifferentmodalities">section
|
|
1.2</a>.</p>
|
|
|
|
<p>In addition, it is important that, as for all other web
|
|
applications; the following requirement be satisfied:</p>
|
|
|
|
<p class="requirement"><strong><a id="MMI-G12"
|
|
name="MMI-G12">(MMI-G12)</a>:</strong> The multimodal
|
|
specifications produced by the <span class="requirement"><a
|
|
href="#MMIWG">MMI working group</a></span> MUST not preclude
|
|
conforming to the W3C accessibility guidelines (MUST specify).</p>
|
|
|
|
<p>This is especially important for applications that make
|
|
complementary use of modalities.</p>
|
|
|
|
<h3 id="Securityandprivacy">1.7 Security and privacy</h3>
|
|
|
|
<p>Early deployments of multimodal applications show that security
|
|
and privacy issues can be very critical for multimodal deployments.
|
|
While addressing these issues is not directly within the scope of
|
|
the W3C Multimodal Interaction Working Group, it is important that
|
|
these issues be considered.</p>
|
|
|
|
<p class="requirement"><strong><a id="MMI-G13"
|
|
name="MMI-G13">(MMI-G13)</a>:</strong> The multimodal
|
|
specifications SHOULD be aligned with the W3C work and
|
|
specifications for security and privacy (SHOULD specify).</p>
|
|
|
|
<p>The follow<span style="color: #000000">ing sec</span><span
|
|
style="color: #000000">urity and privacy issues have been
|
|
identified for multimodal and multi-device interact</span><span
|
|
style="color: #000000">ions.</span></p>
|
|
|
|
<ul style="color: #000000">
|
|
<li style="color: #000000"><span
|
|
style="color: #000000">Security:</span>
|
|
<ul style="color: #000000">
|
|
<li><span style="color: #000000">In some distributed
|
|
configurations:</span>
|
|
<ul>
|
|
<li>the exchange of interaction events that can be intercepted by
|
|
unauthorized third parties. This would enable reconstruction of the
|
|
complete interaction with the application; especially in between
|
|
submits to the backend. Any note, temporary selections etc would be
|
|
accessible!</li>
|
|
|
|
<li>unauthorized third parties may be able to issue presentation
|
|
manipulations that would affect the user agent.</li>
|
|
</ul>
|
|
</li>
|
|
</ul>
|
|
</li>
|
|
|
|
<li>Privacy:
|
|
<ul>
|
|
<li>In some distributed configurations, the interaction events may
|
|
enable reconstruction of the complete interaction with the
|
|
application, including in between submits to the backend. This
|
|
information or aspect of it may be considered as private by the
|
|
user.</li>
|
|
|
|
<li>User profiles (preferences and usage habits), used to optimize
|
|
the user's interaction with multimodal applications, includes
|
|
information that users may consider as private.</li>
|
|
</ul>
|
|
</li>
|
|
</ul>
|
|
|
|
<p>Other considerations and issues may exist and should be
|
|
compiled.</p>
|
|
|
|
<h3 id="Deliveryandcontext">1.8 Delivery and context</h3>
|
|
|
|
<p>Notions of profile and <a href="#deliverycontext">delivery
|
|
context</a> have been widely introduced to characterize the the
|
|
capabilities of devices and preferences of users.</p>
|
|
|
|
<p>From a multimodal point of view, different types of profiles are
|
|
relevant:</p>
|
|
|
|
<ul>
|
|
<li><a href="#userprofile">User profile</a> that may include user
|
|
credentials and user preferences and usage patterns that captures
|
|
the information manually or automatically the way that a user
|
|
interacts or likes to interact with a multimodal application</li>
|
|
|
|
<li><a href="#deviceprofile">Device profiles</a> that captures the
|
|
characteristics the capability of a particular devices used to
|
|
access an application.</li>
|
|
</ul>
|
|
|
|
<p>These profiles are combined into the notion of <a
|
|
href="#deliverycontext">delivery context</a> introduced by the W3C
|
|
device independent activity <a href="#DIactivity">[DI Activity]</a>
|
|
. The delivery context captures the set of attributes that
|
|
characterize the capabilities of the access mechanism (device or
|
|
devices) (device profile), the dynamic preferences of the user (as
|
|
they relates to interaction through this device) and <a
|
|
href="#configuration">configurations</a>. Delivery context may
|
|
dynamically change as the application progresses, as the user
|
|
situation changes (situationalization) or as the number and
|
|
configurations of the devices change.</p>
|
|
|
|
<p>CC/PP is an example of formalism to describe and exchange the
|
|
delivery context <a href="#CCPP">[CC/PP]</a>.</p>
|
|
|
|
<p>Users of multimodal interactions will expect to be able to rely
|
|
on these profiles to optimize the way that multimodal applications
|
|
are presented to them.</p>
|
|
|
|
<p class="requirement"><span class="requirement"><strong><a
|
|
id="MMI-G14" name="MMI-G14">(MMI-G14)</a>:</strong> The multimodal
|
|
specifications MUST enable optimization and adaptation of
|
|
multimodal applications based on <a
|
|
href="#deliverycontext">delivery context</a> or dynamic changes of
|
|
delivery context (MUST specify).</span></p>
|
|
|
|
<p>Dynamic changes of delivery context encompass situations where
|
|
available devices, modalities and configurations; or usage
|
|
preferences dynamically. These changes can be involuntary or
|
|
initiated by the user, the application developer or the service
|
|
providers.</p>
|
|
|
|
<p class="requirement"><span class="requirement"><strong><a
|
|
id="MMI-G15" name="MMI-G15">(MMI-G15)</a>:</strong> The multimodal
|
|
specifications MUST enable authors to specify how <a
|
|
href="#deliverycontext">delivery context</a> and changes of
|
|
delivery context affect the multimodal interface of a particular
|
|
application (MUST specify).</span></p>
|
|
|
|
<p>The description of such impacts on a multimodal application
|
|
could be specified by the author but modified by the user, platform
|
|
vendor or service provider. In particular, the author can describe
|
|
how the application can be be affected or adapted to the delivery
|
|
context but the user and service providers should be able to modify
|
|
the delivery context. <em>Other use cases should also be
|
|
considered.</em></p>
|
|
|
|
<h3 id="Navigationspecification">1.9 Navigation specification</h3>
|
|
|
|
<p>It is expected that the author of multimodal application should
|
|
always be able to specify the expected flow of navigation (i.e.
|
|
sequence of interaction) through the application or the algorithm
|
|
to determine such a flow (e.g. in mixed initiative cases). This
|
|
leads to the following requirement:</p>
|
|
|
|
<p class="requirement"><span class="requirement"><strong><a
|
|
id="MMI-G16" name="MMI-G16">(MMI-G16)</a>:</strong> The multimodal
|
|
specifications MUST enable the author of an application to describe
|
|
the navigation flow through the application or indicate the
|
|
algorithms to determine the navigation flow (MUST
|
|
specify).</span></p>
|
|
|
|
<h2 id="Inputmodalityrequirements">2. Input Modality
|
|
Requirements</h2>
|
|
|
|
<h3 id="Inputprocessing">2.1 Input processing</h3>
|
|
|
|
<p>Numerous modalities or input types require some form of
|
|
processing before the nature of the input is identified. For
|
|
instance, speech input requires speech detection and speech
|
|
recognition which requires specific data files (e.g. grammars,
|
|
language models etc). Similarly handwritten input requires
|
|
recognition.</p>
|
|
|
|
<p class="requirement"><strong><a id="MMI-I1"
|
|
name="MMI-I1">(MMI-I1)</a>:</strong> The multimodal specifications
|
|
MUST provide a mechanism to specify and attach modality related
|
|
information when authoring a multimodal application. (MUST
|
|
specify).</p>
|
|
|
|
<p>This implies that authors should be able to include
|
|
modality-related information, such as the media types, processing
|
|
requirements or fallback mechanisms that a user agent will need for
|
|
the particular modality. Mechanisms should be available to make
|
|
this available to the user agent.</p>
|
|
|
|
<p>For example, audio input may be recognized (speech recognizer),
|
|
recorded or processed by speaker recognizers, natural language
|
|
processing, using specific data files (e.g. grammar, language
|
|
model), etc. The author must be able to completely define such
|
|
processing steps.</p>
|
|
|
|
<h3 id="Sequentialmultimodalinput">2.2 Sequential multimodal
|
|
input</h3>
|
|
|
|
<p class="requirement"><span class="requirement"><strong><a
|
|
id="MMI-I2" name="MMI-I2">(MMI-I2)</a>:</strong> The multimodal
|
|
specifications developed by the MMI working group MUST support <a
|
|
href="#sequentialinput">sequential multimodal input</a> (MUST
|
|
specify).</span></p>
|
|
|
|
<p>It im<span style="color: #000000">plies that</span></p>
|
|
|
|
<ul style="color: #000000">
|
|
<li class="requirement"><span class="requirement"><strong><a
|
|
id="MMI-I2a" name="MMI-I2a">(MMI-I2a)</a>:</strong> It MUST be
|
|
possible to author <a href="#sequentialmm">sequential
|
|
multimodal</a> applications, where inputs across modality are
|
|
provided sequentially (MUST specify).</span></li>
|
|
|
|
<li class="requirement"><strong><a id="MMI-I2b"
|
|
name="MMI-I2b">(MMI-I2b)</a>:</strong> It MUST be possible to
|
|
specify what modality or device to use for input in <a
|
|
href="#sequentialmm">sequential multimodality</a> and hint or
|
|
enforce <a href="#modalityswitch">modality switches</a>. This is an
|
|
application developer's capability (MUST specify).</li>
|
|
|
|
<li class="requirement"><span class="requirement"><strong><a
|
|
id="MMI-I2c" name="MMI-I2c">(MMI-I2c)</a>:</strong> The
|
|
specifications MUST enable writing multimodal applications where
|
|
the user can select what modality or device to use at any time
|
|
based on the user's <a href="#situation">situation</a> and the
|
|
nature of the input interactions. More concretely, the
|
|
specifications must support writing multimodal applications that
|
|
can be accessed through each modality alone, and that support <a
|
|
href="#modalityswitch">modality switches</a> whenever desired by
|
|
the user (MUST specify).</span></li>
|
|
</ul>
|
|
|
|
<h3 id="Simultaneousmultimodalinput">2.3 Simultaneous multimodal
|
|
input</h3>
|
|
|
|
<p class="requirement"><span class="requirement"><strong><a
|
|
id="MMI-I3" name="MMI-I3">(MMI-I3)</a>:</strong> The multimodal
|
|
specifications developed by the MMI working group MUST support <a
|
|
href="#simultaneousinput">simultaneous multimodal input</a> (MUST
|
|
specify).</span></p>
|
|
|
|
<p class="requirement"><strong><a id="MMI-I4"
|
|
name="MMI-I4">(MMI-I4)</a>:</strong> The multimodal specifications
|
|
MUST enable the author to specify the <a
|
|
href="#synchronizationlevel">granularity of input
|
|
synchronization</a> (MUST specify).</p>
|
|
|
|
<p>It should be remarked, however, that the actual granularity of
|
|
input synchronization may be decided by the user, by the runtime or
|
|
by the network (delivery context) or some combination thereof.</p>
|
|
|
|
<p class="requirement"><span class="requirement"><strong><a
|
|
id="MMI-I5" name="MMI-I5">(MMI-I5)</a>:</strong> The multimodal
|
|
specifications MUST enable the author to specify how the multimodal
|
|
application evolves when the</span> <a
|
|
href="#synchronizationlevel">granularity of input
|
|
synchronization</a> <span class="requirement">is modified by
|
|
external factors (MUST specify).</span></p>
|
|
|
|
<p>This requirement enables the application developer to specify
|
|
how the performance of the application can degrade gracefully with
|
|
changes in the input mechanism. For instance, it should be possible
|
|
to access an application designed for event-level or field-level
|
|
synchronization between voice (on the server side) and GUI (on the
|
|
terminal) on a network that permits only session-level
|
|
synchronization (that is, permits only <a
|
|
href="#sequentialmm">sequential multimodality</a>).</p>
|
|
|
|
<p class="requirement"><span class="requirement"><strong><a
|
|
id="MMI-I6" name="MMI-I6">(MMI-I6)</a>:</strong> The multimodal
|
|
specifications SHOULD enable a default input synchronization
|
|
behavior and provide "overwrite" mechanisms (SHOULD
|
|
specify).</span></p>
|
|
|
|
<p>Therefore, it should be possible to author multimodal
|
|
applications while assuming a default synchronization behavior. For
|
|
example, <a href="#supplementarymm">supplementary</a> event-level
|
|
multimodal <a href="#synchronizationlevel">synchronization
|
|
granularity</a>.</p>
|
|
|
|
<h3 id="Compositemultimodalinput">2.4 Composite multimodal
|
|
input</h3>
|
|
|
|
<p class="requirement"><strong><a id="MMI-I7"
|
|
name="MMI-I7">(MMI-I7)</a>:</strong> The multimodal specifications
|
|
developed by the MMI working group MUST support <a
|
|
href="#compositeinput">composite multimodal input</a> (MUST
|
|
specify).</p>
|
|
|
|
<p class="requirement"><span class="requirement"><strong><a
|
|
id="MMI-I8" name="MMI-I8">(MMI-I8)</a>:</strong> The multimodal
|
|
specifications SHOULD allow the author to specify how input
|
|
combination is achieved, possibly taking into account the <a
|
|
href="#coordinationcapability">coordination capabilities</a>
|
|
available in the given <a href="#deliverycontext">delivery
|
|
context</a> (NICE to specify).</span></p>
|
|
|
|
<p>This can be achieved with explicit scripts that describe the
|
|
interpretation and composition algorithms. On the other hand, it
|
|
may also be left to the <a href="#dialogmanager">interaction
|
|
manager</a> to apply an interpretation strategy that includes
|
|
composition, for example by determining the most sensible
|
|
interpretation given the <a href="#sessioncontext">session
|
|
context</a> and therefore determining what input combination (if
|
|
any) to select. This is addressed by the following requirement.</p>
|
|
|
|
<p class="requirement"><span class="requirement"><strong><a
|
|
id="MMI-I9" name="MMI-I9">(MMI-I9)</a>:</strong> The multimodal
|
|
specifications SHOULD enable the author to specify the mechanism
|
|
used to decide when coordinated inputs are to be combined and how
|
|
they are combined (NICE to specify).</span></p>
|
|
|
|
<p>Possible ways to address this include:</p>
|
|
|
|
<ul>
|
|
<li>Time windowing</li>
|
|
|
|
<li>Interaction management strategy or algorithms based on ordering
|
|
of events and context</li>
|
|
</ul>
|
|
|
|
<h3 id="Inputmodessupported">2.5 Input modes supported</h3>
|
|
|
|
<h4 id="InputMUSTspecify">2.5.1 MUST specify</h4>
|
|
|
|
<p class="requirement"><span class="requirement"><strong><a
|
|
id="MMI-I10" name="MMI-I10">(MMI-I10)</a>:</strong> The multimodal
|
|
specifications must support the description of input to be obtained
|
|
from:</span></p>
|
|
|
|
<ul class="requirement">
|
|
<li>Keyboard / keypad (e.g. keyboard (i.e. Qwerty (i.e. US
|
|
keyboard), etc), Handset (e.g. DTMF) or customized keypad).</li>
|
|
|
|
<li>Pointing devices (e.g. mouse, stylus, touch screen)</li>
|
|
|
|
<li>Combined input interfaces like joystick and game
|
|
controllers</li>
|
|
|
|
<li>Audio input (e.g. speech input to be recognized or
|
|
recorded)</li>
|
|
|
|
<li>Video input</li>
|
|
|
|
<li>Sign languages</li>
|
|
|
|
<li>Pen / stylus handwriting and stroke input.
|
|
<ul>
|
|
<li>(hand-writing script and hand-writing gesture - e.g. to delete,
|
|
to insert)</li>
|
|
|
|
<li>This incorporates stroke input and recognized handwriting. This
|
|
is expected to be addressed by requirement <a
|
|
href="#MMI-I1">MMI-I1</a>.</li>
|
|
</ul>
|
|
</li>
|
|
</ul>
|
|
|
|
<p class="requirement"><span class="requirement">(MUST
|
|
specify).</span></p>
|
|
|
|
<h4 id="InputNICEtospecify">2.5.2 NICE to specify</h4>
|
|
|
|
<p class="requirement"><span class="requirement"><strong><a
|
|
id="MMI-I11" name="MMI-I11">(MMI-I11)</a>:</strong> The multimodal
|
|
specifications SHOULD support other input modes,
|
|
including:</span></p>
|
|
|
|
<ul class="requirement">
|
|
<li>gaze recognition (e.g. as a pointer).
|
|
<ul>
|
|
<li>This is also expected to be covered by the "pointing" aspect of
|
|
<a href="#MMI-I10">MMI-10</a> and requirement <a
|
|
href="#MMI-I1">MMI-I1</a>.</li>
|
|
</ul>
|
|
</li>
|
|
|
|
<li>Combined audio-visual speech recognition.</li>
|
|
|
|
<li>Haptic and tactile input</li>
|
|
|
|
<li>Non-spoken audio input (e.g. hummed tune, songs</li>
|
|
</ul>
|
|
|
|
<p class="requirement"><span class="requirement">(NICE to
|
|
specify).</span></p>
|
|
|
|
<h4 id="InputExtensibility">2.5.3 Extensibility</h4>
|
|
|
|
<p class="requirement"><strong><a id="MMI-I12"
|
|
name="MMI-I12">(MMI-I12)</a>:</strong> The multimodal
|
|
specifications MUST describe how extensibility is to be achieved
|
|
and how new devices or modalities can be added (MUST specify).</p>
|
|
|
|
<h3 id="SemanticsofinputgeneratedbyUIcomponents">2.6 Semantics of
|
|
input generated by UI components</h3>
|
|
|
|
<p class="requirement"><span class="requirement"><strong><a
|
|
id="MMI-I13" name="MMI-I13">(MMI-I13)</a>:</strong> The multimodal
|
|
specifications MUST support the representation of the meaning of a
|
|
user input (MUST specify).</span></p>
|
|
|
|
<ul>
|
|
<li class="requirement"><span><strong><a id="MMI-I14"
|
|
name="MMI-I14">(MMI-I14)</a>:</strong> The representation of the
|
|
meaning may be modality or device dependent. However, whenever
|
|
possible, the representation of the meaning SHOULD be independent
|
|
of the input modality (NICE to specify).</span></li>
|
|
|
|
<li class="requirement"><span class="requirement"><strong><a
|
|
id="MMI-I15" name="MMI-I15">(MMI-I15)</a>:</strong> The
|
|
representation of the input SHOULD indicate the modality(ies) where
|
|
the input(s) was (were) provided (SHOULD specify).</span></li>
|
|
</ul>
|
|
|
|
<h3 id="Coordinatedconstraintsandinterpretations">2.7 Coordinated
|
|
constraints</h3>
|
|
|
|
<p class="requirement"><span class="requirement"><strong><a
|
|
id="MMI-I16" name="MMI-I16">(MMI-I16)</a>:</strong> The multimodal
|
|
specifications MUST enable to coordinate the <a
|
|
href="#inputconstraints">input constraints</a> across modalities
|
|
(MUST specify).</span></p>
|
|
|
|
<p>Input constraints specify, for example through grammars, how
|
|
inputs are can be combined via rules or interaction management
|
|
strategies. For example the markup language may coordinates
|
|
grammars for modalities other than speech with speech grammars to
|
|
avoid duplication of effort in authoring multimodal grammars.</p>
|
|
|
|
<p>Possible ways to address this could include:</p>
|
|
|
|
<ul>
|
|
<li>Coordinated Grammars.</li>
|
|
|
|
<li>Constraints expressed in the data model (e.g. XForms, XML
|
|
Schema)</li>
|
|
|
|
<li>Interaction management algorithm or strategy.</li>
|
|
</ul>
|
|
|
|
<p>These methods will be considered during the specification
|
|
work.</p>
|
|
|
|
<h3 id="Supportforconflictinginputfromdifferentmodalities">2.8
|
|
Support for conflicting input from different modalities</h3>
|
|
|
|
<p>When using multiple modalities or user agents, a user may
|
|
introduce errors consciously or inadvertently. For example in a
|
|
voice and GUI multimodal application, the user may say "yes"
|
|
simultaneously click on "no" in the user interface. We require that
|
|
the specifications detect such conflict.</p>
|
|
|
|
<p class="requirement"><strong><a id="MMI-I17"
|
|
name="MMI-I17">(MMI-I17)</a>:</strong> The multimodal
|
|
specifications MUST support the detection of conflicting input from
|
|
several modalities (MUST specify).</p>
|
|
|
|
<p>It is naturally expected that the author will specify how to
|
|
handle the conflict through an explicit script or piece of code. It
|
|
is also possible that an interaction management strategy will be
|
|
able to detect the possible conflict and provide a strategy or
|
|
sub-dialog to resolve it.</p>
|
|
|
|
<h3 id="Temporalpositioningofevents">2.9 Temporal positioning of
|
|
input events</h3>
|
|
|
|
<p>The <a href="#dialogmanager">interaction manager</a> should be
|
|
able to place different input events on the timeline, in order to
|
|
determine the intent of the user.</p>
|
|
|
|
<p class="requirement"><strong><a id="MMI-I18"
|
|
name="MMI-I18">(MMI-I18)</a>:</strong> The multimodal
|
|
specifications MUST provide mechanisms to position the input events
|
|
relatively to each other in time (MUST specify).</p>
|
|
|
|
<p class="requirement"><span class="requirement"><strong><a
|
|
id="MMI-I19" name="MMI-I19">(MMI-I19)</a>:</strong> The multimodal
|
|
specifications SHOULD provide mechanisms to allow for temporal
|
|
grouping of input events (SHOULD specify).</span></p>
|
|
|
|
<p>These requirements may by satisfied by mechanisms to order of
|
|
the input events or, when needed, relative time stamping. For some
|
|
configurations, this may involve clock synchronization.</p>
|
|
|
|
<h2 id="Outputmediarequirements">3. Output Media Requirements</h2>
|
|
|
|
<h3 id="Sequentialmediaoutput">3.1 Sequential media output</h3>
|
|
|
|
<p class="requirement"><span class="requirement"><strong><a
|
|
id="MMI-O1" name="MMI-O1">(MMI-O1)</a>:</strong> The multimodal
|
|
specifications developed by the MMI working group MUST support
|
|
sequential media output (MUST specify).</span></p>
|
|
|
|
<p>As <a href="#SMIL">SMIL</a> supports the sequencing of medias,
|
|
the specification is expected to rely on similar mechanism. This is
|
|
addressed in more details in other requirements.</p>
|
|
|
|
<p>It im<span style="color: #000000">plies that</span></p>
|
|
|
|
<ul>
|
|
<li class="requirement"><strong><a id="MMI-O1a"
|
|
name="MMI-O1a">(MMI-O1a)</a>:</strong> It MUST be possible to
|
|
author <a href="#sequentialmm">sequential multimodal</a>
|
|
applications where output medias are sequentially presented to the
|
|
user (MUST specify).</li>
|
|
|
|
<li class="requirement"><strong><a id="MMI-O1b"
|
|
name="MMI-O1b">(MMI-O1b)</a>:</strong> It MUST be possible to
|
|
specify what modality or device to use for output in <a
|
|
href="#sequentialmm">sequential multimodal</a> and hint or enforce
|
|
<a href="#modalityswitch">modality switches</a>. This is an
|
|
application developer's capability (MUST specify).</li>
|
|
|
|
<li class="requirement"><strong><a id="MMI-O1c"
|
|
name="MMI-O1c">(MMI-O1c)</a>:</strong> The specifications MUST
|
|
enable writing multimodal applications where the user can select
|
|
what modality to use at any time based on the user's <a
|
|
href="#situation">situation</a> and the nature of the output
|
|
interactions. More concretely, the specifications must support
|
|
writing multimodal applications that can be accessed through each
|
|
modality alone, and that support <a href="#modalityswitch">modality
|
|
switches</a> whenever desired by the user (MUST specify).</li>
|
|
</ul>
|
|
|
|
<h3 id="Simultaneousmediaoutput">3.2. Simultaneous media
|
|
output</h3>
|
|
|
|
<p class="requirement"><strong><a id="MMI-O2"
|
|
name="MMI-O2">(MMI-O2)</a>:</strong> The multimodal specifications
|
|
MUST provide the ability to synchronize different output medias
|
|
with different granularities (MUST specify).</p>
|
|
|
|
<p>This covers simultaneous outputs. The granularity of output
|
|
synchronization as provided by SMIL may range from no
|
|
synchronization at all between the medias other than the play in
|
|
parallel to tightly synchronization mechanisms.</p>
|
|
|
|
<p class="requirement"><span class="requirement"><strong><a
|
|
id="MMI-O3" name="MMI-O3">(MMI-O3)</a>:</strong> The multimodal
|
|
specifications MUST enable the author to specify the granularity of
|
|
output synchronization (MUST specify).</span></p>
|
|
|
|
<p>However, it should be possible that the granularity of output
|
|
media synchronization be decided by the user (delivery context)
|
|
runtime or network.</p>
|
|
|
|
<p class="requirement"><strong><a id="MMI-O4"
|
|
name="MMI-O4">(MMI-O4)</a>:</strong> The multimodal markup MUST
|
|
enable the author to specify how the multimodal application degrade
|
|
when the granularity of output synchronization is modified by
|
|
external factors (MUST specify).</p>
|
|
|
|
<p class="requirement"><strong><a id="MMI-O5"
|
|
name="MMI-O5">(MMI-O5)</a>:</strong> The multimodal specifications
|
|
SHOULD rely on a default output synchronization behavior for a
|
|
particular <span class="requirement">granularity and it should
|
|
provide "overwrite" mechanisms</span> (SHOULD specify)</p>
|
|
|
|
<h3 id="Supportedoutputmedias">3.3 Supported output medias</h3>
|
|
|
|
<h4 id="outputMUSTspecify">3.3.1 MUST specify</h4>
|
|
|
|
<p class="requirement"><strong><a id="MMI-O6"
|
|
name="MMI-O6">(MMI-O6)</a>:</strong> The multimodal specifications
|
|
MUST support as output media:</p>
|
|
|
|
<ul class="requirement">
|
|
<li>Audio, including spoken prompts and playback</li>
|
|
|
|
<li>Visual (XHTML, SVG), encompassing different display
|
|
characteristics (monitor, PDA, smart phone etc...)</li>
|
|
|
|
<li>SMIL objects (animation, audio, img, video, text,
|
|
textstream)</li>
|
|
|
|
<li>Synthesis of audio,</li>
|
|
|
|
<li>MIDI</li>
|
|
|
|
<li>Streaming</li>
|
|
|
|
<li>Sign languages</li>
|
|
</ul>
|
|
|
|
<p class="requirement">(MUST specify).</p>
|
|
|
|
<h4 id="outputNicetospecify">3.3.2. Nice to specify</h4>
|
|
|
|
<p class="requirement"><strong><a id="MMI-O7"
|
|
name="MMI-O7">(MMI-O7)</a>:</strong> The multimodal specifications
|
|
SHOULD support additional media outputs like:</p>
|
|
|
|
<ul class="requirement">
|
|
<li>media types supported by CSS3</li>
|
|
|
|
<li>lip-synch face synthesis</li>
|
|
|
|
<li>tactile and haptic output</li>
|
|
</ul>
|
|
|
|
<p><span class="requirement">(NICE to specify).</span></p>
|
|
|
|
<h4 id="outputExtensibility">3.3.3. Extensibility</h4>
|
|
|
|
<p class="requirement"><strong><a id="MMI-O8"
|
|
name="MMI-O8">(MMI-O8)</a>:</strong> The multimodal specifications
|
|
MUST describe how extensibility is to be achieved and how new
|
|
output medias can be added (MUST specify).</p>
|
|
|
|
<h3 id="Outputprocessing">3.4 Output processing</h3>
|
|
|
|
<p class="requirement"><strong><a id="MMI-O9"
|
|
name="MMI-O9">(MMI-O9)</a>:</strong> The multimodal specifications
|
|
MUST support the specification of which output media should be
|
|
processed and how it should be done. The specification MUST provide
|
|
a mechanism that describe how this can be achieved or extended for
|
|
different modalities (MUST specify).</p>
|
|
|
|
<p>Examples of output processing may include: adaptation or styling
|
|
of presentation for particular modalities, speech synthesis of text
|
|
output into audio output, natural language generation, etc...</p>
|
|
|
|
<h2 id="Architectureintegrationandsynchronizationpoints">4.
|
|
Architecture, integration and synchronization points</h2>
|
|
|
|
<h3 id="Reusestandardmarkuplanguages">4.1 Reuse standard markup
|
|
languages</h3>
|
|
|
|
<p class="requirement"><strong><a id="MMI-A1"
|
|
name="MMI-A1">(MMI-A1)</a>:</strong> Where the functionality is
|
|
appropriate, and clean integration is possible, the multimodal
|
|
specifications must enable the use and integration of existing
|
|
standard language specifications including visual, aural, voice and
|
|
multimedia standards (MUST specify).</p>
|
|
|
|
<p>In general, it is understood that in order to satisfy <a
|
|
href="#MMI-G11">MMI-G11</a>, dependencies of the multimodal
|
|
specifications on other specifications must be carefully evaluated
|
|
if these are not yet W3C recommendations or not yet widely
|
|
adopted.</p>
|
|
|
|
<p>SMIL 2.0 provide multimedia synchronization mechanisms.
|
|
Therefore, <a href="#MMI-A1">MMI-A1</a> implies:</p>
|
|
|
|
<p class="requirement"><strong><a id="MMI-A1a"
|
|
name="MMI-A1a">(MMI-A1a)</a>:</strong> The multimodal
|
|
specifications MUST enable the synchronization of input and output
|
|
media through SMIL2.0 as control mechanism (MUST specify).</p>
|
|
|
|
<h3 id="XHTMLmodularization">4.2 XHTML Modularization</h3>
|
|
|
|
<p>The following requirement results from <a
|
|
href="#MMI-A1">MMI-A1</a>.</p>
|
|
|
|
<p class="requirement"><strong><a id="MMI-A2"
|
|
name="MMI-A2">(MMI-A2)</a>:</strong> The multimodal specifications
|
|
MUST be expressible in terms of XHTML modularization (MUST
|
|
specify).</p>
|
|
|
|
<h3 id="CompatibilitywithXForms">4.3 Separation of data model,
|
|
presentation layer and application logic</h3>
|
|
|
|
<p class="requirement"><strong><a id="MMI-A3"
|
|
name="MMI-A3">(MMI-A3)</a>:</strong> The multimodal specification
|
|
MUST allow the separation of data model, presentation layer and
|
|
application logic in the following ways:</p>
|
|
|
|
<ul class="requirement">
|
|
<li>Enable an explicit data model for the back end (i.e. the data)
|
|
and its mapping to the front end.</li>
|
|
|
|
<li>Enable the separation of the data model from the presentation.
|
|
The presentation depends on the device modality.</li>
|
|
|
|
<li>Application data must be modality independent</li>
|
|
|
|
<li>Logic should be modality independent.</li>
|
|
</ul>
|
|
|
|
<p class="requirement">(MUST specify).</p>
|
|
|
|
<p>This will enable the multimodal specifications to be compatible
|
|
with XForms in environments which support XForms. This would comply
|
|
with <a href="#MMI-A1">MMI-A1</a>.</p>
|
|
|
|
<h3 id="Detectionofavailablemodalities">4.4 Detection of available
|
|
modalities and changes</h3>
|
|
|
|
<p>From an authoring point of view, it is important to have
|
|
mechanisms (events, protocols, handlers) to detect or prescribe the
|
|
modalities that are or should be available: i.e. to check the
|
|
delivery context and to adapt to the delivery context. This is
|
|
covered by <a href="#MMI-G14">MMI-G14</a> and <a
|
|
href="#MMI-G15">MMI-G15</a>.</p>
|
|
|
|
<p class="requirement"><strong><a id="MMI-A4"
|
|
name="MMI-A4">(MMI-A4)</a>:</strong> There MUST be events
|
|
associated to changes of <a href="#deliverycontext">delivery
|
|
context</a> and mechanisms to specify how to handle these events by
|
|
adapting the multimodal application (MUST specify).</p>
|
|
|
|
<p class="requirement"><strong><a id="MMI-A5"
|
|
name="MMI-A5">(MMI-A5)</a>:</strong> There SHOULD be mechanisms
|
|
available to define the <a href="#deliverycontext">delivery
|
|
context</a> or behavior that is expected or recommended by the
|
|
author (SHOULD specify).</p>
|
|
|
|
<h3 id="Synchronizationgranularities">4.5 Synchronization
|
|
granularities</h3>
|
|
|
|
<p class="requirement"><strong><a id="MMI-A6"
|
|
name="MMI-A6">(MMI-A6)</a>:</strong> The multimodal specifications
|
|
MUST support the <a href="#synchronizationlevel">synchronization
|
|
granularities</a> at the following levels of synchronization:</p>
|
|
|
|
<ul class="requirement">
|
|
<li>Form-level input synchronization: Inputs in one modality are
|
|
reflected in the other only after reaching a particular point in
|
|
presentation (e.g. completing a certain amount of fields in a
|
|
form).</li>
|
|
|
|
<li>Field-level input synchronization: Inputs in one modality are
|
|
reflected in the other after the user finishes performing a
|
|
particular interaction with a field. This can be detected because
|
|
the user in general changes / sets a value in the data model. For
|
|
example, this results from a changes focus (e.g. move from input
|
|
field to input field) or completes the interaction (e.g. complete a
|
|
select in a menu).</li>
|
|
|
|
<li>Page-level: Inputs in one modality are reflected in the other
|
|
only after submission of the page.</li>
|
|
|
|
<li>Event-level synchronization: User inputs in one modality are
|
|
captured at the level the individual DOM events and immediately
|
|
reflected in the other modality; when it makes sense</li>
|
|
|
|
<li>Event-level input synchronization with output media</li>
|
|
|
|
<li>Media synchronization: Synchronization between output media as
|
|
specified by SMIL</li>
|
|
|
|
<li>Session level: <a href="#suspendresume">Suspend and resume</a>
|
|
behavior; an application suspended in one modality can be resumed
|
|
in the same or another modality.</li>
|
|
</ul>
|
|
|
|
<p><span class="requirement">(MUST specify).</span></p>
|
|
|
|
<p>In addition,</p>
|
|
|
|
<ul>
|
|
<li class="requirement"><strong><a id="MMI-A6a"
|
|
name="MMI-A6a">(MMI-A6a)</a>:</strong> It MUST be possible to
|
|
author <a href="#sequentialmm">sequential multimodal</a>
|
|
applications (MUST specify).</li>
|
|
|
|
<li class="requirement"><strong><a id="MMI-A6b"
|
|
name="MMI-A6b">(MMI-A6b)</a>:</strong> It MUST be possible to
|
|
specify what modality or device to use for interaction in <a
|
|
href="#sequentialmm">sequential multimodal</a> cases and hint or
|
|
enforce <a href="#modalityswitch">modality switches</a>. This is an
|
|
application developer's capability (MUST specify).</li>
|
|
|
|
<li class="requirement"><strong><a id="MMI-A6c"
|
|
name="MMI-A6c">(MMI-A6c)</a>:</strong> The specifications MUST
|
|
enable writing multimodal applications where the user can select
|
|
what modality or device to use at any time based on the user's <a
|
|
href="#situation">situation</a> and the nature of the input and
|
|
output interactions. More concretely, the specifications must
|
|
support writing multimodal applications that can be accessed
|
|
through each modality alone, and that support <a
|
|
href="#modalityswitch">modality switches</a> whenever desired by
|
|
the user (MUST specify).</li>
|
|
</ul>
|
|
|
|
<p>The following requirement results from <a
|
|
href="#MMI-A1">MMI-A1</a>.</p>
|
|
|
|
<p class="requirement"><strong><a id="MMI-A7a"
|
|
name="MMI-A7a">(MMI-A7a)</a>:</strong> Event-level synchronization
|
|
MUST follow the <a href="#DOM">DOM</a> event model (MUST
|
|
specify).</p>
|
|
|
|
<p class="requirement"><strong><a id="MMI-A7b"
|
|
name="MMI-A7b">(MMI-A7b)</a>:</strong> Event-level synchronization
|
|
SHOULD follow <a href="#XMLEvent">XML events</a> (SHOULD
|
|
specify).</p>
|
|
|
|
<p>Such events are not limited to events generated by user
|
|
interactions as discussed in <a href="#MMI-A16">MMI-A16</a>.</p>
|
|
|
|
<p>It is important that the application developer be able to fully
|
|
define the synchronization granularity.</p>
|
|
|
|
<p class="requirement"><strong><a id="MMI-A8"
|
|
name="MMI-A8">(MMI-A8)</a>:</strong> The multimodal specifications
|
|
MUST enable the author to specify the <a
|
|
href="#synchronizationlevel">granularity of synchronization</a>
|
|
(MUST specify).</p>
|
|
|
|
<p>However:</p>
|
|
|
|
<p class="requirement"><span><strong><a id="MMI-A9"
|
|
name="MMI-A9">(MMI-A9)</a>:</strong> It MUST be possible that the
|
|
granularity of synchronization be decided by the user runtime or
|
|
network (through the <a href="#deliverycontext">delivery
|
|
context</a>) (MUST specify).</span></p>
|
|
|
|
<p class="requirement"><strong><a id="MMI-A10"
|
|
name="MMI-A10">(MMI-A10)</a>:</strong> The multimodal
|
|
specifications MUST enable the author to specify how the multimodal
|
|
application degrade when the <a
|
|
href="#synchronizationlevel">granularity of synchronization</a> is
|
|
modified by external factors (MUST specify).</p>
|
|
|
|
<p class="requirement"><strong><a id="MMI-A11"
|
|
name="MMI-A11">(MMI-A11)</a>:</strong> The multimodal
|
|
specifications should rely on an input and output <a
|
|
href="#defaultsynchronization">default synchronization</a> behavior
|
|
and it should provide "overwrite" mechanisms (SHOULD specify).</p>
|
|
|
|
<h3 id="Independentinputandoutput">4.6 Independent input and output
|
|
interfaces even in a same modality</h3>
|
|
|
|
<p>Nothing imposes that input and output, even in a same modality,
|
|
be provided in a same device or user agent. The input and output
|
|
can be independent and the granularity of interfaces afforded by
|
|
the specification should apply independently to the mechanisms of
|
|
input and output within a given modality when necessary.</p>
|
|
|
|
<p class="requirement"><strong><a id="MMI-A12"
|
|
name="MMI-A12">(MMI-A12)</a>:</strong> The specification MUST
|
|
support separate interfaces for input and output even within a same
|
|
modality (MUST specify).</p>
|
|
|
|
<h3 id="Distributedsynchronization">4.7 Distributed
|
|
synchronization</h3>
|
|
|
|
<p class="requirement"><strong><a id="MMI-A13"
|
|
name="MMI-A13">(MMI-A13)</a>:</strong> The multimodal
|
|
specifications MUST support <a
|
|
href="#synchronizationbehavior">synchronization</a> of different
|
|
modalities or devices <a
|
|
href="#distributedcomponents">distributed</a> across the network,
|
|
providing the user with the capability to interact through
|
|
different devices (MUST specify).</p>
|
|
|
|
<p>In particular, this includes multi-device applications where
|
|
different devices or user agents are used to interact with a same
|
|
applications; these may involve presentation in the same modality
|
|
but on different devices. </p>
|
|
|
|
<h3 id="Distributedprocessing">4.8 Distributed processing</h3>
|
|
|
|
<p>Distribution of input and output processing refers to cases
|
|
where the processing algorithms applied on input and output may be
|
|
performed by distributed components.</p>
|
|
|
|
<p class="requirement"><span class="requirement"><strong><a
|
|
id="MMI-A14" name="MMI-A14">(MMI-A14)</a>:</strong> The multimodal
|
|
specifications MUST support the distribution of input and <a
|
|
href="#outputprocessing">output processing</a> (MUST
|
|
specify).</span></p>
|
|
|
|
<p class="requirement"><span class="requirement"><strong><a
|
|
id="MMI-A15" name="MMI-A15">(MMI-A15)</a>:</strong> The multimodal
|
|
specifications MUST support the expression of some level of control
|
|
over the distributed processing of input and output processing
|
|
(MUST specify).</span></p>
|
|
|
|
<p>This requirement is related to <a href="#MMI-I1">MMI-I1</a> and
|
|
<a href="#MMI-O9">MMI-O9</a>.</p>
|
|
|
|
<h3 id="Externalinput">4.9 External input and output</h3>
|
|
|
|
<p class="requirement"><span class="requirement"><strong><a
|
|
id="MMI-A16" name="MMI-A16">(MMI-A16)</a>:</strong> The multimodal
|
|
specifications MUST enable author to specify how multimodal
|
|
applications handle external input events and generate external
|
|
output events used by other processes (MUST specify).</span></p>
|
|
|
|
<p>Examples of input events include camera, sensors or GPS events.
|
|
Example of output event include any form of notification or trigger
|
|
generated by the user interaction.</p>
|
|
|
|
<p>This is expected to be automatically satisfied if events are
|
|
treated as <a href="#XMLEvent">XML events</a>. </p>
|
|
|
|
<h3 id="Temporalpositioningofinputandoutputevents">4.10 Temporal
|
|
positioning of input and output events</h3>
|
|
|
|
<p>Requirements <a href="#MMI-I8">MMI-I8</a> and <a
|
|
href="#MMI-I9">MMI-I9</a> generalize as follows.</p>
|
|
|
|
<p class="requirement"><strong><a id="MMI-A17"
|
|
name="MMI-A17">(MMI-A17)</a>:</strong> The multimodal
|
|
specifications MUST provide mechanisms to position the input and
|
|
output events relatively to each other in time (MUST specify).</p>
|
|
|
|
<p class="requirement"><span class="requirement"><strong><a
|
|
id="MMI-A18" name="MMI-A18">(MMI-A18)</a>:</strong> The multimodal
|
|
specifications SHOULD provide mechanisms to allow for temporal
|
|
grouping of input and output events (SHOULD specify).</span></p>
|
|
|
|
<p>These requirements may by satisfied by mechanisms to order of
|
|
the events or, when needed, relative time stamping. For some
|
|
configurations, this may involve clock synchronization.</p>
|
|
|
|
<h2 id="Runtimesanddeployments">5. Runtimes and deployments</h2>
|
|
|
|
<h3 id="Configurations">5.1 Configurations</h3>
|
|
|
|
<p>It is expected that users will interact with multimodal
|
|
applications through different deployment configurations (i.e.
|
|
architectures): the different modules responsible for media
|
|
rendering, input capture, processing, synchronization,
|
|
interpretation etc, may be partitioned or combined on a single
|
|
device or distributed across several devices or servers. As
|
|
previously discussed, these configurations may dynamically
|
|
change.</p>
|
|
|
|
<p>The specifications of such configuration is beyond the scope of
|
|
the W3C Multimodal Interaction Working Group. However:</p>
|
|
|
|
<p class="requirement"><strong><a id="MMI-C1"
|
|
name="MMI-C1">(MMI-C1)</a>:</strong> The multimodal specifications
|
|
MUST support the deployment of multimodal applications authored
|
|
according the W3C MMI specifications, with all the relevant
|
|
deployment configurations where functions are partitioned or
|
|
combined on a single engine or distributed across several devices
|
|
or servers (MUST specify).</p>
|
|
|
|
<p>The possibility to interact with multiple devices leads
|
|
naturally to multi-user access to applications.</p>
|
|
|
|
<p class="requirement"><strong><a id="MMI-C2"
|
|
name="MMI-C2">(MMI-C2)</a>:</strong> The multimodal specifications
|
|
SHOULD support multi-user deployments (NICE to specify).</p>
|
|
|
|
<h3 id="Mobiledeployments">5.2 Mobile deployments</h3>
|
|
|
|
<p>Multimodal interactions are especially important for mobile
|
|
deployments. Therefore, the W3C multimodal working group will pay
|
|
attention to the constraints associated to mobile deployments and
|
|
especially cell phones. </p>
|
|
|
|
<p class="requirement"><strong><a id="MMI-R1"
|
|
name="MMI-R1">(MMI-R1)</a>:</strong> The multimodal specifications
|
|
MUST be compatible with deployments based on user agents /
|
|
renderers that run on mobile platforms (MUST specify).</p>
|
|
|
|
<p>Mobile platforms, like smart phones, are typically constrained
|
|
in terms of processing power and memory available. It is expected
|
|
that the multimodal specifications will take such constraints into
|
|
account and be designed so that multimodal deployments are possible
|
|
on smart phones.</p>
|
|
|
|
<p>In addition, it is important to pay attention to the challenges
|
|
introduced by mobile networks like: limited bandwidth, delays
|
|
etc...:</p>
|
|
|
|
<p class="requirement"><strong><a id="MMI-R2"
|
|
name="MMI-R2">(MMI-R2)</a>:</strong> The multimodal specifications
|
|
MUST support deployments over mobile networks, considering the
|
|
bandwidth limitations and delays that they may introduce (MUST
|
|
specify).</p>
|
|
|
|
<p>This may enable deployment techniques or specification from
|
|
other standard activity to provision the necessary quality of
|
|
service.</p>
|
|
|
|
<h3 id="EMMA">5.3 EMMA</h3>
|
|
|
|
<p>The following requirements apply to the objectives for the
|
|
specification work on EMMA as defined in the <a
|
|
href="#Appendixb">glossary</a>. EMMA is intended to support the
|
|
necessary exchanges of information between the multimodal modules
|
|
mentioned in <a href="#Configurations">section 5.1</a>.</p>
|
|
|
|
<p class="requirement"><strong><a id="MMI-E1"
|
|
name="MMI-E1">(MMI-E1)</a>:</strong> The multimodal specifications
|
|
MUST support the generation, representation and exchange of input
|
|
events and results of input or output processing (MUST specify)</p>
|
|
|
|
<p class="requirement"><strong><a id="MMI-E2"
|
|
name="MMI-E2">(MMI-E2)</a>:</strong> The multimodal specification
|
|
MUST support the generation, representation and exchange of
|
|
interpretation and combinations of input event and results of input
|
|
or output processing (MUST specify).</p>
|
|
|
|
<h3 id="Multimodalsynchronizationexchanges">5.4 Multimodal
|
|
synchronization exchanges</h3>
|
|
|
|
<p class="requirement"><strong><a id="MMI-S1"
|
|
name="MMI-S1">(MMI-S1)</a>:</strong> The multimodal specifications
|
|
MUST enable to author the generation of asynchronous events and
|
|
their handler (MUST specify).</p>
|
|
|
|
<p class="requirement"><span class="requirement"><strong><a
|
|
id="MMI-S2" name="MMI-S2">(MMI-S2)</a>:</strong> The multimodal
|
|
specifications MUST enable to author the generation of synchronous
|
|
events and their handler (MUST specify).</span></p>
|
|
|
|
<p class="requirement"><span class="requirement"><strong><a
|
|
id="MMI-S3" name="MMI-S3">(MMI-S3)</a>:</strong> The multimodal
|
|
specifications MUST support event handlers local to the event
|
|
generator (MUST specify).</span></p>
|
|
|
|
<p><span class="requirement"><strong><a id="MMI-S4"
|
|
name="MMI-S4">(MMI-S4)</a>:</strong> The multimodal specifications
|
|
MUST support event handlers remote to the event
|
|
generator.</span></p>
|
|
|
|
<p class="requirement"><span class="requirement"><strong><a
|
|
id="MMI-S5" name="MMI-S5">(MMI-S5)</a>:</strong> The multimodal
|
|
specifications MUST support the exchange of EMMA fragments as part
|
|
of the synchronization events content (MUST specify).</span></p>
|
|
|
|
<p class="requirement"><span class="requirement"><strong><a
|
|
id="MMI-S6" name="MMI-S6">(MMI-S6)</a>:</strong> The multimodal
|
|
specifications MUST support the specification of event handlers for
|
|
externally generated events (MUST specify).</span></p>
|
|
|
|
<p class="requirement"><span class="requirement"><strong><a
|
|
id="MMI-S7" name="MMI-S7">(MMI-S7)</a>:</strong> The multimodal
|
|
specifications MUST support the specification of event handlers for
|
|
externally generated events that result from the interaction of the
|
|
user (MUST specify).</span></p>
|
|
|
|
<p class="requirement"><span class="requirement"><strong><a
|
|
id="MMI-S8" name="MMI-S8">(MMI-S8)</a>:</strong> The multimodal
|
|
specifications MUST support handlers that manipulate or update the
|
|
presentation associated to a particular modality (MUST
|
|
specify).</span></p>
|
|
|
|
<p>In distributed configurations, it is important that
|
|
synchronization exchanges take place with minimum delays. In
|
|
practical deployments this implies that the highest available
|
|
quality of services should be allocated to such exchanges.</p>
|
|
|
|
<p class="requirement"><span class="requirement"><strong><a
|
|
id="MMI-S9" name="MMI-S9">(MMI-S9)</a>:</strong> The multimodal
|
|
specifications MUST enable the identification of multimodal
|
|
synchronization exchanges. (MUST specify)</span></p>
|
|
|
|
<p>This would enable the underlying network to allocate the highest
|
|
quality of services associated to synchronization exchanges, if it
|
|
is aware of such needs. This network behavior is beyond the scope
|
|
of the multimodal specifications.</p>
|
|
|
|
<p class="requirement"><span class="requirement"><strong><a
|
|
id="MMI-S10" name="MMI-S10">(MMI-S10)</a>:</strong> The multimodal
|
|
specifications MUST support confirmation of event handling (MUST
|
|
specify).</span></p>
|
|
|
|
<p class="requirement"><span class="requirement"><strong><a
|
|
id="MMI-S11" name="MMI-S11">(MMI-S11)</a>:</strong> The multimodal
|
|
specifications MUST support event generation or event handling
|
|
pending confirmation of a particular event handling (MUST
|
|
specify).</span></p>
|
|
|
|
<p class="requirement"><span class="requirement"><strong><a
|
|
id="MMI-S12" name="MMI-S12">(MMI-S12a)</a>:</strong> The multimodal
|
|
specifications MUST be compatible with existing standards including
|
|
<a href="#DOM">DOM</a> events and <a href="#DOM">DOM</a>
|
|
specifications (MUST specify).</span></p>
|
|
|
|
<p class="requirement"><span class="requirement"><strong><a
|
|
id="MMI-S12b" name="MMI-S12b">(MMI-S12b)</a>:</strong> The
|
|
multimodal specifications SHOULD be compatible with existing
|
|
standards including <a href="#XMLEvent">XML events</a>
|
|
specifications (SHOULD specify).</span></p>
|
|
|
|
<p class="requirement"><span class="requirement"><strong><a
|
|
id="MMI-S13" name="MMI-S13">(MMI-S13)</a>:</strong>The multimodal
|
|
specification MUST allow lightweight multimodal synchronization
|
|
exchanges compatible with wireless network and mobile terminals
|
|
(MUST specify).</span></p>
|
|
|
|
<p>This last requirement is derived from <a
|
|
href="#MMI-R1">MMI-R1</a> and <a href="#MMI-R2">MMI-R2</a>.</p>
|
|
|
|
<h2 id="References">6. References</h2>
|
|
|
|
<p><a id="CCPP" name="CCPP"><strong>[CC/PP]:</strong></a> W3C CC/PP
|
|
Working Group, URI: <a
|
|
href="http://www.w3c.org/Mobile/CCPP/">http://www.w3c.org/Mobile/CCPP/</a>.</p>
|
|
|
|
<p><a id="DIactivity" name="DIactivity"><strong>[DI
|
|
activity]:</strong></a> W3C Device Independent Activity, URI: <a
|
|
href="http://www.w3c.org/2001/di/">http://www.w3c.org/2001/di/</a>.</p>
|
|
|
|
<p><a id="MMIcharter" name="MMIcharter"><strong>[MMI
|
|
charter]</strong></a><strong>:</strong> W3C Multimodal Interaction
|
|
Working group Charter, URI: <a
|
|
href="http://www.w3c.org/2002/01/multimodal-charter.html">http://www.w3c.org/2002/01/multimodal-charter.html</a>.</p>
|
|
|
|
<p><a id="MMIWG" name="MMIWG"><strong>[MMI
|
|
WG]</strong></a><strong>:</strong> W3C Multimodal Interaction
|
|
Working Group, URI: <a
|
|
href="http://www.w3c.org/2002/mmi/">http://www.w3c.org/2002/mmi/.</a></p>
|
|
|
|
<p><a id="MMReqVoice" name="MMReqVoice"><strong>[MM Req
|
|
Voice]</strong></a><strong>:</strong> Multimodal Requirements for
|
|
Voice Markup Languages, W3C Working Draft, URI: <a
|
|
href="http://www.w3c.org/TR/multimodal-reqs">http://www.w3c.org/TR/multimodal-reqs.</a></p>
|
|
|
|
<h2 id="Acknowledgments">7. Acknowledgements</h2>
|
|
|
|
<p><span style="font-style: italic; font-family: times">This
|
|
section is informative.</span></p>
|
|
|
|
<p>This document was jointly prepared by the members of the W3C
|
|
Multimodal Interaction Working Group.</p>
|
|
|
|
<p>Special acknowledgments to Jim Larson (Intel) and Emily Candell
|
|
(Comverse) for their significant editorial contributions.</p>
|
|
|
|
<h2 id="Appendices">Appendices</h2>
|
|
|
|
<h3 id="Appendixa">Appendix A: Use cases</h3>
|
|
|
|
<h4 id="Overviewoftheusecases">A.1 Overview of the use cases</h4>
|
|
|
|
<p>Analysis of use cases provides insight into the requirements for
|
|
applications likely to require a multimodal infrastructure.</p>
|
|
|
|
<p>The use cases described below were selected for analysis in
|
|
order to highlight different requirements resulting from
|
|
application variations in areas such as device requirements, event
|
|
handling, network dependencies and methods of user interaction</p>
|
|
|
|
<p><strong>Use Case Device Classification</strong></p>
|
|
|
|
<h5 id="ThinClient">Thin Client</h5>
|
|
|
|
<p>A device with little processing power and capabilities that can
|
|
be used to capture user input (microphone, touch display, stylus,
|
|
etc) as well as non-user input such as GPS. The device may have a
|
|
very limited capability to interpret the input, for example a small
|
|
vocabulary speech recognition, or a character recognizer. The bulk
|
|
of the processing occurs on the server including natural language
|
|
processing and interaction management.</p>
|
|
|
|
<p>An example of such a device may be a mobile phone with DSR
|
|
capabilities and a visual browser (there could actually be thinner
|
|
clients than this).</p>
|
|
|
|
<h5 id="ThickClient">Thick Client</h5>
|
|
|
|
<p>A device with powerful processing capabilities, such that most
|
|
of the processing can occur locally. Such a device is capable of
|
|
input capture and interpretation. For example, the device can have
|
|
a medium vocabulary speech recognizer, a handwriting recognizer,
|
|
natural language processing and interaction management
|
|
capabilities. The data itself may still be stored on the
|
|
server.</p>
|
|
|
|
<p>An example of such a device may be a recent production PDA or an
|
|
in-car system.</p>
|
|
|
|
<h5 id="MediumClient">Medium Client</h5>
|
|
|
|
<p>A device capable of input capture and some degree of
|
|
interpretation. The processing is distributed in a client/server or
|
|
a multi-device architecture. For example, a medium client will have
|
|
the voice recognition capabilities to handle small vocabulary
|
|
command and control tasks but connects to a voice server for more
|
|
advanced dialog tasks.</p>
|
|
|
|
<p><strong>Use Case Summaries</strong></p>
|
|
|
|
<p><strong>Form Filling for air travel reservation</strong></p>
|
|
|
|
<table border="1" summary="4 column table">
|
|
<tbody>
|
|
<tr>
|
|
<td><b>Description</b></td>
|
|
<td><b>Device Classification</b></td>
|
|
<td><b>Device Details</b></td>
|
|
<td><b>Execution Model</b></td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>The means for a user to reserve a flight using a wireless
|
|
personal mobile device and a combination of input and output
|
|
modalities. The dialog between the user and the application is
|
|
directed through the use of a form-filling paradigm.</td>
|
|
<td>Thin and medium clients</td>
|
|
<td>touch-enabled display (i.e., supports pen input), voice input,
|
|
local ASR and Distributed Speech Recognition Framework, local
|
|
handwriting recognition, voice output, TTS, GPS, wireless
|
|
connectivity, roaming between various networks.</td>
|
|
<td>Client Side Execution</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
|
|
<h5 id="ReservationScenario">Scenario Details</h5>
|
|
|
|
<p>User wants to make a flight reservation with his mobile device
|
|
while he is on the way to work. The user initiates the service via
|
|
means of making a phone call to a multimodal service (telephone
|
|
metaphor) or by selecting an application (portal environment
|
|
metaphor). The details are not described here.</p>
|
|
|
|
<p>As the user moves between networks with very different
|
|
characteristics, the user is offered the flexibility to interact
|
|
using the preferred and most appropriate modes for the situation.
|
|
For example, while sitting in a train, the use of stylus and
|
|
handwriting can achieve higher accuracy than speech (due to
|
|
surrounding noise) and protect privacy. When the user is walking,
|
|
the input and output modalities that more appropriate would be
|
|
voice with some visual output. Finally, at the office the user can
|
|
use pen and voice in a synergistic way.</p>
|
|
|
|
<p>The dialog between the user and the application is driven by a
|
|
form-filling paradigm where the user provides input to fields such
|
|
as "Travel Origin:", "Travel Destination:", "Leaving on date",
|
|
"Returning on date". As the user selects each field in the
|
|
application to enter information, the corresponding input
|
|
constraints are activated to drive the recognition and
|
|
interpretation of the user input. The capability of providing
|
|
composite multimodal input is also examined, where input from
|
|
multiple modalities is combined for the interpretation of the
|
|
user's intent.</p>
|
|
|
|
<p><strong>Driving Directions</strong></p>
|
|
|
|
<table border="1" summary="4 column table">
|
|
<tbody>
|
|
<tr>
|
|
<td><b>Description</b></td>
|
|
<td><b>Device Classification</b></td>
|
|
<td><b>Device Details</b></td>
|
|
<td><b>Execution Model</b></td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>This application provides a mechanism for a user to request and
|
|
receive driving directions via speech and graphical input and
|
|
output</td>
|
|
<td>Medium Client</td>
|
|
<td>on-board system (in a car) with a graphical display, map
|
|
database, touch screen, voice and touch input, speech output, local
|
|
ASR and TTS Processing and GPS.</td>
|
|
<td>Client Side Execution</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
|
|
<h5 id="DrivingScenario">Scenario Details</h5>
|
|
|
|
<p>User wants to go to a specific address from his current location
|
|
and while driving wants to take a detour to a local restaurant (The
|
|
user does not know the restaurant address nor the name). The user
|
|
initiates service via a button on his steering wheel and interacts
|
|
with the system via the touch screen and speech.</p>
|
|
|
|
<p><strong>Name Dialing</strong></p>
|
|
|
|
<table border="1" summary="4 column table">
|
|
<tbody>
|
|
<tr>
|
|
<td><b>Description</b></td>
|
|
<td><b>Device Classification</b></td>
|
|
<td><b>Device Details</b></td>
|
|
<td><b>Execution Model</b></td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>The means for users to call someone by saying their name.</td>
|
|
<td>thin and fat devices</td>
|
|
<td>Telephone</td>
|
|
<td>The study covers several possibilities:
|
|
<ul>
|
|
<li>whether the application runs in the device or the server</li>
|
|
|
|
<li>whether the device supports limited local speech
|
|
recognition</li>
|
|
</ul>
|
|
|
|
<p>These choices determine the kinds of events that are needed to
|
|
coordinate the device and network based services.</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
|
|
<h5 id="DialingScenario">Scenario Details</h5>
|
|
|
|
<p>Janet presses a button on her multimodal phone and says one of
|
|
the following commands:</p>
|
|
|
|
<ul>
|
|
<li>Call Wendy</li>
|
|
|
|
<li>Call Wendy on her cell phone</li>
|
|
|
|
<li>Call Wendy at work</li>
|
|
|
|
<li>Call Wendy Smith at Acme Research</li>
|
|
</ul>
|
|
|
|
<p>The application initially looks for a match in Janet's personal
|
|
contact list and if no match is found then proceeds to look in
|
|
other directories. Directed dialog and tapered help are used to
|
|
narrow down the search, using aural and visual prompts. Janet is
|
|
able to respond by pressing buttons, or tapping with a stylus, or
|
|
by using her voice.</p>
|
|
|
|
<p>Once a selection has been made, rules defined by Wendy are used
|
|
to determine how the call should be handled. Janet may see a
|
|
picture of Wendy along with a personalized message (aural and
|
|
visual) that Wendy has left for her. Call handling may depend on
|
|
the time of day, the location and status of the both parties, and
|
|
the relationship between them. An "ex" might be told to never call
|
|
again, while Janet might be told that Wendy will be free in half an
|
|
hour after Wendy's meeting has finished. The call may be
|
|
automatically directed to Wendy's home, office or mobile phone, or
|
|
Janet may be invited to leave a message.</p>
|
|
|
|
<h4 id="appendixa2analysis">A.2 Event analysis</h4>
|
|
|
|
<p>The use-case analysis exercise helped to identify the types of
|
|
events a multimodal system would likely need to support.</p>
|
|
|
|
<p>Based on the use case analysis, the following events
|
|
classifications were defined:</p>
|
|
|
|
<ul>
|
|
<li>Asynchronous vs. Synchronous</li>
|
|
|
|
<li>Local vs. remote generation</li>
|
|
|
|
<li>Local vs. remote handling</li>
|
|
|
|
<li>Input interpretation Events</li>
|
|
|
|
<li>Externally generated events vs. Events generated as a result of
|
|
user action</li>
|
|
|
|
<li>Actions vs. Notifications</li>
|
|
</ul>
|
|
|
|
<p>The events from the use cases described above have been
|
|
consolidated in the following table.</p>
|
|
|
|
<p><strong>Event Table:</strong></p>
|
|
|
|
<table border="1" summary="8 column table" cellspacing="0"
|
|
class="smaller">
|
|
<tbody>
|
|
<tr>
|
|
<td><br />
|
|
</td>
|
|
<td><b>Event Type</b></td>
|
|
<td><b>Asynchronous vs. Synchronous</b></td>
|
|
<td><b>Local vs. remote generation</b></td>
|
|
<td><b>Local vs. remote handling</b></td>
|
|
<td><b>Input inter- pretation</b></td>
|
|
<td><b>External vs. User</b></td>
|
|
<td><b>Notifications vs. actions</b></td>
|
|
<td><b>Comments</b></td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>1.</td>
|
|
<td>Data Reply Event</td>
|
|
<td>Synchronous</td>
|
|
<td>Remote</td>
|
|
<td>Local</td>
|
|
<td>No</td>
|
|
<td>External</td>
|
|
<td>Notification</td>
|
|
<td>Event containing results from a previous data request</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>2.</td>
|
|
<td>HTTP Request</td>
|
|
<td>Asynchronous</td>
|
|
<td>Local</td>
|
|
<td>Remote</td>
|
|
<td>No</td>
|
|
<td>External</td>
|
|
<td>N/A</td>
|
|
<td>A request sent via the HTTP Protocol</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>3.</td>
|
|
<td>GPS_DATA_in</td>
|
|
<td>Synchronous</td>
|
|
<td>Remote</td>
|
|
<td>Local</td>
|
|
<td>No</td>
|
|
<td>External</td>
|
|
<td>Notification</td>
|
|
<td>Event containing GPS Location Data</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>4.</td>
|
|
<td>Touch Screen Event</td>
|
|
<td>Asynchronous</td>
|
|
<td>Local</td>
|
|
<td>Local</td>
|
|
<td>Yes</td>
|
|
<td>User</td>
|
|
<td>Action</td>
|
|
<td>Event that contains coordinates corresponding to a location on
|
|
a touch screen</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>5.</td>
|
|
<td>Start_Listening Event</td>
|
|
<td>Asynchronous</td>
|
|
<td>Local / Remote</td>
|
|
<td>Local / Remote</td>
|
|
<td>No</td>
|
|
<td>User</td>
|
|
<td>Action</td>
|
|
<td>Event to invoke the speech recognizer</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>6.</td>
|
|
<td>Return Reco Results</td>
|
|
<td>Synchronous</td>
|
|
<td>Local / Remote</td>
|
|
<td>Local</td>
|
|
<td>Yes</td>
|
|
<td>External</td>
|
|
<td>Notification</td>
|
|
<td>Event containing the results of a recognition</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>7.</td>
|
|
<td>Alert</td>
|
|
<td>Asynchronous</td>
|
|
<td>Remote</td>
|
|
<td>Local</td>
|
|
<td>No</td>
|
|
<td>External</td>
|
|
<td>Notification</td>
|
|
<td>Event containing unsolicited data which may be of use to an
|
|
application</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>8.</td>
|
|
<td>Register User Ack</td>
|
|
<td>Synchronous</td>
|
|
<td>Remote</td>
|
|
<td>Local</td>
|
|
<td>No</td>
|
|
<td>External</td>
|
|
<td>Notification</td>
|
|
<td>Event acknowledging that user has registered with the
|
|
service</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>9.</td>
|
|
<td>Call</td>
|
|
<td>Asynchronous</td>
|
|
<td>Local</td>
|
|
<td>Remote</td>
|
|
<td>No</td>
|
|
<td>User</td>
|
|
<td>Action</td>
|
|
<td>Request to place an outgoing call</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>10.</td>
|
|
<td>Call Ack</td>
|
|
<td>Synchronous</td>
|
|
<td>Remote</td>
|
|
<td>Local</td>
|
|
<td>No</td>
|
|
<td>External</td>
|
|
<td>Notification</td>
|
|
<td>Event acknowledging request to place an outgoing call</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>11.</td>
|
|
<td>Leave Message</td>
|
|
<td>Asynchronous</td>
|
|
<td>Local</td>
|
|
<td>Remote</td>
|
|
<td>No</td>
|
|
<td>User</td>
|
|
<td>Action</td>
|
|
<td>Request to leave a message</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>12.</td>
|
|
<td>Message Ack</td>
|
|
<td>Synchronous</td>
|
|
<td>Remote</td>
|
|
<td>Local</td>
|
|
<td>No</td>
|
|
<td>External</td>
|
|
<td>Notification</td>
|
|
<td>Event acknowledging request to leave a message</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>13.</td>
|
|
<td>Send Mail</td>
|
|
<td>Asynchronous</td>
|
|
<td>Local</td>
|
|
<td>Remote</td>
|
|
<td>No</td>
|
|
<td>User</td>
|
|
<td>Action</td>
|
|
<td>Request to send a message</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>14.</td>
|
|
<td>Mail Ack</td>
|
|
<td>Synchronous</td>
|
|
<td>Remote</td>
|
|
<td>Local</td>
|
|
<td>No</td>
|
|
<td>External</td>
|
|
<td>Notification</td>
|
|
<td>Event acknowledging request to send a message</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>15.</td>
|
|
<td>Register_Device_Profile (delivery_context)</td>
|
|
<td>Synchronous</td>
|
|
<td>Local<br />
|
|
</td>
|
|
<td>Remote</td>
|
|
<td>No</td>
|
|
<td>External</td>
|
|
<td>Notification</td>
|
|
<td>Occurs on connection</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>16.</td>
|
|
<td>Update_Device_Profile (delivery_context)</td>
|
|
<td>Asynchronous/<br />
|
|
Synchronous</td>
|
|
<td>Local</td>
|
|
<td>Remote</td>
|
|
<td>No</td>
|
|
<td>External/<br />
|
|
User<br />
|
|
</td>
|
|
<td>Notifiication</td>
|
|
<td>The user selects a new set of modalities by pressing a button
|
|
or making menu selections (synchronous event). If the device
|
|
can detect changes in the network or location via GPS or beacons,
|
|
then the event is asynchronous.</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>17.</td>
|
|
<td>On_Focus (field_name)</td>
|
|
<td>Synchronous</td>
|
|
<td>Local</td>
|
|
<td>Remote</td>
|
|
<td>No</td>
|
|
<td>User</td>
|
|
<td>Action</td>
|
|
<td>Event sends selected field to multimodal synchronization server
|
|
for the purpose of loading the appropriate input constraints for
|
|
the field.</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>18.</td>
|
|
<td>Handwriting_Reco ()</td>
|
|
<td>Synchronous</td>
|
|
<td>Local</td>
|
|
<td>Local</td>
|
|
<td>Yes</td>
|
|
<td>User</td>
|
|
<td>Action</td>
|
|
<td>Event to invoke the handwriting recognizer (HWR) after pen
|
|
input in a field. In the current scenario, we consider that HWR is
|
|
handled locally, but this may be expanded later to include remote
|
|
processing.</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>19.</td>
|
|
<td>Submit_Partial_Result ()</td>
|
|
<td>Synchronous</td>
|
|
<td>Local</td>
|
|
<td>Remote<br />
|
|
</td>
|
|
<td>No</td>
|
|
<td>External</td>
|
|
<td>Notification<br />
|
|
</td>
|
|
<td>Result of recognition of field input is sent to the server</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>20.</td>
|
|
<td>Send_Ink (ink_data, time_stamp)</td>
|
|
<td>Synchronous</td>
|
|
<td>Local</td>
|
|
<td>Remote</td>
|
|
<td>Yes<br />
|
|
</td>
|
|
<td>User</td>
|
|
<td>Action</td>
|
|
<td>Ink collected for a pen gesture is sent to multimodal server
|
|
for integration. As before, this event associates time stamp
|
|
information with the ink data for synchronization.The result of the
|
|
pen gesture can be transmitted as a sequence of (x,y) coordinates
|
|
relative to the device display,</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td valign="middle">21<br />
|
|
</td>
|
|
<td valign="middle">Collect_Pen_Input ()<br />
|
|
</td>
|
|
<td valign="middle">Synchronous<br />
|
|
</td>
|
|
<td valign="middle">Local<br />
|
|
</td>
|
|
<td valign="middle">Local<br />
|
|
</td>
|
|
<td valign="middle">Yes<br />
|
|
</td>
|
|
<td valign="middle">User<br />
|
|
</td>
|
|
<td valign="middle">Action<br />
|
|
</td>
|
|
<td valign="middle">Ink collection could be interpreted first
|
|
locally into basic shapes (i.e, circles, lines) and have those
|
|
transmitted to the server.</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td valign="middle">22<br />
|
|
</td>
|
|
<td valign="middle">Send_Gesture (gesture_data, time_stamp)<br />
|
|
</td>
|
|
<td valign="middle">Synchronous<br />
|
|
</td>
|
|
<td valign="middle">Local<br />
|
|
</td>
|
|
<td valign="middle">Remote<br />
|
|
</td>
|
|
<td valign="middle">Yes<br />
|
|
</td>
|
|
<td valign="middle">User<br />
|
|
</td>
|
|
<td valign="middle">Action<br />
|
|
</td>
|
|
<td valign="middle">The server can provide a deeper semantic
|
|
interpetation than the basic shapes that are recognized on
|
|
the client<br />
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
|
|
<h3 id="Appendixb">Appendix B: Glossary</h3>
|
|
|
|
<p><a id="audiovisualspeech"
|
|
name="audiovisualspeech"><strong>audio-visual
|
|
speech</strong></a></p>
|
|
|
|
<p>Combination of video and audio to process input (joint
|
|
face/lips/movement recognition and speech recognition) and generate
|
|
output (audio-visual media)</p>
|
|
|
|
<p class="Section1"><a id="complementarymm"
|
|
name="complementarymm"><strong>complementary use of
|
|
modalities</strong></a></p>
|
|
|
|
<p>A use of modalities where the interactions available to the user
|
|
differ per modality.</p>
|
|
|
|
<p class="Section1"><a id="compositeinput"
|
|
name="compositeinput"><strong>composite inputs</strong></a></p>
|
|
|
|
<p class="Section1">Composite input is input received on multiple
|
|
modalities at the same time and treated as a single, integrated
|
|
compound input by downstream processes.</p>
|
|
|
|
<p class="Section1"><a id="configuration"
|
|
name="configuration"><strong>configuration</strong></a></p>
|
|
|
|
<p class="Section1">See <a href="#executionmodel">execution
|
|
model.</a></p>
|
|
|
|
<p class="Section1"><a id="conflictinginput"
|
|
name="conflictinginput"><strong>conflicting inputs</strong></a></p>
|
|
|
|
<p class="Section1">Contradictory inputs provided by the user in
|
|
different modalities or on different devices. For examples, they
|
|
may indicate different exclusive selection.</p>
|
|
|
|
<p class="Section1"><a id="sessioncontext"
|
|
name="sessioncontext"><strong>context</strong></a></p>
|
|
|
|
<p class="Section1">A session context consists of the history of
|
|
the interaction between the user and the multimodal system,
|
|
including the input received from the user, the output presented to
|
|
the user, the current data model and the sequence of data model
|
|
changes.</p>
|
|
|
|
<p class="Section1"><a id="coordinationcapability"
|
|
name="coordinationcapability"><strong>coordination
|
|
capability</strong></a></p>
|
|
|
|
<p class="Section1">Capability of a multimodal system to combine
|
|
multimodal inputs into composite inputs based on an interpretation
|
|
algorithm that decides what makes sense to combine based on the
|
|
context</p>
|
|
|
|
<p class="Section1"><strong>CC/PP [ Composite Capability/Preference
|
|
Profiles],</strong></p>
|
|
|
|
<p class="Section1">A W3C working group which is developing an
|
|
RDF-based framework for the management of device profile
|
|
information. For more details about the group activity please visit
|
|
<a
|
|
href="http://www.w3.org/Mobile/CCPP/">http://www.w3.org/Mobile/CCPP/</a></p>
|
|
|
|
<p class="Section1"><strong>concatenation</strong></p>
|
|
|
|
<p class="Section1">The text-to-speech engine concatenates short
|
|
digital-audio segments and performs intersegment smoothing to
|
|
produce a continuous sound.</p>
|
|
|
|
<p><strong>CSS</strong></p>
|
|
|
|
<p class="Section1">Cascading Stylesheets</p>
|
|
|
|
<p class="Section1"><strong>data file</strong></p>
|
|
|
|
<p>Argument files to input or output processing algorithms</p>
|
|
|
|
<p class="Section1"><a id="defaultsynchronization"
|
|
name="defaultsynchronization"><strong>default
|
|
synchronization</strong></a></p>
|
|
|
|
<p class="Section1">Synchronization behavior supported by default
|
|
by a multimodal application.</p>
|
|
|
|
<p class="Section1"><a id="deliverycontext"
|
|
name="deliverycontext"><strong>delivery context</strong></a></p>
|
|
|
|
<p class="Section1">A set of attributes that characterizes the
|
|
capabilities of the access mechanism in terms of device profile,
|
|
user profile (e.g. identify, preferences and usage patterns) and
|
|
situation. Delivery context may have static and dynamic
|
|
components.</p>
|
|
|
|
<p class="Section1"><a id="device"
|
|
name="device"><strong>device</strong></a></p>
|
|
|
|
<p class="Section1">A piece of hardware used to access and interact
|
|
with an application.</p>
|
|
|
|
<p class="Section1"><a id="deviceprofile"
|
|
name="deviceprofile"><strong>device profile</strong></a></p>
|
|
|
|
<p class="Section1">A particular subset of the delivery context
|
|
that describes the device characteristics including for example
|
|
device form factor, available modalities, level of synchronization
|
|
and coordination.</p>
|
|
|
|
<p class="Section1"><strong>DI [Device Independence]</strong></p>
|
|
|
|
<p class="Section1">The W3C Device Independence Activity is working
|
|
to ensure seamless Web access with all kinds of devices, and
|
|
worldwide standards for the benefit of Web users and content
|
|
providers alike. For more details pleases refer to <a
|
|
href="http://www.w3.org/2001/di/">http://www.w3.org/2001/di/</a></p>
|
|
|
|
<p class="Section1"><a id="digitalink"
|
|
name="digitalink"><strong>digital ink</strong></a></p>
|
|
|
|
<p class="Section1">Stored or recognized handwriting input.</p>
|
|
|
|
<p class="Section1"><a id="directeddialog"
|
|
name="directeddialog"><strong>directed dialog</strong></a></p>
|
|
|
|
<p>A dialog in which one party (the user or the computer) follows a
|
|
pre-selected path, independent of the responses of the other. (cfr.
|
|
<a href="#mixedinitiative">mixed initiative</a> dialog).</p>
|
|
|
|
<p class="Section1"><a id="distributedcomponents"
|
|
name="distributedcomponents"><strong>distributed
|
|
components</strong></a></p>
|
|
|
|
<p class="Section1">System components may live at various points of
|
|
the network, including the local client.</p>
|
|
|
|
<p class="Section1"><a id="DOM" name="DOM"><strong>DOM [Document
|
|
Object Model]</strong></a></p>
|
|
|
|
<p class="Section1">A standard interface to the contents of a web
|
|
page. Please visit <a
|
|
href="http://www.w3.org/DOM/">http://www.w3.org/DOM/</a> for more
|
|
details.</p>
|
|
|
|
<p class="Section1"><strong>EMMA</strong></p>
|
|
|
|
<p class="Section1">Extensible MultiModal Annotation Markup
|
|
Language. Formerly known as NLSML—Natural Language
|
|
Semantics Markup Language. This markup language is intended for use
|
|
by systems to represent semantic interpretations for a variety of
|
|
inputs, including but not necessarily limited to, speech and
|
|
natural language text input</p>
|
|
|
|
<p class="Section1"><a id="event"
|
|
name="event"><strong>event</strong></a></p>
|
|
|
|
<p class="Section1">An event is a representation of some
|
|
asynchronous occurrence of interest to the multimodal system.
|
|
Examples include mouse clicks, hanging up the phone, speech
|
|
recognition errors. Events may be associated with data e.g. the
|
|
location the mouse was clicked.</p>
|
|
|
|
<p class="Section1"><a id="eventhandler"
|
|
name="eventhandler"><strong>event handler</strong></a></p>
|
|
|
|
<p class="Section1">A software object intended to interpret and
|
|
respond to a given class of events.</p>
|
|
|
|
<p class="Section1"><a id="eventsource"
|
|
name="eventsource"><strong>event source</strong></a></p>
|
|
|
|
<p class="Section1">An agent (human or software) capable of
|
|
generating events.</p>
|
|
|
|
<p class="Section1"><a id="executionmodel"
|
|
name="executionmodel"><strong>execution model</strong></a></p>
|
|
|
|
<p class="Section1">Runtime configuration of the various system
|
|
components in a particular manifestation of a multimodal
|
|
system.</p>
|
|
|
|
<p class="Section1"><a id="externalevent"
|
|
name="externalevent"><strong>external event</strong></a></p>
|
|
|
|
<p class="Section1">External input events are events that are not
|
|
originating from direct user input. External output events are
|
|
events that originate in the multimodal system and are handled by
|
|
other processes.</p>
|
|
|
|
<p class="Section1"><a id="GPS" name="GPS"><strong>GPS [Global
|
|
Positioning System]</strong></a></p>
|
|
|
|
<p class="Section1">A worldwide radio-navigation system formed from
|
|
a constellation of 24 satellites and their ground stations. GPS
|
|
uses these "man-made stars" as reference points to calculate
|
|
positions accurate to a matter of meters.</p>
|
|
|
|
<p class="Section1"><strong>grammar</strong></p>
|
|
|
|
<p class="Section1">A computational mechanism that defines a finite
|
|
or infinite set of legal strings, usually with some structure.</p>
|
|
|
|
<p class="Section1"><a id="handwriting"
|
|
name="handwriting"><strong>handwriting</strong></a></p>
|
|
|
|
<p class="Section1">use of the pen for input which is converted
|
|
into text or symbols. Involves handwriting recognition.</p>
|
|
|
|
<p class="Section1"><a id="history"
|
|
name="history"><strong>history</strong></a></p>
|
|
|
|
<p class="Section1">Portions of profile and session context
|
|
persisted for a same user across sessions.</p>
|
|
|
|
<p class="Section1"><strong>HTML [HyperText Markup
|
|
Language]</strong></p>
|
|
|
|
<p class="Section1">A simple markup language used to create
|
|
hypertext documents that are portable from one platform to another.
|
|
To find more information about specification of HTML and the
|
|
working group acitivity please visit <a
|
|
href="http://www.w3c.org/MarkUp/">http://www.w3c.org/MarkUp/</a></p>
|
|
|
|
<p class="Section1"><strong>HTTP [Hypertext Transfer
|
|
Protocol]</strong></p>
|
|
|
|
<p class="Section1">To get details about the HTTP working group and
|
|
the HTTP specification please visit <a
|
|
href="http://www.w3c.org/Protocols/">http://www.w3c.org/Protocols/</a>.</p>
|
|
|
|
<p class="Section1"><strong><a id="humanlanguage"
|
|
name="humanlanguage">human language</a></strong></p>
|
|
|
|
<p class="Section1">Any spoken language (e.g. French, Japanese,
|
|
English etc...).</p>
|
|
|
|
<p class="Section1"><strong>ink</strong></p>
|
|
|
|
<p class="Section1">See digital ink.</p>
|
|
|
|
<p class="Section1"><a id="input"
|
|
name="input"><strong>input</strong></a></p>
|
|
|
|
<p class="Section1">Event, set of events or macro-event generated
|
|
by a user interaction in a particular modality on a particular
|
|
device.<!--StartFragment-->
|
|
</p>
|
|
|
|
<p><a id="inputconstraints" name="inputconstraints"><strong>input
|
|
constraints</strong></a></p>
|
|
|
|
<p>Specify how inputs are can be combined via rules or interaction
|
|
management strategies. For example the markup language may
|
|
coordinates grammars for modalities other than speech with speech
|
|
grammars to avoid duplication of effort in authoring multimodal
|
|
grammars.</p>
|
|
|
|
<p class="Section1"><a id="inputprocessing"
|
|
name="inputprocessing"><strong>input processing</strong></a></p>
|
|
|
|
<p class="Section1">Algorithm to apply to a particular input in
|
|
order to transform or extract information from it (e.g. filtering,
|
|
speech recognition; spaker recognition, NL parsing,...). The
|
|
algorithm may rely on data files as argument (e.g. grammar,
|
|
acoustic model, NL models, ...)</p>
|
|
|
|
<p class="Section1"><a id="dialogmanager"
|
|
name="dialogmanager"><strong>interaction manager</strong></a></p>
|
|
|
|
<p class="Section1">An interaction manager generates or updates the
|
|
presentation by processing user inputs, session context and
|
|
possibly other external knowledge sources to determine the intent
|
|
of the user. An interaction manager relies on strategies to
|
|
determine focus and intent as well as to disambiguate, correct and
|
|
confirm sub-dialogs. We typically distinguish <a
|
|
href="#directeddialog">directed dialogs</a> (e.g. user-driven or
|
|
application-driven) and <a href="#mixedinitiative">mixed
|
|
initiative</a> or free flow dialogs.</p>
|
|
|
|
<p class="Section1"><a id="lipsynch"
|
|
name="lipsynch"><strong>lipsynch</strong></a></p>
|
|
|
|
<p class="Section1">Output media where at least a face has lip
|
|
movements synchronized with an output audio speech</p>
|
|
|
|
<p class="Section1"><strong>markup components</strong></p>
|
|
|
|
<p class="Section1">XML vocabularies that provide markup-level
|
|
access to various system components</p>
|
|
|
|
<p class="Section1"><a id="mediasynch"
|
|
name="mediasynch"><strong>media synchronization</strong></a></p>
|
|
|
|
<p class="Section1">Synchronization between output media as
|
|
specified by SMIL: <a
|
|
href="http://www.w3.org/AudioVideo/">http://www.w3.org/AudioVideo/</a></p>
|
|
|
|
<p class="Section1"><strong>medium</strong></p>
|
|
|
|
<p class="Section1">It is a description that can be rendered into
|
|
physical effects that can be perceived and interacted with by the
|
|
user in one or multiple modalities and on one or multiple
|
|
devices</p>
|
|
|
|
<p class="Section1"><strong>MIDI</strong></p>
|
|
|
|
<p class="Section1">Musical Instrument Digital Interface, an audio
|
|
format.</p>
|
|
|
|
<p class="Section1"><a id="mixedinitiative"
|
|
name="mixedinitiative"><strong>mixed initiative
|
|
dialog</strong></a></p>
|
|
|
|
<p>A style of dialog where both parties (the computer and the user)
|
|
can control what is talked about and when. A party may on its own
|
|
change the course of the interaction (e.g., by asking questions,
|
|
providing more or less information than what was requested or
|
|
making digressions). Mixed initiative dialog is contrasted with
|
|
directed dialog where only one party controls the conversation. (cf
|
|
directed dialog)</p>
|
|
|
|
<p class="Section1"><strong>MMI: [Multimodal
|
|
Interaction]</strong></p>
|
|
|
|
<p class="Section1">A W3C Working Group which is developing markup
|
|
specifications that extends the Web user interface to allow
|
|
multiple modes of interaction. For more details of MMI working
|
|
group and MMI activity, please visit <a
|
|
href="http://www.w3c.org/2002/mmi/">http://www.w3c.org/2002/mmi/</a></p>
|
|
|
|
<p class="Section1"><a id="modality"
|
|
name="modality"><strong>modality</strong></a></p>
|
|
|
|
<p>The type of communication channel used for interaction. It also
|
|
covers the way an idea is expressed or perceived, or the manner in
|
|
which an action is performed.</p>
|
|
|
|
<p class="Section1"><a id="modalityswitch"
|
|
name="modalityswitch"><strong>modality switch</strong></a></p>
|
|
|
|
<p class="Section1">Change of modality to perform a particular
|
|
interaction. It can be decided by the user or imposed by the
|
|
application or runtime (e.g. when a phone call drops).</p>
|
|
|
|
<p class="Section1"><strong>MPEG</strong></p>
|
|
|
|
<p class="Section1">Working group established under the joint
|
|
direction of the International Standards Organization/International
|
|
Electrotechnical Commission (ISO/IEC), that has for goal to create
|
|
standards for the digital video and the audiophonic compression.
|
|
More precisely, MPEG defines the syntax of audio and video format
|
|
needing low data rates, as well as operations to be undertaken by
|
|
decoders.</p>
|
|
|
|
<p class="Section1"><strong>MP3 [MPEG Audio Layer-3]</strong></p>
|
|
|
|
<p class="Section1">An Internet music format. For MP3 related
|
|
technologies please refer to <a
|
|
href="http://www.mp3-tech.org/">http://www.mp3-tech.org/</a></p>
|
|
|
|
<p class="Section1"><a id="multimodalsystem"
|
|
name="multimodalsystem"><strong>multimodal system</strong></a></p>
|
|
|
|
<p class="Section1">A multimodal system supports communication with
|
|
the user through different modalities such as voice, gesture, and
|
|
typing. (cfr modality)</p>
|
|
|
|
<p class="Section1"><strong>must specify</strong></p>
|
|
|
|
<p class="Section1">A must specify requirement must be satisfied by
|
|
the multimodal specification(s), starting from their very first
|
|
version.</p>
|
|
|
|
<p class="Section1"><strong>natural Language (NL)</strong></p>
|
|
|
|
<p class="Section1">Term used for human language, as opposed to
|
|
artificial languages (such as computer programming languages or
|
|
those based on mathematical logic). A processor capable of handling
|
|
NL must typically be able to deal with a flexible set of
|
|
sentences.</p>
|
|
|
|
<p class="Section1"><a id="NLG" name="NLG"><strong>natural language
|
|
generation (NLG)</strong></a></p>
|
|
|
|
<p class="Section1">A technique for generating natural language
|
|
sentences based on some higher-level information. Generation by
|
|
template is an example of simple language generation techniques.
|
|
The flight from <departure-city> to <arrival-city>
|
|
leaves at <departure-time> is an example of template where
|
|
the slots indicated by <Â…> have to be filled
|
|
with the appropriate information by a higher-level process.</p>
|
|
|
|
<p class="Section1"><strong>natural language
|
|
processing</strong></p>
|
|
|
|
<p class="Section1">Natural language understanding, generation,
|
|
translation and other transformations on human language.</p>
|
|
|
|
<p class="Section1"><strong>natural language understanding
|
|
(NLU)</strong></p>
|
|
|
|
<p class="Section1">The process of interpreting natural language
|
|
phrases to specify their meaning, typically as a formula in formal
|
|
logic.</p>
|
|
|
|
<p class="Section1"><strong>nice to specify</strong></p>
|
|
|
|
<p class="Section1">A "nice to specify" requirement will be taken
|
|
into account when designing the specification. If a technical
|
|
solution is available, the specifications will try to satisfy the
|
|
requirement or support the feature, provided that it does not
|
|
excessively delay the work plan.</p>
|
|
|
|
<p class="Section1"><a id="notify"
|
|
name="notify"><strong>notify</strong></a></p>
|
|
|
|
<p class="Section1">The act of communicating an event (see
|
|
subscribe).</p>
|
|
|
|
<p class="Section1"><strong>override mechanism for
|
|
synchronization</strong></p>
|
|
|
|
<p class="Section1">Information that specifies how the
|
|
synchronization should behave when not following its default
|
|
behavior. (cf. default synchronization)</p>
|
|
|
|
<p class="Section1"><strong>output generation</strong></p>
|
|
|
|
<p class="Section1">Expressing information to be conveyed in a
|
|
user-friendly form, possibly using multiple output media
|
|
streams.</p>
|
|
|
|
<p class="Section1"><a id="outputprocessing"
|
|
name="outputprocessing"><strong>output processing</strong></a></p>
|
|
|
|
<p class="Section1">Algorithm to apply in order to transform or
|
|
generate an output (e.g. TTS, NLG)</p>
|
|
|
|
<p class="Section1"><strong>semantics</strong></p>
|
|
|
|
<p class="Section1">The meaning or interpretation of a word,
|
|
phrase, or sentence, as opposed to its syntactic form. In natural
|
|
language and dialog technology the term semantics is typically used
|
|
to indicate a representation of a phrase or a sentence whose
|
|
elements can be related to entities of the application (e.g.
|
|
departure airport and arrival time for a flight application), or
|
|
dialog acts (e.g. request for help, repeat, etc.).</p>
|
|
|
|
<p class="Section1"><a id="semanticinterp"
|
|
name="semanticinterp"><strong>semantic
|
|
interpretation</strong></a></p>
|
|
|
|
<p class="Section1">The process of interpreting the semantic part
|
|
of a grammar. The result of the interpretation is a semantic
|
|
representation. This process is often referred as Semantic
|
|
Tagging.</p>
|
|
|
|
<p class="Section1"><a id="semanticrep"
|
|
name="semanticrep"><strong>semantic representation</strong></a></p>
|
|
|
|
<p class="Section1">The semantic result of parsing a written
|
|
sentence, or a spoken utterance. The semantic interpretation can be
|
|
expressed as attribute value pairs or more complex structures. W3C
|
|
is working on the definition of Semantic Representation
|
|
formalism</p>
|
|
|
|
<p class="Section1"><a id="sequentialinput"
|
|
name="sequentialinput"><strong>sequential inputs</strong></a></p>
|
|
|
|
<p class="Section1">A sequential input is one received on a single
|
|
modality. The modality may change over time.] (cf. <a
|
|
href="#simultaneousinput">simultaneous</a> or <a
|
|
href="#compositeinput">composite</a> input.</p>
|
|
|
|
<p class="Section1"><a id="sequentialmm"
|
|
name="sequentialmm"><strong>sequential
|
|
multimodality</strong></a></p>
|
|
|
|
<p class="Section1">A sequential multimodal application is one in
|
|
which the user may interact with the application only one modality
|
|
at a time, <a href="#modalityswitch">switching</a> between
|
|
modalities as needed.]</p>
|
|
|
|
<p class="Section1"><a id="session"
|
|
name="session"><strong>session</strong></a></p>
|
|
|
|
<p class="Section1">The time interval during which an application
|
|
and its context context is associated to a user and persisted.
|
|
Within a session, users may suspend and resume interaction with an
|
|
application within a same modality or device or switch modality or
|
|
device.</p>
|
|
|
|
<p class="Section1"><strong>session level synchronization
|
|
granularity</strong></p>
|
|
|
|
<p class="Section1">Multimodal application that supports suspend
|
|
and resume behavior across modalities</p>
|
|
|
|
<p class="Section1"><!--StartFragment -->
|
|
<strong>should specify</strong></p>
|
|
|
|
<p class="Section1">The specifications (multimodal markup language
|
|
and other) will aim at addressing and satisfying the requirement or
|
|
supporting the features during the lifetime of the working group.
|
|
Early specification will take this into account to allow easy and
|
|
interoperable updates.</p>
|
|
|
|
<p class="Section1"><a id="simultaneousinput"
|
|
name="simultaneousinput"><strong>simultaneous
|
|
inputs</strong></a></p>
|
|
|
|
<p class="Section1">Simultaneous inputs denote inputs that can come
|
|
from different modalities but are not combined into composite
|
|
inputs. Simultaneous multimodal inputs, imply that the inputs from
|
|
several modalities are interpreted one after the other in the order
|
|
that they where received instead of being combined before
|
|
interpretation.</p>
|
|
|
|
<p class="Section1"><strong><a id="situation"
|
|
name="situation">situation</a></strong></p>
|
|
|
|
<p class="Section1">External information that can affect the usage
|
|
or expected behavior of multimodal applications including for
|
|
example on-going activities (e.g. walking versus driving),
|
|
environment (e.g. noisy), privacy (e.g. alone versus in public),
|
|
etc...</p>
|
|
|
|
<p class="Section1"><a id="SMIL" name="SMIL"><strong>SMIL
|
|
[Synchronized Multimedia Integration Language]</strong></a></p>
|
|
|
|
<p class="Section1">A W3C Recommendation, SMIL 2.0 enables simple
|
|
authoring of interactive audiovisual applications. See <a
|
|
href="http://www.w3.org/TR/smil20/">http://www.w3.org/TR/smil20/</a>
|
|
for details.</p>
|
|
|
|
<p class="Section1"><strong>speech recognition</strong></p>
|
|
|
|
<p class="Section1">The ability of a computer to understand the
|
|
spoken word for the purpose of receiving command and data input
|
|
from the speaker.</p>
|
|
|
|
<p class="Section1"><a id="speechrecognitionengine"
|
|
name="speechrecognitionengine"><strong>speech-recognition
|
|
engine</strong></a></p>
|
|
|
|
<p class="Section1">A software/hardware component that performs
|
|
recognition from a digital-audio stream. speech recognition engines
|
|
are supplied by vendors who specialize in the software.</p>
|
|
|
|
<p class="Section1"><a id="subscribe"
|
|
name="subscribe"><strong>subscribe</strong></a></p>
|
|
|
|
<p class="Section1">The act of informing an event source that you
|
|
want to be notified of some class of events.</p>
|
|
|
|
<p><a id="supplementarymm"
|
|
name="supplementarymm"><strong>supplementary use of
|
|
modalities</strong></a></p>
|
|
|
|
<p class="Section1">Describes multimodal applications in which
|
|
every interaction (input or output) can be carried through in each
|
|
modality as if it was the only available modality</p>
|
|
|
|
<p class="Section1"><a id="suspendresume"
|
|
name="suspendresume"><strong>suspend and resume</strong></a></p>
|
|
|
|
<p class="Section1">Suspend and resume behavior; an application
|
|
suspended in one modality can be resumed in the same or another
|
|
modality</p>
|
|
|
|
<p class="Section1"><a id="synchronizationbehavior"
|
|
name="synchronizationbehavior"><strong>synchronization
|
|
behavior</strong></a></p>
|
|
|
|
<p class="Section1">Way that an input in one modality is reflected
|
|
in the output in another modality/device as well as way that it may
|
|
be combined across modalities (<a
|
|
href="#coordinationcapability">coordination capability</a>)</p>
|
|
|
|
<p class="Section1"><a id="synchronizationlevel"
|
|
name="synchronizationlevel"><strong>synchronization granularity or
|
|
level</strong></a></p>
|
|
|
|
<ul>
|
|
<li><strong>Event-level synchronization</strong>: Inputs in one
|
|
modality are captured at the level the individual DOM events and
|
|
immediately reflected in the other modality; when it makes
|
|
sense</li>
|
|
|
|
<li><strong>Field-level synchronization</strong>: Inputs in one
|
|
modality are reflected in the other after the user changes focus
|
|
(e.g. moves from input field to input field) or completes the
|
|
interaction with a field (e.g. completes a select in a menu)</li>
|
|
|
|
<li><strong>Form-level synchronization</strong>: Inputs in one
|
|
modality are reflected in the other only after a particular point
|
|
in the presentation is reached (e.g. after a certain number of
|
|
fields have been completed in the form).</li>
|
|
|
|
<li><strong>Session-level synchronization</strong>: Inputs in one
|
|
modality are reflected in the other only after a switch from one
|
|
modality to another.</li>
|
|
</ul>
|
|
|
|
<p class="Section1"><a id="synthesis"
|
|
name="synthesis"><strong>synthesis</strong></a></p>
|
|
|
|
<p class="Section1">The text-to-speech engine synthesizes the
|
|
glottal pulse from human vocal cords and applies various filters to
|
|
simulate throat length, mouth cavity, lip shape, and tongue
|
|
position.</p>
|
|
|
|
<p class="Section1"><a id="TTS"
|
|
name="TTS"><strong>text-to-speech</strong></a></p>
|
|
|
|
<p class="Section1">Technologies for converting textual (ASCII)
|
|
information into synthetic speech output. Used in voice-processing
|
|
applications requiring production of broad, unrelated, and
|
|
unpredictable vocabularies, such as products in a catalog or names
|
|
and addresses. This technology is appropriate when system design
|
|
constraints prevent the more efficient use of speech concatenation
|
|
alone.</p>
|
|
|
|
<p class="Section1"><a id="timestamp" name="timestamp"><strong>time
|
|
stamping</strong></a></p>
|
|
|
|
<p class="Section1">Annotation of an event that characterize the
|
|
relative (with respect to an agreed upon reference) or absolute
|
|
time of occurrence of the event</p>
|
|
|
|
<p class="Section1"><strong>TTS</strong></p>
|
|
|
|
<p class="Section1">text-to-speech</p>
|
|
|
|
<p class="Section1"><strong>turn</strong></p>
|
|
|
|
<p class="Section1">Set of input collected from the user before
|
|
updating the output</p>
|
|
|
|
<p class="Section1"><strong>URI</strong></p>
|
|
|
|
<p class="Section1">Uniform Resource Identifier - <a
|
|
href="http://www.w3.org/Addressing/">http://www.w3.org/Addressing/</a></p>
|
|
|
|
<p class="Section1"><a id="userprofile"
|
|
name="userprofile"><strong>user profile</strong></a></p>
|
|
|
|
<p class="Section1">A particular subset of the delivery context
|
|
that describes the user including for example the identity,
|
|
personal information, personal preferences and usage
|
|
preferences.</p>
|
|
|
|
<p class="Section1"><a id="XMLEvent" name="XMLEvent"><strong>XML
|
|
Event</strong></a></p>
|
|
|
|
<p class="Section1">An XML Events module that provides XML
|
|
languages with the ability to uniformly integrate event listeners
|
|
and associated event handlers with DOM Level 2 event interfaces.
|
|
The result is to provide an interoperable way of associating
|
|
behaviors with document-level markup. For XML Event specification
|
|
please visit <a
|
|
href="http://www.w3.org/TR/2001/WD-xml-events-20011026/Overview.html#s_intro">
|
|
http://www.w3.org/TR/2001/WD-xml-events-20011026/Overview.html#s_intro</a></p>
|
|
|
|
<p class="Section1"><strong>XSL</strong></p>
|
|
|
|
<p class="Section1">Extensible Stylesheet Language</p>
|
|
|
|
<p class="Section1"><strong>XSLT</strong></p>
|
|
|
|
<p class="Section1">Extensible Stylesheet Language
|
|
Transformations</p>
|
|
</body>
|
|
</html>
|
|
|