You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
5946 lines
238 KiB
5946 lines
238 KiB
<?xml version="1.0" encoding="utf-8"?>
|
|
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
|
|
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
|
|
<html xmlns="http://www.w3.org/1999/xhtml">
|
|
<head>
|
|
<meta name="generator" content=
|
|
"HTML Tidy for Linux/x86 (vers 1 September 2005), see www.w3.org" />
|
|
<meta http-equiv="Content-Type" content= "text/html; charset=utf-8" />
|
|
<title>EMMA: Extensible MultiModal Annotation markup
|
|
language</title>
|
|
|
|
<style type="text/css">
|
|
/*<![CDATA[*/
|
|
span.term {
|
|
color: rgb(0,0,192);
|
|
font-style: italic
|
|
}
|
|
blockquote { margin-left: 4% }
|
|
.toc { list-style-type: none; marker-offset: 1em }
|
|
.tocline { list-style-type: none }
|
|
ul.toc a { text-decoration: none }
|
|
.fig { text-align: center }
|
|
pre { font-family: monospace }
|
|
pre.example {
|
|
margin-left: 0;
|
|
padding: 0.5em;
|
|
width: 98%;
|
|
font-family: monospace;
|
|
white-space: pre;
|
|
border: none;
|
|
font-size: 95%;
|
|
background-color: rgb(230,230,255);
|
|
}
|
|
.note { color: red }
|
|
.new { color: green;}
|
|
.old { text-decoration: line-through }
|
|
.newer { text-decoration: underline }
|
|
.change { color: red }
|
|
.changeTable { color: orange }
|
|
.remove { text-decoration: line-through }
|
|
div.issues {
|
|
border-width: thin;
|
|
border-style: solid;
|
|
border-color: maroon;
|
|
background-color: #FFEECC;
|
|
color: maroon;
|
|
width: 95%; padding: 0.5em; }
|
|
div.issues h4 { margin-top: 0 }
|
|
code {
|
|
font-weight:bold;
|
|
color: green;
|
|
font-family: monospace;
|
|
font-size: 110%;
|
|
}
|
|
.good {
|
|
border: green 2px solid;
|
|
font-weight: bold;
|
|
color: green;
|
|
margin: 1em 5% 1em 0px;
|
|
}
|
|
.bad {
|
|
border: red 2px solid;
|
|
font-weight: bold;
|
|
color: rgb(192,101,101);
|
|
margin: 1em 5% 1em 0px;
|
|
}
|
|
div.navbar { text-align: center }
|
|
div.contents {
|
|
border: medium none;
|
|
padding: 0.5em;
|
|
margin-right: 5%;
|
|
background-color: rgb(230,230,255);
|
|
}
|
|
table.exceptions {
|
|
background-color: rgb(255,255,153)
|
|
}
|
|
table.modes { font-size: 90% }
|
|
table.defn {
|
|
border-width: thin;
|
|
border-style: solid;
|
|
border-color: black;
|
|
color: black
|
|
}
|
|
table.defn th { background-color: rgb(220,220,255);
|
|
border-style: solid; border-color: black; border-width: thin }
|
|
table.defn td { background-color: rgb(230,230,255);
|
|
border-style: solid; border-color: black; border-width: thin }
|
|
.diff { color: rgb(128,0,0) }
|
|
.reqs { color: blue; font-style: italic }
|
|
.editorial { color: maroon; font-style: italic }
|
|
/*]]>*/
|
|
</style>
|
|
<link rel="stylesheet" type="text/css" href=
|
|
"http://www.w3.org/StyleSheets/TR/W3C-REC.css" />
|
|
</head>
|
|
<body>
|
|
<div class="head">
|
|
<div class="banner"><a href="http://www.w3.org/"><img alt="W3C"
|
|
src="http://www.w3.org/Icons/w3c_home" width="72" height=
|
|
"48" /></a></div>
|
|
<h1 class="notoc" id="s0">EMMA: Extensible MultiModal Annotation
|
|
markup language</h1>
|
|
<h2><a id="w3c-doctype" name="w3c-doctype"><acronym title=
|
|
"World Wide Web Consortium">W3C</acronym> Recommendation
|
|
10 February 2009</a></h2>
|
|
<dl>
|
|
<dt>This version:</dt>
|
|
<dd><a href=
|
|
"http://www.w3.org/TR/2009/REC-emma-20090210/">http://www.w3.org/TR/2009/REC-emma-20090210/</a></dd>
|
|
<dt>Latest version:</dt>
|
|
<dd><a href=
|
|
"http://www.w3.org/TR/emma/">http://www.w3.org/TR/emma/</a></dd>
|
|
<dt>Previous version:</dt>
|
|
<dd><a href=
|
|
"http://www.w3.org/TR/2008/PR-emma-20081215/">http://www.w3.org/TR/2008/PR-emma-20081215/</a></dd>
|
|
</dl>
|
|
<dl>
|
|
<dt>Editor:</dt>
|
|
<dd>Michael Johnston, AT&T</dd>
|
|
<dt>Authors:</dt>
|
|
<dd>Paolo Baggia, Loquendo</dd>
|
|
<dd>Daniel C. Burnett, Voxeo (formerly of Vocalocity and Nuance)</dd>
|
|
<dd>Jerry Carter, Nuance</dd>
|
|
<dd>Deborah A. Dahl, Invited Expert</dd>
|
|
<dd>Gerry McCobb, Openstream</dd>
|
|
<dd>Dave Raggett, (until 2007, while at W3C/Volantis and W3C/Canon)</dd>
|
|
</dl>
|
|
|
|
<p>Please refer to the
|
|
<a href="http://www.w3.org/2009/02/emma-errata.html">
|
|
<strong>errata</strong></a>
|
|
for this document, which may include some normative
|
|
corrections.</p>
|
|
|
|
<p>See also
|
|
<a href="http://www.w3.org/2003/03/Translations/byTechnology?technology=emma">
|
|
<strong>translations</strong></a>.</p>
|
|
|
|
<p class="copyright"><a href="http://www.w3.org/Consortium/Legal/ipr-notice#Copyright">Copyright</a> © 2009 <a href="http://www.w3.org/"><acronym title="World Wide Web Consortium">W3C</acronym></a><sup>®</sup> (<a href="http://www.csail.mit.edu/"><acronym title="Massachusetts Institute of Technology">MIT</acronym></a>, <a href="http://www.ercim.org/"><acronym title="European Research Consortium for Informatics and Mathematics">ERCIM</acronym></a>, <a href="http://www.keio.ac.jp/">Keio</a>), All Rights Reserved. W3C <a href="http://www.w3.org/Consortium/Legal/ipr-notice#Legal_Disclaimer">liability</a>, <a href="http://www.w3.org/Consortium/Legal/ipr-notice#W3C_Trademarks">trademark</a> and <a href="http://www.w3.org/Consortium/Legal/copyright-documents">document use</a> rules apply.</p>
|
|
|
|
<hr title="Separator for header" /></div>
|
|
<h2 class="notoc" id="abstract">Abstract</h2>
|
|
<p>The W3C Multimodal Interaction Working Group aims to develop
|
|
specifications to enable access to the Web using multimodal
|
|
interaction. This document is part of a set of specifications for
|
|
multimodal systems, and provides details of an XML markup language
|
|
for containing and annotating the interpretation of user input.
|
|
Examples of interpretation of user input are a transcription into
|
|
words of a raw signal, for instance derived from speech, pen or
|
|
keystroke input, a set of attribute/value pairs describing their
|
|
meaning, or a set of attribute/value pairs describing a gesture.
|
|
The interpretation of the user's input is expected to be generated
|
|
by signal interpretation processes, such as speech and ink
|
|
recognition, semantic interpreters, and other types of processors
|
|
for use by components that act on the user's inputs such as
|
|
interaction managers.</p>
|
|
<h2 id="status">Status of this Document</h2>
|
|
<p><em>This section describes the status of this document at the
|
|
time of its publication. Other documents may supersede this
|
|
document. A list of current W3C publications and the latest
|
|
revision of this technical report can be found in the <a href=
|
|
"http://www.w3.org/TR/">W3C technical reports index</a> at
|
|
http://www.w3.org/TR/.</em></p>
|
|
|
|
<p>This is the
|
|
<a href="http://www.w3.org/2005/10/Process-20051014/tr.html#RecsW3C">
|
|
Recommendation
|
|
</a>
|
|
of "EMMA: Extensible MultiModal Annotation markup language".
|
|
|
|
It has been produced by the
|
|
<a href="http://www.w3.org/2002/mmi/">Multimodal Interaction Working Group</a>,
|
|
which is part of the
|
|
<a href="http://www.w3.org/2002/mmi/Activity.html">Multimodal Interaction Activity</a>.
|
|
</p>
|
|
|
|
<p>Comments are welcome on <a href="mailto:www-multimodal@w3.org">www-multimodal@w3.org</a>
|
|
(<a href="http://lists.w3.org/Archives/Public/www-multimodal/">archive</a>).
|
|
|
|
See <a href="http://www.w3.org/Mail/">W3C mailing list and archive
|
|
usage guidelines</a>.</p>
|
|
|
|
<p>The design of EMMA has been widely reviewed
|
|
(see the <a href="http://www.w3.org/TR/2008/PR-emma-20081215/emma-disp.html">
|
|
disposition of comments</a>)
|
|
and satisfies the Working Group's technical requirements.
|
|
|
|
A list of implementations is included in the
|
|
<a href="http://www.w3.org/2002/mmi/2008/emma-ir/">
|
|
EMMA Implementation Report</a>.
|
|
|
|
The Working Group made a few editorial changes to the
|
|
<a href="http://www.w3.org/TR/2008/PR-emma-20081215/">
|
|
15 December 2008 Proposed Recommendation</a>.
|
|
Changes from the Proposed Recommendation can be found in
|
|
<a href="#appF">Appendix F</a>.
|
|
</p>
|
|
|
|
|
|
<p>This document has been reviewed by W3C Members, by software
|
|
developers, and by other W3C groups and interested parties, and is
|
|
endorsed by the Director as a W3C Recommendation. It is a stable
|
|
document and may be used as reference material or cited from another
|
|
document. W3C's role in making the Recommendation is to draw
|
|
attention to the specification and to promote its widespread
|
|
deployment. This enhances the functionality and interoperability of
|
|
the Web.</p>
|
|
|
|
<p>This specification describes markup for representing
|
|
interpretations of user input (speech, keystrokes, pen input etc.)
|
|
together with annotations for confidence scores, timestamps, input
|
|
medium etc., and forms part of the proposals for the <a href=
|
|
"http://www.w3.org/TR/mmi-framework/">W3C Multimodal Interaction
|
|
Framework</a>.</p>
|
|
|
|
<p>This document was produced by a group operating under the
|
|
<a href="http://www.w3.org/Consortium/Patent-Policy-20040205/">5
|
|
February 2004 W3C Patent Policy</a>. W3C maintains a <a rel=
|
|
"disclosure" href=
|
|
"http://www.w3.org/2004/01/pp-impl/34607/status">public list of any
|
|
patent disclosures</a> made in connection with the deliverables of
|
|
the group; that page also includes instructions for disclosing a
|
|
patent. An individual who has actual knowledge of a patent which
|
|
the individual believes contains <a href=
|
|
"http://www.w3.org/Consortium/Patent-Policy-20040205/#def-essential">
|
|
Essential Claim(s)</a> must disclose the information in accordance
|
|
with <a href=
|
|
"http://www.w3.org/Consortium/Patent-Policy-20040205/#sec-Disclosure">
|
|
section 6 of the W3C Patent Policy</a>.</p>
|
|
|
|
<p>The sections in the main body of this document are normative unless
|
|
otherwise specified. The appendices in this document are informative
|
|
unless otherwise indicated explicitly.</p>
|
|
|
|
|
|
<h2 class="notoc" id="conv">Conventions of this Document</h2>
|
|
<p>All sections in this specification are normative, unless
|
|
otherwise indicated. The informative parts of this specification
|
|
are identified by "Informative" labels within sections.</p>
|
|
<p>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
|
|
NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL"
|
|
in this document are to be interpreted as described in [<a href=
|
|
"#ref-rfc2119">RFC2119</a>].</p>
|
|
<h2 class="notoc" id="toc">Table of Contents</h2>
|
|
<ul class="tocline">
|
|
<li>1. <a href="#s1">Introduction</a>
|
|
<ul class="tocline">
|
|
<li>1.1 <a href="#s1.1">Uses of EMMA</a></li>
|
|
<li>1.2 <a href="#s1.2">Terminology</a></li>
|
|
</ul>
|
|
</li>
|
|
<li>2. <a href="#s2">Structure of EMMA documents</a>
|
|
<ul class="tocline">
|
|
<li>2.<span>1</span> <a href="#s2.1">Data model</a></li>
|
|
<li>2.<span>2</span> <a href="#s2.2">EMMA namespace
|
|
prefixes</a></li>
|
|
</ul>
|
|
</li>
|
|
<li>3. <a href="#s3">EMMA structural elements</a>
|
|
<ul class="tocline">
|
|
<li>3.1 <a href="#s3.1">Root element:
|
|
<code>emma:emma</code></a></li>
|
|
<li>3.2 <a href="#s3.2">Interpretation element:
|
|
<code>emma:interpretation</code></a></li>
|
|
<li>3.3 <a href="#s3.3">Container elements</a>
|
|
<ul class="tocline">
|
|
<li>3.3.1 <a href="#s3.3.1"><code>emma:one-of</code>
|
|
element</a></li>
|
|
<li>3.3.2 <a href="#s3.3.2"><code>emma:group</code> element</a>
|
|
<ul class="tocline">
|
|
<li>3.3.2.1 <a href="#s3.3.2.1">Indirect grouping criteria:
|
|
<code>emma:group-info</code> element</a></li>
|
|
</ul>
|
|
</li>
|
|
<li>3.3.3 <a href="#s3.3.3"><code>emma:sequence</code>
|
|
element</a></li>
|
|
</ul>
|
|
</li>
|
|
<li>3.4 <a href="#s3.4">Lattice element</a>
|
|
<ul class="tocline">
|
|
<li>3.4.1 <a href="#s3.4.1">Lattice markup:
|
|
<code>emma:lattice</code>, <code>emma:arc</code>,
|
|
<code>emma:node</code> elements</a></li>
|
|
<li>3.4.2 <a href="#s3.4.2">Annotations on lattices</a></li>
|
|
<li>3.4.3 <a href="#s3.4.3">Relative timestamps on
|
|
lattices</a></li>
|
|
</ul>
|
|
</li>
|
|
<li>3.5 <a href="#s3.5">Literal semantics:
|
|
<code>emma:literal</code> element</a></li>
|
|
</ul>
|
|
</li>
|
|
<li>4 <a href="#s4">EMMA annotations</a>
|
|
<ul class="tocline">
|
|
<li>4.1 <a href="#s4.1">EMMA annotation elements</a>
|
|
<ul class="tocline">
|
|
<li>4.1.1 <a href="#s4.1.1">Data model: <code>emma:model</code>
|
|
element</a></li>
|
|
<li>4.1.2 <a href="#s4.1.2">Interpretation derivation:
|
|
<code>emma:derived-from</code> element and
|
|
<code>emma:derivation</code> element</a></li>
|
|
<li>4.1.3 <a href="#s4.1.3">Reference to grammar used:
|
|
<code>emma:grammar</code> element</a></li>
|
|
<li>4.1.4 <a href="#s4.1.4">Extensibility to application/vendor
|
|
specific annotations: <code>emma:info</code> element</a></li>
|
|
<li>4.1.5 <a href="#s4.1.5">Endpoint reference:
|
|
<code>emma:endpoint-info</code> element and
|
|
<code>emma:endpoint</code> element</a></li>
|
|
</ul>
|
|
</li>
|
|
<li>4.2 <a href="#s4.2">EMMA annotation attributes</a>
|
|
<ul class="tocline">
|
|
<li>4.2.1 <a href="#s4.2.1">Tokens of input:
|
|
<code>emma:tokens</code> attribute</a></li>
|
|
<li>4.2.2 <a href="#s4.2.2">Reference to processing:
|
|
<code>emma:process</code> attribute</a></li>
|
|
<li>4.2.3 <a href="#s4.2.3">Lack of input:
|
|
<code>emma:no-input</code> attribute</a></li>
|
|
<li>4.2.4 <a href="#s4.2.4">Uninterpreted input:
|
|
<code>emma:uninterpreted</code> attribute</a></li>
|
|
<li>4.2.5 <a href="#s4.2.5">Human language of input:
|
|
<code>emma:lang</code> attribute</a></li>
|
|
<li>4.2.6 <a href="#s4.2.6">Reference to signal:
|
|
<code>emma:signal</code> <span>and
|
|
<code>emma:signal-size</code></span> attributes</a></li>
|
|
<li>4.2.7 <a href="#s4.2.7">Media type:
|
|
<code>emma:media-type</code> attribute</a></li>
|
|
<li>4.2.8 <a href="#s4.2.8">Confidence scores:
|
|
<code>emma:confidence</code> attribute</a></li>
|
|
<li>4.2.9 <a href="#s4.2.9">Input source: <code>emma:source</code>
|
|
attribute</a></li>
|
|
<li>4.2.10 <a href="#s4.2.10">Timestamps</a>
|
|
<ul class="tocline">
|
|
<li>4.2.10.1 <a href="#s4.2.10.1">Absolute timestamps:
|
|
<code>emma:start</code>, <code>emma:end</code> attributes</a></li>
|
|
<li>4.2.10.2 <a href="#s4.2.10.2">Relative timestamps:
|
|
<code>emma:time-ref-uri</code>,
|
|
<code>emma:time-ref-anchor-point</code>,
|
|
<code>emma:offset-to-start</code> attributes</a></li>
|
|
<li>4.2.10.3 <a href="#s4.2.10.3">Duration of input:
|
|
<code>emma:duration</code> attribute</a></li>
|
|
<li><span>4.2.10.4 <a href="#s4.2.10.4">Composite Input and
|
|
Relative Timestamps</a></span></li>
|
|
</ul>
|
|
</li>
|
|
<li>4.2.11 <a href="#s4.2.11">Medium, mode, and function of user
|
|
inputs: <code>emma:medium</code>, <code>emma:mode</code>,
|
|
<code>emma:function</code>, <code>emma:verbal</code>
|
|
attributes</a></li>
|
|
<li>4.2.12 <a href="#s4.2.12">Composite multimodality:
|
|
<code>emma:hook</code> attribute</a></li>
|
|
<li>4.2.13 <a href="#s4.2.13">Cost: <code>emma:cost</code>
|
|
attribute</a></li>
|
|
<li>4.2.14 <a href="#s4.2.14">Endpoint properties:
|
|
<code>emma:endpoint-role</code>,
|
|
<code>emma:endpoint-address</code>, <code>emma:port-type</code>,
|
|
<code>emma:port-num</code>, <code>emma:message-id</code>,
|
|
<code>emma:service-name</code>, <code>emma:endpoint-pair-ref</code>,
|
|
<code>emma:endpoint-info-ref</code>
|
|
attributes</a></li>
|
|
<li>4.2.15 <a href="#s4.2.15">Reference to
|
|
<code>emma:grammar</code> element: <code>emma:grammar-ref</code>
|
|
attribute</a></li>
|
|
<li>4.2.16 <a href="#s4.2.16">Reference to <code>emma:model</code>
|
|
element: <code>emma:model-ref</code> attribute</a></li>
|
|
<li>4.2.17 <a href="#s4.2.17">Dialog turns:
|
|
<code>emma:dialog-turn</code> attribute</a></li>
|
|
</ul>
|
|
</li>
|
|
<li>4.3 <a href="#s4.3">Scope of EMMA annotations</a></li>
|
|
</ul>
|
|
</li>
|
|
<li>5.<a href="#s5">Conformance</a>
|
|
<ul class="tocline">
|
|
<li>5.1 <a href="#s5.1">Conforming EMMA Documents</a></li>
|
|
<li>5.2 <a href="#s5.2">Using EMMA with other Namespaces</a></li>
|
|
<li>5.3 <a href="#s5.3">Conforming EMMA Processors</a></li>
|
|
</ul>
|
|
</li>
|
|
<li><a href="#appendices">Appendices</a>
|
|
<ul class="tocline">
|
|
<li>Appendix A. <a href="#appA">XML and <span>RELAX NG</span>
|
|
schemata</a> <span>(Normative)</span></li>
|
|
<li>Appendix B. <a href="#appB">MIME type</a>
|
|
<span>(Normative)</span>
|
|
<ul>
|
|
<li><span>B.1 <a href="#media-type-registration">Registration of
|
|
MIME media type application/emma+xml</a></span></li>
|
|
</ul>
|
|
</li>
|
|
<li>Appendix C. <a href="#appC"><code>emma:hook</code> and SRGS</a>
|
|
<span>(Informative)</span></li>
|
|
<li>Appendix D. <a href="#appD">EMMA event interface</a>
|
|
<span>(Informative)</span></li>
|
|
<li>Appendix E. <a href="#appE">References</a>
|
|
<ul>
|
|
<li>E.1 <a href="#appE1">Normative references</a></li>
|
|
<li>E.2 <a href="#appE2"><span>Informative</span>
|
|
references</a></li>
|
|
</ul>
|
|
</li>
|
|
<li>Appendix F. <a href="#appF">Changes since last draft</a>
|
|
<span>(Informative)</span></li>
|
|
<li>Appendix G. <a href="#appG">Acknowledgements</a>
|
|
<span>(Informative)</span></li>
|
|
</ul>
|
|
</li>
|
|
</ul>
|
|
<h2 id="s1">1. Introduction</h2>
|
|
<p>This section is <span>I</span>nformative.</p>
|
|
<p>This document presents an XML specification for EMMA, an
|
|
Extensible MultiModal Annotation markup language, responding to the
|
|
requirements documented in <span>Requirements for EMMA</span>
|
|
[<a href="#EMMAreqs">EMMA <span>Requirements</span></a>]. This
|
|
markup language is intended for use by systems that provide
|
|
semantic interpretations for a variety of inputs, including but not
|
|
necessarily limited to, speech, natural language text, GUI and ink
|
|
input.</p>
|
|
<p>It is expected that this markup will be used primarily as a
|
|
standard data interchange format between the components of a
|
|
multimodal system; in particular, it will normally be automatically
|
|
generated by interpretation components to represent the semantics
|
|
of users' inputs, not directly authored by developers.</p>
|
|
<p>The language is focused on annotating single inputs from users,
|
|
which may be either from a single mode or a composite input
|
|
combining information from multiple modes, as opposed to
|
|
information that might have been collected over multiple turns of a
|
|
dialog. The language provides a set of elements and attributes that
|
|
are focused on enabling annotations on user inputs and
|
|
interpretations of those inputs.</p>
|
|
<p>An EMMA document can be considered to hold three types of
|
|
data:</p>
|
|
<ul>
|
|
<li>
|
|
<p><b>instance data</b></p>
|
|
<p>Application-specific markup corresponding to input information
|
|
which is meaningful to the consumer of an EMMA document. Instances
|
|
are application-specific and built by input processors at runtime.
|
|
Given that utterances may be ambiguous with respect to input
|
|
values, an EMMA document may hold more than one instance.</p>
|
|
</li>
|
|
<li>
|
|
<p><b>data model</b></p>
|
|
<p>Constraints on structure and content of an instance. The data
|
|
model is typically pre-established by an application, and may be
|
|
implicit, that is, unspecified.</p>
|
|
</li>
|
|
<li>
|
|
<p><b>metadata</b></p>
|
|
<p>Annotations associated with the data contained in the instance.
|
|
Annotation values are added by input processors at runtime.</p>
|
|
</li>
|
|
</ul>
|
|
<p>Given the assumptions above about the nature of data represented
|
|
in an EMMA document, the following general principles apply to the
|
|
design of EMMA:</p>
|
|
<ul>
|
|
<li>The main prescriptive content of the EMMA specification will
|
|
consist of metadata: EMMA will provide a means to express the
|
|
metadata annotations which require standardization. (Notice,
|
|
however, that such annotations may express the relationship among
|
|
all the types of data within an EMMA document.)</li>
|
|
<li>The instance and its data model are assumed to be specified in
|
|
XML, but EMMA will remain agnostic to the XML format used to
|
|
express these. (The instance XML is assumed to be sufficiently
|
|
structured to enable the association of annotative data.)</li>
|
|
<li>The extensibility of EMMA lies in the ability for additional
|
|
kinds of metadata to be included in application specific
|
|
vocabularies. EMMA itself can be extended with application and
|
|
vendor specific annotations contained within the
|
|
<code>emma:info</code> element <span>(<a href="#s4.1.4">Section
|
|
4.1.4</a>)</span>.</li>
|
|
</ul>
|
|
<p>The annotations of EMMA should be considered 'normative' in the
|
|
sense that if an EMMA component produces annotations as described
|
|
in <a href="#s3">Section 3</a> <span>and <a href="#s4">Section
|
|
4</a></span>, these annotations must be represented using the EMMA
|
|
syntax. The Multimodal Interaction Working Group may address in
|
|
later drafts the issues of modularization and profiling; that is,
|
|
which sets of annotations are to be supported by which classes of
|
|
EMMA component.</p>
|
|
<h3 id="s1.1">1.1 Uses of EMMA</h3>
|
|
<p>The general purpose of EMMA is to represent information
|
|
automatically extracted from a user's input by an interpretation
|
|
component, where input is to be taken in the general sense of a
|
|
meaningful user input in any modality supported by the platform.
|
|
The reader should refer to the sample architecture in <span>W3C
|
|
Multimodal Interaction Framework</span> <a href="#MMIF">[<span>MMI
|
|
Framework</span>]</a>, which shows EMMA conveying content between
|
|
user input modality components and an interaction manager.</p>
|
|
<p>Components that generate EMMA markup:</p>
|
|
<ol>
|
|
<li>Speech recognizers</li>
|
|
<li>Handwriting recognizers</li>
|
|
<li>Natural language understanding engines</li>
|
|
<li>Other input media interpreters (e.g. DTMF, pointing,
|
|
keyboard)</li>
|
|
<li>Multimodal integration component</li>
|
|
</ol>
|
|
<p>Components that use EMMA include:</p>
|
|
<ol>
|
|
<li>Interaction manager</li>
|
|
<li>Multimodal integration component</li>
|
|
</ol>
|
|
<p>Although not a primary goal of EMMA, a platform may also choose
|
|
to use this general format as the basis of a general semantic
|
|
result that is carried along and filled out during each stage of
|
|
processing. In addition, future systems may also potentially make
|
|
use of this markup to convey abstract semantic content to be
|
|
rendered into natural language by a natural language generation
|
|
component.</p>
|
|
<h3 id="s1.2">1.2 Terminology</h3>
|
|
<dl>
|
|
<dt id="anchor-point">anchor point</dt>
|
|
<dd>When referencing an input interval with
|
|
<code>emma:time-ref-uri</code>,
|
|
<code>emma:time-ref-anchor-point</code> allows you to specify
|
|
whether the referenced anchor is the start or end of the
|
|
interval.</dd>
|
|
<dt id="annotation">annotation</dt>
|
|
<dd>Information about the interpreted input, for example,
|
|
timestamps, confidence scores, links to raw input, etc.</dd>
|
|
<dt id="composite-input">composite input</dt>
|
|
<dd>An input formed from several pieces, often in different modes,
|
|
for example, a combination of speech and pen gesture, such as
|
|
saying "zoom in here" and circling a region on a map.</dd>
|
|
<dt id="confidence">confidence</dt>
|
|
<dd>A numerical score describing the degree of certainty in a
|
|
particular interpretation of user input.</dd>
|
|
<dt id="data-model">data model</dt>
|
|
<dd>For EMMA, a data model defines a set of constraints on possible
|
|
interpretations of user input.</dd>
|
|
<dt id="derivation">derivation</dt>
|
|
<dd>Interpretations of user input are said to be derived from that
|
|
input, and higher level interpretations may be derived from lower
|
|
level ones. EMMA allows you to reference the user input or
|
|
interpretation a given interpretation was derived from, see
|
|
<a href="#semantic-interpretation"><em>semantic
|
|
interpretation</em></a>.</dd>
|
|
<dt id="dialog">dialog</dt>
|
|
<dd>For EMMA, dialog can be considered as a sequence of
|
|
interactions between
|
|
a user and the application.</dd>
|
|
<dt id="endpoint">endpoint</dt>
|
|
<dd>In EMMA, this refers to a network location which is the source
|
|
or recipient of an EMMA document. It should be noted that the usage
|
|
of the term "endpoint" in this context is different from the way
|
|
that the term is used in speech processing, where it refers to the
|
|
end of a speech input.</dd>
|
|
<dt id="gestures">gestures</dt>
|
|
<dd>In multimodal applications gestures are communicative acts made
|
|
by the user or application. An example is circling an area on a map
|
|
to indicate a region of interest. Users may be able to gesture with
|
|
a pen, keystrokes, hand movements, head
|
|
movements, or sound. Gestures often form part of <a href=
|
|
"#composite-input"><em>composite input</em></a>. Application
|
|
gestures are typically animations and/or sound effects.</dd>
|
|
<dt id="grammar">grammar</dt>
|
|
<dd>A set of rules that describe a sequence of tokens expected in a
|
|
given input. These can be used by speech and handwriting
|
|
recognizers to increase recognition accuracy.</dd>
|
|
<dt id="handwriting-recognition">handwriting recognition</dt>
|
|
<dd>The process of converting pen strokes into text.</dd>
|
|
<dt id="ink-recognition">ink recognition</dt>
|
|
<dd>This includes the recognition of handwriting and pen
|
|
gestures.</dd>
|
|
<dt id="input-cost">input cost</dt>
|
|
<dd>In EMMA, this refers to a numerical measure indicating the
|
|
weight or processing cost associated with a user's input or part of
|
|
their input.</dd>
|
|
<dt id="input-device">input device</dt>
|
|
<dd>The device proving a particular input, for example, a
|
|
microphone, a pen, a mouse, a camera, or a keyboard.</dd>
|
|
<dt id="input-function">input function</dt>
|
|
<dd>In EMMA, this refers to <span>the</span> use a particular input
|
|
is serving, for example, as part of a recording or transcription,
|
|
as part of a dialog, or as a means to verify the user's
|
|
identity.</dd>
|
|
<dt id="input-medium">input medium</dt>
|
|
<dd>Whether the input is acoustic, visual, or tactile, for
|
|
instance, a spoken utterance is an example of an aural input, a
|
|
hand gesture as seen by a camera is an example of a visual input,
|
|
pointing with a mouse or pen is an example of a tactile input.</dd>
|
|
<dt id="input-mode">input mode</dt>
|
|
<dd>This distinguishes a particular means of providing an input
|
|
within a general input medium, for example, speech, DTMF, ink, key
|
|
strokes, video, photograph, etc.</dd>
|
|
<dt id="input-source">input source</dt>
|
|
<dd>This is the device that provided the input, for example a
|
|
particular microphone or camera. EMMA allows you to identify these
|
|
with a URI.</dd>
|
|
<dt id="input-tokens">input tokens</dt>
|
|
<dd>In EMMA, this refers to a sequence of characters, words or
|
|
other discrete units of input.</dd>
|
|
<dt id="instance-data">instance data</dt>
|
|
<dd>A representation in XML of an interpretation of user
|
|
input.</dd>
|
|
<dt id="interaction-manager">interaction manager</dt>
|
|
<dd>A processor that determines how an application interacts with a
|
|
user. This can be at multiple levels of abstraction, for example,
|
|
at a detailed level, determining what prompts to present to the
|
|
user and what actions to take in response to user input, versus a
|
|
higher level treatment in terms of goals and tasks for achieving
|
|
those goals. Interaction managers are frequently event driven.</dd>
|
|
<dt id="interpretation">interpretation</dt>
|
|
<dd>In EMMA, an interpretation of user input refers to information
|
|
derived from the user input that is meaningful to the
|
|
application.</dd>
|
|
<dt id="keystroke-input">keystroke input</dt>
|
|
<dd>Input provided by the user pressing on a sequence of keys
|
|
(buttons), such as a computer keyboard or keypad.</dd>
|
|
<dt id="lattice">lattice</dt>
|
|
<dd>A set of nodes interconnected with directed arcs such that by
|
|
following an arc, you can never find yourself back at a node you
|
|
have already visited (i.e. a directed acyclic graph). Lattices
|
|
provide a flexible means to represent the results of speech and
|
|
handwriting recognition, in terms of arcs representing words or
|
|
character sequences. Different arcs from the same node represent
|
|
different local hypotheses as to what the user said or wrote.</dd>
|
|
<dt id="metadata">metadata</dt>
|
|
<dd>Information describing another set of data, for instance, a
|
|
library catalog card with information on the author, title and
|
|
location of a book. EMMA is designed to support input processors in
|
|
providing metadata for interpretations of user input.</dd>
|
|
<dt id="multimodal-integration">multimodal integration</dt>
|
|
<dd>The process of combining inputs from different modes to create
|
|
an interpretation of composite input. This is also sometimes
|
|
referred to as <em>multimodal fusion</em>.</dd>
|
|
<dt id="multimodal-interaction">multimodal interaction</dt>
|
|
<dd>The means for a user to interact with an application using more
|
|
than one mode of interaction, for instance, offering the user the
|
|
choice of speaking or typing, or in some cases, allowing the user
|
|
to provide a composite input involving multiple modes.</dd>
|
|
<dt id="natural-language-understanding">natural language
|
|
understanding</dt>
|
|
<dd>The process of interpreting text in terms that are useful for
|
|
an application.</dd>
|
|
<dt id="N-best-list">N-best list</dt>
|
|
<dd>An N-best list is a list of the most likely hypotheses for what
|
|
the user actually said or wrote, where N stands for an integral
|
|
number such as 5 for the 5 most likely hypotheses.</dd>
|
|
<dt id="raw-signal">raw signal</dt>
|
|
<dd>An uninterpreted input, such as an audio waveform captured from
|
|
a microphone.</dd>
|
|
<dt id="semantic-interpretation">semantic interpretation</dt>
|
|
<dd>A normalized representation of the meaning of a user input, for
|
|
instance, mapping the speech for "San Francisco" into the airport
|
|
code "SFO".</dd>
|
|
<dt id="semantic-processor">semantic processor</dt>
|
|
<dd>In EMMA, this refers to systems that can derive interpretations
|
|
of user input, for instance, mapping the speech for "San Francisco"
|
|
into the airport code "SFO".</dd>
|
|
<dt id="signal-interpretation">signal interpretation</dt>
|
|
<dd>The process of mapping a discrete or continuous signal into a
|
|
symbolic representation that can be used by an application, for
|
|
instance, transforming the audio waveform corresponding to someone
|
|
saying "2005" into the number 2005.</dd>
|
|
<dt id="speech-recognition">speech recognition</dt>
|
|
<dd>The process of determining the textual transcription of a piece
|
|
of speech.</dd>
|
|
<dt id="speech-synthesis">speech synthesis</dt>
|
|
<dd>The process of rendering a piece of text into the corresponding
|
|
speech, i.e. synthesi<span>z</span>ing speech from text.</dd>
|
|
<dt id="text-to-speech">text to speech</dt>
|
|
<dd>The process of rendering a piece of text into the corresponding
|
|
speech.</dd>
|
|
<dt id="time-stamp">time stamp</dt>
|
|
<dd>The time that a particular input or part of an input began or
|
|
ended.</dd>
|
|
<dt id="term-uri">URI: Uniform Resource Identifier</dt>
|
|
<dd>A URI is a unifying syntax for the expression of names and
|
|
addresses of objects on the network as used in the World Wide Web.
|
|
<span>Within this specification, the term URI refers to a Universal
|
|
Resource Identifier as defined in [<a href="#RFC3986">RFC3986</a>]
|
|
and extended in [<a href="#RFC3987">RFC3987</a>] with the new name
|
|
IRI. The term URI has been retained in preference to IRI to avoid
|
|
introducing new names for concepts such as "Base URI" that are
|
|
defined or referenced across the whole family of XML
|
|
specifications</span>. A URI is defined as any legal
|
|
<code>anyURI</code> primitive as defined in XML Schema Part 2:
|
|
Datatypes Second Edition Section 3.2.17 [<a href=
|
|
"#XSD2">SCHEMA2</a>].</dd>
|
|
<dt id="user-input">user input</dt>
|
|
<dd>An input provided by a user as opposed to something generated
|
|
automatically.</dd>
|
|
</dl>
|
|
<h2 id="s2">2. Structure of EMMA documents</h2>
|
|
<p>This section is <span>I</span>nformative.</p>
|
|
<p>As noted above, the main components of an interpreted user input
|
|
in EMMA are the instance data, an optional data model, and the
|
|
metadata annotations that may be applied to that input. The
|
|
realization of these components in EMMA is as follows:</p>
|
|
<ul>
|
|
<li><b>instance data</b> is contained within an EMMA
|
|
<i>interpretation</i></li>
|
|
<li>the <b>data model</b> is optionally specified as an annotation
|
|
of that instance</li>
|
|
<li>EMMA <b>annotations</b> may be applied at different levels of
|
|
an EMMA document.</li>
|
|
</ul>
|
|
<p>An EMMA <i>interpretation</i> is the primary unit for holding
|
|
user input as interpreted by an EMMA processor. As will be seen
|
|
below, multiple interpretations of a single input are possible.</p>
|
|
<p>EMMA provides a simple structural syntax for the organization of
|
|
interpretations and instances, and an annotative syntax to apply
|
|
the annotation to the input data at different levels.</p>
|
|
<p>An outline of the structural syntax and annotations found in
|
|
EMMA documents is as follows. A fuller definition may be found in
|
|
the description of individual elements and attributes in <a href=
|
|
"#s3"><span>S</span>ection 3</a> and <a href=
|
|
"#s4"><span>S</span>ection 4</a>.</p>
|
|
<ul>
|
|
<li><b><a href="#s3">EMMA <span>s</span>tructural
|
|
<span>e</span>lements</a></b> (<a href="#s3">Section 3</a>)
|
|
<ul>
|
|
<li><b><a href="#s3.1">Root element</a></b>: The root node of an
|
|
EMMA document, the <code>emma:emma</code> element, holds EMMA
|
|
version and namespace information, and provides a container for one
|
|
or more of the following interpretation and container elements
|
|
(<a href="#s3.1">Section 3.1</a>)</li>
|
|
<li><b><a href="#s3.2">Interpretation element</a></b>: The
|
|
<code>emma:interpretation</code> element contains a given
|
|
interpretation of the input and holds application specific markup
|
|
(<a href="#s3.2">Section 3.2</a>)</li>
|
|
<li><b><a href="#s3.3">Container elements</a>:</b>
|
|
<ul>
|
|
<li><code>emma:one-of</code> is a container for one or more
|
|
interpretation elements or container elements and denotes that
|
|
these are mutually exclusive interpretations (<a href=
|
|
"#s3.3.1">Section 3.3.1</a>)</li>
|
|
<li><code>emma:group</code> is a general container for one or more
|
|
interpretation elements or container elements. It can be associated
|
|
with arbitrary grouping criteria (<a href="#s3.3.2">Section
|
|
3.3.2</a>).</li>
|
|
<li><code>emma:sequence</code> is a container for one or more
|
|
interpretation elements or container elements and denotes that
|
|
these are sequential in time (<a href="#s3.3.3">Section
|
|
3.3.3</a>).</li>
|
|
</ul>
|
|
</li>
|
|
<li><b><a href="#s3.4">Lattice element</a></b>: The
|
|
<code>emma:lattice</code> element is used to contain a series of
|
|
<code>emma:arc</code> and <code>emma:node</code> elements that
|
|
define a lattice of words, gestures, meanings or other symbols. The
|
|
<code>emma:lattice</code> element appears within the
|
|
<code>emma:interpretation</code> element (<a href="#s3.4">Section
|
|
3.4</a>)</li>
|
|
<li><b><a href="#s3.5">Literal element</a></b>: The
|
|
<code>emma:literal</code> element is used as a wrapper when the
|
|
application semantics is a string literal. (<a href="#s3.5">Section
|
|
3.5</a>)</li>
|
|
</ul>
|
|
</li>
|
|
<li><b><a href="#s4">EMMA annotations</a></b> (<a href=
|
|
"#s4">Section 4</a>)
|
|
<ul>
|
|
<li><b><a href="#s4.1">EMMA annotation elements</a></b>: These are
|
|
EMMA annotations such as <code>emma:derived-from</code>,
|
|
<code>emma:endpoint-info</code>, and <code>emma:info</code> which
|
|
are represented as elements so that they can occur more than once
|
|
within an element and can contain internal structure. (<a href=
|
|
"#s4.1">Section 4.1</a>)</li>
|
|
<li><b><a href="#s4.2">EMMA annotation attributes</a></b>: These
|
|
are EMMA annotations such as <code>emma:start</code>,
|
|
<code>emma:end</code> , <code>emma:confidence</code>, and
|
|
<code>emma:tokens</code> which are represented as attributes. They
|
|
can appear on <code>emma:interpretation</code> elements<span>.
|
|
S</span>ome can appear on container elements, lattice elements, and
|
|
elements in the application-specific markup. (<a href=
|
|
"#s4.2">Section 4.2</a>)</li>
|
|
</ul>
|
|
</li>
|
|
</ul>
|
|
<p>From the defined root node <code>emma:emma</code> the structure
|
|
of an EMMA document consists of a tree of EMMA container elements
|
|
(<code>emma:one-of</code>, <code>emma:sequence</code>,
|
|
<code>emma:group</code>) terminating in a number of interpretation
|
|
elements (<code>emma:interpretation</code>). The
|
|
<code>emma:interpretation</code> elements serve as wrappers for
|
|
either application namespace markup describing the interpretation
|
|
of the users input or an <code>emma:lattice</code> element or
|
|
<code>emma:literal</code> element . A single
|
|
<code>emma:interpretation</code> may also appear directly under the
|
|
root node.</p>
|
|
|
|
|
|
<p>
|
|
The EMMA elements
|
|
<code>emma:emma</code>,
|
|
<code>emma:interpretation</code>,
|
|
<code>emma:one-of</code>,
|
|
and <code>emma:literal</code>
|
|
and the EMMA attributes
|
|
<code>emma:no-input</code>,
|
|
<code>emma:uninterpreted</code>,
|
|
<code>emma:medium</code>,
|
|
and <code>emma:mode</code>
|
|
are required of all
|
|
implementations. The remaining elements and attributes are optional
|
|
and may be used in some implementations and not other depending on the
|
|
specific modalities and processing being represented.
|
|
</p>
|
|
|
|
|
|
<p>To illustrate this, here is an example <span class="new">of
|
|
an</span> EMMA document <span class="new">representing</span> input
|
|
to a flight reservation application. In this example there are two
|
|
speech recognition results and associated semantic representations
|
|
of the input. The system is uncertain whether the user meant
|
|
"flights from Boston to Denver" or "flights from Austin to Denver".
|
|
The annotations to be captured are timestamps and confidence scores
|
|
for the two inputs.</p>
|
|
<p>Example:</p>
|
|
<pre class="example">
|
|
<emma:emma version="1.0"
|
|
xmlns:emma="http://www.w3.org/2003/04/emma"
|
|
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
|
xsi:schemaLocation="http://www.w3.org/2003/04/emma
|
|
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
|
|
xmlns="http://www.example.com/example">
|
|
<emma:one-of id="r1" emma:start="1087995961542" emma:end="1087995963542"
|
|
<span> emma:medium="acoustic" emma:mode="voice"</span>>
|
|
<emma:interpretation id="int1" emma:confidence="0.75"
|
|
emma:tokens="flights from boston to denver">
|
|
<origin>Boston</origin>
|
|
<destination>Denver</destination>
|
|
</emma:interpretation>
|
|
|
|
<emma:interpretation id="int2" emma:confidence="0.68"
|
|
emma:tokens="flights from austin to denver">
|
|
<origin>Austin</origin>
|
|
<destination>Denver</destination>
|
|
</emma:interpretation>
|
|
</emma:one-of>
|
|
</emma:emma>
|
|
</pre>
|
|
<p>Attributes on the root <code>emma:emma</code> element indicate
|
|
the version and namespace. The <code>emma:emma</code> element
|
|
contains an <code>emma:one-of</code> element which contains a
|
|
disjunctive list of possible interpretations of the input. The
|
|
actual semantic representation of each interpretation is within the
|
|
application namespace. In the example here the application specific
|
|
semantics involves elements <code>origin</code> and
|
|
<code>destination</code> indicating the origin and destination
|
|
cities for looking up a flight. The timestamp is the same for both
|
|
interpretations and it is annotated using values in milliseconds in
|
|
the <code>emma:start</code> and <code>emma:end</code> attributes on
|
|
the <code>emma:one-of</code>. The confidence scores and tokens
|
|
associated with each of the inputs are annotated using the EMMA
|
|
annotation attributes <code>emma:confidence</code> and
|
|
<code>emma:tokens</code> on each of the
|
|
<code>emma:interpretation</code> elements.</p>
|
|
<h3 id="s2.1">2.<span>1</span> Data model</h3>
|
|
<p>An EMMA data model expresses the constraints on the structure
|
|
and content of instance data, for the purposes of validation. As
|
|
such, the data model may be considered as a particular kind of
|
|
annotation (although, unlike other EMMA annotations, it is not a
|
|
feature pertaining <span>to</span> a specific user input at a
|
|
specific moment in time, it is rather a static and, by its very
|
|
definition, application-specific structure). <span>The</span>
|
|
specification of <span>a data model</span> in EMMA is optional.</p>
|
|
<p>Since Web applications today use different formats to specify
|
|
data models, e.g. <span>XML Schema Part 1: Structures Second
|
|
Edition</span> [<a href="#XSD1">XML Schema
|
|
<span>Structures</span></a>], XForms <span>1.0 (Second
|
|
Edition)</span> [<a href="#XFORMS">XFORMS</a>], <span>RELAX NG
|
|
Specification</span> [<a href="#RELAXNG">RELAX-NG</a>], etc., EMMA
|
|
itself is agnostic to the format of data model used.</p>
|
|
<p>Data model definition and reference is defined in <a href=
|
|
"#s4.1.1">Section 4.1.1</a>.</p>
|
|
<h3 id="s2.2">2.<span>2</span> EMMA namespace prefixes</h3>
|
|
<p>An EMMA attribute is qualified with the EMMA namespace prefix if
|
|
the attribute can also be used as an in-line annotation on elements
|
|
in the application's namespace. Most of the EMMA annotation
|
|
attributes in <a href="#s4.2">Section 4.2</a> are in this category.
|
|
An EMMA attribute is not qualified with the EMMA namespace prefix
|
|
if the attribute only appears on an EMMA element. This rule ensures
|
|
consistent usage of the attributes across all examples.</p>
|
|
<p>Attributes from other namespaces are permissible on all EMMA
|
|
elements. As an example <code>xml:lang</code> may be used to
|
|
annotate the human language of character data content.</p>
|
|
<h2 id="s3">3. EMMA structural elements</h2>
|
|
<p>This section defines elements in the EMMA namespace which
|
|
provide the structural syntax of EMMA documents.</p>
|
|
<h3 id="s3.1">3.1 Root element: <code>emma:emma</code></h3>
|
|
<table class="defn" summary="property definition" width="98%"
|
|
cellpadding="5" cellspacing="0">
|
|
<tbody>
|
|
<tr>
|
|
<th>Annotation</th>
|
|
<th>emma:emma</th>
|
|
</tr>
|
|
<tr>
|
|
<th>Definition</th>
|
|
<td>The root element of an EMMA document.</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Children</th>
|
|
<td>The <code>emma:emma</code> element MUST immediately contain a
|
|
single <code>emma:interpretation</code> element or EMMA container
|
|
element: <code>emma:one-of</code>, <code>emma:group</code>,
|
|
<code>emma:sequence</code>. It MAY also contain an optional single
|
|
<code>emma:derivation</code> element and an optional single
|
|
<code>emma:info</code> annotation element. It MAY also contain
|
|
multiple optional <code>emma:grammar</code> annotation elements,
|
|
<code>emma:model</code> annotation elements, and
|
|
<code>emma:endpoint-info</code> annotation elements.</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Attributes</th>
|
|
<td>
|
|
<ul>
|
|
<li><b>Required</b>:
|
|
<ul>
|
|
<li><code>version</code>: the version of EMMA used for the
|
|
interpretation(s). Interpretations expressed using this
|
|
specification MUST use <code>1.0</code> for the value.</li>
|
|
<li>Namespace declaration for EMMA, see below.</li>
|
|
</ul>
|
|
</li>
|
|
<li><b>Optional</b>:
|
|
<ul>
|
|
<li>any other namespace declarations for application specific
|
|
namespaces.</li>
|
|
</ul>
|
|
</li>
|
|
</ul>
|
|
</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Applies to</th>
|
|
<td>None</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
<p>The root element of an EMMA document is named
|
|
<code>emma:emma</code>. It holds a single
|
|
<code>emma:interpretation</code> or EMMA container element
|
|
(<code>emma:one-of</code>, <code>emma:sequence</code>,
|
|
<code>emma:group</code>). It MAY also contain a single
|
|
<code>emma:derivation</code> element containing earlier stages of
|
|
the processing of the input (See <a href="#s4.1.2">Section
|
|
4.1.2</a>). It MAY also contain an optional single annotation
|
|
element: <code>emma:info</code> and multiple optional
|
|
<code>emma:grammar</code>, <code>emma:model</code>, and
|
|
<code>emma:endpoint-info</code> elements.</p>
|
|
<p>It MAY hold attributes for information pertaining to EMMA
|
|
itself, along with any namespaces which are declared for the entire
|
|
document, and any other EMMA annotative data. The
|
|
<code>emma:emma</code> element and other elements and attributes
|
|
defined in this specification belong to the XML namespace
|
|
identified by the URI "http://www.w3.org/2003/04/emma". In the
|
|
examples, the EMMA namespace is generally declared using the
|
|
attribute <code>xmlns:emma</code> on the root
|
|
<code>emma:emma</code> element. EMMA processors MUST support the
|
|
full range of ways of declaring XML namespaces as defined by the
|
|
<span>Namespaces in XML 1.1 (Second Edition)</span> [<a href=
|
|
"#XMLNS">XMLNS</a>]. Application markup MAY be declared in an
|
|
explicit application namespace, or an undefined namespace
|
|
(equivalent to setting xmlns="").</p>
|
|
<p>For example:</p>
|
|
<pre class="example">
|
|
<emma:emma version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma">
|
|
....
|
|
</emma:emma>
|
|
</pre>
|
|
<p>or</p>
|
|
<pre class="example">
|
|
<emma version="1.0" xmlns="http://www.w3.org/2003/04/emma">
|
|
....
|
|
</emma>
|
|
</pre>
|
|
<h3 id="s3.2">3.2 Interpretation element:
|
|
<code>emma:interpretation</code></h3>
|
|
<table class="defn" summary="property definition" width="98%"
|
|
cellpadding="5" cellspacing="0">
|
|
<tbody>
|
|
<tr>
|
|
<th>Annotation</th>
|
|
<th>emma:interpretation</th>
|
|
</tr>
|
|
<tr>
|
|
<th>Definition</th>
|
|
<td>The <code>emma:interpretation</code> element acts as a wrapper
|
|
for application instance data or lattices.</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Children</th>
|
|
<td>The <code>emma:interpretation</code> element MUST immediately
|
|
contain either application instance data, or a single
|
|
<code>emma:lattice</code> element, or a single
|
|
<code>emma:literal</code> element, or in the case of uninterpreted
|
|
input or no input <code>emma:interpretation</code>
|
|
<span>MUST</span> be empty. It MAY also contain <span>multiple
|
|
optional</span> <code>emma:derived-from</code>
|
|
element<span>s</span> and <span>an optional single</span>
|
|
<code>emma:info</code> <span>element</span>.</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Attributes</th>
|
|
<td>
|
|
<ul>
|
|
<li><b>Required</b>: Attribute <code>id</code> of type
|
|
<code>xsd:ID</code> that uniquely identifies the interpretation
|
|
within the EMMA document.</li>
|
|
<li><b>Optional</b>: The annotation attributes:
|
|
<code>emma:tokens</code>, <code>emma:process</code>,
|
|
<code>emma:no-input</code>, <code>emma:uninterpreted</code>,
|
|
<code>emma:lang</code>, <code>emma:signal</code>,
|
|
<code><span>emma:signal-size</span></code>,
|
|
<code>emma:media-type</code>, <code>emma:confidence</code>,
|
|
<code>emma:source</code>, <code>emma:start</code>,
|
|
<code>emma:end</code>, <code>emma:time-ref-uri</code>,
|
|
<code>emma:time-ref-anchor-point</code>,
|
|
<code>emma:offset-to-start</code>, <code>emma:duration</code>,
|
|
<code>emma:medium</code>, <code>emma:mode</code>,
|
|
<code>emma:function</code>, <code>emma:verbal</code>,
|
|
<code>emma:cost</code>, <code>emma:grammar-ref</code>,
|
|
<code>emma:endpoint-info-ref</code>, <code>emma:model-ref</code>,
|
|
<code>emma:dialog-turn</code>.</li>
|
|
</ul>
|
|
</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Applies to</th>
|
|
<td>The <code>emma:interpretation</code> element is legal only as a
|
|
child of <code>emma:emma</code>, <code>emma:group</code>,
|
|
<code>emma:one-of</code>, <code>emma:sequence</code>, or
|
|
<code>emma:derivation</code>.</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
<p>The <code>emma:interpretation</code> element holds a single
|
|
interpretation represented in application specific markup, or a
|
|
single <code>emma:lattice</code> element, or a single
|
|
<code>emma:literal</code> element.</p>
|
|
<p>The <code>emma:interpretation</code> element MUST be empty if it
|
|
is marked with <code>emma:no-input="true"</code> <span>(<a href=
|
|
"#s4.2.3">Section 4.2.3</a>)</span>. The
|
|
<code>emma:interpretation</code> element <span>MUST</span> be empty
|
|
if it has been annotated with
|
|
<code>emma:uninterpreted="true"</code> <span>(<a href=
|
|
"#s4.2.4">Section 4.2.4</a>)</span> or
|
|
<code>emma:function="recording"</code> <span>(<a href=
|
|
"#s4.2.11">Section 4.2.11</a>)</span>.</p>
|
|
<p>Attributes:</p>
|
|
<ol>
|
|
<li><b>id</b> a REQUIRED <code>xsd:ID</code> value that uniquely
|
|
identifies the interpretation within the EMMA document.</li>
|
|
</ol>
|
|
<pre class="example">
|
|
<emma:emma version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma"
|
|
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
|
xsi:schemaLocation="http://www.w3.org/2003/04/emma
|
|
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
|
|
xmlns="http://www.example.com/example">
|
|
<emma:interpretation id="r1" emma:medium="acoustic" emma:mode="voice">
|
|
...
|
|
</emma:interpretation>
|
|
</emma:emma>
|
|
</pre>
|
|
<p>While <code>emma:medium</code> and <code>emma:mode</code> are
|
|
optional on <code>emma:interpretation</code>, note that all EMMA
|
|
interpretations must be annotated for <code>emma:medium</code> and
|
|
<code>emma:mode</code>, so either these attributes must appear
|
|
directly on <code>emma:interpretation</code> or they must appear on
|
|
an ancestor <code>emma:one-of</code> node or they must appear on an
|
|
earlier stage of the derivation listed in
|
|
<code>emma:derivation</code>.</p>
|
|
<h3 id="s3.3">3.3 Container elements</h3>
|
|
<h3 id="s3.3.1">3.3.1 <code>emma:one-of</code> element</h3>
|
|
<table class="defn" summary="property definition" width="98%"
|
|
cellpadding="5" cellspacing="0">
|
|
<tbody>
|
|
<tr>
|
|
<th>Annotation</th>
|
|
<th>emma:one-of</th>
|
|
</tr>
|
|
<tr>
|
|
<th>Definition</th>
|
|
<td>A container element indicating a disjunction among a collection
|
|
of mutually exclusive interpretations of the input.</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Children</th>
|
|
<td>The <code>emma:one-of</code> element MUST immediately contain a
|
|
collection of one or more <code>emma:interpretation</code> elements
|
|
or container elements: <code>emma:one-of</code>,
|
|
<code>emma:group</code>, <code>emma:sequence</code> . It MAY also
|
|
contain <span>multiple optional</span>
|
|
<code>emma:derived-from</code> element<span>s</span> and <span>an
|
|
optional single</span> <code>emma:info</code>
|
|
<span>element</span>.</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Attributes</th>
|
|
<td>
|
|
<ul>
|
|
<li><b>Required</b>:
|
|
<ul>
|
|
<li>Attribute <code>id</code> of type <code>xsd:ID</code></li>
|
|
<li>The attribute <code>disjunction-type</code> MUST be present if
|
|
<code>emma:one-of</code> is embedded within
|
|
<code>emma:one-of</code>. <span>The possible values of
|
|
<code>disjunction-type</code> are {<code>recognition</code>,
|
|
<code>understanding</code>, <code>multi-device</code>, and
|
|
<code>multi-process</code>}.</span></li>
|
|
</ul>
|
|
</li>
|
|
<li><b>Optional</b>:
|
|
<ul>
|
|
<li>On a single non-embedded <code>emma:one-of</code> the attribute
|
|
<code>disjunction-type</code> is optional.</li>
|
|
<li>The following annotation attributes are optional:
|
|
<code>emma:tokens</code>, <code>emma:process</code>,
|
|
<code>emma:lang</code>, <code>emma:signal</code>,
|
|
<code><span>emma:signal-size</span></code>,
|
|
<code>emma:media-type</code>, <code>emma:confidence</code>,
|
|
<code>emma:source</code>, <code>emma:start</code>,
|
|
<code>emma:end</code>, <code>emma:time-ref-uri</code>,
|
|
<code>emma:time-ref-anchor-point</code>,
|
|
<code>emma:offset-to-start</code>, <code>emma:duration</code>,
|
|
<code>emma:medium</code>, <code>emma:mode</code>,
|
|
<code>emma:function</code>, <code>emma:verbal</code>,
|
|
<code>emma:cost</code>, <code>emma:grammar-ref</code>,
|
|
<code>emma:endpoint-info-ref</code>, <code>emma:model-ref</code>,
|
|
<code>emma:dialog-turn</code>.</li>
|
|
</ul>
|
|
</li>
|
|
</ul>
|
|
</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Applies to</th>
|
|
<td>The <code>emma:one-of</code> element MAY only appear as a child
|
|
of <code>emma:emma</code>, <code>emma:one-of</code>,
|
|
<code>emma:group</code>, <code>emma:sequence</code>, or
|
|
<code>emma:derivation</code>.</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
<p>The <code>emma:one-of</code> element acts as a container for a
|
|
collection of one or more interpretation
|
|
(<code>emma:interpretation</code>) or container elements
|
|
(<code>emma:one-of</code>, <code>emma:group</code>,
|
|
<code>emma:sequence</code>), and denotes that these are mutually
|
|
exclusive interpretations.</p>
|
|
<p>An N-best list of choices in EMMA MUST be represented as a set
|
|
of <code>emma:interpretation</code> elements contained within an
|
|
<code>emma:one-of</code> element. For instance, a series of
|
|
different recognition results in speech recognition might be
|
|
represented in this way.</p>
|
|
<pre class="example">
|
|
<emma:emma version="1.0"
|
|
xmlns:emma="http://www.w3.org/2003/04/emma"
|
|
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
|
xsi:schemaLocation="http://www.w3.org/2003/04/emma
|
|
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
|
|
xmlns="http://www.example.com/example">
|
|
<emma:one-of id="r1" <span>emma:medium="acoustic" emma:mode="voice"</span>>
|
|
<emma:interpretation id="int1">
|
|
<origin>Boston</origin>
|
|
<destination>Denver</destination>
|
|
<date>03112003</date>
|
|
</emma:interpretation>
|
|
|
|
<emma:interpretation id="int2">
|
|
<origin>Austin</origin>
|
|
<destination>Denver</destination>
|
|
<date>03112003</date>
|
|
</emma:interpretation>
|
|
</emma:one-of>
|
|
</emma:emma>
|
|
</pre>
|
|
<p>The function of the <code>emma:one-of</code> element is to
|
|
represent a disjunctive list of possible interpretations of a user
|
|
input. A disjunction of possible interpretations of an input can be
|
|
the result of different kinds of processing or ambiguity. One
|
|
source is multiple results from a recognition technology such as
|
|
speech or handwriting recognition. Multiple results can also occur
|
|
from parsing or understanding natural language. Another possible
|
|
source of ambiguity is from the application of multiple different
|
|
kinds of recognition or understanding components to the same input
|
|
signal. For example, an single ink input signal might be processed
|
|
by both handwriting recognition and gesture recognition. Another is
|
|
the use of more than one recording device for the same input
|
|
(multiple microphones).</p>
|
|
<p>In order to make explicit these different kinds of multiple
|
|
interpretations and allow for concise statement of the annotations
|
|
associated with each, the <code>emma:one-of</code> element MAY
|
|
appear within another <code>emma:one-of</code> element. If
|
|
<code>emma:one-of</code> elements are nested then they MUST
|
|
indicate the kind of disjunction using the attribute
|
|
<code>disjunction-type</code>. The values of
|
|
<code>disjunction-type</code> are <code>{recognition,
|
|
understanding, multi-device, and multi-process}</code>. For the
|
|
most common use case, where there are multiple recognition results
|
|
and some of them have multiple interpretations, the top-level
|
|
<code>emma:one-of</code> is
|
|
<code>disjunction-type="recognition"</code> and the embedded
|
|
<code>emma:one-of</code> has the attribute
|
|
<code>disjunction-type="understanding"</code>.</p>
|
|
<p>As an example, in an interactive flight reservation application,
|
|
recognition yielded 'Boston' or 'Austin' and each had a semantic
|
|
interpretation as either the assertion of city name or the
|
|
specification of a flight query with the city as the destination,
|
|
this would be represented as follows in EMMA:</p>
|
|
<pre class="example">
|
|
<span>
|
|
<emma:emma version="1.0"
|
|
xmlns:emma="http://www.w3.org/2003/04/emma"
|
|
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
|
xsi:schemaLocation="http://www.w3.org/2003/04/emma
|
|
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
|
|
xmlns="http://www.example.com/example">
|
|
<emma:one-of disjunction-type="recognition"
|
|
start="12457990" end="12457995"
|
|
<span>emma:medium="acoustic" emma:mode="voice"</span>>
|
|
<emma:one-of disjunction-type="understanding"
|
|
emma:tokens="boston">
|
|
<emma:interpretation>
|
|
<assert><city>boston</city></assert>
|
|
</emma:interpretation>
|
|
<emma:interpretation>
|
|
<flight><dest><city>boston</city></dest></flight>
|
|
</emma:interpretation>
|
|
</emma:one-of>
|
|
<emma:one-of disjunction-type="understanding"
|
|
emma:tokens="austin">
|
|
<emma:interpretation>
|
|
<assert><city>austin</city></assert>
|
|
</emma:interpretation>
|
|
<emma:interpretation>
|
|
<flight><dest><city>austin</city></dest></flight>
|
|
</emma:interpretation>
|
|
</emma:one-of>
|
|
</emma:one-of>
|
|
</emma:emma>
|
|
</span>
|
|
</pre>
|
|
<p>EMMA MAY explicitly represent ambiguity resulting from different
|
|
processes, devices, or sources using embedded
|
|
<code>emma:one-of</code> and the <code>disjunction-type</code>
|
|
attribute. Multiple different interpretations resulting from
|
|
different factors MAY also be listed within a single unstructured
|
|
<code>emma:one-of</code> though in this case it is more complex or
|
|
impossible to uncover the sources of the ambiguity if required by
|
|
later stages of processing. If there is no embedding in
|
|
<code>emma:one-of</code>, then the <code>disjunction-type</code>
|
|
attribute is not required. If the <code>disjunction-type</code>
|
|
attribute is missing then by default the source of disjunction is
|
|
unspecified.</p>
|
|
<p>The example case above could also be represented as:</p>
|
|
<pre class="example">
|
|
<span>
|
|
<emma:emma version="1.0"
|
|
xmlns:emma="http://www.w3.org/2003/04/emma"
|
|
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
|
xsi:schemaLocation="http://www.w3.org/2003/04/emma
|
|
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
|
|
xmlns="http://www.example.com/example">
|
|
<emma:one-of start="12457990" end="12457995"
|
|
<span> emma:medium="acoustic" emma:mode="voice"</span>>
|
|
<emma:interpretation emma:tokens="boston">
|
|
<assert><city>boston</city></assert>
|
|
</emma:interpretation>
|
|
<emma:interpretation >
|
|
<flight><dest><city>boston</city></dest></flight>
|
|
</emma:interpretation>
|
|
<emma:interpretation emma:tokens="austin">
|
|
<assert><city>austin</city></assert>
|
|
</emma:interpretation>
|
|
<emma:interpretation emma:tokens="austin">
|
|
<flight><dest><city>austin</city></dest></flight>
|
|
</emma:interpretation>
|
|
</emma:one-of>
|
|
</emma:emma>
|
|
</span>
|
|
</pre>
|
|
<p>But in this case information about which interpretations
|
|
resulted from speech recognition and which resulted from language
|
|
understanding is lost.</p>
|
|
<p>A list of <code>emma:interpretation</code> elements within an
|
|
<code>emma:one-of</code> MUST be sorted best-first by some measure
|
|
of quality. The quality measure is <code>emma:confidence</code> if
|
|
present, otherwise, the quality metric is platform-specific.</p>
|
|
<p>With embedded <code>emma:one-of</code> structures there is no
|
|
requirement for the confidence scores within different
|
|
<code>emma:one-of</code> to be on the same scale. For example, the
|
|
scores assigned by handwriting recognition might not be comparable
|
|
to those assigned by gesture recognition. Similarly, if multiple
|
|
recognizers are used there is no guarantee that their confidence
|
|
scores will be comparable. For this reason the ordering requirement
|
|
on <code>emma:interpretation</code> within <code>emma:one-of</code>
|
|
only applies locally to sister <code>emma:interpretation</code>
|
|
elements within each <code>emma:one-of</code>. There is no
|
|
requirement on the ordering of embedded <code>emma:one-of</code>
|
|
elements within a higher <code>emma:one-of</code> element.</p>
|
|
<p>While <code>emma:medium</code> and <code>emma:mode</code> are
|
|
optional on <code>emma:one-of</code>, note that all EMMA
|
|
interpretations must be annotated for <code>emma:medium</code> and
|
|
<code>emma:mode</code>, so either these annotations must appear
|
|
directly on all of the contained <code>emma:interpretation</code>
|
|
elements within the <code>emma:one-of</code>, or they must appear
|
|
on the <code>emma:one-of</code> element itself, or they must appear
|
|
on an ancestor <code>emma:one-of</code> element, or they must
|
|
appear on an earlier stage of the derivation listed in
|
|
<code>emma:derivation</code>.</p>
|
|
<h3 id="s3.3.2">3.3.2 <code>emma:group</code> element</h3>
|
|
<table class="defn" summary="property definition" width="98%"
|
|
cellpadding="5" cellspacing="0">
|
|
<tbody>
|
|
<tr>
|
|
<th>Annotation</th>
|
|
<th>emma:group</th>
|
|
</tr>
|
|
<tr>
|
|
<th>Definition</th>
|
|
<td>A container element indicating that a number of interpretations
|
|
of distinct user inputs are grouped according to some
|
|
criteria.</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Children</th>
|
|
<td>The <code>emma:group</code> element MUST immediately contain a
|
|
collection of one or more <code>emma:interpretation</code> elements
|
|
or container elements: <code>emma:one-of</code>,
|
|
<code>emma:group</code>, <code>emma:sequence</code> . It MAY also
|
|
contain an <span>optional single</span>
|
|
<code>emma:group-info</code> element. It MAY also contain
|
|
<span>multiple optional</span> <code>emma:derived-from</code>
|
|
element<span>s</span> and <span>an optional single</span>
|
|
<code>emma:info</code> <span>element</span>.</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Attributes</th>
|
|
<td>
|
|
<ul>
|
|
<li><b>Required</b>: Attribute <code>id</code> of type
|
|
<code>xsd:ID</code></li>
|
|
<li><b>Optional</b>: The annotation attributes:
|
|
<code>emma:tokens</code>, <code>emma:process</code>,
|
|
<code>emma:lang</code>, <code>emma:signal</code>,
|
|
<code><span>emma:signal-size</span></code>,
|
|
<code>emma:media-type</code>, <code>emma:confidence</code>,
|
|
<code>emma:source</code>, <code>emma:start</code>,
|
|
<code>emma:end</code>, <code>emma:time-ref-uri</code>,
|
|
<code>emma:time-ref-anchor-point</code>,
|
|
<code>emma:offset-to-start</code>, <code>emma:duration</code>,
|
|
<code>emma:medium</code>, <code>emma:mode</code>,
|
|
<code>emma:function</code>, <code>emma:verbal</code>,
|
|
<code>emma:cost</code>, <code>emma:grammar-ref</code>,
|
|
<code>emma:endpoint-info-ref</code>, <code>emma:model-ref</code>,
|
|
<code>emma:dialog-turn</code>.</li>
|
|
</ul>
|
|
</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Applies to</th>
|
|
<td>The <code>emma:group</code> element is legal only as a child of
|
|
<code>emma:emma</code>, <code>emma:one-of</code>,
|
|
<code>emma:group</code>, <code>emma:sequence</code>, or
|
|
<code>emma:derivation</code>.</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
<p>The <code>emma:group</code> element is used to indicate that the
|
|
contained interpretations are from distinct user inputs that are
|
|
related in some manner. <code>emma:group</code> MUST NOT be used
|
|
for containing the multiple stages of processing of a single user
|
|
input. Those MUST be contained in the <code>emma:derivation</code>
|
|
element instead <span>(<a href="#s4.1.2">Section 4.1.2</a>)</span>.
|
|
For groups of inputs in temporal order the more specialized
|
|
container <code>emma:sequence</code> MUST be used <span>(<a href=
|
|
"#s3.3.3">Section 3.3.3</a>)</span>. The following example shows
|
|
three interpretations derived from the speech input "Move this
|
|
ambulance here" and the tactile input related to two consecutive
|
|
points on a map.</p>
|
|
<pre class="example">
|
|
<emma:emma version="1.0"
|
|
xmlns:emma="http://www.w3.org/2003/04/emma"
|
|
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
|
xsi:schemaLocation="http://www.w3.org/2003/04/emma
|
|
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
|
|
xmlns="http://www.example.com/example">
|
|
<emma:group id="grp"
|
|
emma:start="1087995961542"
|
|
emma:end="1087995964542">
|
|
<emma:interpretation id="int1"
|
|
<span>emma:medium="acoustic" emma:mode="voice"</span>>
|
|
<action>move</action>
|
|
<object>ambulance</object>
|
|
<destination>here</destination>
|
|
</emma:interpretation>
|
|
|
|
<emma:interpretation id="int2"
|
|
<span>emma:medium="tactile" emma:mode="ink"</span>>
|
|
<x>0.253</x>
|
|
<y>0.124</y>
|
|
</emma:interpretation>
|
|
|
|
<emma:interpretation id="int3"
|
|
<span>emma:medium="tactile" emma:mode="ink"</span>>
|
|
<x>0.866</x>
|
|
<y>0.724</y>
|
|
</emma:interpretation>
|
|
</emma:group>
|
|
</emma:emma>
|
|
|
|
</pre>
|
|
<p>The <code>emma:one-of</code> and <code>emma:group</code>
|
|
containers MAY be nested arbitrarily.</p>
|
|
<h4 id="s3.3.2.1">3.3.2.1 Indirect grouping criteria:
|
|
<code>emma:group-info</code> element</h4>
|
|
<table class="defn" summary="property definition" width="98%"
|
|
cellpadding="5" cellspacing="0">
|
|
<tbody>
|
|
<tr>
|
|
<th>Annotation</th>
|
|
<th>emma:group-info</th>
|
|
</tr>
|
|
<tr>
|
|
<th>Definition</th>
|
|
<td>The <code>emma:group-info</code> element contains or references
|
|
criteria used in establishing the grouping of interpretations in an
|
|
<code>emma:group</code> element.</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Children</th>
|
|
<td>The <code>emma:group-info</code> element MUST either
|
|
immediately contain inline instance data specifying grouping
|
|
criteria or have the attribute <code>ref</code> referencing the
|
|
criteria.</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Attributes</th>
|
|
<td>
|
|
<ul>
|
|
<li><b>Optional</b>: <code>ref</code> of type
|
|
<code>xsd:anyURI</code> referencing the grouping criteria;
|
|
alternatively the criteria MAY be provided inline as the content of
|
|
the <code>emma:group-info</code> element.</li>
|
|
</ul>
|
|
</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Applies to</th>
|
|
<td>The <code>emma:group-info</code> element is legal only as a
|
|
child of <code>emma:group</code>.</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
<p>Sometimes it may be convenient to indirectly associate a given
|
|
group with information, such as grouping criteria. The
|
|
<code>emma:group-info</code> element might be used to make explicit
|
|
the criteria by which members of a group are associated. In the
|
|
following example, a group of two points is associated with a
|
|
description of grouping criteria based upon a sliding temporal
|
|
window of two seconds duration.</p>
|
|
<pre class="example">
|
|
<emma:emma version="1.0"
|
|
xmlns:emma="http://www.w3.org/2003/04/emma"
|
|
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
|
xsi:schemaLocation="http://www.w3.org/2003/04/emma
|
|
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
|
|
xmlns="http://www.example.com/example"
|
|
xmlns:ex="http://www.example.com/ns/group">
|
|
<emma:group id="grp">
|
|
<emma:group-info>
|
|
<ex:mode>temporal</ex:mode>
|
|
<ex:duration>2s</ex:duration>
|
|
</emma:group-info>
|
|
|
|
<emma:interpretation id="int1"
|
|
<span> emma:medium="tactile" emma:mode="ink"</span>>
|
|
<x>0.253</x>
|
|
<y>0.124</y>
|
|
</emma:interpretation>
|
|
|
|
<emma:interpretation id="int2"
|
|
<span>emma:medium="tactile" emma:mode="ink"</span>>
|
|
<x>0.866</x>
|
|
<y>0.724</y>
|
|
</emma:interpretation>
|
|
</emma:group>
|
|
</emma:emma>
|
|
</pre>
|
|
<p>You might also use <code>emma:group-info</code> to refer to a
|
|
named grouping criterion using external reference, for
|
|
instance:</p>
|
|
<pre class="example">
|
|
<emma:emma version="1.0"
|
|
xmlns:emma="http://www.w3.org/2003/04/emma"
|
|
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
|
xsi:schemaLocation="http://www.w3.org/2003/04/emma
|
|
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
|
|
xmlns="http://www.example.com/example"
|
|
xmlns:ex="http://www.example.com/ns/group">
|
|
<emma:group id="grp">
|
|
<emma:group-info ref="http://www.example.com/criterion42"/>
|
|
<emma:interpretation id="int1"
|
|
<span>emma:medium="tactile" emma:mode="ink"</span>>
|
|
<x>0.253</x>
|
|
<y>0.124</y>
|
|
</emma:interpretation>
|
|
|
|
<emma:interpretation id="int2"
|
|
<span>emma:medium="tactile" emma:mode="ink"</span>>
|
|
<x>0.866</x>
|
|
<y>0.724</y>
|
|
</emma:interpretation>
|
|
</emma:group>
|
|
</emma:emma>
|
|
</pre>
|
|
<h3 id="s3.3.3">3.3.3 <code>emma:sequence</code> element</h3>
|
|
<table class="defn" summary="property definition" width="98%"
|
|
cellpadding="5" cellspacing="0">
|
|
<tbody>
|
|
<tr>
|
|
<th>Annotation</th>
|
|
<th>emma:sequence</th>
|
|
</tr>
|
|
<tr>
|
|
<th>Definition</th>
|
|
<td>A container element indicating that a number of interpretations
|
|
of distinct user inputs are in temporal sequence.</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Children</th>
|
|
<td>The <code>emma:sequence</code> element MUST immediately contain
|
|
a collection of one or more <code>emma:interpretation</code>
|
|
elements or container elements: <code>emma:one-of</code>,
|
|
<code>emma:group</code>, <code>emma:sequence</code> . It MAY also
|
|
contain <span>multiple optional</span>
|
|
<code>emma:derived-from</code> element<span>s</span> and <span>an
|
|
optional single</span> <code>emma:info</code>
|
|
<span>element</span>.</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Attributes</th>
|
|
<td>
|
|
<ul>
|
|
<li><b>Required</b>: Attribute <code>id</code> of type
|
|
<code>xsd:ID</code></li>
|
|
<li><b>Optional</b>: The annotation attributes:
|
|
<code>emma:tokens</code>, <code>emma:process</code>,
|
|
<code>emma:lang</code>, <code>emma:signal</code>,
|
|
<code><span>emma:signal-size</span></code>,
|
|
<code>emma:media-type</code>, <code>emma:confidence</code>,
|
|
<code>emma:source</code>, <code>emma:start</code>,
|
|
<code>emma:end</code>, <code>emma:time-ref-uri</code>,
|
|
<code>emma:time-ref-anchor-point</code>,
|
|
<code>emma:offset-to-start</code>, <code>emma:duration</code>,
|
|
<code>emma:medium</code>, <code>emma:mode</code>,
|
|
<code>emma:function</code>, <code>emma:verbal</code>,
|
|
<code>emma:cost</code>, <code>emma:grammar-ref</code>,
|
|
<code>emma:endpoint-info-ref</code>, <code>emma:model-ref</code>,
|
|
<code>emma:dialog-turn</code>.</li>
|
|
</ul>
|
|
</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Applies to</th>
|
|
<td>The <code>emma:sequence</code> element is legal only as a child
|
|
of <code>emma:emma</code>, <code>emma:one-of</code>,
|
|
<code>emma:group</code>, <code>emma:sequence</code>, or
|
|
<code>emma:derivation</code>.</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
<p>The <code>emma:sequence</code> element is used to indicate that
|
|
the contained interpretations are sequential in time, as in the
|
|
following example, which indicates that two points made with a pen
|
|
are in temporal order.</p>
|
|
<pre class="example">
|
|
<emma:emma version="1.0"
|
|
xmlns:emma="http://www.w3.org/2003/04/emma"
|
|
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
|
xsi:schemaLocation="http://www.w3.org/2003/04/emma
|
|
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
|
|
xmlns="http://www.example.com/example">
|
|
<emma:sequence id="seq1">
|
|
<emma:interpretation id="int1"
|
|
<span>emma:medium="tactile"</span> emma:mode="ink">
|
|
<x>0.253</x>
|
|
<y>0.124</y>
|
|
</emma:interpretation>
|
|
|
|
<emma:interpretation id="int2"
|
|
<span>emma:medium="tactile"</span> emma:mode="ink">
|
|
<x>0.866</x>
|
|
<y>0.724</y>
|
|
</emma:interpretation>
|
|
</emma:sequence>
|
|
</emma:emma>
|
|
</pre>
|
|
<p>The <code>emma:sequence</code> container MAY be combined with
|
|
<code>emma:one-of</code> and <code>emma:group</code> in arbitrary
|
|
nesting structures. The order of children in the content of the
|
|
<code>emma:sequence</code> element corresponds to a sequence of
|
|
interpretations. This ordering does not imply any particular
|
|
definition of sequentiality. EMMA processors are expected therefore
|
|
to use the <code>emma:sequence</code> element to hold
|
|
interpretations which are either strictly sequential in nature
|
|
(e.g. the end-time of an interpretation precedes the start-time of
|
|
its follower), or which overlap in some manner (e.g. the start-time
|
|
of a follower interpretation precedes the end-time of its
|
|
precedent). It is possible to use timestamps to provide fine
|
|
grained annotation for the sequence of interpretations that are
|
|
sequential in time <span>(see <a href="#s4.2.10">Section
|
|
4.2.10)</a></span>.</p>
|
|
<p>In the following more complex example, a sequence of two pen
|
|
gestures in <code>emma:sequence</code> and a speech input in
|
|
<code>emma:interpretation</code> <span>is</span> contained in an
|
|
<code>emma:group</code>.</p>
|
|
<pre class="example">
|
|
<emma:emma version="1.0"
|
|
xmlns:emma="http://www.w3.org/2003/04/emma"
|
|
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
|
xsi:schemaLocation="http://www.w3.org/2003/04/emma
|
|
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
|
|
xmlns="http://www.example.com/example">
|
|
<emma:group id="grp">
|
|
<emma:interpretation id="int1" emma:medium="acoustic"
|
|
emma:mode="voice">
|
|
<action>move</action>
|
|
<object>this-battleship</object>
|
|
<destination>here</destination>
|
|
</emma:interpretation>
|
|
|
|
<emma:sequence id="seq1">
|
|
<emma:interpretation id="int2" emma:medium="tactile"
|
|
emma:mode="ink">
|
|
<x>0.253</x>
|
|
<y>0.124</y>
|
|
</emma:interpretation>
|
|
|
|
<emma:interpretation id="int3" emma:medium="tactile"
|
|
emma:mode="ink">
|
|
<x>0.866</x>
|
|
<y>0.724</y>
|
|
</emma:interpretation>
|
|
</emma:sequence>
|
|
</emma:group>
|
|
</emma:emma>
|
|
</pre>
|
|
<h3 id="s3.4">3.4 Lattice element</h3>
|
|
<p>In addition to providing the ability to represent N-best lists
|
|
of interpretations using <code>emma:one-of</code>, EMMA also
|
|
provides the capability to represent lattices of words or other
|
|
symbols using the <code>emma:lattice</code> element. Lattices
|
|
provide a compact representation of large lists of possible
|
|
recognition results or interpretations for speech, pen, or
|
|
multimodal inputs.</p>
|
|
<p>In addition to providing a representation for lattice output
|
|
from speech recognition, another important use case for lattices is
|
|
for representation of the results of gesture and handwriting
|
|
recognition from a pen modality component. Lattices can also be
|
|
used to compactly represent multiple possible meaning
|
|
representations. Another use case for the lattice representation is
|
|
for associating confidence scores and other annotations with
|
|
individual words within a speech recognition result string.</p>
|
|
<p>Lattices are compactly described by a list of transitions
|
|
between nodes. For each transition the start and end nodes MUST be
|
|
defined, along with the label for the transition. Initial and final
|
|
nodes MUST also be indicated. The following figure provides a
|
|
graphical representation of a speech recognition lattice which
|
|
compactly represents eight different sequences of words.</p>
|
|
<p><img alt="speech lattice" src="lattice.png" /></p>
|
|
<p>which expands to:</p>
|
|
<pre>
|
|
a. flights to boston from portland today please
|
|
b. flights to austin from portland today please
|
|
c. flights to boston from oakland today please
|
|
d. flights to austin from oakland today please
|
|
e. flights to boston from portland tomorrow
|
|
f. flights to austin from portland tomorrow
|
|
g. flights to boston from oakland tomorrow
|
|
h. flights to austin from oakland tomorrow
|
|
</pre>
|
|
<h4 id="s3.4.1">3.4.1 Lattice markup: <code>emma:lattice</code>,
|
|
<code>emma:arc</code>, <code>emma:node</code> elements</h4>
|
|
<table class="defn" summary="property definition" width="98%"
|
|
cellpadding="5" cellspacing="0">
|
|
<tbody>
|
|
<tr>
|
|
<th>Annotation</th>
|
|
<th>emma:lattice</th>
|
|
</tr>
|
|
<tr>
|
|
<th>Definition</th>
|
|
<td>An element which encodes a lattice representation of user
|
|
input.</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Children</th>
|
|
<td>The <code>emma:lattice</code> element MUST immediately contain
|
|
one or more <code>emma:arc</code> elements and zero or more
|
|
<code>emma:node</code> elements.</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Attributes</th>
|
|
<td>
|
|
<ul>
|
|
<li><b>Required</b>:
|
|
<ul>
|
|
<li><code>initial</code> <span>of type
|
|
<code>xsd:nonNegativeInteger</code></span> indicating the number of
|
|
the initial node of the lattice.</li>
|
|
<li><code>final</code> contains a space-separated list of
|
|
<code>xsd:nonNegativeInteger</code> indicating the numbers of the
|
|
final nodes in the lattice.</li>
|
|
</ul>
|
|
</li>
|
|
<li><b>Optional</b>: <code>emma:time-ref-uri</code>,
|
|
<code>emma:time-ref-anchor-point</code>.</li>
|
|
</ul>
|
|
</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Applies to</th>
|
|
<td>The <code>emma:lattice</code> element is legal only as a child
|
|
of the <code>emma:interpretation</code> element.</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Annotation</th>
|
|
<th>emma:arc</th>
|
|
</tr>
|
|
<tr>
|
|
<th>Definition</th>
|
|
<td>An element which encodes a transition between two nodes in a
|
|
lattice. The label associated with the arc in the lattice is
|
|
represented in the content of <code>emma:arc</code>.</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Children</th>
|
|
<td>The <code>emma:arc</code> element MUST immediately contain
|
|
either character data or a single application namespace element or
|
|
be empty, in the case of epsilon transitions. It MAY contain an
|
|
<code>emma:info</code> element containing application or vendor
|
|
specific annotations.</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Attributes</th>
|
|
<td>
|
|
<ul>
|
|
<li><b>Required</b>:
|
|
<ul>
|
|
<li><code>from</code> <span>of type
|
|
<code>xsd:nonNegativeInteger</code></span> indicating the number of
|
|
the starting node for the arc.</li>
|
|
<li><code>to</code> <span>of type
|
|
<code>xsd:nonNegativeInteger</code></span> indicating the number of
|
|
the ending node for the arc.</li>
|
|
</ul>
|
|
</li>
|
|
<li><b>Optional</b>: <code>emma:start</code>,
|
|
<code>emma:end</code>, <code>emma:offset-to-start</code>,
|
|
<code>emma:duration</code>, <code>emma:confidence</code>,
|
|
<code>emma:cost</code>, <code>emma:lang</code>,
|
|
<code>emma:medium</code>, <code>emma:mode</code>,
|
|
<code>emma:source</code>.</li>
|
|
</ul>
|
|
</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Applies to</th>
|
|
<td>The <code>emma:arc</code> element is legal only as a child of
|
|
the <code>emma:lattice</code> element.</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Annotation</th>
|
|
<th>emma:node</th>
|
|
</tr>
|
|
<tr>
|
|
<th>Definition</th>
|
|
<td>An element which represents a node in the lattice. The
|
|
<code>emma:node</code> elements are not required to describe a
|
|
lattice but might be added to provide a location for annotations on
|
|
nodes in a lattice. There MUST be at most one
|
|
<code>emma:node</code> specification for each numbered node in the
|
|
lattice.</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Children</th>
|
|
<td>An OPTIONAL <code>emma:info</code> element for application or
|
|
vendor specific annotations on the node.</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Attributes</th>
|
|
<td>
|
|
<ul>
|
|
<li><b>Required</b>:
|
|
<ul>
|
|
<li><code>node-number</code> <span>of type
|
|
<code>xsd:nonNegativeInteger</code></span> indicating the
|
|
<span>node number</span> in the lattice.</li>
|
|
</ul>
|
|
</li>
|
|
<li><b>Optional</b>: <code>emma:confidence</code>,
|
|
<code>emma:cost</code>.</li>
|
|
</ul>
|
|
</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Applies to</th>
|
|
<td>The <code>emma:node</code> element is legal only as a child of
|
|
the <code>emma:lattice</code> element.</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
<p>In EMMA, a lattice is represented using an element
|
|
<code>emma:lattice</code>, which has attributes
|
|
<code>initial</code> and <code>final</code> for indicating the
|
|
initial and final nodes of the lattice. For the lattice
|
|
<span>below</span>, this will be: <code><emma:lattice
|
|
initial="1" final="8"/></code>. The nodes are numbered with
|
|
integers. If there is more than one distinct final node in the
|
|
lattice the nodes MUST be represented as a space separated list in
|
|
the value of the <code>final</code> attribute e.g.
|
|
<code><emma:lattice initial="1" final="9 10 23"/></code>.
|
|
There MUST only be one initial node in an EMMA lattice. Each
|
|
transition in the lattice is represented as an element
|
|
<code>emma:arc</code> with attributes <code>from</code> and
|
|
<code>to</code> which indicate the nodes where the transition
|
|
starts and ends. The arc's label is represented as the content of
|
|
the <code>emma:arc</code> element and MUST be any well-formed
|
|
character or XML content. In the example here the contents are
|
|
words. Empty (epsilon) transitions in a lattice MUST be represented
|
|
in the <code>emma:lattice</code> representation as
|
|
<code>emma:arc</code> <span>empty</span> elements, e.g.
|
|
<code><emma:arc from="1" to="8"/></code>.</p>
|
|
<p>The example speech lattice above would be represented in EMMA
|
|
markup as follows:</p>
|
|
<pre class="example">
|
|
<emma:emma version="1.0"
|
|
xmlns:emma="http://www.w3.org/2003/04/emma"
|
|
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
|
xsi:schemaLocation="http://www.w3.org/2003/04/emma
|
|
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
|
|
xmlns="http://www.example.com/example">
|
|
<emma:interpretation id="interp1"
|
|
<span>emma:medium="acoustic" emma:mode="voice"</span>>
|
|
<emma:lattice initial="1" final="8">
|
|
<emma:arc from="1" to="2">flights</emma:arc>
|
|
|
|
<emma:arc from="2" to="3">to</emma:arc>
|
|
<emma:arc from="3" to="4">boston</emma:arc>
|
|
<emma:arc from="3" to="4">austin</emma:arc>
|
|
<emma:arc from="4" to="5">from</emma:arc>
|
|
|
|
<emma:arc from="5" to="6">portland</emma:arc>
|
|
<emma:arc from="5" to="6">oakland</emma:arc>
|
|
<emma:arc from="6" to="7">today</emma:arc>
|
|
<emma:arc from="7" to="8">please</emma:arc>
|
|
|
|
<emma:arc from="6" to="8">tomorrow</emma:arc>
|
|
</emma:lattice>
|
|
</emma:interpretation>
|
|
</emma:emma>
|
|
</pre>
|
|
<p>Alternatively, if we wish to represent the same information as
|
|
an N-best list using <code>emma:one-of,</code> we would have the
|
|
more verbose representation:</p>
|
|
<pre class="example">
|
|
<emma:emma version="1.0"
|
|
xmlns:emma="http://www.w3.org/2003/04/emma"
|
|
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
|
xsi:schemaLocation="http://www.w3.org/2003/04/emma
|
|
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
|
|
xmlns="http://www.example.com/example">
|
|
<emma:one-of id="nbest1" <span>emma:medium="acoustic" emma:mode="voice"</span>>
|
|
<emma:interpretation id="interp1">
|
|
<text>flights to boston from portland today please</text>
|
|
</emma:interpretation>
|
|
|
|
<emma:interpretationid="interp2">
|
|
<text>flights to boston from portland tomorrow</text>
|
|
</emma:interpretation>
|
|
|
|
<emma:interpretation id="interp3">
|
|
<text>flights to austin from portland today please</text>
|
|
</emma:interpretation>
|
|
|
|
<emma:interpretation id="interp4">
|
|
<text>flights to austin from portland tomorrow</text>
|
|
</emma:interpretation>
|
|
|
|
<emma:interpretation id="interp5">
|
|
<text>flights to boston from oakland today please</text>
|
|
</emma:interpretation>
|
|
|
|
<emma:interpretation id="interp6">
|
|
<text>flights to boston from oakland tomorrow</text>
|
|
</emma:interpretation>
|
|
|
|
<emma:interpretation id="interp7">
|
|
<text>flights to austin from oakland today please</text>
|
|
</emma:interpretation>
|
|
|
|
<emma:interpretation id="interp8">
|
|
<text>flights to austin from oakland tomorrow</text>
|
|
</emma:interpretation>
|
|
</emma:one-of>
|
|
</emma:emma>
|
|
</pre>
|
|
<p>The lattice representation avoids the need to enumerate all of
|
|
the possible word sequences. Also, as detailed below, the
|
|
<code>emma:lattice</code> representation enables placement of
|
|
annotations on individual words in the input.</p>
|
|
<p>For use cases involving the representation of gesture/ink
|
|
lattices and use cases involving lattices of semantic
|
|
interpretations, EMMA allows for application namespace elements to
|
|
appear within <code>emma:arc</code>.</p>
|
|
<p>For example a sequence of two gestures, each of which is
|
|
recognized as either a line or a circle<span>,</span> might be
|
|
represented as follows:</p>
|
|
<pre class="example">
|
|
<emma:emma version="1.0"
|
|
xmlns:emma="http://www.w3.org/2003/04/emma"
|
|
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
|
xsi:schemaLocation="http://www.w3.org/2003/04/emma
|
|
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
|
|
xmlns="http://www.example.com/example">
|
|
<emma:interpretation id="interp1"
|
|
<span>emma:medium="acoustic" emma:mode="voice"</span>>
|
|
<emma:lattice initial="1" final="3">
|
|
<emma:arc from="1" to="2">
|
|
<circle radius="100"/>
|
|
</emma:arc>
|
|
<emma:arc from="2" to="3">
|
|
<line length="628"/>
|
|
</emma:arc>
|
|
<emma:arc from="1" to="2">
|
|
<circle radius="200"/>
|
|
</emma:arc>
|
|
<emma:arc from="2" to="3">
|
|
<line length="1256"/>
|
|
</emma:arc>
|
|
</emma:lattice>
|
|
</emma:interpretation>
|
|
</emma:emma>
|
|
</pre>
|
|
<p>As an example of a lattice of semantic interpretations, in a
|
|
travel application where the source is either "Boston" or
|
|
"Austin"and the destination is either "Newark" or "New York", the
|
|
possibilities might be represented in a lattice as follows:</p>
|
|
<pre class="example">
|
|
<emma:emma version="1.0"
|
|
xmlns:emma="http://www.w3.org/2003/04/emma"
|
|
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
|
xsi:schemaLocation="http://www.w3.org/2003/04/emma
|
|
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
|
|
xmlns="http://www.example.com/example">
|
|
<emma:interpretation id="interp1"
|
|
<span>emma:medium="acoustic" emma:mode="voice"</span>>
|
|
<emma:lattice initial="1" final="3">
|
|
<emma:arc from="1" to="2">
|
|
<source city="boston"/>
|
|
</emma:arc>
|
|
<emma:arc from="2" to="3">
|
|
<destination city="newark"/>
|
|
</emma:arc>
|
|
<emma:arc from="1" to="2">
|
|
<source city="austin"/>
|
|
</emma:arc>
|
|
<emma:arc from="2" to="3">
|
|
<destination city="new york"/>
|
|
</emma:arc>
|
|
</emma:lattice>
|
|
</emma:interpretation>
|
|
</emma:emma>
|
|
</pre>
|
|
<p>The <code>emma:arc</code> element MAY contain either an
|
|
application namespace element or character data. It MUST NOT
|
|
contain combinations of application namespace elements and
|
|
character data. However, an <code>emma:info</code> element MAY
|
|
appear within an <code>emma:arc</code> element alongside character
|
|
data, in order to allow for the association of vendor or
|
|
application specific annotations on a single word or symbol in a
|
|
lattice.</p>
|
|
<p>So, in summary, there are four groupings of content that can
|
|
appear within <code>emma:arc</code>:</p>
|
|
<ul>
|
|
<li>Character Data e.g. a recognized word in a speech lattice.</li>
|
|
<li>Character Data and a single <code>emma:info</code> element
|
|
providing vendor or application specific annotations that apply to
|
|
the character data.</li>
|
|
<li>An application namespace element e.g. the gesture and
|
|
<span>semantic interpretation</span> lattice examples above.</li>
|
|
<li>An application namespace element and a single
|
|
<code>emma:info</code> element providing vendor or application
|
|
specific annotations that apply to the character data.</li>
|
|
</ul>
|
|
<h4 id="s3.4.2">3.4.2 Annotations on lattices</h4>
|
|
<p>The encoding of lattice arcs as XML elements
|
|
(<code>emma:arc</code>) enables arcs to be annotated with metadata
|
|
such as timestamps, costs, or confidence scores:</p>
|
|
<pre class="example">
|
|
<emma:emma version="1.0"
|
|
xmlns:emma="http://www.w3.org/2003/04/emma"
|
|
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
|
xsi:schemaLocation="http://www.w3.org/2003/04/emma
|
|
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
|
|
xmlns="http://www.example.com/example">
|
|
<emma:interpretation id="interp1"
|
|
<span>emma:medium="acoustic" emma:mode="voice"</span>>
|
|
<emma:lattice initial="1" final="8">
|
|
<emma:arc
|
|
from="1"
|
|
to="2"
|
|
emma:start="1087995961542"
|
|
emma:end="1087995962042"
|
|
emma:cost="30">
|
|
flights
|
|
</emma:arc>
|
|
|
|
<emma:arc
|
|
from="2"
|
|
to="3"
|
|
emma:start="1087995962042"
|
|
emma:end="1087995962542"
|
|
emma:cost="20">
|
|
to
|
|
</emma:arc>
|
|
|
|
<emma:arc
|
|
from="3"
|
|
to="4"
|
|
emma:start="1087995962542"
|
|
emma:end="1087995963042"
|
|
emma:cost="50">
|
|
boston
|
|
</emma:arc>
|
|
|
|
<emma:arc
|
|
from="3"
|
|
to="4"
|
|
emma:start="1087995963042"
|
|
emma:end="1087995963742"
|
|
emma:cost="60">
|
|
austin
|
|
</emma:arc>
|
|
...
|
|
</emma:lattice>
|
|
</emma:interpretation>
|
|
</emma:emma>
|
|
</pre>
|
|
<p>The following EMMA attributes MAY be placed on
|
|
<code>emma:arc</code> elements: absolute timestamps
|
|
(<code>emma:start</code>, <code>emma:end</code>), relative
|
|
timestamps ( <code>emma:offset-to-start</code>,
|
|
<code>emma:duration</code>), <code>emma:confidence</code>,
|
|
<code>emma:cost</code>, the human language of the input
|
|
(<code>emma:lang</code>), <code>emma:medium</code>,
|
|
<code>emma:mode</code>, and <code>emma:source</code>. The use case
|
|
for <code>emma:medium</code>, <code>emma:mode</code>, and
|
|
<code>emma:source</code> is for lattices which contains content
|
|
from different input modes. The <code>emma:arc</code> element MAY
|
|
also contain an <code>emma:info</code> element for specification of
|
|
vendor and application specific annotations on the arc.</p>
|
|
<p>The timestamps that appear on <code>emma:arc</code> elements do
|
|
not necessarily indicate the start and end of the arc itself. They
|
|
MAY indicate the start and end of the signal corresponding to the
|
|
label on the arc. As a result there is no requirement that the
|
|
<code>emma:end</code> timestamp on an arc going into a node should
|
|
be equivalent to the <code>emma:start</code> of all arcs going out
|
|
of that node. Furthermore there is no guarantee that the left to
|
|
right order of arcs in a lattice will correspond to the temporal
|
|
order of the input signal. The lattice representation is an
|
|
abstraction that represents a range of possible interpretations of
|
|
a user's input and is not intended to necessarily be a
|
|
representation of temporal order.</p>
|
|
<p>Costs are typically application and device dependent. There are
|
|
a variety of ways that individual arc costs might be combined to
|
|
produce costs for specific paths through the lattice. This
|
|
specification does not standardize the way for these costs to be
|
|
combined; it is up to the applications and devices to determine how
|
|
such derived costs would be computed and used.</p>
|
|
<p>For some lattice formats, it is also desirable to annotate the
|
|
nodes in the lattice themselves with information such as costs. For
|
|
example in speech recognition, costs might be placed on nodes as a
|
|
result of word penalties or redistribution of costs. For this
|
|
purpose EMMA also provides an <code>emma:node</code> element which
|
|
can host annotations such as <code>emma:cost</code>. The
|
|
<code>emma:node</code> element MUST have an attribute
|
|
<code>node-number</code> which indicates the number of the node.
|
|
There MUST be at most one <code>emma:node</code> specification for
|
|
a given numbered node in the lattice. In our example, if there was
|
|
a cost of <b>100</b> on the final state this could be represented
|
|
as follows:</p>
|
|
<pre class="example">
|
|
<emma:emma version="1.0"
|
|
xmlns:emma="http://www.w3.org/2003/04/emma"
|
|
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
|
xsi:schemaLocation="http://www.w3.org/2003/04/emma
|
|
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
|
|
xmlns="http://www.example.com/example">
|
|
<emma:interpretation id="interp1"
|
|
<span>emma:medium="acoustic" emma:mode="voice"</span>>
|
|
<emma:lattice initial="1" final="8">
|
|
<emma:arc
|
|
from="1"
|
|
to="2"
|
|
emma:start="1087995961542"
|
|
emma:end="1087995962042"
|
|
emma:cost="30">
|
|
flights
|
|
</emma:arc>
|
|
<emma:arc
|
|
from="2"
|
|
to="3"
|
|
emma:start="1087995962042"
|
|
emma:end="1087995962542"
|
|
emma:cost="20">
|
|
to
|
|
</emma:arc>
|
|
|
|
<emma:arc
|
|
from="3"
|
|
to="4"
|
|
emma:start="1087995962542"
|
|
emma:end="1087995963042"
|
|
emma:cost="50">
|
|
boston
|
|
</emma:arc>
|
|
<emma:arc
|
|
from="3"
|
|
to="4"
|
|
emma:start="1087995963042"
|
|
emma:end="1087995963742"
|
|
emma:cost="60">
|
|
austin
|
|
</emma:arc>
|
|
...
|
|
<emma:node node-number="8" emma:cost="100"/>
|
|
</emma:lattice>
|
|
</emma:interpretation>
|
|
</emma:emma>
|
|
</pre>
|
|
<h4 id="s3.4.3">3.4.3 Relative timestamps on lattices</h4>
|
|
<p>The relative timestamp mechanism in EMMA is intended to provide
|
|
temporal information about arcs in a lattice in relative terms
|
|
using offsets in milliseconds. In order to do this the absolute
|
|
time MAY be specified on <code>emma:interpretation</code>; both
|
|
<code>emma:time-ref-uri</code> and
|
|
<code>emma:time-ref-anchor-point</code> apply to
|
|
<code>emma:lattice</code> and MAY be used there to set the anchor
|
|
point for offsets to the start of the absolute time specified on
|
|
<code>emma:interpretation</code>. The offset in milliseconds to the
|
|
beginning of each arc MAY then be indicated on each
|
|
<code>emma:arc</code> in the <code>emma:offset-to-start</code>
|
|
attribute.</p>
|
|
<pre class="example">
|
|
<emma:emma version="1.0"
|
|
xmlns:emma="http://www.w3.org/2003/04/emma"
|
|
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
|
xsi:schemaLocation="http://www.w3.org/2003/04/emma
|
|
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
|
|
xmlns="http://www.example.com/example">
|
|
|
|
<emma:interpretation id="interp1"
|
|
emma:start="1087995961542" emma:end="1087995963042"
|
|
<span>emma:medium="acoustic" emma:mode="voice"</span>>
|
|
<emma:lattice emma:time-ref-uri="#interp1"
|
|
emma:time-ref-anchor-point="start"
|
|
initial="1" final="4">
|
|
<emma:arc
|
|
from="1"
|
|
to="2"
|
|
emma:offset-to-start="0">
|
|
flights
|
|
</emma:arc>
|
|
<emma:arc
|
|
from="2"
|
|
to="3"
|
|
emma:offset-to-start="500">
|
|
to
|
|
</emma:arc>
|
|
|
|
<emma:arc
|
|
from="3"
|
|
to="4"
|
|
emma:offset-to-start="1000">
|
|
boston
|
|
</emma:arc>
|
|
</emma:lattice>
|
|
</emma:interpretation>
|
|
</emma:emma>
|
|
</pre>
|
|
<p>Note that the offset for the first <code>emma:arc</code> MUST
|
|
always be zero since the EMMA attribute
|
|
<code>emma:offset-to-start</code> indicates the number of
|
|
milliseconds from the anchor point to the <i>start</i> of the piece
|
|
of input associated with the <code>emma:arc</code>, in this case
|
|
the word "flights".</p>
|
|
<h3 id="s3.5">3.5 Literal semantics: <code>emma:literal</code>
|
|
element</h3>
|
|
<table class="defn" summary="property definition" width="98%"
|
|
cellpadding="5" cellspacing="0">
|
|
<tbody>
|
|
<tr>
|
|
<th>Annotation</th>
|
|
<th>emma:literal</th>
|
|
</tr>
|
|
<tr>
|
|
<th>Definition</th>
|
|
<td>An element that contains string literal output.</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Children</th>
|
|
<td>String literal</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Attributes</th>
|
|
<td>None.</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Applies to</th>
|
|
<td>The <code>emma:literal</code> is a child of
|
|
<code>emma:interpretation</code>.</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
<p>Certain EMMA processing components produce semantic results in
|
|
the form of string literals without any surrounding application
|
|
namespace markup. These MUST be placed with the EMMA element
|
|
<code>emma:literal</code> within <code>emma:interpretation</code>.
|
|
For example, if a semantic interpreter simply returned "boston"
|
|
this could be represented in EMMA as:</p>
|
|
<pre class="example">
|
|
<emma:emma version="1.0"
|
|
xmlns:emma="http://www.w3.org/2003/04/emma"
|
|
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
|
xsi:schemaLocation="http://www.w3.org/2003/04/emma
|
|
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
|
|
xmlns="http://www.example.com/example">
|
|
<emma:interpretation <span>id="r1" <br />
|
|
emma:medium="acoustic" emma:mode="voice"</span>>
|
|
<emma:literal>boston</emma:literal>
|
|
</emma:interpretation>
|
|
</emma:emma>
|
|
</pre>
|
|
<p>Note that a raw recognition result of a sequence of words from
|
|
speech recognition is also a kind of string literal and can be
|
|
contained within <code>emma:literal</code>. For example,
|
|
recognition of the string "flights to san francisco" can be
|
|
represented in EMMA as follows:</p>
|
|
<pre class="example">
|
|
<emma:emma version="1.0"
|
|
xmlns:emma="http://www.w3.org/2003/04/emma"
|
|
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
|
xsi:schemaLocation="http://www.w3.org/2003/04/emma
|
|
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
|
|
xmlns="http://www.example.com/example">
|
|
<emma:interpretation <span>id="r1" <br />
|
|
emma:medium="acoustic" emma:mode="voice"</span>>
|
|
<emma:literal>flights to san francisco</emma:literal>
|
|
</emma:interpretation>
|
|
</emma:emma>
|
|
</pre>
|
|
<h2 id="s4">4. EMMA annotations</h2>
|
|
<p>This section defines annotations in the EMMA namespace including
|
|
both attributes and elements. The values are specified in terms of
|
|
the data types defined by XML Schema Part 2: Datatypes <span>Second
|
|
Edition</span> [<a href="#XSD2"><span>XML Schema
|
|
Datatypes</span></a>].</p>
|
|
<h3 id="s4.1">4.1 EMMA annotation elements</h3>
|
|
<h4 id="s4.1.1">4.1.1 Data model: <code>emma:model</code>
|
|
element</h4>
|
|
<table class="defn" summary="property definition" width="98%"
|
|
cellpadding="5" cellspacing="0">
|
|
<tbody>
|
|
<tr>
|
|
<th>Annotation</th>
|
|
<th>emma:model</th>
|
|
</tr>
|
|
<tr>
|
|
<th>Definition</th>
|
|
<td>The <code>emma:model</code> either references or provides
|
|
inline the data model for the instance data.</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Children</th>
|
|
<td>If a <code>ref</code> attribute is not specified then this
|
|
element contains the data model inline.</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Attributes</th>
|
|
<td>
|
|
<ul>
|
|
<li><b>Required</b>:
|
|
<ul>
|
|
<li><code>id</code> of type <code>xsd:ID</code>.</li>
|
|
</ul>
|
|
</li>
|
|
<li><b>Optional</b>:
|
|
<ul>
|
|
<li><code>ref</code> of type <code>xsd:anyURI</code> that
|
|
references the data model. Note that either an <code>ref</code>
|
|
attribute or in-line data model (but not both) MUST be
|
|
specified.</li>
|
|
</ul>
|
|
</li>
|
|
</ul>
|
|
</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Applies to</th>
|
|
<td>The <code>emma:model</code> element MAY appear only as a child
|
|
of <code>emma:emma</code>.</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
<p>The data model that may be used to express constraints on the
|
|
structure and content of instance data is specified as one of the
|
|
annotations of the instance. Specifying the data model is OPTIONAL,
|
|
in which case the data model can be said to be implicit. Typically
|
|
the data model is pre-established by the application.</p>
|
|
<p>The data model is specified with the <code>emma:model</code>
|
|
annotation defined as an element in the EMMA namespace. If the data
|
|
model for the contents of a <code>emma:interpretation</code>,
|
|
container elements, or application namespace element is to be
|
|
specified in EMMA, the attribute <code>emma:model-ref</code> MUST
|
|
be specified on the <code>emma:interpretation</code>, container
|
|
element, or application namespace element. Note that since multiple
|
|
<code>emma:model</code> elements might be specified under the
|
|
<code>emma:emma</code> it is possible to refer to multiple data
|
|
models within a single EMMA document. For example, different
|
|
alternative interpretations under an <code>emma:one-of</code> might
|
|
have different data models. In this case, an
|
|
<code>emma:model-ref</code> attribute would appear on each
|
|
<code>emma:interpretation</code> element in the N-best list with
|
|
its value being the <code>id</code> of the <code>emma:model</code>
|
|
element for that particular interpretation.</p>
|
|
<p>The data model is closely related to the interpretation data,
|
|
and is typically specified as the annotation related to the
|
|
<code>emma:interpretation</code> or <code>emma:one-of</code>
|
|
elements.</p>
|
|
<pre class="example">
|
|
<emma:emma version="1.0"
|
|
xmlns:emma="http://www.w3.org/2003/04/emma"
|
|
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
|
xsi:schemaLocation="http://www.w3.org/2003/04/emma
|
|
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
|
|
xmlns="http://www.example.com/example">
|
|
<emma:model id="model1" ref="http://example.com/models/city.xml"/>
|
|
<emma:interpretation id="int1" emma:model-ref="model1"
|
|
<span>emma:medium="acoustic" emma:mode="voice"</span>>
|
|
<city> London </city>
|
|
<country> UK </country>
|
|
</emma:interpretation>
|
|
</emma:emma>
|
|
</pre>
|
|
<p>The <code>emma:model</code> annotation MAY reference any element
|
|
or attribute in the application instance data, as well as any EMMA
|
|
container element (<code>emma:one-of</code>,
|
|
<code>emma:group</code>, or <code>emma:sequence</code>).</p>
|
|
<p>The data model annotation MAY be used to either reference an
|
|
external data model with the <code>ref</code> attribute or provide
|
|
a data model as in-line content. Either a <code>ref</code>
|
|
attribute or in-line data model (but not both) MUST be
|
|
specified.</p>
|
|
<h4 id="s4.1.2">4.1.2 Interpretation derivation:
|
|
<code>emma:derived-from</code> element and
|
|
<code>emma:derivation</code> element</h4>
|
|
<table class="defn" summary="property definition" width="98%"
|
|
cellpadding="5" cellspacing="0">
|
|
<tbody>
|
|
<tr>
|
|
<th>Annotation</th>
|
|
<th>emma:derived-from</th>
|
|
</tr>
|
|
<tr>
|
|
<th>Definition</th>
|
|
<td>An empty element which provides a reference to the
|
|
interpretation which the element it appears on was derived
|
|
from.</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Children</th>
|
|
<td>None</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Attributes</th>
|
|
<td>
|
|
<ul>
|
|
<li><b>Required</b>:
|
|
<ul>
|
|
<li><code>resource</code> of type <code>xsd:anyURI</code> that
|
|
references the interpretation from which the current interpretation
|
|
is derived.</li>
|
|
</ul>
|
|
</li>
|
|
<li><b>Optional</b>:
|
|
<ul>
|
|
<li><code>composite</code> of type <code>xsd:boolean</code> that is
|
|
<code>"true"</code> if the derivation step combines multiple inputs
|
|
and <code>"false"</code> if not. If <code>composite</code> is not
|
|
specified the value is <code>"false"</code> by default.</li>
|
|
</ul>
|
|
</li>
|
|
</ul>
|
|
</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Applies to</th>
|
|
<td>The <code>emma:derived-from</code> element is legal only as a
|
|
child of <code>emma:interpretation</code>,
|
|
<code>emma:one-of</code>, <code>emma:group</code>, or
|
|
<code>emma:sequence</code>.</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Annotation</th>
|
|
<th>emma:derivation</th>
|
|
</tr>
|
|
<tr>
|
|
<th>Definition</th>
|
|
<td>An element which contains interpretation and container elements
|
|
representing earlier stages in the processing of the input.</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Children</th>
|
|
<td>One or more <code>emma:interpretation</code>,
|
|
<code>emma:one-of</code>, <code>emma:sequence</code>, or
|
|
<code>emma:group</code> elements.</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Attributes</th>
|
|
<td>None</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Applies to</th>
|
|
<td>The <code>emma:derivation</code> MAY appear only as a child of
|
|
the <code>emma:emma</code> element.</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
<p>Instances of interpretations are in general derived from other
|
|
instances of interpretation in a process that goes from raw data to
|
|
increasingly refined representations of the input. The derivation
|
|
annotation is used to link any two interpretations that are related
|
|
by representing the source and the outcome of an interpretation
|
|
process. For instance, a speech recognition process can return the
|
|
following result in the form of raw text:</p>
|
|
<pre class="example">
|
|
<emma:emma version="1.0"
|
|
xmlns:emma="http://www.w3.org/2003/04/emma"
|
|
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
|
xsi:schemaLocation="http://www.w3.org/2003/04/emma
|
|
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
|
|
xmlns="http://www.example.com/example">
|
|
<emma:interpretation id="raw"<br />
|
|
<span>emma:medium="acoustic" emma:mode="voice"</span>>
|
|
<answer>From Boston to Denver tomorrow</answer>
|
|
</emma:interpretation>
|
|
|
|
</emma:emma>
|
|
</pre>
|
|
<p>A first interpretation process will produce:</p>
|
|
<pre class="example">
|
|
<emma:emma version="1.0"
|
|
xmlns:emma="http://www.w3.org/2003/04/emma"
|
|
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
|
xsi:schemaLocation="http://www.w3.org/2003/04/emma
|
|
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
|
|
xmlns="http://www.example.com/example">
|
|
<emma:interpretation id="better"<br />
|
|
<span>emma:medium="acoustic" emma:mode="voice"</span>>
|
|
<origin>Boston</origin>
|
|
<destination>Denver</destination>
|
|
<date>tomorrow</date>
|
|
</emma:interpretation>
|
|
</emma:emma>
|
|
</pre>
|
|
<p>A second interpretation process, aware of the current date, will
|
|
be able to produce a more refined instance, such as:</p>
|
|
<pre class="example">
|
|
<emma:emma version="1.0"
|
|
xmlns:emma="http://www.w3.org/2003/04/emma"
|
|
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
|
xsi:schemaLocation="http://www.w3.org/2003/04/emma
|
|
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
|
|
xmlns="http://www.example.com/example">
|
|
<emma:interpretation id="best"
|
|
<span>emma:medium="acoustic" emma:mode="voice"</span>>
|
|
<origin>Boston</origin>
|
|
<destination>Denver</destination>
|
|
<date>20030315</date>
|
|
</emma:interpretation>
|
|
</emma:emma>
|
|
</pre>
|
|
<p>The interaction manager might need to have access to the three
|
|
levels of interpretation. The <code>emma:derived-from</code>
|
|
annotation element can be used to establish a chain of derivation
|
|
relationships as in the following example:</p>
|
|
<pre class="example">
|
|
<emma:emma version="1.0"
|
|
xmlns:emma="http://www.w3.org/2003/04/emma"
|
|
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
|
xsi:schemaLocation="http://www.w3.org/2003/04/emma
|
|
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
|
|
xmlns="http://www.example.com/example">
|
|
<emma:derivation>
|
|
<emma:interpretation id="raw"<br />
|
|
<span> emma:medium="acoustic" emma:mode="voice"</span>>
|
|
<answer>From Boston to Denver tomorrow</answer>
|
|
</emma:interpretation>
|
|
|
|
<emma:interpretation id="better">
|
|
<emma:derived-from resource="#raw" composite="false"/>
|
|
<origin>Boston</origin>
|
|
<destination>Denver</destination>
|
|
<date>tomorrow</date>
|
|
</emma:interpretation>
|
|
</emma:derivation>
|
|
|
|
<emma:interpretation id="best">
|
|
<emma:derived-from resource="#better" composite="false"/>
|
|
<origin>Boston</origin>
|
|
<destination>Denver</destination>
|
|
<date>20030315</date>
|
|
</emma:interpretation>
|
|
</emma:emma>
|
|
</pre>
|
|
<p>The <code>emma:derivation</code> element MAY be used as a
|
|
container for representations of the earlier stages in the
|
|
interpretation of the input. The latest stage of processing MUST be
|
|
a direct child of <code>emma:emma</code>.</p>
|
|
<p>The resource attribute on <code>emma:derived-from</code> is a
|
|
URI which can reference IDs in the current or other EMMA
|
|
documents.</p>
|
|
<p>In addition to representing sequential derivations, the EMMA
|
|
<code>emma:derived-from</code> element can also be used to capture
|
|
composite derivations. Composite derivations involve combination of
|
|
inputs from different modes.</p>
|
|
<p>In order to indicate whether an <code>emma:derived-from</code>
|
|
element describes a sequential derivation step or a composite
|
|
derivation step, the <code>emma:derived-from</code> element has an
|
|
attribute <code>composite</code> which has a boolean value. A
|
|
composite <code>emma:derived-from</code> MUST be marked as
|
|
<code>composite="true"</code> while a sequential
|
|
<code>emma:derived-from</code> element is marked as
|
|
<code>composite="false"</code>. If this attribute is not specified
|
|
the value is <code>false</code> by default.</p>
|
|
<p>In the following composite derivation example the user said
|
|
"destination" using the voice mode and circled Boston on a map
|
|
using the ink mode:</p>
|
|
<div>
|
|
<pre class="example">
|
|
<emma:emma version="1.0"
|
|
xmlns:emma="http://www.w3.org/2003/04/emma"
|
|
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
|
xsi:schemaLocation="http://www.w3.org/2003/04/emma
|
|
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
|
|
xmlns="http://www.example.com/example">
|
|
<emma:derivation>
|
|
<emma:interpretation id="voice1"
|
|
emma:start="1087995961500"
|
|
emma:end="1087995962542"
|
|
emma:process="http://example.com/myasr.xml"
|
|
emma:source="http://example.com/microphone/NC-61"
|
|
emma:signal="http://example.com/signals/sg23.wav"
|
|
emma:confidence="0.6"
|
|
emma:medium="acoustic"
|
|
emma:mode="voice"
|
|
emma:function="dialog"
|
|
emma:verbal="true"
|
|
emma:lang="en-US"
|
|
emma:tokens="destination">
|
|
<rawinput>destination</rawinput>
|
|
</emma:interpretation>
|
|
|
|
<emma:interpretation id="ink1"
|
|
emma:start="1087995961600"
|
|
emma:end="1087995964000"
|
|
emma:process="http://example.com/mygesturereco.xml"
|
|
emma:source="http://example.com/pen/wacom123"
|
|
emma:signal="http://example.com/signals/ink5.inkml"
|
|
emma:confidence="0.5"
|
|
emma:medium="tactile"
|
|
emma:mode="ink"
|
|
emma:function="dialog"
|
|
emma:verbal="false">
|
|
<rawinput>Boston</rawinput>
|
|
</emma:interpretation>
|
|
</emma:derivation>
|
|
|
|
<emma:interpretation id="multimodal1"
|
|
|
|
|
|
emma:confidence="0.3"
|
|
<span>emma:start="1087995961500"</span>
|
|
<span>emma:end="1087995964000"</span>
|
|
emma:medium="<span>acoustic tactile</span>"
|
|
emma:mode="<span>voice ink</span>"
|
|
emma:function="dialog"
|
|
emma:verbal="true"
|
|
emma:lang="en-US"
|
|
emma:tokens="destination">
|
|
<emma:derived-from resource="#voice1" composite="true"
|
|
<emma:derived-from resource="#ink1" composite="true"
|
|
<destination>Boston</destination>
|
|
</emma:interpretation>
|
|
</emma:emma>
|
|
</pre></div>
|
|
<p>In this example, annotations on the multimodal interpretation
|
|
indicate the process used for the integration and there are two
|
|
<code>emma:derived-from</code> elements, one pointing to the speech
|
|
and one pointing to the pen gesture.</p>
|
|
<p>The only constraints the EMMA specification places on the
|
|
annotations that appear on a composite input are that the
|
|
<code>emma:medium</code> attribute MUST contain the union of the
|
|
<code>emma:medium</code> attributes on the combining inputs,
|
|
represented as a space delimited set of <code>nmtokens</code> as
|
|
defined in <a href="#s4.2.11">Section 4.2.11</a>, and that the
|
|
<code>emma:mode</code> attribute MUST contain the union of the
|
|
<code>emma:mode</code> attributes on the combining inputs,
|
|
represented as a space delimited set of <span><code>nmtokens</code>
|
|
as defined in <a href="#s4.2.11">Section 4.2.11</a></span>. In the
|
|
example above this meanings that the <code>emma:medium</code> value
|
|
is <code>"acoustic tactile"</code> and the <code>emma:mode</code>
|
|
attribute is <code>"voice ink"</code>. How all other annotations
|
|
are handled is author defined. In the following paragraph,
|
|
informative examples on how specific annotations might be handled
|
|
are given.</p>
|
|
<p>With reference to the illustrative example above, this paragraph
|
|
provides informative guidance regarding the determination of
|
|
annotations (beyond <code>emma:medium</code> and
|
|
<code>emma:mode</code> on a composite multimodal interpretation).
|
|
Generally the timestamp on a combined input should contain the
|
|
intervals indicated by the combining inputs. For the absolute
|
|
timestamps <code>emma:start</code> and <code>emma:end</code> this
|
|
can be achieved by taking the earlier of the
|
|
<code>emma:start</code> values
|
|
(<code>emma:start="1087995961500"</code> in our example) and the
|
|
later of the <code>emma:end</code> values
|
|
(<code>emma:end="1087995964000"</code> in the example). The
|
|
determination of relative timestamps for composite is more complex,
|
|
informative guidance is given in <a href="#s4.2.10.4">Section
|
|
4.2.10.4</a>. Generally speaking the <code>emma:confidence</code>
|
|
value will be some numerical combination of the confidence scores
|
|
assigned to the combining inputs. In our example, it is the result
|
|
of multiplying the voice and ink confidence scores
|
|
(<code>0.3</code>). In other cases there may not be a confidence
|
|
score for one of the combining inputs and the author may choose to
|
|
copy the confidence score from the input which does have one.
|
|
Generally, for <code>emma:verbal</code>, if either of the inputs
|
|
has the value <code>true</code> then the multimodal interpretation
|
|
will also be <code>emma:verbal="true"</code> as in the example. In
|
|
other words the annotation for the composite input is the result of
|
|
an inclusive OR of the boolean values of the annotations on the
|
|
inputs. If an annotation is only specified on one of the combining
|
|
inputs then it may in some cases be assumed to apply to the
|
|
multimodal interpretation of the composite input. In the example,
|
|
<code>emma:lang="en-US"</code> is only specified for the speech
|
|
input, and this annotation appears on the composite result also.
|
|
Similarly in our example, only the voice has
|
|
<code>emma:tokens</code> and the author has chosen to annotate the
|
|
combined input with the same <code>emma:tokens</code> value. In
|
|
this example, the <code>emma:function</code> is the same on both
|
|
combining input and the author has chosen to use the same
|
|
annotation on the composite interpretation.</p>
|
|
<p>In annotating derivations of the processing of the input, EMMA
|
|
provides the flexibility of both course-grained or fine-grained
|
|
annotation of relations among interpretations. For example, when
|
|
relating two N-best lists, within <code>emma:one-of</code> elements
|
|
either there can be a single <code>emma:derived-from</code> element
|
|
under <code>emma:one-of</code> referring to the ID of the
|
|
<code>emma:one-of</code> for the earlier processing stage:</p>
|
|
<pre class="example">
|
|
<emma:emma version="1.0"
|
|
xmlns:emma="http://www.w3.org/2003/04/emma"
|
|
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
|
xsi:schemaLocation="http://www.w3.org/2003/04/emma
|
|
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
|
|
xmlns="http://www.example.com/example">
|
|
<emma:derivation>
|
|
<emma:one-of id="nbest1"
|
|
<span>emma:medium="acoustic" emma:mode="voice"</span>>
|
|
<emma:interpretation id="int1">
|
|
<res>from boston to denver on march eleven two thousand three</res>
|
|
</emma:interpretation>
|
|
|
|
<emma:interpretation id="int2">
|
|
<res>from austin to denver on march eleven two thousand three</res>
|
|
</emma:interpretation>
|
|
</emma:one-of>
|
|
</emma:derivation>
|
|
|
|
<emma:one-of id="nbest2">
|
|
<emma:derived-from resource="#nbest1" composite="false"/>
|
|
<emma:interpretation id="int1b">
|
|
<origin>Boston</origin>
|
|
<destination>Denver</destination>
|
|
<date>03112003</date>
|
|
</emma:interpretation>
|
|
|
|
<emma:interpretation id="int2b">
|
|
<origin>Austin</origin>
|
|
<destination>Denver</destination>
|
|
<date>03112003</date>
|
|
</emma:interpretation>
|
|
</emma:one-of>
|
|
|
|
</emma:emma>
|
|
</pre>
|
|
<p>Or there can be a separate <code>emma:derived-from</code>
|
|
element on each <code>emma:interpretation</code> element referring
|
|
to the specific <code>emma:interpretation</code> element it was
|
|
derived from.</p>
|
|
<pre class="example">
|
|
<emma:emma version="1.0"
|
|
xmlns:emma="http://www.w3.org/2003/04/emma"
|
|
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
|
xsi:schemaLocation="http://www.w3.org/2003/04/emma
|
|
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
|
|
xmlns="http://www.example.com/example">
|
|
<emma:one-of id="nbest2">
|
|
<emma:interpretation id="int1b">
|
|
<emma:derived-from resource="#int1" composite="false"/>
|
|
<origin>Boston</origin>
|
|
<destination>Denver</destination>
|
|
<date>03112003</date>
|
|
</emma:interpretation>
|
|
|
|
<emma:interpretation id="int2b">
|
|
<emma:derived-from resource="#int2" composite="false"/>
|
|
<origin>Austin</origin>
|
|
<destination>Denver</destination>
|
|
<date>03112003</date>
|
|
</emma:interpretation>
|
|
</emma:one-of>
|
|
<emma:derivation>
|
|
<emma:one-of id="nbest1"<br />
|
|
<span>emma:medium="acoustic" emma:mode="voice"</span>>
|
|
<emma:interpretation id="int1">
|
|
<res>from boston to denver on march eleven two thousand three</res>
|
|
</emma:interpretation>
|
|
|
|
<emma:interpretation id="int2">
|
|
<res>from austin to denver on march eleven two thousand three</res>
|
|
</emma:interpretation>
|
|
</emma:one-of>
|
|
</emma:derivation>
|
|
</emma:emma>
|
|
</pre>
|
|
<p><a href="#s4.3">Section 4.3</a> provides further examples of the
|
|
use of <code>emma:derived-from</code> to represent sequential
|
|
derivations and addresses the issue of the scope of EMMA
|
|
annotations across derivations of user input.</p>
|
|
<h4 id="s4.1.3">4.1.3 Reference to grammar used:
|
|
<code>emma:grammar</code> element</h4>
|
|
<table class="defn" summary="property definition" width="98%"
|
|
cellpadding="5" cellspacing="0">
|
|
<tbody>
|
|
<tr>
|
|
<th>Annotation</th>
|
|
<th>emma:grammar</th>
|
|
</tr>
|
|
<tr>
|
|
<th>Definition</th>
|
|
<td>An element used to provide a reference to the grammar used in
|
|
processing the input.</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Children</th>
|
|
<td>None</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Attributes</th>
|
|
<td>
|
|
<ul>
|
|
<li><b>Required</b>:
|
|
<ul>
|
|
<li><code><span>ref</span></code> of type <code>xsd:anyURI</code>
|
|
that references a grammar used in processing the input.</li>
|
|
<li><code>id</code> of type <code>xsd:ID</code>.</li>
|
|
</ul>
|
|
</li>
|
|
</ul>
|
|
</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Applies to</th>
|
|
<td>The <code>emma:grammar</code> is legal only as a child of the
|
|
<code>emma:emma</code> element.</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
<p>The grammar that was used to derive the EMMA result MAY be
|
|
specified with the <code>emma:grammar</code> annotation defined as
|
|
an element in the EMMA namespace.</p>
|
|
<p>Example:</p>
|
|
<pre class="example">
|
|
<emma:emma version="1.0"
|
|
xmlns:emma="http://www.w3.org/2003/04/emma"
|
|
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
|
xsi:schemaLocation="http://www.w3.org/2003/04/emma
|
|
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
|
|
xmlns="http://www.example.com/example">
|
|
<emma:grammar id="gram1" <span>ref</span>="someURI"/>
|
|
<emma:grammar id="gram2" <span>ref</span>="anotherURI"/>
|
|
<emma:one-of id="r1"<br />
|
|
<span>emma:medium="acoustic" emma:mode="voice"</span>>
|
|
<emma:interpretation id="int1" emma:grammar-ref="gram1">
|
|
<origin>Boston</origin>
|
|
</emma:interpretation>
|
|
|
|
<emma:interpretation id="int2" emma:grammar-ref="gram1">
|
|
<origin>Austin</origin>
|
|
</emma:interpretation>
|
|
|
|
<emma:interpretation id="int3" emma:grammar-ref="gram2">
|
|
<command>help</command>
|
|
</emma:interpretation>
|
|
</emma:one-of>
|
|
</emma:emma>
|
|
</pre>
|
|
<p>The <code>emma:grammar</code> annotation is a child of
|
|
<code>emma:emma.</code></p>
|
|
<h3 id="s4.1.4">4.1.4 Extensibility to application/vendor specific
|
|
annotations: <code>emma:info</code> element</h3>
|
|
<table class="defn" summary="property definition" width="98%"
|
|
cellpadding="5" cellspacing="0">
|
|
<tbody>
|
|
<tr>
|
|
<th>Annotation</th>
|
|
<th>emma:info</th>
|
|
</tr>
|
|
<tr>
|
|
<th>Definition</th>
|
|
<td>The <code>emma:info</code> element acts as a container for
|
|
vendor and/or application specific metadata regarding a user's
|
|
input.</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Children</th>
|
|
<td><span>One of more</span> elements in the application namespace
|
|
providing metadata about the input.</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Attributes</th>
|
|
<td>
|
|
<ul>
|
|
<li><b>Optional</b>:
|
|
<ul>
|
|
<li><code>id</code> of type <code>xsd:ID</code>.</li>
|
|
</ul>
|
|
</li>
|
|
</ul>
|
|
</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Applies to</th>
|
|
<td>The <code>emma:info</code> element is legal only as a child of
|
|
the EMMA elements <code>emma:emma</code>,
|
|
<code>emma:interpretation</code>, <code>emma:group</code>,
|
|
<code>emma:one-of</code>, <code>emma:sequence</code>,
|
|
<code>emma:arc</code>, or <code>emma:node</code>.</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
<p>In <a href="#s4.2">Section 4.2</a>, a series of attributes are
|
|
defined for representation of metadata about user inputs in a
|
|
standardized form. EMMA also provides an extensibility mechanism
|
|
for annotation of user inputs with vendor or application specific
|
|
metadata not covered by the standard set of EMMA annotations. The
|
|
element <code>emma:info</code> MUST be used as a container for
|
|
these annotations, UNLESS they are explicitly covered by
|
|
<code>emma:endpoint-info</code>. For example, if an input to a
|
|
dialog system needed to be annotated with the number that the call
|
|
originated from, their state, some indication of the type of
|
|
customer, and the name of the service, these pieces of information
|
|
could be represented within <code>emma:info</code> as in the
|
|
following example:</p>
|
|
<pre class="example">
|
|
<emma:emma version="1.0"
|
|
xmlns:emma="http://www.w3.org/2003/04/emma"
|
|
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
|
xsi:schemaLocation="http://www.w3.org/2003/04/emma
|
|
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
|
|
xmlns="http://www.example.com/example">
|
|
<emma:info>
|
|
<caller_id>
|
|
<phone_number>2121234567</phone_number>
|
|
<state>NY</state>
|
|
</caller_id>
|
|
|
|
<customer_type>residential</customer_type>
|
|
<service_name>acme_travel_service</service_name>
|
|
</emma:info>
|
|
|
|
<emma:one-of id="r1" emma:start="1087995961542"
|
|
emma:end="1087995963542"
|
|
<span>emma:medium="acoustic" emma:mode="voice"</span>>
|
|
<emma:interpretation id="int1" emma:confidence="0.75">
|
|
<origin>Boston</origin>
|
|
<destination>Denver</destination>
|
|
<date>03112003</date>
|
|
</emma:interpretation>
|
|
|
|
<emma:interpretation id="int2" emma:confidence="0.68">
|
|
<origin>Austin</origin>
|
|
<destination>Denver</destination>
|
|
<date>03112003</date>
|
|
</emma:interpretation>
|
|
</emma:one-of>
|
|
</emma:emma>
|
|
</pre>
|
|
<p>It is important to have an EMMA container element for
|
|
application/vendor specific annotations since EMMA elements provide
|
|
a structure for representation of multiple possible interpretations
|
|
of the input. As a result it is cumbersome to state
|
|
application/vendor specific metadata as part of the application
|
|
data within each <code>emma:interpretation</code>. An element is
|
|
used rather than an attribute so that internal structure can be
|
|
given to the annotations within <code>emma:info</code>.</p>
|
|
<p>In addition to <code>emma:emma</code>, <code>emma:info</code>
|
|
MAY also appear as a child of other structural elements such as
|
|
<code>emma:interpretation</code>, <code>emma:info</code> and so on.
|
|
When <code>emma:info</code> appears as a child of one of these
|
|
elements the application/vendor specific annotations contained
|
|
within <code>emma:info</code> are assumed to apply to all of the
|
|
<code>emma:interpretation</code> elements within the containing
|
|
element. The semantics of conflicting annotations in
|
|
<code>emma:info</code>, for example when different values are found
|
|
within <code>emma:emma</code> and <code>emma:interpretation</code>,
|
|
are left to the developer of the vendor/application specific
|
|
annotations.</p>
|
|
<h3 id="s4.1.5" class="notoc">4.1.5 Endpoint reference:
|
|
<code>emma:endpoint-info</code> element and
|
|
<code>emma:endpoint</code> element</h3>
|
|
<table class="defn" summary="property definition" width="98%"
|
|
cellpadding="5" cellspacing="0">
|
|
<tbody>
|
|
<tr>
|
|
<th>Annotation</th>
|
|
<th>emma:endpoint-info</th>
|
|
</tr>
|
|
<tr>
|
|
<th>Definition</th>
|
|
<td>The <code>emma:endpoint-info</code> element acts as a container
|
|
for all application specific annotation regarding the communication
|
|
environment.</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Children</th>
|
|
<td>One or more <code>emma:endpoint</code> elements.</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Attributes</th>
|
|
<td>
|
|
<ul>
|
|
<li><b>Required</b>:
|
|
<ul>
|
|
<li><code>id</code> of type <code>xsd:ID</code>.</li>
|
|
</ul>
|
|
</li>
|
|
</ul>
|
|
</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Applies to</th>
|
|
<td>The <code>emma:endpoint-info</code> elements is legal only as a
|
|
child of <code>emma:emma</code>.</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Annotation</th>
|
|
<th>emma:endpoint</th>
|
|
</tr>
|
|
<tr>
|
|
<th>Definition</th>
|
|
<td>The element acts as a container for application specific
|
|
endpoint information.</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Children</th>
|
|
<td>Elements in the application namespace providing metadata about
|
|
the input.</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Attributes</th>
|
|
<td>
|
|
<ul>
|
|
<li>Required:
|
|
<ul>
|
|
<li><code>id</code> of type <code>xsd:ID</code></li>
|
|
</ul>
|
|
</li>
|
|
<li>Optional: <code>emma:endpoint-role</code>,
|
|
<code>emma:endpoint-address</code>, <code>emma:message-id</code>,
|
|
<code>emma:port-num</code>, <code>emma:port-type</code>,
|
|
<code>emma:endpoint-pair-ref</code>,
|
|
<code>emma:service-name</code>, <code>emma:media-type</code>,
|
|
<code>emma:medium</code>, <code>emma:mode</code>.</li>
|
|
</ul>
|
|
</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Applies to</th>
|
|
<td><code>emma:endpoint-info</code></td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
<p>In order to conduct multimodal interaction, there is a need in
|
|
EMMA to specify the properties of the endpoint that receives the
|
|
input which leads to the EMMA annotation. This allows subsequent
|
|
components to utilize the endpoint properties as well as the
|
|
annotated inputs to conduct meaningful multimodal interaction. EMMA
|
|
element <code>emma:endpoint</code> can be used for this purpose. It
|
|
can specify the endpoint properties based on a set of common
|
|
endpoint property attributes in EMMA, such as
|
|
<code>emma:endpoint-address</code>, <code>emma:port-num</code>,
|
|
<code>emma:port-type</code>, etc. (<a href="#s4.2.14">Section
|
|
4.2.14</a>). Moreover, it provides an extensible annotation
|
|
structure that allows the inclusion of application and vendor
|
|
specific endpoint properties.</p>
|
|
<p>Note that the usage of the term "endpoint" in this context is
|
|
different from the way that the term is used in speech processing,
|
|
where it refers to the end of a speech input. As used here,
|
|
"endpoint" refers to a network location which is the source or
|
|
recipient of an EMMA document.</p>
|
|
<p>In multimodal interaction, multiple devices can be used and each
|
|
device can open multiple communication endpoints at the same time.
|
|
These endpoints are used to transmit and receive data, such as raw
|
|
input, EMMA documents, etc. The EMMA element
|
|
<code>emma:endpoint</code> provides a generic representation of
|
|
endpoint information which is relevant to multimodal interaction.
|
|
It allows the annotation to be interoperable, and it eliminates the
|
|
need for EMMA processors to create their own specialized
|
|
annotations for existing protocols, potential protocols or yet
|
|
undefined private protocols that they may use.</p>
|
|
<p>Moreover, <code>emma:endpoint-info</code> provides a container
|
|
to hold all annotations regarding the endpoint information,
|
|
including <code>emma:endpoint</code> and other application and
|
|
vendor specific annotations that are related to the communication,
|
|
allowing the same communication environment to be referenced and
|
|
used in multiple interpretations.</p>
|
|
<p>Note that EMMA provides two locations (i.e.
|
|
<code>emma:info</code> and <code>emma:endpoint-info</code>) for
|
|
specifying vendor/application specific annotations. If the
|
|
annotation is specifically related to the description of the
|
|
endpoint, then the vendor/application specific annotation SHOULD be
|
|
placed within <code>emma:endpoint-info</code>, otherwise it SHOULD
|
|
be placed within <code>emma:info</code>.</p>
|
|
<p>The following example illustrates the annotation of endpoint
|
|
reference properties in EMMA.</p>
|
|
<pre class="example">
|
|
<emma:emma version="1.0"
|
|
xmlns:emma="http://www.w3.org/2003/04/emma"
|
|
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
|
xsi:schemaLocation="http://www.w3.org/2003/04/emma
|
|
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
|
|
xmlns="http://www.example.com/example"
|
|
xmlns:ex="http://www.example.com/emma/port">
|
|
<emma:endpoint-info id="audio-channel-1">
|
|
<emma:endpoint id="endpoint1"
|
|
emma:endpoint-role="sink"
|
|
emma:endpoint-address="135.61.71.103"
|
|
emma:port-num="50204"
|
|
emma:port-type="rtp"
|
|
emma:endpoint-pair-ref="endpoint2"
|
|
emma:media-type="audio/dsr-202212; rate:8000; maxptime:40"
|
|
emma:service-name="travel"
|
|
emma:mode="voice">
|
|
<ex:app-protocol>SIP</ex:app-protocol>
|
|
</emma:endpoint>
|
|
|
|
<emma:endpoint id="endpoint2"
|
|
emma:endpoint-role="source"
|
|
emma:endpoint-address="136.62.72.104"
|
|
emma:port-num="50204"
|
|
emma:port-type="rtp"
|
|
emma:endpoint-pair-ref="endpoint1"
|
|
emma:media-type="audio/dsr-202212; rate:8000; maxptime:40"
|
|
emma:service-name="travel"
|
|
emma:mode="voice">
|
|
<ex:app-protocol>SIP</ex:app-protocol>
|
|
</emma:endpoint>
|
|
</emma:endpoint-info>
|
|
|
|
<emma:interpretation id="int1"
|
|
emma:start="1087995961542" emma:end="1087995963542"
|
|
emma:endpoint-info-ref="audio-channel-1"<br />
|
|
<span>emma:medium="acoustic" emma:mode="voice"</span>>
|
|
<destination>Chicago</destination>
|
|
</emma:interpretation>
|
|
</emma:emma>
|
|
</pre>
|
|
<p>The <code>ex:app-protocol</code> is provided by the application
|
|
or the vendor specification. It specifies that the application
|
|
layer protocol used to establish the speech transmission from the
|
|
"source" port to the "sink" port is Session Initiation Protocol
|
|
(SIP). This is specific to SIP based VoIP communication, in which
|
|
the actual media transmission and the call signaling that controls
|
|
the communication sessions, are separated and typically based on
|
|
different protocols. In the above example, the Real-time
|
|
Transmission Protocol (RTP) is used in the media transmission
|
|
between the source port and the sink port.</p>
|
|
<h2 id="s4.2">4.2 EMMA annotation attributes</h2>
|
|
<h3 id="s4.2.1">4.2.1 Tokens of input: <code>emma:tokens</code>
|
|
attribute</h3>
|
|
<table class="defn" summary="property definition" width="98%"
|
|
cellpadding="5" cellspacing="0">
|
|
<tbody>
|
|
<tr>
|
|
<th>Annotation</th>
|
|
<th>emma:tokens</th>
|
|
</tr>
|
|
<tr>
|
|
<th>Definition</th>
|
|
<td>An attribute of type <code>xsd:string</code> holding a sequence
|
|
of input tokens.</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Applies to</th>
|
|
<td><code>emma:interpretation</code>, <code>emma:group</code>,
|
|
<code>emma:one-of</code>, <code>emma:sequence</code>, and
|
|
application instance data.</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
<p>The <code>emma:tokens</code> annotation holds a list of input
|
|
tokens. In the following description, the term <i>tokens</i> is
|
|
used in the computational and syntactic sense of <i>units of
|
|
input</i>, and not in the sense of <i>XML tokens</i>. The value
|
|
held in <code>emma:tokens</code> is the list of the tokens of input
|
|
as produced by the processor which generated the EMMA document;
|
|
there is no language associated with this value.</p>
|
|
<p>In the case where a grammar is used to constrain input, the
|
|
value will correspond to tokens as defined by the grammar. So for
|
|
an EMMA document produced by input to a SRGS grammar [<a href=
|
|
"#SRGS">SRGS</a>], the value of <code>emma:tokens</code> will be
|
|
the list of words and/or phrases that are defined as tokens in SRGS
|
|
(<span>see</span> Section 2.1 <span>of [<a href=
|
|
"#SRGS">SRGS</a>]</span>). Items in the <code>emma:tokens</code>
|
|
list are delimited by white space and/or quotation marks for
|
|
phrases containing white space. For example:</p>
|
|
<pre class="example">
|
|
emma:tokens="arriving at 'Liverpool Street'"
|
|
</pre>
|
|
<p>where the three tokens of input are <i>arriving</i>, <i>at</i>
|
|
and <i>Liverpool Street</i>.</p>
|
|
<p>The <code>emma:tokens</code> annotation MAY be applied not just
|
|
to the lexical words and phrases of language but to any level of
|
|
input processing. Other examples of tokenization include phonemes,
|
|
ink strokes, gestures and any other discrete units of input at any
|
|
level.</p>
|
|
<p>Examples:</p>
|
|
<pre class="example">
|
|
<emma:emma version="1.0"
|
|
xmlns:emma="http://www.w3.org/2003/04/emma"
|
|
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
|
xsi:schemaLocation="http://www.w3.org/2003/04/emma
|
|
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
|
|
xmlns="http://www.example.com/example">
|
|
<emma:interpretation id="int1"
|
|
emma:tokens="From Cambridge to London tomorrow"<br />
|
|
<span>emma:medium="acoustic" emma:mode="voice"</span>>
|
|
<origin emma:tokens="From Cambridge">Cambridge</origin>
|
|
<destination emma:tokens="to London">London</destination>
|
|
<date emma:tokens="tomorrow">20030315</date>
|
|
</emma:interpretation>
|
|
</emma:emma>
|
|
</pre>
|
|
<h3 id="s4.2.2">4.2.2 Reference to processing:
|
|
<code>emma:process</code> attribute</h3>
|
|
<table class="defn" summary="property definition" width="98%"
|
|
cellpadding="5" cellspacing="0">
|
|
<tbody>
|
|
<tr>
|
|
<th>Annotation</th>
|
|
<th>emma:process</th>
|
|
</tr>
|
|
<tr>
|
|
<th>Definition</th>
|
|
<td>An attribute of type <code>xsd:anyURI</code> referencing the
|
|
process used to generate the interpretation.</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Applies to</th>
|
|
<td><code>emma:interpretation</code>, <code>emma:one-of</code>,
|
|
<code>emma:group</code>, <code>emma:sequence</code></td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
<p>A reference to the information concerning the processing that
|
|
was used for generating an interpretation MAY be made using the
|
|
<code>emma:process</code> attribute. For example:</p>
|
|
<pre class="example">
|
|
<emma:emma version="1.0"
|
|
xmlns:emma="http://www.w3.org/2003/04/emma"
|
|
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
|
xsi:schemaLocation="http://www.w3.org/2003/04/emma
|
|
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
|
|
xmlns="http://www.example.com/example">
|
|
<emma:derivation>
|
|
<emma:interpretation id="raw"<br />
|
|
<span>emma:medium="acoustic" emma:mode="voice"</span>>
|
|
<answer>From Boston to Denver tomorrow</answer>
|
|
</emma:interpretation>
|
|
|
|
<emma:interpretation id="better"
|
|
emma:process="http://example.com/mysemproc1.xml">
|
|
<origin>Boston</origin>
|
|
<destination>Denver</destination>
|
|
<date>tomorrow</date>
|
|
<emma:derived-from resource="#raw"/>
|
|
</emma:interpretation>
|
|
</emma:derivation>
|
|
|
|
<emma:interpretation id="best"
|
|
emma:process="http://example.com/mysemproc2.xml">
|
|
<origin>Boston</origin>
|
|
<destination>Denver</destination>
|
|
<date>03152003</date>
|
|
<emma:derived-from resource="#better"/>
|
|
</emma:interpretation>
|
|
</emma:emma>
|
|
</pre>
|
|
<p>The process description document, referenced by the
|
|
<code>emma:process</code> annotation MAY include information on the
|
|
process itself, such as grammar, type of parser, etc. EMMA is not
|
|
normative about the format of the process description document.</p>
|
|
<h3 id="s4.2.3">4.2.3 Lack of input: <code>emma:no-input</code>
|
|
attribute</h3>
|
|
<table class="defn" summary="property definition" width="98%"
|
|
cellpadding="5" cellspacing="0">
|
|
<tbody>
|
|
<tr>
|
|
<th>Annotation</th>
|
|
<th>emma:no-input</th>
|
|
</tr>
|
|
<tr>
|
|
<th>Definition</th>
|
|
<td>Attribute holding <code>xsd:boolean</code> value that is true
|
|
if there was no input.</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Applies to</th>
|
|
<td><code>emma:interpretation</code></td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
<p>The case of lack of input MUST be annotated as follows:</p>
|
|
<pre class="example">
|
|
<emma:emma version="1.0"
|
|
xmlns:emma="http://www.w3.org/2003/04/emma"
|
|
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
|
xsi:schemaLocation="http://www.w3.org/2003/04/emma
|
|
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
|
|
xmlns="http://www.example.com/example">
|
|
<emma:interpretation id="int1" emma:no-input="true"<br />
|
|
<span>emma:medium="acoustic" emma:mode="voice"</span>/>
|
|
</emma:emma>
|
|
</pre>
|
|
<p>If the <code>emma:interpretation</code> is annotated with
|
|
<code>emma:no-input="true"</code> then the
|
|
<code>emma:interpretation</code> MUST be empty.</p>
|
|
<h3 id="s4.2.4">4.2.4 Uninterpreted input:
|
|
<code>emma:uninterpreted</code> attribute</h3>
|
|
<table class="defn" summary="property definition" width="98%"
|
|
cellpadding="5" cellspacing="0">
|
|
<tbody>
|
|
<tr>
|
|
<th>Annotation</th>
|
|
<th>emma:uninterpreted</th>
|
|
</tr>
|
|
<tr>
|
|
<th>Definition</th>
|
|
<td>Attribute holding <code>xsd:boolean</code> value that is true
|
|
if <span>no interpretation was produced in response to the
|
|
input</span></td>
|
|
</tr>
|
|
<tr>
|
|
<th>Applies to</th>
|
|
<td><code>emma:interpretation</code></td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
<p>An <code>emma:interpretation</code> element representing input
|
|
<span>for which no interpretation was produced</span> MUST be
|
|
annotated with <code>emma:uninterpreted="true"</code>. For
|
|
example:</p>
|
|
<pre class="example">
|
|
<emma:emma version="1.0"
|
|
xmlns:emma="http://www.w3.org/2003/04/emma"
|
|
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
|
xsi:schemaLocation="http://www.w3.org/2003/04/emma
|
|
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
|
|
xmlns="http://www.example.com/example">
|
|
<emma:interpretation id="interp1" emma:uninterpreted="true"<br />
|
|
<span>emma:medium="acoustic" emma:mode="voice"</span>/>
|
|
</emma:emma>
|
|
</pre>
|
|
<p>The notation for uninterpreted input MAY refer to any possible
|
|
stage of interpretation processing, including raw transcriptions.
|
|
For instance, no interpretation would be produced for stages
|
|
performing pure signal capture such as audio recordings. Likewise,
|
|
if a spoken input was recognized but cannot be parsed by a language
|
|
understanding component, it can be tagged as
|
|
<code>emma:uninterpreted</code> as in the following example:</p>
|
|
<pre class="example">
|
|
<emma:emma version="1.0"
|
|
xmlns:emma="http://www.w3.org/2003/04/emma"
|
|
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
|
xsi:schemaLocation="http://www.w3.org/2003/04/emma
|
|
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
|
|
xmlns="http://www.example.com/example">
|
|
<emma:interpretation id="understanding"
|
|
emma:process="http://example.com/mynlu.xml"
|
|
emma:uninterpreted="true"
|
|
emma:tokens="From Cambridge to London tomorrow"<br />
|
|
<span>emma:medium="acoustic" emma:mode="voice"</span>/>
|
|
</emma:emma>
|
|
</pre>
|
|
<p>The <code>emma:interpretation</code> MUST be empty <span class=
|
|
"add">if</span> the <code>emma:interpretation</code> element is
|
|
annotated with <code>emma:uninterpreted="true"</code>.</p>
|
|
<h3 id="s4.2.5">4.2.5 Human language of input:
|
|
<code>emma:lang</code> attribute</h3>
|
|
<table class="defn" summary="property definition" width="98%"
|
|
cellpadding="5" cellspacing="0">
|
|
<tbody>
|
|
<tr>
|
|
<th>Annotation</th>
|
|
<th>emma:lang</th>
|
|
</tr>
|
|
<tr>
|
|
<th>Definition</th>
|
|
<td>An attribute of type <code>xsd:language</code> indicating the
|
|
language for the input.</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Applies to</th>
|
|
<td><code>emma:interpretation</code>, <code>emma:group</code>,
|
|
<code>emma:one-of</code>, <code>emma:sequence</code>, and
|
|
application instance data.</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
<p>The <code>emma:lang</code> annotation is used to indicate the
|
|
human language for the input that it annotates. The values of the
|
|
<code>emma:lang</code> attribute are language identifiers as
|
|
defined by <span>IETF Best Current Practice 47 [<a href=
|
|
"#BCP47">BCP47</a>]</span>. For example,
|
|
<code>emma:lang="fr"</code> denotes French, and
|
|
<code>emma:lang="en-US"</code> denotes US English.
|
|
<code>emma:lang</code> MAY be applied to any
|
|
<code>emma:interpretation</code> element. Its annotative scope
|
|
follows the annotative scope of these elements. Unlike the
|
|
<code>xml:lang</code> attribute in XML, <code>emma:lang</code> does
|
|
not specify the language used by element contents or attribute
|
|
values.</p>
|
|
<p>The following example shows the use of <code>emma:lang</code>
|
|
for annotating an input interpretation.</p>
|
|
<pre class="example">
|
|
<emma:emma version="1.0"
|
|
xmlns:emma="http://www.w3.org/2003/04/emma"
|
|
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
|
xsi:schemaLocation="http://www.w3.org/2003/04/emma
|
|
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
|
|
xmlns="http://www.example.com/example">
|
|
<emma:interpretation id="int1" emma:lang="fr"<br />
|
|
<span>emma:medium="acoustic" emma:mode="voice"</span>>
|
|
<answer>arretez</answer>
|
|
</emma:interpretation>
|
|
</emma:emma>
|
|
</pre>
|
|
<p>Many kinds of input including some inputs made through pen,
|
|
computer vision, and other kinds of sensors are inherently
|
|
non-linguistic. Examples include drawing areas, arrows etc. using a
|
|
pen and music input for tune recognition. If these non-linguistic
|
|
inputs are annotated with <code>emma:lang</code> then they MUST be
|
|
annotated as <code>emma:lang="zxx"</code>. For example, pen input
|
|
where a user circles an area on map display could be represented as
|
|
follows where <code>emma:lang="zxx"</code> indicates that the ink
|
|
input is not in any human language.</p>
|
|
<pre class="example">
|
|
<span><emma:emma version="1.0"
|
|
xmlns:emma="http://www.w3.org/2003/04/emma"
|
|
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
|
xsi:schemaLocation="http://www.w3.org/2003/04/emma
|
|
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
|
|
xmlns="http://www.example.com/example">
|
|
<emma:interpretation id="pen1"
|
|
emma:medium="tactile"
|
|
emma:mode="ink"
|
|
emma:lang="zxx">
|
|
<location>
|
|
<type>area</type>
|
|
<points>42.1345 -37.128 42.1346 -37.120 ... </points>
|
|
</location>
|
|
</emma:interpretation>
|
|
</emma:emma></span>
|
|
</pre>
|
|
<p>If inputs for which there is no information about whether the
|
|
source input is in a particular human language, and if so which
|
|
language, are annotated with <code>emma:lang,</code> then they MUST
|
|
be annotated as <code>emma:lang=""</code>. Furthermore, in cases
|
|
where there is not explicit <code>emma:lang</code> annotation, and
|
|
none is inherited from a higher element in the document, the
|
|
default value for <code>emma:lang</code> is <code>""</code> meaning
|
|
that there is no information about whether the source input is in a
|
|
language and if so which language.</p>
|
|
<p>The <code>xml:lang</code> and <code>emma:lang</code> attributes
|
|
serve uniquely different and equally important purposes. The role
|
|
of the <code>xml:lang</code> attribute in XML 1.0 is to indicate
|
|
the language used for character data content in an XML element or
|
|
document. In contrast, the <code>emma:lang</code> attribute is used
|
|
to indicate the language employed by a user when entering an input.
|
|
Critically, <code>emma:lang</code> annotates the language of the
|
|
signal originating from the user rather than the specific tokens
|
|
used at a particular stage of processing. This is most clearly
|
|
illustrated through consideration of an example involving multiple
|
|
stages of processing of a user input. Consider the following
|
|
scenario: EMMA is being used to represent three stages in the
|
|
processing of a spoken input to an system for ordering products.
|
|
The user input is in Italian, after speech recognition, the user
|
|
input is first translated into English, then a natural language
|
|
understanding system converts the English translation into a
|
|
product ID (which is not in any particular language). Since the
|
|
input signal is a user speaking Italian, the <code>emma:lang</code>
|
|
will be <code>emma:lang="it"</code> on all of these three stages of
|
|
processing. The <code>xml:lang</code> attribute, in contrast, will
|
|
initially be <code>"it"</code>, after translation the
|
|
<code>xml:lang</code> will be <code>"en-US"</code>, and after
|
|
language understanding it will be <code>"zxx"</code> since the
|
|
product ID is non-linguistic content. The following are examples of
|
|
EMMA documents corresponding to these three processing stages,
|
|
abbreviated to show the critical attributes for discussion here.
|
|
Note that <code><transcription></code>,
|
|
<code><translation></code>, and
|
|
<code><understanding></code> are application namespace
|
|
attributes, not part of the EMMA markup.<br /></p>
|
|
<pre class="example">
|
|
<span><emma:emma version="1.0"
|
|
xmlns:emma="http://www.w3.org/2003/04/emma"
|
|
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
|
xsi:schemaLocation="http://www.w3.org/2003/04/emma
|
|
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
|
|
xmlns="http://www.example.com/example">
|
|
<emma:interpretation emma:lang="it" emma:mode="voice" emma:medium="acoustic"><br />
|
|
<transcription xml:lang="it">condizionatore</transcription><br />
|
|
</emma:interpretation>
|
|
</emma:emma>
|
|
</span>
|
|
</pre>
|
|
<pre class="example">
|
|
<span><emma:emma version="1.0"
|
|
xmlns:emma="http://www.w3.org/2003/04/emma"
|
|
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
|
xsi:schemaLocation="http://www.w3.org/2003/04/emma
|
|
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
|
|
xmlns="http://www.example.com/example">
|
|
<emma:interpretation emma:lang="it" emma:mode="voice" emma:medium="acoustic">
|
|
<translation xml:lang="en-US">air conditioner</translation><br />
|
|
</emma:interpretation>
|
|
</emma:emma></span>
|
|
</pre>
|
|
<pre class="example">
|
|
<span><emma:emma version="1.0"
|
|
xmlns:emma="http://www.w3.org/2003/04/emma"
|
|
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
|
xsi:schemaLocation="http://www.w3.org/2003/04/emma
|
|
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
|
|
xmlns="http://www.example.com/example">
|
|
<emma:interpretation emma:lang="it" emma:mode="voice" emma:medium="acoustic"> <br />
|
|
<understanding xml:lang="zxx">id1456</understanding><br />
|
|
</emma:interpretation>
|
|
</emma:emma></span>
|
|
</pre>
|
|
<p>In order <span>to</span> handle inputs involving multiple
|
|
languages, such as through code switching, the
|
|
<code>emma:lang</code> tag MAY contain several language identifiers
|
|
separated by spaces.</p>
|
|
<pre class="example">
|
|
<emma:emma version="1.0"
|
|
xmlns:emma="http://www.w3.org/2003/04/emma"
|
|
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
|
xsi:schemaLocation="http://www.w3.org/2003/04/emma
|
|
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
|
|
xmlns="http://www.example.com/example">
|
|
<emma:interpretation id="int1"
|
|
emma:tokens="please stop arretez s'il vous plait"
|
|
emma:lang="en fr"
|
|
<span>emma:medium="acoustic" emma:mode="voice"</span>>
|
|
<command> CANCEL </command>
|
|
</emma:interpretation>
|
|
</emma:emma>
|
|
</pre>
|
|
<h3 id="s4.2.6">4.2.6 Reference to signal: <code>emma:signal</code>
|
|
<span>and <code>emma:signal-size</code></span> attributes</h3>
|
|
<table class="defn" summary="property definition" width="98%"
|
|
cellpadding="5" cellspacing="0">
|
|
<tbody>
|
|
<tr>
|
|
<th>Annotation</th>
|
|
<th>emma:signal</th>
|
|
</tr>
|
|
<tr>
|
|
<th>Definition</th>
|
|
<td>An attribute of type <code>xsd:anyURI</code> referencing the
|
|
input signal.</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Applies to</th>
|
|
<td><code>emma:interpretation</code>, <code>emma:one-of</code>,
|
|
<code>emma:group</code>, <code>emma:sequence</code>,
|
|
<span>and</span> application instance data.</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Annotation</th>
|
|
<th>emma:signal-size</th>
|
|
</tr>
|
|
<tr>
|
|
<th>Definition</th>
|
|
<td>An attribute <span>of type <code>xsd:nonNegativeInteger</code>
|
|
specifying</span> the size in eight bit octets of the referenced
|
|
source.</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Applies to</th>
|
|
<td><code>emma:interpretation</code>, <code>emma:one-of</code>,
|
|
<code>emma:group</code>, <code>emma:sequence</code>,
|
|
<span>and</span> application instance data.</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
<p>A URI reference to the signal that originated the input
|
|
recognition process MAY be represented in EMMA using the
|
|
<code>emma:signal</code> annotation.</p>
|
|
<p>Here is an example where the reference to a speech signal is
|
|
represented using the <code>emma:signal</code> annotation on the
|
|
<code>emma:interpretation</code> element:</p>
|
|
<pre class="example">
|
|
<emma:emma version="1.0"
|
|
xmlns:emma="http://www.w3.org/2003/04/emma"
|
|
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
|
xsi:schemaLocation="http://www.w3.org/2003/04/emma
|
|
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
|
|
xmlns="http://www.example.com/example">
|
|
<emma:interpretation id="intp1"
|
|
emma:signal="http://example.com/signals/sg23.bin"<br />
|
|
<span>emma:medium="acoustic" emma:mode="voice"</span>>
|
|
<origin>Boston</origin>
|
|
<destination>Denver</destination>
|
|
<date>03152003</date>
|
|
</emma:interpretation>
|
|
</emma:emma>
|
|
</pre>
|
|
<p>The <code>emma:signal-size</code> annotation can be used to
|
|
declare the exact size of the associated signal in 8-bit octets. An
|
|
example of the use of an EMMA document to represent a recording,
|
|
with <code>emma:signal-size</code> indicating the size is as
|
|
follows:</p>
|
|
<pre class="example">
|
|
<span>
|
|
<emma:emma version="1.0"
|
|
xmlns:emma="http://www.w3.org/2003/04/emma"
|
|
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
|
xsi:schemaLocation="http://www.w3.org/2003/04/emma
|
|
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
|
|
xmlns="http://www.example.com/example">
|
|
<emma:interpretation id="intp1"
|
|
emma:medium="acoustic"
|
|
emma:mode="voice"
|
|
emma:function="recording"
|
|
emma:uninterpreted="true"
|
|
emma:signal="http://example.com/signals/recording.mpg"
|
|
emma:signal-size="82102"
|
|
emma:duration="10000">
|
|
</emma:interpretation>
|
|
</emma:emma>
|
|
</span>
|
|
</pre>
|
|
<h3 id="s4.2.7">4.2.7 Media type: <code>emma:media-type</code>
|
|
attribute</h3>
|
|
<table class="defn" summary="property definition" width="98%"
|
|
cellpadding="5" cellspacing="0">
|
|
<tbody>
|
|
<tr>
|
|
<th>Annotation</th>
|
|
<th>emma:media-type</th>
|
|
</tr>
|
|
<tr>
|
|
<th>Definition</th>
|
|
<td>An attribute of type <code>xsd:string</code> holding the MIME
|
|
type associated with the signal's data format.</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Applies to</th>
|
|
<td><code>emma:interpretation</code>, <code>emma:one-of</code>,
|
|
<code>emma:group</code>, <code>emma:sequence</code>,
|
|
<code>emma:endpoint</code>, <span>and</span> application instance
|
|
data.</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
<p>The data format of the signal that originated the input MAY be
|
|
represented in EMMA using the <code>emma:media-type</code>
|
|
annotation. An initial set of MIME media types is defined by
|
|
[<a href="#RFC2046">RFC2046</a>].</p>
|
|
<p>Here is an example where the media type for the ETSI ES 202 212
|
|
audio codec for Distributed Speech Recognition (DSR) is applied to
|
|
the <code>emma:interpretation</code> element. The example also
|
|
specifies an optional sampling rate of 8 kHz and maxptime of 40
|
|
milliseconds.</p>
|
|
<pre class="example">
|
|
<emma:emma version="1.0"
|
|
xmlns:emma="http://www.w3.org/2003/04/emma"
|
|
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
|
xsi:schemaLocation="http://www.w3.org/2003/04/emma
|
|
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
|
|
xmlns="http://www.example.com/example">
|
|
<emma:interpretation id="intp1"<span>
|
|
emma:signal="http://example.com/signals/signal.dsr"</span>
|
|
emma:media-type="audio/dsr-<span>es</span>202212; rate:8000; maxptime:40"<br />
|
|
<span>emma:medium="acoustic" emma:mode="voice"</span>>
|
|
<origin>Boston</origin>
|
|
<destination>Denver</destination>
|
|
<date>03152003</date>
|
|
</emma:interpretation>
|
|
</emma:emma>
|
|
</pre>
|
|
<h3 id="s4.2.8">4.2.8 Confidence scores:
|
|
<code>emma:confidence</code> attribute</h3>
|
|
<table class="defn" summary="property definition" width="98%"
|
|
cellpadding="5" cellspacing="0">
|
|
<tbody>
|
|
<tr>
|
|
<th>Annotation</th>
|
|
<th>emma:confidence</th>
|
|
</tr>
|
|
<tr>
|
|
<th>Definition</th>
|
|
<td>An attribute of type <code>xsd:decimal</code> in range 0.0 to
|
|
1.0, indicating the processor's confidence in the result.</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Applies to</th>
|
|
<td><code>emma:interpretation</code>, <code>emma:one-of</code>,
|
|
<code>emma:group</code>, <code>emma:sequence</code>, and
|
|
application instance data.</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
<p>The confidence score in EMMA is used to indicate the quality of
|
|
the input, and if confidence is annotated on an input it MUST be
|
|
given as the value of <code>emma:confidence</code>. The confidence
|
|
score MUST be a number in the range from 0.0 to 1.0 inclusive. A
|
|
value of 0.0 indicates minimum confidence, and a value of 1.0
|
|
indicates maximum confidence. Note that
|
|
<code>emma:confidence</code> represents not only the confidence of
|
|
the speech recognizer, but rather the confidence of the whatever
|
|
processor was responsible for creating the EMMA result, based on
|
|
whatever evidence it has. For a natural language interpretation,
|
|
for example, this might include semantic heuristics in addition to
|
|
speech recognition scores. Moreover, the confidence score values do
|
|
not have to be interpreted as probabilities. In fact confidence
|
|
score values are platform-dependent, since their computation is
|
|
likely to differ between platforms and different EMMA processors.
|
|
Confidence scores are annotated explicitly in EMMA in order to
|
|
provide this information to the subsequent processes for multimodal
|
|
interaction. The example below illustrates how confidence scores
|
|
are annotated in EMMA.</p>
|
|
<pre class="example">
|
|
<emma:emma version="1.0"
|
|
xmlns:emma="http://www.w3.org/2003/04/emma"
|
|
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
|
xsi:schemaLocation="http://www.w3.org/2003/04/emma
|
|
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
|
|
xmlns="http://www.example.com/example">
|
|
<emma:one-of id="nbest1"<br />
|
|
<span>emma:medium="acoustic" emma:mode="voice"</span>>
|
|
<emma:interpretation id="meaning1" emma:confidence="0.6">
|
|
<location>Boston</location>
|
|
</emma:interpretation>
|
|
|
|
<emma:interpretation id="meaning2" emma:confidence="0.4">
|
|
<location> Austin </location>
|
|
</emma:interpretation>
|
|
</emma:one-of>
|
|
</emma:emma>
|
|
</pre>
|
|
<p>In addition to its use as an attribute on the EMMA
|
|
interpretation and container elements, the
|
|
<code>emma:confidence</code> attribute MAY also be used to assign
|
|
confidences to elements in instance data in the application
|
|
namespace. This can be seen in the following example, where the
|
|
<code><destination></code> and <code><origin></code>
|
|
elements have confidences.</p>
|
|
<pre class="example">
|
|
<emma:emma version="1.0"
|
|
xmlns:emma="http://www.w3.org/2003/04/emma"
|
|
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
|
xsi:schemaLocation="http://www.w3.org/2003/04/emma
|
|
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
|
|
xmlns="http://www.example.com/example">
|
|
<emma:interpretation id="meaning1" emma:confidence="0.6"
|
|
<span>emma:medium="acoustic" emma:mode="voice"</span>>
|
|
<destination emma:confidence="0.8"> Boston</destination>
|
|
<origin emma:confidence="0.6"> Austin </origin>
|
|
</emma:interpretation>
|
|
</emma:emma>
|
|
</pre>
|
|
<p>Although in general instance data can be represented in XML
|
|
using a combination of elements and attributes in the application
|
|
namespace, EMMA does not provide a standard way to annotate
|
|
processors' confidences in attributes. Consequently, instance data
|
|
that is expected to be assigned confidences SHOULD be represented
|
|
using elements, as in the above example.</p>
|
|
<h3 id="s4.2.9">4.2.9 Input source: <code>emma:source</code>
|
|
attribute</h3>
|
|
<table class="defn" summary="property definition" width="98%"
|
|
cellpadding="5" cellspacing="0">
|
|
<tbody>
|
|
<tr>
|
|
<th>Annotation</th>
|
|
<th>emma:source</th>
|
|
</tr>
|
|
<tr>
|
|
<th>Definition</th>
|
|
<td>An attribute of type <code>xsd:anyURI</code> referencing the
|
|
source of input.</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Applies to</th>
|
|
<td><code>emma:interpretation</code>, <code>emma:one-of</code>,
|
|
<code>emma:group</code> , <code>emma:sequence</code>, and
|
|
application instance data.</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
<p>The source of an interpreted input MAY be represented in EMMA as
|
|
a URI resource using the <code>emma:source</code> annotation.</p>
|
|
<p>Here is an example that shows different input sources for
|
|
different input interpretations.</p>
|
|
<pre class="example">
|
|
<emma:emma version="1.0"
|
|
xmlns:emma="http://www.w3.org/2003/04/emma"
|
|
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
|
xsi:schemaLocation="http://www.w3.org/2003/04/emma
|
|
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
|
|
xmlns="http://www.example.com/example"
|
|
xmlns:myapp="http://www.example.com/myapp">
|
|
<emma:one-of id="nbest1"<br />
|
|
<span>emma:medium="acoustic" emma:mode="voice"</span>>
|
|
<emma:interpretation id="intp1"
|
|
emma:source="http://example.com/microphone/NC-61">
|
|
<myapp:destination>Boston</myapp:destination>
|
|
</emma:interpretation>
|
|
|
|
<emma:interpretation id="intp2"
|
|
emma:source="http://example.com/microphone/NC-4024">
|
|
<myapp:destination>Austin</myapp:destination>
|
|
</emma:interpretation>
|
|
</emma:one-of>
|
|
</emma:emma>
|
|
</pre>
|
|
<h3 id="s4.2.10">4.2.10 Timestamps</h3>
|
|
<p>The start and end times for input MAY be indicated using either
|
|
absolute timestamps or relative timestamps. Both are in
|
|
milliseconds for ease in processing timestamps. Note that the
|
|
ECMAScript Date object's <code>getTime()</code> function is a
|
|
convenient way to determine the absolute time.</p>
|
|
<h4 id="s4.2.10.1">4.2.10.1 Absolute timestamps:
|
|
<code>emma:start</code>, <code>emma:end</code> attributes</h4>
|
|
<table class="defn" summary="property definition" width="98%"
|
|
cellpadding="5" cellspacing="0">
|
|
<tbody>
|
|
<tr>
|
|
<th>Annotation</th>
|
|
<th>emma:start, emma:end</th>
|
|
</tr>
|
|
<tr>
|
|
<th>Definition</th>
|
|
<td>Attributes <span>of type
|
|
<code>xsd:nonNegativeInteger</code></span> indicating the absolute
|
|
starting and ending times of an input in terms of the number of
|
|
milliseconds since 1 January 1970 00:00:00 GMT</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Applies to</th>
|
|
<td><code>emma:interpretation</code>, <code>emma:group</code>,
|
|
<code>emma:one-of</code>, <code>emma:sequence</code>,
|
|
<code>emma:arc</code>, <span>and</span> application instance
|
|
data</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
<p>Here is an example of a timestamp for an absolute time.</p>
|
|
<pre class="example">
|
|
<emma:emma version="1.0"
|
|
xmlns:emma="http://www.w3.org/2003/04/emma"
|
|
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
|
xsi:schemaLocation="http://www.w3.org/2003/04/emma
|
|
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
|
|
xmlns="http://www.example.com/example">
|
|
<emma:interpretation id="int1"
|
|
emma:start="1087995961542"
|
|
emma:end="1087995963542"<br />
|
|
<span>emma:medium="acoustic" emma:mode="voice"</span>>
|
|
<destination>Chicago</destination>
|
|
</emma:interpretation>
|
|
</emma:emma>
|
|
</pre>
|
|
<p>The <code>emma:start</code> and <code>emma:end</code>
|
|
annotations on an input MAY be identical, however the
|
|
<code>emma:end</code> value MUST NOT be less than the
|
|
<code>emma:start</code> value.</p>
|
|
<h4 id="s4.2.10.2">4.2.10.2 Relative timestamps:
|
|
<code>emma:time-ref-uri</code>,
|
|
<code>emma:time-ref-anchor-point</code>,
|
|
<code>emma:offset-to-start</code> attributes</h4>
|
|
<table class="defn" summary="property definition" width="98%"
|
|
cellpadding="5" cellspacing="0">
|
|
<tbody>
|
|
<tr>
|
|
<th>Annotation</th>
|
|
<th>emma:time-ref-uri</th>
|
|
</tr>
|
|
<tr>
|
|
<th>Definition</th>
|
|
<td>Attribute of type <code>xsd:anyURI</code> indicating the URI
|
|
used to anchor the relative timestamp.</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Applies to</th>
|
|
<td><code>emma:interpretation</code>, <code>emma:group</code>,
|
|
<code>emma:one-of</code>, <code>emma:sequence</code>,
|
|
<code>emma:lattice</code>, <span>and</span> application instance
|
|
data</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Annotation</th>
|
|
<th>emma:time-ref-anchor-point</th>
|
|
</tr>
|
|
<tr>
|
|
<th>Definition</th>
|
|
<td>Attribute with a value of <code>start</code> or
|
|
<code>end</code>, defaulting to <code>start</code>. It indicates
|
|
whether to measure the time from the start or end of the interval
|
|
designated with <code>emma:time-ref-uri</code>.</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Applies to</th>
|
|
<td><code>emma:interpretation</code>, <code>emma:group</code>,
|
|
<code>emma:one-of</code>, <code>emma:sequence</code>,
|
|
<code>emma:lattice</code>, <span>and</span> application instance
|
|
data</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Annotation</th>
|
|
<th>emma:offset-to-start</th>
|
|
</tr>
|
|
<tr>
|
|
<th>Definition</th>
|
|
<td>Attribute <span>of type <code>xsd:integer</code></span>,
|
|
defaulting to zero. It specifies the offset in milliseconds for the
|
|
start of input from the anchor point designated with
|
|
<span><code>emma:time-ref-uri</code></span> and
|
|
<span><code>emma:time-ref-anchor-point</code></span></td>
|
|
</tr>
|
|
<tr>
|
|
<th>Applies to</th>
|
|
<td><code>emma:interpretation</code>, <code>emma:group</code>,
|
|
<code>emma:one-of</code>, <code>emma:sequence</code>,
|
|
<code>emma:arc</code>, <span>and</span> application instance
|
|
data</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
<p>Relative timestamps define the start of an input relative to the
|
|
start or end of a reference interval such as another input.</p>
|
|
<p><img alt="relative timestamps" src=
|
|
"relativetimestamps.png" /></p>
|
|
<p>The reference interval is designated with
|
|
<code>emma:time-ref-uri</code> attribute. This MAY be combined with
|
|
<code>emma:time-ref-anchor-point</code> attribute to specify
|
|
whether the anchor point is the start or end of this interval. The
|
|
start of an input relative to this anchor point is then specified
|
|
with <code>emma:offset-to-start</code> attribute.</p>
|
|
<p>Here is an example where the referenced input is in the same
|
|
document:</p>
|
|
<pre class="example">
|
|
<emma:emma version="1.0"
|
|
xmlns:emma="http://www.w3.org/2003/04/emma"
|
|
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
|
xsi:schemaLocation="http://www.w3.org/2003/04/emma
|
|
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
|
|
xmlns="http://www.example.com/example">
|
|
<emma:sequence>
|
|
<emma:interpretation id="int1"<br />
|
|
<span>emma:medium="acoustic" emma:mode="voice"</span>>
|
|
<origin>Denver</origin>
|
|
</emma:interpretation>
|
|
<emma:interpretation id="int2"<br />
|
|
<span>emma:medium="acoustic" emma:mode="voice"</span>
|
|
emma:time-ref-uri="#int1"
|
|
emma:time-ref-anchor-point="start"
|
|
emma:offset-to-start="5000">
|
|
<destination>Chicago</destination>
|
|
</emma:interpretation>
|
|
</emma:sequence>
|
|
</emma:emma>
|
|
</pre>
|
|
<p>Note that the reference point refers to an input, but not
|
|
necessarily to a complete input. For example, if a speech
|
|
recognizer timestamps each word in an utterance, the anchor point
|
|
might refer to the timestamp for just one word.</p>
|
|
<p>The absolute and relative timestamps are not mutually exclusive;
|
|
that is, it is possible to have both relative and absolute
|
|
timestamp attributes on the same EMMA container element.</p>
|
|
<p>Timestamps of inputs collected by different devices will be
|
|
subject to variation if the times maintained by the devices are not
|
|
synchronized. This concern is outside of the scope of the EMMA
|
|
specification.</p>
|
|
<h4 id="s4.2.10.3">4.2.10.3 Duration of input:
|
|
<code>emma:duration</code> attribute</h4>
|
|
<table class="defn" summary="property definition" width="98%"
|
|
cellpadding="5" cellspacing="0">
|
|
<tbody>
|
|
<tr>
|
|
<th>Annotation</th>
|
|
<th>emma:duration</th>
|
|
</tr>
|
|
<tr>
|
|
<th>Definition</th>
|
|
<td>Attribute <span>of type
|
|
<code>xsd:nonNegativeInteger</code></span>, defaulting to zero. It
|
|
specifies the duration of the input in milliseconds.</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Applies to</th>
|
|
<td><code>emma:interpretation</code>, <code>emma:group</code>,
|
|
<code>emma:one-of</code>, <code>emma:sequence</code>,
|
|
<code>emma:arc</code>, <span>and</span> application instance
|
|
data</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
<p>The duration of an input in milliseconds MAY be specified with
|
|
the <code>emma:duration</code> attribute. The
|
|
<code>emma:duration</code> attribute MAY be used either in
|
|
combination with timestamps or independently, for example in the
|
|
annotation of speech corpora.</p>
|
|
<p>In the following example, the duration of the signal that gave
|
|
rise to the interpretation is indicated using
|
|
<code>emma:duration</code>.</p>
|
|
<pre class="example">
|
|
<emma:emma version="1.0"
|
|
xmlns:emma="http://www.w3.org/2003/04/emma"
|
|
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
|
xsi:schemaLocation="http://www.w3.org/2003/04/emma
|
|
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
|
|
xmlns="http://www.example.com/example">
|
|
<emma:interpretation id="int1" emma:duration="2300"<br />
|
|
<span>emma:medium="acoustic" emma:mode="voice"</span>>
|
|
<origin>Denver</origin>
|
|
</emma:interpretation>
|
|
</emma:emma>
|
|
</pre>
|
|
<h4 id="s4.2.10.4">4.2.10.4 Composite Input and Relative
|
|
Timestamps</h4>
|
|
<p>This section is informative.</p>
|
|
<p>The following table provides guidance on how to determine the
|
|
values of relative timestamps on a composite input.</p>
|
|
<div>
|
|
<table summary="3 columns" border="1" cellpadding="3" cellspacing=
|
|
"0">
|
|
<caption>Informative Guidance on Relative Timestamps in Composite
|
|
Derivations</caption>
|
|
<tbody>
|
|
<tr>
|
|
<td><code>emma:time-ref-uri</code></td>
|
|
<td>If the reference interval URI is the same for both inputs then
|
|
it should be the same for the composite input. If it is not the
|
|
same then relative timestamps will have to be resolved to absolute
|
|
timestamps in order to determine the combined timestamp. .</td>
|
|
</tr>
|
|
<tr>
|
|
<td><code>emma:time-ref-anchor-point</code></td>
|
|
<td>If the anchor value is the same for both inputs then it should
|
|
be the same for the composite input. If it is not the same then
|
|
relative timestamps will have to be resolved to absolute timestamps
|
|
in order to determine the combined timestamp.</td>
|
|
</tr>
|
|
<tr>
|
|
<td><code>emma:offset-to-start</code></td>
|
|
<td>Given that the <code>emma:time-ref-uri</code> and
|
|
<code>emma:time-ref-anchor-point</code> are the same for both
|
|
combining inputs, then the <code>emma:offset-to-start</code> for
|
|
the combination should be the lesser of the two. If they are not
|
|
the same then relative timestamps will have to be resolved to
|
|
absolute timestamps in order to determine the combined
|
|
timestamp.</td>
|
|
</tr>
|
|
<tr>
|
|
<td><code>emma:duration</code></td>
|
|
<td>Given that the <code>emma:time-ref-uri</code> and
|
|
<code>emma:time-ref-anchor-point</code> are the same for both
|
|
combining inputs, then the <code>emma:duration</code> is calculated
|
|
as follows. Add together the <code>emma:offset-to-start</code> and
|
|
<code>emma:duration</code> for each of the inputs. Take whichever
|
|
of these is greater and subtract from it the lesser of the
|
|
<code>emma:offset-to-start</code> values in order to determine the
|
|
combined duration. If <code>emma:time-ref-uri</code> and
|
|
<code>emma:time-ref-anchor-point</code> are not the same then
|
|
relative timestamps will have to be resolved to absolute timestamps
|
|
in order to determine the combined timestamp.</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
<h3 id="s4.2.11">4.2.11 Medium, mode, and function of user inputs:
|
|
<code>emma:medium</code>, <code>emma:mode</code>,
|
|
<code>emma:function</code>, <code>emma:verbal</code>
|
|
attributes</h3>
|
|
<table class="defn" summary="property definition" width="98%"
|
|
cellpadding="5" cellspacing="0">
|
|
<tbody>
|
|
<tr>
|
|
<th>Annotation</th>
|
|
<th>emma:medium</th>
|
|
</tr>
|
|
<tr>
|
|
<th>Definition</th>
|
|
<td>An attribute of type <span><code>xsd:nmtokens</code></span>
|
|
<span>which contains a space delimited set of values from the
|
|
set</span> {<code>acoustic</code>, <code>tactile</code>,
|
|
<code>visual</code>}.</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Applies to</th>
|
|
<td><code>emma:interpretation</code>, <code>emma:group</code>,
|
|
<code>emma:one-of</code>, <code>emma:sequence</code>,
|
|
<code>emma:endpoint</code>, and application instance data</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Annotation</th>
|
|
<th>emma:mode</th>
|
|
</tr>
|
|
<tr>
|
|
<th>Definition</th>
|
|
<td>An attribute of type <span><code>xsd:nmtokens</code></span>
|
|
<span>which contains a space delimited set of values from</span> an
|
|
open set of values including: {<span><code>voice</code>,
|
|
<code>dtmf</code></span>, <code>ink</code>, <code>gui</code>,
|
|
<code>keys</code>, <code>video</code>, <code>photograph</code>,
|
|
...}.</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Applies to</th>
|
|
<td><code>emma:interpretation</code>, <code>emma:group</code>,
|
|
<code>emma:one-of</code>, <code>emma:sequence</code>,
|
|
<code>emma:endpoint</code>, and application instance data</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Annotation</th>
|
|
<th>emma:function</th>
|
|
</tr>
|
|
<tr>
|
|
<th>Definition</th>
|
|
<td>An attribute of type <code>xsd:string</code> constrained to
|
|
values in the open set {<code>recording</code>,
|
|
<code>transcription</code>, <code>dialog</code>,
|
|
<code>verification</code>, ...}.</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Applies to</th>
|
|
<td><code>emma:interpretation</code>, <code>emma:group</code>,
|
|
<code>emma:one-of</code>, <code>emma:sequence</code>, and
|
|
application instance data</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Annotation</th>
|
|
<th>emma:verbal</th>
|
|
</tr>
|
|
<tr>
|
|
<th>Definition</th>
|
|
<td>An attribute of type <code>xsd:boolean</code>.</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Applies to</th>
|
|
<td><code>emma:interpretation</code>, <code>emma:group</code>,
|
|
<code>emma:one-of</code>, <code>emma:sequence</code>, and
|
|
application instance data</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
<p>EMMA provides two properties for the annotation of input
|
|
modality. One indicating the broader medium or channel
|
|
(<code>emma:medium</code>) and another indicating the specific mode
|
|
of communication used on that channel (<code>emma:mode</code>). The
|
|
input medium is defined from the users perspective and indicates
|
|
whether they use their voice (<code>acoustic</code>), touch
|
|
(<code>tactile</code>), or visual appearance/motion
|
|
(<code>visual</code>) as input. Tactile includes most
|
|
<i>hand-on</i> input device types such as pen, mouse, keyboard, and
|
|
touch screen. Visual is used for camera input.</p>
|
|
<pre class="example">
|
|
emma:medium = <span>space delimited sequence of values from the set: </span>
|
|
[acoustic|tactile|visual]
|
|
</pre>
|
|
<p>The mode property provides the ability to distinguish between
|
|
different modes of communication that may be within a particular
|
|
medium. For example, in the tactile medium, modes include
|
|
electronic ink (<code>ink</code>), and pointing and clicking on a
|
|
graphical user interface (<code>gui</code>).</p>
|
|
<pre class="example">
|
|
emma:mode = <span>space delimited sequence of values from the set: </span>
|
|
[<span>voice|dtmf</span>|ink|gui|keys|video|photograph| ... ]
|
|
</pre>
|
|
<p>The <code>emma:medium</code> classification is based on the
|
|
boundary between the user and the device that they use. For
|
|
<code>emma:medium="tactile"</code> the user physically touches the
|
|
device in order to provide input. For
|
|
<code>emma:medium="visual"</code> the user's movement is captured
|
|
by sensors (cameras, infrared) resulting in an input to the system.
|
|
In the case where <code>emma:medium="acoustic"</code> the user
|
|
provides input to the system by producing an acoustic signal. Note
|
|
then that DTMF input will be classified as
|
|
<code>emma:medium="tactile"</code> since in order to provide DTMF
|
|
input the user physically presses keys on a keypad.</p>
|
|
<p>While <code>emma:medium</code> and <code>emma:mode</code> are
|
|
optional on specific elements such as
|
|
<code>emma:interpretation</code> and <code>emma:one-of</code>, note
|
|
that all EMMA interpretations must be annotated for
|
|
<code>emma:medium</code> and <code>emma:mode</code>, so either
|
|
these attributes must appear directly on
|
|
<code>emma:interpretation</code> or they must appear on an ancestor
|
|
<code>emma:one-of</code> node or they must appear on an earlier
|
|
stage of the derivation listed in <code>emma:derivation</code>.</p>
|
|
<p>Orthogonal to the mode, user inputs can also be classified with
|
|
respect to their communicative function. This enables a simpler
|
|
mode classification.</p>
|
|
<pre class="example">
|
|
emma:function = [recording|transcription|dialog|verification| ... ]
|
|
</pre>
|
|
<p>For example, speech can be used for recording (e.g. voicemail),
|
|
transcription (e.g. dictation), dialog (e.g. interactive spoken
|
|
dialog systems), and verification (e.g. identifying users through
|
|
their voiceprints).</p>
|
|
<p>EMMA also supports an additional property
|
|
<code>emma:verbal</code> which distinguishes verbal use of an input
|
|
mode from non-verbal. This MAY be used to distinguish the use of
|
|
electronic ink to convey handwritten commands from the user of
|
|
electronic ink for symbolic gestures such as circles and arrows.
|
|
Handwritten commands, such as writing <i>downtown</i> in order to
|
|
change a map display to show the downtown are classified as verbal
|
|
(<code>emma:function="dialog" emma:verbal="true"</code>). Pen
|
|
gestures (arrows, lines, circles, etc), such as circling a
|
|
building, are classified as non-verbal dialog
|
|
(<code>emma:function="dialog" emma:verbal="false"</code>). The use
|
|
of handwritten words to transcribe an email message is classified
|
|
as transcription (<code>emma:function="transcription"
|
|
emma:verbal="true"</code>).</p>
|
|
<pre class="example">
|
|
emma:verbal = [true|false]
|
|
</pre>
|
|
<p>Handwritten words and ink gestures are typically recognized
|
|
using different kinds of recognition components (handwriting
|
|
recognizer vs. gesture recognizer) and the verbal annotation will
|
|
be added by the recognition component which classifies the input.
|
|
The original input source, a pen in this case, will not be aware of
|
|
this difference. The input source identifier will tell you that the
|
|
input was from a pen of some kind but will not tell you if the mode
|
|
of input was handwriting (<i>show downtown</i>) or gesture (e.g.
|
|
circling an object or area).</p>
|
|
<p>Here is an example of the EMMA annotation for a pen input where
|
|
the user's ink is recognized as either a word ("Boston") or as an
|
|
arrow:</p>
|
|
<pre class="example">
|
|
<emma:emma version="1.0"
|
|
xmlns:emma="http://www.w3.org/2003/04/emma"
|
|
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
|
xsi:schemaLocation="http://www.w3.org/2003/04/emma
|
|
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
|
|
xmlns="http://www.example.com/example">
|
|
<emma:one-of id="nbest1">
|
|
<emma:interpretation id="interp1"
|
|
emma:confidence="0.6"
|
|
emma:medium="tactile"
|
|
emma:mode="ink"
|
|
emma:function="dialog"
|
|
emma:verbal="true">
|
|
<location>Boston</location>
|
|
</emma:interpretation>
|
|
|
|
<emma:interpretation id="interp2"
|
|
emma:confidence="0.4"
|
|
emma:medium="tactile"
|
|
emma:mode="ink"
|
|
emma:function="dialog"
|
|
emma:verbal="false">
|
|
<direction>45</direction>
|
|
</emma:interpretation>
|
|
</emma:one-of>
|
|
</emma:emma>
|
|
</pre>
|
|
<p>Here is an example of the EMMA annotation for a spoken command
|
|
which is recognized as either "Boston" or "Austin":</p>
|
|
<pre class="example">
|
|
<emma:emma version="1.0"
|
|
xmlns:emma="http://www.w3.org/2003/04/emma"
|
|
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
|
xsi:schemaLocation="http://www.w3.org/2003/04/emma
|
|
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
|
|
xmlns="http://www.example.com/example">
|
|
<emma:one-of>
|
|
<emma:interpretation id="interp1"
|
|
emma:confidence="0.6"
|
|
emma:medium="acoustic"
|
|
emma:mode="voice"
|
|
emma:function="dialog"
|
|
emma:verbal="true">
|
|
<location>Boston</location>
|
|
</emma:interpretation>
|
|
|
|
<emma:interpretation id="interp2"
|
|
emma:confidence="0.4"
|
|
emma:medium="acoustic"
|
|
emma:mode="voice"
|
|
emma:function="dialog"
|
|
emma:verbal="true">
|
|
<location>Austin</location>
|
|
</emma:interpretation>
|
|
</emma:one-of>
|
|
</emma:emma>
|
|
</pre>
|
|
<p>The following table shows the relationship between the medium,
|
|
mode, and function properties and serves as an aid for classifying
|
|
inputs. For the dialog function it also shows some examples of the
|
|
classification of inputs as verbal vs. non-verbal.</p>
|
|
<table class="modes" summary="7 columns" border="1" cellpadding="3"
|
|
cellspacing="0">
|
|
<tbody>
|
|
<tr>
|
|
<th rowspan="2">Medium</th>
|
|
<th rowspan="2">Device</th>
|
|
<th rowspan="2">Mode</th>
|
|
<th colspan="4">Function</th>
|
|
</tr>
|
|
<tr>
|
|
<th>recording</th>
|
|
<th>dialog</th>
|
|
<th>transcription</th>
|
|
<th>verification</th>
|
|
</tr>
|
|
<tr>
|
|
<td rowspan="2">acoustic</td>
|
|
<td rowspan="2">microphone</td>
|
|
<td rowspan="2">voice</td>
|
|
<td rowspan="2">audiofile (e.g. voicemail)</td>
|
|
<td>spoken command / query / response (verbal = true)</td>
|
|
<td rowspan="2">dictation</td>
|
|
<td rowspan="2">speaker recognition</td>
|
|
</tr>
|
|
<tr>
|
|
<td>singing a note (verbal = false)</td>
|
|
</tr>
|
|
<tr>
|
|
<td rowspan="14">tactile</td>
|
|
<td rowspan="2">keypad</td>
|
|
<td rowspan="2">dtmf</td>
|
|
<td rowspan="2">audiofile / character stream</td>
|
|
<td>typed command / query / response (verbal = true)</td>
|
|
<td rowspan="2">text entry (T9-tegic, word completion, or word
|
|
grammar)</td>
|
|
<td rowspan="2">password / pin entry</td>
|
|
</tr>
|
|
<tr>
|
|
<td>command key "Press 9 for sales" (verbal = false)</td>
|
|
</tr>
|
|
<tr>
|
|
<td rowspan="2">keyboard</td>
|
|
<td rowspan="2">dtmf</td>
|
|
<td rowspan="2">character / key-code stream</td>
|
|
<td>typed command / query / response (verbal = true)</td>
|
|
<td rowspan="2">typing</td>
|
|
<td rowspan="2">password / pin entry</td>
|
|
</tr>
|
|
<tr>
|
|
<td>command key "Press S for sales" (verbal = false)</td>
|
|
</tr>
|
|
<tr>
|
|
<td rowspan="4">pen</td>
|
|
<td rowspan="2">ink</td>
|
|
<td rowspan="2">trace, sketch</td>
|
|
<td>handwritten command / query / response (verbal = true)</td>
|
|
<td rowspan="2">handwritten text entry</td>
|
|
<td rowspan="2">signature, handwriting recognition</td>
|
|
</tr>
|
|
<tr>
|
|
<td>gesture (e.g. circling building) (verbal = false)</td>
|
|
</tr>
|
|
<tr>
|
|
<td rowspan="2">gui</td>
|
|
<td rowspan="2">N/A</td>
|
|
<td>tapping on named button (verbal = true)</td>
|
|
<td rowspan="2">soft keyboard</td>
|
|
<td rowspan="2">password / pin entry</td>
|
|
</tr>
|
|
<tr>
|
|
<td>drag and drop, tapping on map (verbal = false)</td>
|
|
</tr>
|
|
<tr>
|
|
<td rowspan="4">mouse</td>
|
|
<td rowspan="2">ink</td>
|
|
<td rowspan="2">trace, sketch</td>
|
|
<td>handwritten command / query / response (verbal = true)</td>
|
|
<td rowspan="2">handwritten text entry</td>
|
|
<td rowspan="2">N/A</td>
|
|
</tr>
|
|
<tr>
|
|
<td>gesture (e.g. circling building) (verbal = false)</td>
|
|
</tr>
|
|
<tr>
|
|
<td rowspan="2">gui</td>
|
|
<td rowspan="2">N/A</td>
|
|
<td>clicking named button (verbal = true)</td>
|
|
<td rowspan="2">soft keyboard</td>
|
|
<td rowspan="2">password / pin entry</td>
|
|
</tr>
|
|
<tr>
|
|
<td>drag and drop, clicking on map (verbal = false)</td>
|
|
</tr>
|
|
<tr>
|
|
<td rowspan="2">joystick</td>
|
|
<td>ink</td>
|
|
<td>trace,sketch</td>
|
|
<td>gesture (e.g. circling building) (verbal = false)</td>
|
|
<td>N/A</td>
|
|
<td>N/A</td>
|
|
</tr>
|
|
<tr>
|
|
<td>gui</td>
|
|
<td>N/A</td>
|
|
<td>pointing, clicking button / menu (verbal = false)</td>
|
|
<td>soft keyboard</td>
|
|
<td>password / pin entry</td>
|
|
</tr>
|
|
<tr>
|
|
<td rowspan="5">visual</td>
|
|
<td rowspan="2">page scanner</td>
|
|
<td rowspan="2">photograph</td>
|
|
<td rowspan="2">image</td>
|
|
<td>handwritten command / query / response (verbal = true)</td>
|
|
<td rowspan="2">optical character recognition, object/scene
|
|
recognition (markup, e.g. SVG)</td>
|
|
<td rowspan="2">N/A</td>
|
|
</tr>
|
|
<tr>
|
|
<td>drawings and images (verbal = false)</td>
|
|
</tr>
|
|
<tr>
|
|
<td>still camera</td>
|
|
<td>photograph</td>
|
|
<td>image</td>
|
|
<td>objects (verbal = false)</td>
|
|
<td>visual object/scene recognition</td>
|
|
<td>face id, retinal scan</td>
|
|
</tr>
|
|
<tr>
|
|
<td rowspan="2">video camera</td>
|
|
<td rowspan="2">video</td>
|
|
<td rowspan="2">movie</td>
|
|
<td>sign language (verbal = true)</td>
|
|
<td rowspan="2">audio/visual recognition</td>
|
|
<td rowspan="2">face id, gait id, retinal scan</td>
|
|
</tr>
|
|
<tr>
|
|
<td>face / hand / arm / body gesture (e.g. pointing, facing)
|
|
(verbal = false)</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
<h3 id="s4.2.12">4.2.12 Composite multimodality:
|
|
<code>emma:hook</code> attribute</h3>
|
|
<table class="defn" summary="property definition" width="98%"
|
|
cellpadding="5" cellspacing="0">
|
|
<tbody>
|
|
<tr>
|
|
<th>Annotation</th>
|
|
<th>emma:hook</th>
|
|
</tr>
|
|
<tr>
|
|
<th>Definition</th>
|
|
<td>An attribute of type <code>xsd:string</code> constrained to
|
|
values in the open set {<code>voice</code>, <code>dtmf</code>,
|
|
<code>ink</code>, <code>gui</code>, <code>keys</code>,
|
|
<code>video</code>, <code>photograph</code>, ...} or the wildcard
|
|
<code>any</code></td>
|
|
</tr>
|
|
<tr>
|
|
<th>Applies to</th>
|
|
<td>Application instance data</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
<p>The attribute <code>emma:hook</code> MAY be used to mark the
|
|
elements in the application semantics within an
|
|
<code>emma:interpretation</code> which are expected to be
|
|
integrated with content from input in another mode to yield a
|
|
complete interpretation. The <code>emma:mode</code> to be
|
|
integrated at that point in the application semantics is indicated
|
|
as the value of the <code>emma:hook</code> attribute. The possible
|
|
values of <code>emma:hook</code> are the list of input modes that
|
|
can be values of <code>emma:mode</code> <span>(see <a href=
|
|
"#s4.2.11">Section 4.2.11</a>)</span>. In addition to these, the
|
|
value of <code>emma:hook</code> can also be the wildcard
|
|
<code>any</code> indicating that the other content can come from
|
|
any source. The annotation <code>emma:hook</code> differs in
|
|
semantics from <code>emma:mode</code> as follows. Annotating an
|
|
element in the application semantics with
|
|
<code>emma:mode="ink"</code> indicates that that part of the
|
|
semantics came from the <code>ink</code> mode. Annotating an
|
|
element in the application semantics with
|
|
<code>emma:hook="ink"</code> indicates that part of the semantics
|
|
needs to be integrated with content from the <code>ink</code>
|
|
mode.</p>
|
|
<p>To illustrate the use of <code>emma:hook</code> consider an
|
|
example composite input in which the user says "zoom in here" in
|
|
the speech input mode while drawing an area on a graphical display
|
|
in the ink input mode. <span>The fact that the
|
|
<code>location</code> element needs to come from the
|
|
<code>ink</code> mode is indicated by annotating this application
|
|
namespace element using <code>emma:hook</code></span></p>
|
|
<pre class="example">
|
|
<emma:emma version="1.0"
|
|
xmlns:emma="http://www.w3.org/2003/04/emma"
|
|
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
|
xsi:schemaLocation="http://www.w3.org/2003/04/emma
|
|
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
|
|
xmlns="http://www.example.com/example">
|
|
<emma:interpretation <span>emma:medium="acoustic"</span> emma:mode="voice">
|
|
<command>
|
|
<action>zoom</action>
|
|
<location emma:hook="ink">
|
|
<type>area</type>
|
|
</location>
|
|
</command>
|
|
</emma:interpretation>
|
|
</emma:emma>
|
|
</pre>
|
|
<p>For more detailed explanation of this example see <a href=
|
|
"#appC">Appendix C</a>.</p>
|
|
<h3 id="s4.2.13">4.2.13 Cost: <code>emma:cost</code> attribute</h3>
|
|
<table class="defn" summary="property definition" width="98%"
|
|
cellpadding="5" cellspacing="0">
|
|
<tbody>
|
|
<tr>
|
|
<th>Annotation</th>
|
|
<th>emma:cost</th>
|
|
</tr>
|
|
<tr>
|
|
<th>Definition</th>
|
|
<td>An attribute of type <code>xsd:decimal</code> in range 0.0 to
|
|
10000000, indicating the processor's cost or weight associated with
|
|
an input or part of an input.</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Applies to</th>
|
|
<td><code>emma:interpretation</code>, <code>emma:group</code>,
|
|
<code>emma:one-of</code>, <code>emma:sequence</code>,
|
|
<code>emma:arc</code>, <code>emma:node</code>, and application
|
|
instance data.</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
<p>The cost annotation in EMMA indicates the weight or cost
|
|
associated with an user's input or part of their input. The most
|
|
common use of <code>emma:cost</code> is for representing the costs
|
|
encoded on a lattice output from speech recognition or other
|
|
recognition or understanding processes. <code>emma:cost</code> MAY
|
|
also be used to indicate the total cost associated with particular
|
|
recognition results or semantic interpretations.</p>
|
|
<pre class="example">
|
|
<emma:emma version="1.0"
|
|
xmlns:emma="http://www.w3.org/2003/04/emma"
|
|
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
|
xsi:schemaLocation="http://www.w3.org/2003/04/emma
|
|
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
|
|
xmlns="http://www.example.com/example">
|
|
<emma:one-of <span>emma:medium="acoustic" emma:mode="voice"</span>>
|
|
<emma:interpretation id="meaning1" emma:cost="1600">
|
|
<location>Boston</location>
|
|
</emma:interpretation>
|
|
|
|
<emma:interpretation id="meaning2" emma:cost="400">
|
|
<location> Austin </location>
|
|
</emma:interpretation>
|
|
</emma:one-of>
|
|
</emma:emma>
|
|
</pre>
|
|
<h3 id="s4.2.14">4.2.14 Endpoint properties:
|
|
<code>emma:endpoint-role</code>,
|
|
<code>emma:endpoint-address</code>, <code>emma:port-type</code>,
|
|
<code>emma:port-num</code>, <code>emma:message-id</code>,
|
|
<code>emma:service-name</code>, <code>emma:endpoint-pair-ref</code>,
|
|
<code>emma:endpoint-info-ref</code>
|
|
attributes</h3>
|
|
<table class="defn" summary="property definition" width="98%"
|
|
cellpadding="5" cellspacing="0">
|
|
<tbody>
|
|
<tr>
|
|
<th>Annotation</th>
|
|
<th>emma:endpoint-role</th>
|
|
</tr>
|
|
<tr>
|
|
<th>Definition</th>
|
|
<td>An attribute of type <code>xsd:string</code> constrained to
|
|
values in the set {<code>source</code>, <code>sink</code>,
|
|
<code>reply-to</code>, <code>router</code>}.</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Applies to</th>
|
|
<td><code>emma:endpoint</code></td>
|
|
</tr>
|
|
<tr>
|
|
<th>Annotation</th>
|
|
<th>emma:endpoint-address</th>
|
|
</tr>
|
|
<tr>
|
|
<th>Definition</th>
|
|
<td>An attribute of type <code>xsd:anyURI</code> that uniquely
|
|
specifies the network address of the
|
|
<code>emma:endpoint</code>.</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Applies to</th>
|
|
<td><code>emma:endpoint</code></td>
|
|
</tr>
|
|
<tr>
|
|
<th>Annotation</th>
|
|
<th>emma:port-type</th>
|
|
</tr>
|
|
<tr>
|
|
<th>Definition</th>
|
|
<td>An attribute of type <code>xsd:QName</code> that specifies the
|
|
type of the port.</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Applies to</th>
|
|
<td><code>emma:endpoint</code></td>
|
|
</tr>
|
|
<tr>
|
|
<th>Annotation</th>
|
|
<th>emma:port-num</th>
|
|
</tr>
|
|
<tr>
|
|
<th>Definition</th>
|
|
<td>An attribute of type <code>xsd:nonNegativeInteger</code> that
|
|
specifies the port number.</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Applies to</th>
|
|
<td><code>emma:endpoint</code></td>
|
|
</tr>
|
|
<tr>
|
|
<th>Annotation</th>
|
|
<th>emma:message-id</th>
|
|
</tr>
|
|
<tr>
|
|
<th>Definition</th>
|
|
<td>An attribute of type <code>xsd:anyURI</code> that specifies the
|
|
message ID associated with the data.</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Applies to</th>
|
|
<td><code>emma:endpoint</code></td>
|
|
</tr>
|
|
<tr>
|
|
<th>Annotation</th>
|
|
<th>emma:service-name</th>
|
|
</tr>
|
|
<tr>
|
|
<th>Definition</th>
|
|
<td>An attribute of type <code>xsd:string</code> that specifies the
|
|
name of the service.</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Applies to</th>
|
|
<td><code>emma:endpoint</code></td>
|
|
</tr>
|
|
<tr>
|
|
<th>Annotation</th>
|
|
<th>emma:endpoint-pair-ref</th>
|
|
</tr>
|
|
<tr>
|
|
<th>Definition</th>
|
|
<td>An attribute of type <code>xsd:anyURI</code> that specifies the
|
|
pairing between sink and source endpoints.</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Applies to</th>
|
|
<td><code>emma:endpoint</code></td>
|
|
</tr>
|
|
<tr>
|
|
<th>Annotation</th>
|
|
<th>emma:endpoint-info-ref</th>
|
|
</tr>
|
|
<tr>
|
|
<th>Definition</th>
|
|
<td>An attribute of type <code>xsd:IDREF</code> referring to the
|
|
<code>id</code> attribute of an <code>emma:endpoint-info</code>
|
|
element.</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Applies to</th>
|
|
<td><code>emma:interpretation</code>, <code>emma:group</code>,
|
|
<code>emma:one-of</code>, <code>emma:sequence</code>, and
|
|
application instance data.</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
<p>The <code>emma:endpoint-role</code> attribute specifies the role
|
|
that the particular <code>emma:endpoint</code> performs in
|
|
multimodal interaction. The role value <code>sink</code> indicates
|
|
that the particular endpoint is the receiver of the input data. The
|
|
role value <code>source</code> indicates that the particular
|
|
endpoint is the sender of the input data. The role value
|
|
<code>reply-to</code> indicates that the particular
|
|
<code>emma:endpoint</code> is the intended endpoint for the reply.
|
|
The same <code>emma:endpoint-address</code> MAY appear in multiple
|
|
<code>emma:endpoint</code> elements, provided that the same
|
|
endpoint address is used to serve multiple roles, e.g. sink,
|
|
source, reply-to, router, etc., or associated with multiple
|
|
interpretations.</p>
|
|
<p>The <code>emma:endpoint-address</code> specifies the network
|
|
address of the <code>emma:endpoint</code>, and
|
|
<code>emma:port-type</code> specifies the port type of the
|
|
<code>emma:endpoint</code>. The <code>emma:port-num</code>
|
|
annotates the port number of the endpoint (e.g. the typical port
|
|
number for an http endpoint is 80). The
|
|
<code>emma:message-id</code> annotates the message ID information
|
|
associated with the annotated input. This meta information is used
|
|
to establish and maintain the communication context for both
|
|
inbound processing and outbound operation. The service
|
|
specification of the <code>emma:endpoint</code> is annotated by
|
|
<code>emma:service-name</code> which contains the definition of the
|
|
service that the <code>emma:endpoint</code> performs. The matching
|
|
of the <code>sink</code> endpoint and its pairing
|
|
<code>source</code> endpoint is annotated by the
|
|
<code>emma:endpoint-pair-ref</code> attribute. One sink endpoint
|
|
MAY link to multiple source endpoints through
|
|
<code>emma:endpoint-pair-ref</code>. Further bounding of the
|
|
<code>emma:endpoint</code> is possible by using the annotation of
|
|
<code>emma:group</code> (see <a href="#s3.3.2">Section
|
|
3.3.2</a>).</p>
|
|
<p>The <code>emma:endpoint-info-ref</code> attribute associates the
|
|
EMMA result in the container element with an
|
|
<code>emma:endpoint-info</code> element.</p>
|
|
<p>The following example illustrates the use of these attributes in
|
|
multimodal interactions where multiple modalities are used.</p>
|
|
<pre>
|
|
<emma:emma version="1.0"
|
|
xmlns:emma="http://www.w3.org/2003/04/emma"
|
|
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
|
xsi:schemaLocation="http://www.w3.org/2003/04/emma
|
|
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
|
|
xmlns="http://www.example.com/example"
|
|
xmlns:ex="http://www.example.com/emma/port">
|
|
<emma:endpoint-info id="audio-channel-1" >
|
|
<emma:endpoint id="endpoint1"
|
|
emma:endpoint-role="sink"
|
|
emma:endpoint-address="135.61.71.103"
|
|
emma:port-num="50204"
|
|
emma:port-type="rtp"
|
|
emma:endpoint-pair-ref="endpoint2"
|
|
emma:media-type="audio/dsr-202212; rate:8000; maxptime:40"
|
|
emma:service-name="travel"
|
|
emma:mode="voice">
|
|
<ex:app-protocol>SIP</ex:app-protocol>
|
|
</emma:endpoint>
|
|
|
|
<emma:endpoint id="endpoint2" emma:endpoint-role="source"
|
|
emma:endpoint-address="136.62.72.104"
|
|
emma:port-num="50204"
|
|
emma:port-type="rtp"
|
|
emma:endpoint-pair-ref="endpoint1"
|
|
emma:media-type="audio/dsr-202212; rate:8000; maxptime:40"
|
|
emma:service-name="travel"
|
|
emma:mode="voice">
|
|
<ex:app-protocol>SIP</ex:app-protocol>
|
|
</emma:endpoint>
|
|
</emma:endpoint-info>
|
|
|
|
<emma:endpoint-info id="ink-channel-1">
|
|
<emma:endpoint id="endpoint3" emma:endpoint-role="sink"
|
|
emma:endpoint-address="http://emma.example/sink"
|
|
emma:endpoint-pair-ref="endpoint4"
|
|
emma:port-num="80" emma:port-type="http"
|
|
emma:message-id="uuid:2e5678"
|
|
emma:service-name="travel"
|
|
emma:mode="ink"/>
|
|
<emma:endpoint id="endpoint4"
|
|
emma:endpoint-role="source"
|
|
emma:port-address="http://emma.example/source"
|
|
emma:endpoint-pair-ref="endpoint3"
|
|
emma:port-num="80"
|
|
emma:port-type="http"
|
|
emma:message-id="uuid:2e5678"
|
|
emma:service-name="travel"
|
|
emma:mode="ink"/>
|
|
</emma:endpoint-info>
|
|
|
|
<emma:group>
|
|
<emma:interpretation id="int1" emma:start="1087995961542"
|
|
emma:end="1087995963542"
|
|
emma:endpoint-info-ref="audio-channel-1"<br />
|
|
emma:medium="acoustic" emma:mode="voice">
|
|
<destination>Chicago</destination>
|
|
</emma:interpretation>
|
|
|
|
<emma:interpretation id="int2" emma:start="1087995961542"
|
|
emma:end="1087995963542"
|
|
emma:endpoint-info-ref="ink-channel-1"<br />
|
|
emma:medium="acoustic" emma:mode="voice">
|
|
<location>
|
|
<type>area</type>
|
|
<points>34.13 -37.12 42.13 -37.12 ... </points>
|
|
</location>
|
|
</emma:interpretation>
|
|
</emma:group>
|
|
</emma:emma>
|
|
</pre>
|
|
<h3 id="s4.2.15">4.2.15 Reference to <code>emma:grammar</code>
|
|
element: <code>emma:grammar-ref</code> attribute</h3>
|
|
<table class="defn" summary="property definition" width="98%"
|
|
cellpadding="5" cellspacing="0">
|
|
<tbody>
|
|
<tr>
|
|
<th>Annotation</th>
|
|
<th>emma:grammar-ref</th>
|
|
</tr>
|
|
<tr>
|
|
<th>Definition</th>
|
|
<td>An attribute of type <code>xsd:IDREF</code> referring to the
|
|
<code>id</code> attribute of an <code>emma:grammar</code>
|
|
element<span>.</span></td>
|
|
</tr>
|
|
<tr>
|
|
<th>Applies to</th>
|
|
<td><code>emma:interpretation</code>, <code>emma:group</code>,
|
|
<code>emma:one-of</code>, <code>emma:sequence</code>.</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
<p>The <code>emma:grammar-ref</code> annotation associates the EMMA
|
|
result in the container element with an <code>emma:grammar</code>
|
|
element.</p>
|
|
<p>Example:</p>
|
|
<pre class="example">
|
|
<emma:emma version="1.0"
|
|
xmlns:emma="http://www.w3.org/2003/04/emma"
|
|
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
|
xsi:schemaLocation="http://www.w3.org/2003/04/emma
|
|
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
|
|
xmlns="http://www.example.com/example">
|
|
<emma:grammar id="gram1" <span>ref</span>="someURI"/>
|
|
|
|
<emma:grammar id="gram2" <span>ref</span>="anotherURI"/>
|
|
|
|
<emma:one-of id="r1"<br />
|
|
<span>emma:medium="acoustic" emma:mode="voice"</span>>
|
|
<emma:interpretation id="int1" emma:grammar-ref="gram1">
|
|
<origin>Boston</origin>
|
|
</emma:interpretation>
|
|
|
|
<emma:interpretation id="int2" emma:grammar-ref="gram1">
|
|
<origin>Austin</origin>
|
|
</emma:interpretation>
|
|
|
|
<emma:interpretation id="int3" emma:grammar-ref="gram2">
|
|
<command>help</command>
|
|
</emma:interpretation>
|
|
</emma:one-of>
|
|
</emma:emma>
|
|
</pre>
|
|
<h3 id="s4.2.16">4.2.16 Reference to <code>emma:model</code>
|
|
element: <code>emma:model-ref</code> attribute</h3>
|
|
<table class="defn" summary="property definition" width="98%"
|
|
cellpadding="5" cellspacing="0">
|
|
<tbody>
|
|
<tr>
|
|
<th>Annotation</th>
|
|
<th>emma:model-ref</th>
|
|
</tr>
|
|
<tr>
|
|
<th>Definition</th>
|
|
<td>An attribute of type <code>xsd:IDREF</code> referring to the
|
|
<code>id</code> attribute of an <code>emma:model</code>
|
|
element<span>.</span></td>
|
|
</tr>
|
|
<tr>
|
|
<th>Applies to</th>
|
|
<td><code>emma:interpretation</code>, <code>emma:group</code>,
|
|
<code>emma:one-of</code>, <code>emma:sequence</code>, and
|
|
application instance data.</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
<p>The <code>emma:model-ref</code> annotation associates the EMMA
|
|
result in the container element with an <code>emma:model</code>
|
|
element.</p>
|
|
<p>Example:</p>
|
|
<pre class="example">
|
|
<emma:emma version="1.0"
|
|
xmlns:emma="http://www.w3.org/2003/04/emma"
|
|
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
|
xsi:schemaLocation="http://www.w3.org/2003/04/emma
|
|
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
|
|
xmlns="http://www.example.com/example">
|
|
<emma:model id="model1" ref="someURI"/>
|
|
|
|
<emma:model id="model2" ref="anotherURI"/>
|
|
|
|
<emma:one-of id="r1"<br />
|
|
<span>emma:medium="acoustic" emma:mode="voice"</span>>
|
|
<emma:interpretation id="int1" emma:model-ref="model1">
|
|
<origin>Boston</origin>
|
|
</emma:interpretation>
|
|
|
|
<emma:interpretation id="int2" emma:model-ref="model1">
|
|
<origin>Austin</origin>
|
|
</emma:interpretation>
|
|
|
|
<emma:interpretation id="int3" emma:model-ref="model2">
|
|
<command>help</command>
|
|
</emma:interpretation>
|
|
</emma:one-of>
|
|
</emma:emma>
|
|
</pre>
|
|
<h3 id="s4.2.17">4.2.17 Dialog turns: <code>emma:dialog-turn</code>
|
|
attribute</h3>
|
|
<table class="defn" summary="property definition" width="98%"
|
|
cellpadding="5" cellspacing="0">
|
|
<tbody>
|
|
<tr>
|
|
<th>Annotation</th>
|
|
<th>emma:dialog-turn</th>
|
|
</tr>
|
|
<tr>
|
|
<th>Definition</th>
|
|
<td>An attribute of type <code>xsd:string</code> referring to the
|
|
dialog turn associated with a given container element.</td>
|
|
</tr>
|
|
<tr>
|
|
<th>Applies to</th>
|
|
<td><code>emma:interpretation</code>, <code>emma:group</code>,
|
|
<code>emma:one-of</code>, and <code>emma:sequence</code>.</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
<p>The <code>emma:dialog-turn</code> annotation associates the EMMA
|
|
result in the container element with a dialog turn. The syntax and
|
|
semantics of dialog turns is left open to suit the needs of
|
|
individual applications. For example, some applications might use
|
|
an integer value, where successive turns are represented by
|
|
successive integers. Other applications might combine a name of a
|
|
dialog participant with an integer value representing the turn
|
|
number for that participant. Ordering semantics for comparison of
|
|
<code>emma:dialog-turn</code> is deliberately unspecified and left
|
|
for applications to define.</p>
|
|
<p>Example:</p>
|
|
<pre class="example">
|
|
<span>
|
|
<emma:emma version="1.0"
|
|
emma="http://www.w3.org/2003/04/emma"
|
|
xsi="http://www.w3.org/2001/XMLSchema-instance"
|
|
xsi:schemaLocation="http://www.w3.org/2003/04/emma
|
|
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
|
|
xmlns="http://www.example.com/example">
|
|
<emma:interpretation id="int1" emma:dialog-turn="u8"<br />
|
|
<span>emma:medium="acoustic" emma:mode="voice"</span>>
|
|
<quantity>3</quantity>
|
|
</emma:interpretation>
|
|
</emma:emma></span>
|
|
</pre>
|
|
<h2 class="notoc" id="s4.3">4.3 Scope of EMMA annotations</h2>
|
|
<p>The <code>emma:derived-from</code> element (<a href=
|
|
"#s4.1.2">Section 4.1.2</a>) can be used to capture both sequential
|
|
and composite derivations. This section concerns the scope of EMMA
|
|
annotations across <span>sequential</span> derivations of user
|
|
input connected using the <code>emma:derived-from</code> element
|
|
(<a href="#s4.1.2">Section 4.1.2</a>). Sequential derivations
|
|
involve processing steps that do not involve multimodal
|
|
integration, such as applying natural language understanding and
|
|
then reference resolution to a speech transcription. EMMA
|
|
derivations describe only single turns of user input and are not
|
|
intended to describe a sequence of dialog turns.</p>
|
|
<p>For example, an EMMA document could contain
|
|
<code>emma:interpretation</code> elements for the transcription,
|
|
interpretation, and reference resolution of a speech input,
|
|
utilizing the <code>id</code> values: <code>raw</code>,
|
|
<code>better</code>, and <code>best</code> respectively:</p>
|
|
<pre class="example">
|
|
<emma:emma version="1.0"
|
|
xmlns:emma="http://www.w3.org/2003/04/emma"
|
|
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
|
xsi:schemaLocation="http://www.w3.org/2003/04/emma
|
|
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
|
|
xmlns="http://www.example.com/example">
|
|
<emma:derivation>
|
|
<emma:interpretation id="raw"
|
|
emma:process="http://example.com/myasr1.xml"
|
|
<span>emma:medium="acoustic" emma:mode="voice"</span>>
|
|
<answer>From Boston to Denver tomorrow</answer>
|
|
</emma:interpretation>
|
|
|
|
<emma:interpretation id="better"
|
|
emma:process="http://example.com/mynlu1.xml">
|
|
<emma:derived-from resource="#raw" composite="false"/>
|
|
<origin>Boston</origin>
|
|
<destination>Denver</destination>
|
|
<date>tomorrow</date>
|
|
</emma:interpretation>
|
|
</emma:derivation>
|
|
|
|
<emma:interpretation id="best"
|
|
emma:process="http://example.com/myrefresolution1.xml">
|
|
<emma:derived-from resource="#better" composite="false"/>
|
|
<origin>Boston</origin>
|
|
<destination>Denver</destination>
|
|
<date>03152003</date>
|
|
</emma:interpretation>
|
|
</emma:emma>
|
|
</pre>
|
|
<p>Each member of the derivation chain is linked to the previous
|
|
one by a <code>derived-from</code> element (<a href=
|
|
"#s4.1.2">Section 4.1.2</a>), which has an attribute
|
|
<code>resource</code> that provides a pointer to the
|
|
<code>emma:interpretation</code> from which it is derived. The
|
|
<code>emma:process</code> annotation (<a href="#s4.2.2">Section
|
|
4.2.2</a>) provides a pointer to the process used for each stage of
|
|
the derivation.</p>
|
|
<p>The following EMMA example represents the same derivation as
|
|
above but with a more fully specified set of annotations:</p>
|
|
<pre class="example">
|
|
<emma:emma version="1.0"
|
|
xmlns:emma="http://www.w3.org/2003/04/emma"
|
|
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
|
xsi:schemaLocation="http://www.w3.org/2003/04/emma
|
|
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
|
|
xmlns="http://www.example.com/example">
|
|
<emma:derivation>
|
|
<emma:interpretation id="raw"
|
|
emma:process="http://example.com/myasr1.xml"
|
|
emma:source="http://example.com/microphone/NC-61"
|
|
emma:signal="http://example.com/signals/sg23.wav"
|
|
emma:confidence="0.6"
|
|
emma:medium="acoustic"
|
|
emma:mode="voice"
|
|
emma:function="dialog"
|
|
emma:verbal="true"
|
|
emma:tokens="from boston to denver tomorrow"
|
|
emma:lang="en-US">
|
|
<answer>From Boston to Denver tomorrow</answer>
|
|
</emma:interpretation>
|
|
|
|
<emma:interpretation id="better"
|
|
emma:process="http://example.com/mynlu1.xml"
|
|
emma:source="http://example.com/microphone/NC-61"
|
|
emma:signal="http://example.com/signals/sg23.wav"
|
|
emma:confidence="0.8"
|
|
emma:medium="acoustic"
|
|
emma:mode="voice"
|
|
emma:function="dialog"
|
|
emma:verbal="true"
|
|
emma:tokens="from boston to denver tomorrow"
|
|
emma:lang="en-US">
|
|
<emma:derived-from resource="#raw" composite="false"/>
|
|
<origin>Boston</origin>
|
|
<destination>Denver</destination>
|
|
<date>tomorrow</date>
|
|
</emma:interpretation>
|
|
</emma:derivation>
|
|
|
|
<emma:interpretation id="best"
|
|
emma:process="http://example.com/myrefresolution1.xml"
|
|
emma:source="http://example.com/microphone/NC-61"
|
|
emma:signal="http://example.com/signals/sg23.wav"
|
|
emma:confidence="0.8"
|
|
emma:medium="acoustic"
|
|
emma:mode="voice"
|
|
emma:function="dialog"
|
|
emma:verbal="true"
|
|
emma:tokens="from boston to denver tomorrow"
|
|
emma:lang="en-US">
|
|
<emma:derived-from resource="#better" composite="false"/>
|
|
<origin>Boston</origin>
|
|
<destination>Denver</destination>
|
|
<date>03152003</date>
|
|
</emma:interpretation>
|
|
</emma:emma>
|
|
</pre>
|
|
<p>EMMA annotations on earlier stages of the derivation often
|
|
remain accurate at later stages of the derivation. Although this
|
|
can be captured in EMMA by repeating the annotations on each
|
|
<code>emma:interpretation</code> within the derivation, as in the
|
|
example above, there are two disadvantages of this approach to
|
|
annotation. First, the repetition of annotations makes the
|
|
resulting EMMA documents significantly more verbose. Second, EMMA
|
|
processors used for intermediate tasks such as natural language
|
|
understanding and reference resolution will need to read in all of
|
|
the annotations and write them all out again.</p>
|
|
<p>EMMA overcomes these problems by assuming that annotations on
|
|
earlier stages of a derivation automatically apply to later stages
|
|
of the derivation unless a new value is specified. Later stages of
|
|
the derivation essentially inherit annotations from earlier stages
|
|
in the derivation. For example, if there was an
|
|
<code>emma:source</code> annotation on the transcription
|
|
(<code>raw</code>) it would also apply to the later stages of the
|
|
derivation such as the result of natural language understanding
|
|
(<code>better</code>) or reference resolution
|
|
(<code>best</code>).</p>
|
|
<p>Because of the assumption in EMMA that annotations have scope
|
|
over later stages of a sequential derivation, the example EMMA
|
|
document above can be equivalently represented as follows:</p>
|
|
<pre class="example">
|
|
<emma:emma version="1.0"
|
|
xmlns:emma="http://www.w3.org/2003/04/emma"
|
|
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
|
xsi:schemaLocation="http://www.w3.org/2003/04/emma
|
|
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
|
|
xmlns="http://www.example.com/example">
|
|
<emma:derivation>
|
|
<emma:interpretation id="raw"
|
|
emma:process="http://example.com/myasr1.xml"
|
|
emma:source="http://example.com/microphone/NC-61"
|
|
emma:signal="http://example.com/signals/sg23.wav"
|
|
emma:confidence="0.6"
|
|
emma:medium="acoustic"
|
|
emma:mode="voice"
|
|
emma:function="dialog"
|
|
emma:verbal="true"
|
|
emma:tokens="from boston to denver tomorrow"
|
|
emma:lang="en-US">
|
|
<answer>From Boston to Denver tomorrow</answer>
|
|
</emma:interpretation>
|
|
|
|
<emma:interpretation id="better"
|
|
emma:process="http://example.com/mynlu1.xml"
|
|
emma:confidence="0.8">
|
|
<emma:derived-from resource="#raw" composite="false"/>
|
|
<origin>Boston</origin>
|
|
<destination>Denver</destination>
|
|
<date>tomorrow</date>
|
|
</emma:interpretation>
|
|
</emma:derivation>
|
|
|
|
<emma:interpretation id="best"
|
|
emma:process="http://example.com/myrefresolution1.xml">
|
|
<emma:derived-from resource="#better" composite="false"/>
|
|
<origin>Boston</origin>
|
|
<destination>Denver</destination>
|
|
<date>03152003</date>
|
|
</emma:interpretation>
|
|
</emma:emma>
|
|
</pre>
|
|
<p>The fully specified derivation illustrated above is equivalent
|
|
to the reduced form derivation following it where only annotations
|
|
with new values are specified at each stage. These two EMMA
|
|
documents MUST yield the same result when processed by an EMMA
|
|
processor.</p>
|
|
<p>The <code>emma:confidence</code> annotation is respecified on
|
|
the <code>better</code> interpretation. This indicates the
|
|
confidence score for natural language understanding, whereas
|
|
<code>emma:confidence</code> on the <code>raw</code> interpretation
|
|
indicates the speech recognition confidence score.</p>
|
|
<p>In order to determine the full set of annotations that apply to
|
|
an <code>emma:interpretation</code> element an EMMA processor or
|
|
script needs to access the annotations directly on that element and
|
|
for any that are not specified follow the reference in the
|
|
<code>resource</code> attribute of the
|
|
<code>emma:derived-from</code> element to add in annotations from
|
|
earlier stages of the derivation.</p>
|
|
<p>The EMMA annotations break down into three groups with respect
|
|
to their scope in sequential derivations. One group of annotations
|
|
always hold<span>s</span> true for all members of a sequential
|
|
derivation. A second group <span>is</span> always respecified on
|
|
each stage of the derivation. A third group may or may not be
|
|
respecified.</p>
|
|
<table summary="7 columns" border="1" cellpadding="3" cellspacing=
|
|
"0">
|
|
<caption>Scope of Annotations in Sequential Derivations</caption>
|
|
<tbody>
|
|
<tr>
|
|
<th>Classification</th>
|
|
<th>Annotation</th>
|
|
</tr>
|
|
<tr>
|
|
<td rowspan="16">Applies to whole derivation</td>
|
|
<td><code>emma:signal</code></td>
|
|
</tr>
|
|
<tr>
|
|
<td><code><span>emma:signal-size</span></code></td>
|
|
</tr>
|
|
<tr>
|
|
<td><code><span>emma:dialog-turn</span></code></td>
|
|
</tr>
|
|
<tr>
|
|
<td><code>emma:source</code></td>
|
|
</tr>
|
|
<tr>
|
|
<td><code>emma:medium</code></td>
|
|
</tr>
|
|
<tr>
|
|
<td><code>emma:mode</code></td>
|
|
</tr>
|
|
<tr>
|
|
<td><code>emma:function</code></td>
|
|
</tr>
|
|
<tr>
|
|
<td><code>emma:verbal</code></td>
|
|
</tr>
|
|
<tr>
|
|
<td><code>emma:lang</code></td>
|
|
</tr>
|
|
<tr>
|
|
<td><code>emma:tokens</code></td>
|
|
</tr>
|
|
<tr>
|
|
<td><code>emma:start</code></td>
|
|
</tr>
|
|
<tr>
|
|
<td><code>emma:end</code></td>
|
|
</tr>
|
|
<tr>
|
|
<td><code>emma:time-ref-uri</code></td>
|
|
</tr>
|
|
<tr>
|
|
<td><code>emma:time-ref-anchor-point</code></td>
|
|
</tr>
|
|
<tr>
|
|
<td><code>emma:offset-to-start</code></td>
|
|
</tr>
|
|
<tr>
|
|
<td><code>emma:duration</code></td>
|
|
</tr>
|
|
<tr>
|
|
<td rowspan="2">Specified at each stage of derivation</td>
|
|
<td><code>emma:derived-from</code></td>
|
|
</tr>
|
|
<tr>
|
|
<td><code>emma:process</code></td>
|
|
</tr>
|
|
<tr>
|
|
<td rowspan="6">May be respecified</td>
|
|
<td><code>emma:confidence</code></td>
|
|
</tr>
|
|
<tr>
|
|
<td><code>emma:cost</code></td>
|
|
</tr>
|
|
<tr>
|
|
<td><code>emma:grammar-ref</code></td>
|
|
</tr>
|
|
<tr>
|
|
<td><code>emma:model-ref</code></td>
|
|
</tr>
|
|
<tr>
|
|
<td><code>emma:no-input</code></td>
|
|
</tr>
|
|
<tr>
|
|
<td><code>emma:uninterpreted</code></td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
<p>One potential problem with this annotation scoping mechanism is
|
|
that earlier annotations could be lost if earlier stages of a
|
|
derivation were dropped in order to reduce message size. This
|
|
problem can be overcome by considering annotation scope at the
|
|
point where earlier derivation stages are discarded and populating
|
|
the final interpretation in the derivation with all of the
|
|
annotations which it could inherit. For example, if the
|
|
<code>raw</code> and <code>better</code> stages were dropped the
|
|
resulting EMMA document would be:</p>
|
|
<pre class="example">
|
|
<emma:emma version="1.0"
|
|
xmlns:emma="http://www.w3.org/2003/04/emma"
|
|
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
|
xsi:schemaLocation="http://www.w3.org/2003/04/emma
|
|
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
|
|
xmlns="http://www.example.com/example">
|
|
<emma:interpretation id="best"
|
|
emma:start="1087995961542"
|
|
emma:end="1087995963542"
|
|
emma:process="http://example.com/myrefresolution1.xml"
|
|
emma:source="http://example.com/microphone/NC-61"
|
|
emma:signal="http://example.com/signals/sg23.wav"
|
|
emma:confidence="0.8"
|
|
emma:medium="acoustic"
|
|
emma:mode="voice"
|
|
emma:function="dialog"
|
|
emma:verbal="true"
|
|
emma:tokens="from boston to denver tomorrow"
|
|
emma:lang="en-US">
|
|
<emma:derived-from resource="#better" composite="false"/>
|
|
<origin>Boston</origin>
|
|
<destination>Denver</destination>
|
|
<date>03152003</date>
|
|
</emma:interpretation>
|
|
</emma:emma>
|
|
</pre>
|
|
<p>Annotations on an <code>emma:one-of</code> element are assumed
|
|
to apply to all of the container elements within the
|
|
<code>emma:one-of</code>.</p>
|
|
<p>If <code>emma:one-of</code> appears with another
|
|
<code>emma:one-of</code> then annotations on the parent
|
|
<code>emma:one-of</code> are assumed to apply to the children of
|
|
the child <code>emma:one-of</code>.</p>
|
|
<p>Annotations on <code>emma:group</code> or
|
|
<code>emma:sequence</code> do not apply to their child
|
|
elements.</p>
|
|
<h2 id="s5">5. Conformance</h2>
|
|
<p>The contents of this section are normative.</p>
|
|
<h3 id="s5.1">5.1 Conforming EMMA Documents</h3>
|
|
<p>A document is a Conforming EMMA Document if it meets both the
|
|
following conditions:</p>
|
|
<ul>
|
|
<li>It is a well-formed XML document [<a href="#XML">XML</a>]
|
|
conforming to Namespaces in XML [<a href="#XMLNS">XMLNS</a>].</li>
|
|
<li>It adheres to the specification described in this document
|
|
(EMMA Specification) including the constraints expressed in the
|
|
Schema (see <a href="#appA">Appendix A</a>) and having an XML
|
|
Prolog and root element as specified in <a href="#s3.1">Section
|
|
3.1</a>.</li>
|
|
</ul>
|
|
<p>The EMMA specification and these conformance criteria provide no
|
|
designated size limits on any aspect of EMMA documents. There are
|
|
no maximum values on the number of elements, the amount of
|
|
character data, or the number of characters in attribute
|
|
values.</p>
|
|
<p><span>Within this specification, the term URI refers to a
|
|
Universal Resource Identifier as defined in [<a href=
|
|
"#RFC3986">RFC3986</a>] and extended in [<a href=
|
|
"#RFC3987">RFC3987</a>] with the new name IRI. The term URI has
|
|
been retained in preference to IRI to avoid introducing new names
|
|
for concepts such as "Base URI" that are defined or referenced
|
|
across the whole family of XML specifications</span>.</p>
|
|
<h3 id="s5.2">5.2 Using EMMA with other Namespaces</h3>
|
|
<p>The EMMA namespace is intended to be used with other XML
|
|
namespaces as per the Namespaces in XML Recommendation [<a href=
|
|
"#XMLNS">XMLNS</a>]. Future work by W3C is expected to address ways
|
|
to specify conformance for documents involving multiple
|
|
namespaces.</p>
|
|
<h3 id="s5.3">5.3 Conforming EMMA Processors</h3>
|
|
<p>A EMMA processor is a program that can process and/or generate
|
|
Conforming EMMA documents.</p>
|
|
<p>In a Conforming EMMA Processor, the XML parser MUST be able to
|
|
parse and process all XML constructs defined by XML 1.1 [<a href=
|
|
"#XML">XML</a>] and Namespaces in XML [<a href="#XMLNS">XMLNS</a>].
|
|
It is not required that a Conforming EMMA Processor uses a
|
|
validating XML parser.</p>
|
|
<p>A Conforming EMMA Processor MUST correctly understand and apply
|
|
the semantics of each markup element or attribute as described by
|
|
this document.</p>
|
|
<p>There is, however, no conformance requirement with respect to
|
|
performance characteristics of the EMMA Processor. For instance, no
|
|
statement is required regarding the accuracy, speed or other
|
|
characteristics of output produced by the processor. No statement
|
|
is made regarding the size of input that a EMMA Processor is
|
|
required to support.</p>
|
|
<h2 id="appendices">Appendices</h2>
|
|
<h3 id="appA">Appendix A. XML and <span>RELAX NG</span>
|
|
schemata</h3>
|
|
<p>This section is Normative.</p>
|
|
<p>This section defines the formal syntax for EMMA documents in
|
|
terms of a normative XML Schema.</p>
|
|
<p>There are both an XML Schema and <span>RELAX NG</span> Schema
|
|
for the EMMA markup. The latest version of the XML Schema for EMMA
|
|
is available at <a href=
|
|
"http://www.w3.org/TR/emma/emma.xsd">http://www.w3.org/TR/emma/emma.xsd</a>
|
|
and the RELAX NG Schema can be found at <a href=
|
|
"http://www.w3.org/TR/emma/emma.rng">http://www.w3.org/TR/emma/emma.rng</a>.</p>
|
|
<p>For stability it is RECOMMENDED that you use the dated URI
|
|
available at <a href=
|
|
"http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd">http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd</a>
|
|
and <a href=
|
|
"http://www.w3.org/TR/2009/REC-emma-20090210/emma.rng">http://www.w3.org/TR/2009/REC-emma-20090210/emma.rng</a>.</p>
|
|
<h2 id="appB">Appendix B. MIME type</h2>
|
|
<p>This section is <span>N</span>ormative.</p>
|
|
<p>This appendix registers a new MIME media type,
|
|
"<code>application/emma+xml</code>".</p>
|
|
|
|
<p>The "<code>application/emma+xml</code>" media type is
|
|
registered with IANA at
|
|
<a href="http://www.iana.org/assignments/media-types/application/">
|
|
http://www.iana.org/assignments/media-types/application/</a>.
|
|
</p>
|
|
|
|
|
|
<div>
|
|
<h3 id="media-type-registration">B.1 Registration of MIME media
|
|
type application/emma+xml</h3>
|
|
<dl>
|
|
<dt>MIME media type name:</dt>
|
|
<dd>
|
|
<p><code>application</code></p>
|
|
</dd>
|
|
<dt>MIME subtype name:</dt>
|
|
<dd>
|
|
<p><code>emma+xml</code></p>
|
|
</dd>
|
|
<dt>Required parameters:</dt>
|
|
<dd>
|
|
<p>None.</p>
|
|
</dd>
|
|
<dt>Optional parameters:</dt>
|
|
<dd>
|
|
<dl>
|
|
<dt><code>charset</code></dt>
|
|
<dd>
|
|
<p>This parameter has identical semantics to the
|
|
<code>charset</code> parameter of the <code>application/xml</code>
|
|
media type as specified in [<a href="#RFC3023">RFC3023</a>] or its
|
|
successor.</p>
|
|
</dd>
|
|
</dl>
|
|
</dd>
|
|
<dt>Encoding considerations:</dt>
|
|
<dd>
|
|
<p>By virtue of EMMA content being XML, it has the same
|
|
considerations when sent as "<code>application/emma+xml</code>"as
|
|
does XML. See RFC 3023 (or its successor), section 3.2.</p>
|
|
</dd>
|
|
<dt>Security considerations:</dt>
|
|
<dd>
|
|
<p>Several features of EMMA require dereferencing arbitrary URIs.
|
|
Implementers are advised to heed the security issues of [<a href=
|
|
"#RFC3986">RFC3986</a>] section 7.</p>
|
|
<p>In addition, because of the extensibility features for EMMA, it
|
|
is possible that "<code>application/emma+xml</code>" will describe
|
|
content that has security implications beyond those described here.
|
|
However, if the processor follows only the normative semantics of
|
|
this specification, this content will be ignored. Only in the case
|
|
where the processor recognizes and processes the additional
|
|
content, or where further processing of that content is dispatched
|
|
to other processors, would security issues potentially arise. And
|
|
in that case, they would fall outside the domain of this
|
|
registration document.</p>
|
|
</dd>
|
|
<dt>Interoperability considerations:</dt>
|
|
<dd>
|
|
<p>This specification describes processing semantics that dictate
|
|
the required behavior for dealing with, among other things,
|
|
unrecognized elements.</p>
|
|
<p>Because EMMA is extensible, conformant
|
|
"<code>application/emma+xml</code>" processors MAY expect that
|
|
content received is well-formed XML, but processors SHOULD NOT
|
|
assume that the content is valid EMMA or expect to recognize all of
|
|
the elements and attributes in the document.</p>
|
|
</dd>
|
|
<dt>Published specification:</dt>
|
|
<dd>
|
|
<p>
|
|
This media type registration is extracted from Appendix B of the
|
|
"<a href="http://www.w3.org/TR/emma/">EMMA: Extensible MultiModal Annotation markup language</a>"
|
|
specification.
|
|
</p>
|
|
</dd>
|
|
<dt>Additional information:</dt>
|
|
<dd>
|
|
<dl>
|
|
<dt>Magic number(s):</dt>
|
|
<dd>
|
|
<p>There is no single initial octet sequence that is always present
|
|
in EMMA documents.</p>
|
|
</dd>
|
|
<dt>File extension(s):</dt>
|
|
<dd>
|
|
<p>EMMA documents are most often identified with the extensions
|
|
"<code>.emma</code>"<!-- or "<code>.mma</code>"-->.</p>
|
|
</dd>
|
|
<dt>Macintosh File Type Code(s):</dt>
|
|
<dd>
|
|
<p>TEXT</p>
|
|
</dd>
|
|
</dl>
|
|
</dd>
|
|
<dt>Person & email address to contact for further
|
|
information:</dt>
|
|
<dd>
|
|
<p>Kazuyuki Ashimura, <<a href=
|
|
"mailto:ashimura@w3.org">ashimura@w3.org</a>>.</p>
|
|
</dd>
|
|
<dt>Intended usage:</dt>
|
|
<dd>
|
|
<p>COMMON</p>
|
|
</dd>
|
|
<dt>Author/Change controller:</dt>
|
|
<dd>
|
|
<p>The EMMA specification is a work product of the World Wide Web
|
|
Consortium's Multimodal Interaction Working Group. The W3C has
|
|
change control over these specifications.</p>
|
|
</dd>
|
|
</dl>
|
|
</div>
|
|
<h2 id="appC">Appendix C. <code>emma:hook</code> and SRGS</h2>
|
|
<p>This section is <span>I</span>nformative.</p>
|
|
<div>
|
|
<p>One of the most powerful aspects of multimodal interfaces is
|
|
their ability to provide support for user inputs which are
|
|
distributed over the available input modes. These <b>composite</b>
|
|
inputs are contributions made by the user within a single turn
|
|
which have component parts in different modes. For example, the
|
|
user might say "zoom in here" in the speech mode while drawing an
|
|
area on a graphical display in the ink mode. One of the central
|
|
motivating factors for this kind of input is that different kinds
|
|
of communicative content are best suited to different input modes.
|
|
In the example of a user drawing an area on a map and saying "zoom
|
|
in here", the zoom command is easiest to provide in speech but the
|
|
spatial information, the specific area, is easier to provide in
|
|
ink.</p>
|
|
<p>Enabling composite multimodality is critical in ensuring that
|
|
multimodal systems support more natural and effective interaction
|
|
for users. In order to support composite inputs, a multimodal
|
|
architecture must provide some kind of multimodal integration
|
|
mechanism. In the W3C Multimodal Interaction Framework
|
|
<span>[<a href="#MMIF">MMI Framework</a>]</span>, multimodal
|
|
integration can be handled by an integration component which
|
|
follows the application of speech understanding and other kinds of
|
|
interpretation procedures for individual modes.</p>
|
|
<p>Given the broad range of different techniques being employed for
|
|
multimodal integration and the extent to which this is an ongoing
|
|
research problem, standardization of the specific method or
|
|
algorithm used for multimodal integration is not appropriate at
|
|
this time. In order to facilitate the development and
|
|
inter-operation of different multimodal integration mechanisms EMMA
|
|
provides markup language enabling application independent
|
|
specification of elements in the application markup where content
|
|
from another mode needs to be integrated. These representation
|
|
'hooks' can then be used by different kinds of multimodal
|
|
integration components and algorithms to drive the process of
|
|
multimodal integration. In the processing of a composite multimodal
|
|
input, the result of applying a mode-specific interpretation
|
|
component to each of the individual modes will be EMMA markup
|
|
describing the possible interpretation of that input.</p>
|
|
</div>
|
|
<p>One way to build an EMMA representation of a spoken input such
|
|
as "zoom in here" is to use grammar rules in the W3C Speech
|
|
Recognition Grammar Specification [<a href="#SRGS">SRGS</a>] using
|
|
the Semantic Interpretation <span>[<a href="#SI">SISR</a>]</span>
|
|
tags to build the application semantics with the
|
|
<code>emma:hook</code> attribute. In this approach <span>[<a href=
|
|
"#ECMASCRIPT">ECMAScript</a>]</span> is specified in order to build
|
|
up an object representing the semantics. The resulting ECMAScript
|
|
object is then translated to XML.</p>
|
|
<p>For our example case of "zoom in here". The following SRGS rule
|
|
could be used. The <span>Semantic Interpretation for Speech
|
|
Recognition</span> specification <span>[<a href=
|
|
"#SI">SISR</a>]</span> provides a reserved property
|
|
<b>_nsprefix</b> for indicating the namespace to be used with an
|
|
attribute.</p>
|
|
<pre class="example">
|
|
<rule id="zoom">
|
|
zoom in here
|
|
<tag>
|
|
$.command = new Object();
|
|
$.command.action = "zoom";
|
|
$.command.location = new Object();
|
|
$.command.location._attributes = new Object();
|
|
$.command.location._attributes.hook = new Object();
|
|
$.command.location._attributes.hook._nsprefix = "emma";
|
|
$.command.location._attributes.hook._value = "ink";
|
|
$.command.location.type = "area";
|
|
</tag>
|
|
</rule>
|
|
</pre>
|
|
<p>Application of this rule will result in the following ECMAScript
|
|
object being built.</p>
|
|
<pre class="example">
|
|
command: {
|
|
action: "zoom"
|
|
location: {
|
|
_attributes: {
|
|
hook: {
|
|
_nsprefix: "emma"
|
|
_value: "ink"
|
|
}
|
|
}
|
|
type: "area"
|
|
}
|
|
}
|
|
</pre>
|
|
<p><a href="#SI">SI</a> processing in an XML environment would
|
|
generate the following document:</p>
|
|
<pre class="example">
|
|
<command>
|
|
<action>zoom</action>
|
|
<location emma:hook="ink">
|
|
<type>area</type>
|
|
</location>
|
|
</command>
|
|
</pre>
|
|
<p>This XML fragment might then appear within an EMMA document as
|
|
follows:</p>
|
|
<pre class="example">
|
|
<emma:emma version="1.0"
|
|
xmlns:emma="http://www.w3.org/2003/04/emma"
|
|
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
|
xsi:schemaLocation="http://www.w3.org/2003/04/emma
|
|
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
|
|
xmlns="http://www.example.com/example">
|
|
<emma:interpretation id="voice1"
|
|
emma:medium="acoustic"
|
|
emma:mode="voice">
|
|
<command>
|
|
<action>zoom</action>
|
|
<location emma:hook="ink">
|
|
<type>area</type>
|
|
</location>
|
|
</command>
|
|
</emma:interpretation>
|
|
</emma:emma>
|
|
</pre>
|
|
<p>The <code>emma:hook</code> annotation indicates that this speech
|
|
input needs to be combined with ink input such as the
|
|
following:</p>
|
|
<pre class="example">
|
|
<emma:emma version="1.0"
|
|
xmlns:emma="http://www.w3.org/2003/04/emma"
|
|
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
|
xsi:schemaLocation="http://www.w3.org/2003/04/emma
|
|
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
|
|
xmlns="http://www.example.com/example">
|
|
<emma:interpretation id="pen1"
|
|
emma:medium="tactile"
|
|
emma:mode="ink">
|
|
<location>
|
|
<type>area</type>
|
|
<points>42.1345 -37.128 42.1346 -37.120 ... </points>
|
|
</location>
|
|
</emma:interpretation>
|
|
</emma:emma>
|
|
|
|
</pre>
|
|
<p>This representation could be generated by a pen modality
|
|
component performing gesture recognition and interpretation. The
|
|
input to the component would be an <span>Ink Markup Language</span>
|
|
specification <span>[<a href="#InkML">INKML</a>]</span> of the ink
|
|
trace and the output would be the EMMA document above.</p>
|
|
<p>The combination will result in the following EMMA document for
|
|
the combined speech and pen multimodal input.</p>
|
|
<pre class="example">
|
|
<emma:emma version="1.0"
|
|
xmlns:emma="http://www.w3.org/2003/04/emma"
|
|
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
|
xsi:schemaLocation="http://www.w3.org/2003/04/emma
|
|
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
|
|
xmlns="http://www.example.com/example">
|
|
<emma:interpretation
|
|
emma:medium="acoustic tactile"
|
|
emma:mode="<span>voice ink</span>"
|
|
emma:process="http://example.com/myintegrator.xml">
|
|
<emma:derived-from resource="<span>http://example.com/voice1.emma/</span>#voice1" composite="true"/>
|
|
<emma:derived-from resource="<span>http://example.com/pen1.emma/</span>#pen1" composite="true"/>
|
|
<command>
|
|
<action>zoom</action>
|
|
<location>
|
|
<type>area</type>
|
|
<points>42.1345 -37.128 42.1346 -37.120 ... </points>
|
|
</location>
|
|
</command>
|
|
</emma:interpretation>
|
|
</emma:emma>
|
|
</pre>
|
|
<div>
|
|
<p>There are two components to the process of integrating these two
|
|
pieces of semantic markup. The first is to ensure that the two are
|
|
compatible; that is, that no semantic constraints are violated. The
|
|
second is to fuse the content from the two sources. In our example,
|
|
the <code><type>area</type></code> element is intended
|
|
to indicate that this speech command requires integration with an
|
|
area gesture rather than, for example, a line gesture, which would
|
|
have the subelement <code><type>line</type></code>.
|
|
This constraint needs to be enforced by whatever mechanism is
|
|
responsible for multimodal integration.</p>
|
|
<p>Many different techniques could be used for achieving this
|
|
integration of the semantic interpretation of the pen input, a
|
|
<code><location></code> element, with the corresponding
|
|
<code><location></code> element in the speech. The
|
|
<span><code>emma:hook</code></span> simply serves to indicate the
|
|
existence of this relationship.</p>
|
|
<p>One way to achieve both the compatibility checking and fusion of
|
|
content from the two modes is to use a well-defined general purpose
|
|
matching mechanism such as unification. <span>Graph unification
|
|
[</span><a href="#graphunification">Graph
|
|
unification</a><span>]</span> is a mathematical operation defined
|
|
over directed acylic graphs which captures both of the components
|
|
of integration in a single operation: the applications of the
|
|
semantic constraints and the fusing of content. One possible
|
|
semantics for the <code>emma:hook</code> markup indicates that
|
|
content from the required mode needs to be unified with that
|
|
position in the application semantics. In order to unify, two
|
|
elements must not have any conflicting values for subelements or
|
|
attributes. This procedure can be defined recursively so that
|
|
elements within the subelements must also not clash and so on. The
|
|
result of unification is the union of all of the elements and
|
|
attributes of the two elements that are being unified.</p>
|
|
<p>In addition to the unification operation, in the resulting
|
|
<code>emma:interpretation</code> the <code>emma:hook</code>
|
|
attribute needs to be removed and the <code>emma:mode</code>
|
|
attribute changed to <span>the list of the modes of the individual
|
|
inputs</span> <span>, e.g. <code>"voice ink"</code></span>.</p>
|
|
<p>Instead of the unification operation, for a specific application
|
|
semantics, integration could be achieved using some other algorithm
|
|
or script. The benefit of using the unification semantics for
|
|
<code>emma:hook</code> is that it provides a general purpose
|
|
mechanism for checking the compatibility of elements and fusing
|
|
them, whatever the specific elements are in the application
|
|
specific semantic representation.</p>
|
|
<p>The benefit of using the <code>emma:hook</code> annotation for
|
|
authors is that it provides an application independent method for
|
|
indicating where integration with content from another mode is
|
|
required. If a general purpose integration mechanism is used, such
|
|
as the unification approach described above, authors should be able
|
|
to use the same integration mechanism for a range of different
|
|
applications without having to change the integration rules or
|
|
logic. For each application the speech grammar rules [<a href=
|
|
"#SRGS">SRGS</a>] need to assign <code>emma:hook</code> to the
|
|
appropriate elements in the semantic representation of the speech.
|
|
The general purpose multimodal integration mechanism will use the
|
|
<code>emma:hook</code> annotations in order to determine where to
|
|
add in content from other modes. Another benefit of the
|
|
<code>emma:hook</code> mechanism is that it facilitates
|
|
interoperability among different multimodal integration components,
|
|
so long as they are all general purpose and utilize
|
|
<code>emma:hook</code> in order to determine where to integrate
|
|
content.</p>
|
|
<p>The following provides a more detailed example of the use of the
|
|
<code>emma:hook</code> annotation. In this example, spoken input is
|
|
combined with two <span>ink</span> gestures. The semantic
|
|
representation assigned to the spoken input "send this file to
|
|
this" indicates two locations where content is required from ink
|
|
input using <code>emma:hook="ink"</code>:</p>
|
|
<pre class="example">
|
|
<emma:emma version="1.0"
|
|
xmlns:emma="http://www.w3.org/2003/04/emma"
|
|
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
|
xsi:schemaLocation="http://www.w3.org/2003/04/emma
|
|
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
|
|
xmlns="http://www.example.com/example">
|
|
<emma:interpretation<span> id="voice2"
|
|
emma:medium="acoustic"
|
|
emma:mode="voice"
|
|
emma:tokens="send this file to this"
|
|
emma:start="1087995961500"
|
|
emma:end="1087995963542"</span>>
|
|
<command>
|
|
<action>send</action>
|
|
<arg1>
|
|
<object emma:hook="ink">
|
|
<type>file</type>
|
|
<number>1</number>
|
|
</object>
|
|
</arg1>
|
|
<arg2>
|
|
<object emma:hook="ink">
|
|
<number>1</number>
|
|
</object>
|
|
</arg2>
|
|
</command>
|
|
</emma:interpretation>
|
|
</emma:emma>
|
|
</pre>
|
|
<p>The user gesturing on the two locations on the display can be
|
|
represented using <code>emma:sequence</code>:</p>
|
|
<pre class="example">
|
|
<emma:emma version="1.0"
|
|
xmlns:emma="http://www.w3.org/2003/04/emma"
|
|
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
|
xsi:schemaLocation="http://www.w3.org/2003/04/emma
|
|
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
|
|
xmlns="http://www.example.com/example">
|
|
<emma:sequence<span> id="ink2"</span>>
|
|
<emma:interpretation <span>emma:start="1087995960500"
|
|
emma:end="1087995960900"<br />
|
|
emma:medium="tactile"
|
|
emma:mode="ink"</span>>
|
|
<object>
|
|
<type>file</type>
|
|
<number>1</number>
|
|
<id>test.pdf</id>
|
|
<object>
|
|
</emma:interpretation>
|
|
<emma:interpretation <span>emma:start="1087995961000"
|
|
emma:end="1087995961100"<br />
|
|
emma:medium="tactile"
|
|
emma:mode="ink"</span>>
|
|
<object>
|
|
<type>printer</type>
|
|
<number>1</number>
|
|
<id>lpt1</id>
|
|
<object>
|
|
</emma:interpretation>
|
|
</emma:sequence>
|
|
</emma:emma>
|
|
</pre>
|
|
<p>A general purpose unification-based multimodal integration
|
|
algorithm could use the <code>emma:hook</code> annotation as
|
|
follows. It identifies the elements marked with
|
|
<code>emma:hook</code> in document order. For each of those in
|
|
turn, it attempts to unify the element with the corresponding
|
|
element in order in the <code>emma:sequence</code>. Since none of
|
|
the subelements conflict, the unification goes through and as a
|
|
result, we have the following EMMA for the composite result:</p>
|
|
<pre class="example">
|
|
<emma:emma version="1.0"
|
|
xmlns:emma="http://www.w3.org/2003/04/emma"
|
|
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
|
xsi:schemaLocation="http://www.w3.org/2003/04/emma
|
|
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
|
|
xmlns="http://www.example.com/example">
|
|
<emma:interpretation<span> id="multimodal2"
|
|
emma:medium="acoustic tactile"
|
|
emma:mode="voice ink"
|
|
emma:tokens="send this file to this"
|
|
emma:process="http://example.com/myintegration.xml"
|
|
emma:start="1087995960500"
|
|
emma:end="1087995963542"</span>>
|
|
<emma:derived-from resource="<span>http://example.com/voice2.emma/</span>#voice2" composite="true"/>
|
|
<emma:derived-from resource="<span>http://example.com/ink2.emma/</span>#ink2" composite="true"/>
|
|
<command>
|
|
<action>send</action>
|
|
<arg1>
|
|
<object>
|
|
<type>file</type>
|
|
<number>1</number>
|
|
<id>test.pdf</id>
|
|
</object>
|
|
</arg1>
|
|
<arg2>
|
|
<object>
|
|
<type>printer</type>
|
|
<number>1</number>
|
|
<id>lpt1</id>
|
|
</object>
|
|
</arg2>
|
|
</command>
|
|
</emma:interpretation>
|
|
</emma:emma>
|
|
</pre></div>
|
|
<h2 id="appD">Appendix D. EMMA event interface</h2>
|
|
<p>This section is <span>I</span>nformative.</p>
|
|
<p>The W3C Document Object Model [<a href="#DOM">DOM</a>] defines
|
|
platform and language neutral interfaces that gives programs and
|
|
scripts the means to dynamically access and update the content,
|
|
structure and style of documents. DOM Events define a generic event
|
|
system which allows registration of event handlers, describes event
|
|
flow through a tree structure, and provides basic contextual
|
|
information for each event.</p>
|
|
<p>This section of the EMMA specification extends the DOM Event
|
|
interface for use with events that describe interpreted user input
|
|
in terms of a DOM Node for an EMMA document.</p>
|
|
<pre class="example">
|
|
// File: emma.idl
|
|
|
|
#ifndef _EMMA_IDL_
|
|
#define _EMMA_IDL_
|
|
|
|
#include "dom.idl"#include "views.idl"#include "events.idl"
|
|
#pragma prefix "dom.w3c.org"module emma
|
|
{
|
|
typedef dom::DOMString DOMString;
|
|
typedef dom::Node Node;
|
|
|
|
interface EMMAEvent : events::UIEvent {
|
|
readonly attribute dom::Node node;
|
|
void initEMMAEvent(in DOMString typeArg,
|
|
in boolean canBubbleArg,
|
|
in boolean cancelableArg,
|
|
in Node node);
|
|
};
|
|
};
|
|
|
|
#endif // _EMMA_IDL_
|
|
</pre>
|
|
<h2 id="appE">Appendix E. References</h2>
|
|
<h3 id="appE1">E.1 Normative references</h3>
|
|
<dl>
|
|
<dt id="BCP47">BCP47</dt>
|
|
<dd>A. Phillips and M. Davis, editors. <a href=
|
|
"http://www.rfc-editor.org/rfc/bcp/bcp47.txt">Tags for the
|
|
Identification of Languages</a>, IETF, September 2006.</dd>
|
|
<dt id="RFC3023">RFC3023</dt>
|
|
<dd>M. Murata et al.<span>,</span> editors. <a href=
|
|
"http://www.ietf.org/rfc/rfc3023.txt">XML Media Types</a>. IETF RFC
|
|
3023<span>, January 2001</span>.</dd>
|
|
<dt id="RFC2046">RFC2046</dt>
|
|
<dd>N. Freed and N. Borenstein<span>,</span> editors. <a href=
|
|
"http://www.ietf.org/rfc/rfc2046.txt">Multipurpose Internet Mail
|
|
Extensions (MIME) Part Two: Media Types</a>. IETF RFC 2046<span>,
|
|
November 1996</span>.</dd>
|
|
<dt><a id="ref-rfc2119" name="ref-rfc2119" shape=
|
|
"rect">RFC2119</a></dt>
|
|
<dd>S. Bradner, <span>e</span>ditor. <a href=
|
|
"http://www.ietf.org/rfc/rfc2119.txt">Key words for use in RFCs to
|
|
Indicate Requirement Levels</a>, IETF <span>RFC 2119</span>, March
|
|
1997.</dd>
|
|
<dt id="RFC3986">RFC3986</dt>
|
|
<dd>T. Berners-Lee et al.<span>,</span> editors. <a href=
|
|
"http://www.ietf.org/rfc/rfc3986.txt">Uniform Resource Identifier
|
|
(URI): Generic Syntax</a>. IETF RFC 3986<span>, January
|
|
2005</span>.</dd>
|
|
<dt id="RFC3987">RFC3987</dt>
|
|
<dd>M. Duerst and M. Suignard<span>,</span> editors. <a href=
|
|
"http://www.ietf.org/rfc/rfc3987.txt">Internationalized Resource
|
|
Identifiers (IRIs)</a>. IETF RFC 3987<span>, January
|
|
2005</span>.</dd>
|
|
<dt id="XML">XML</dt>
|
|
<dd>Tim Bray <span>et al.,</span> editors. <a href=
|
|
"http://www.w3.org/TR/2004/REC-xml11-20040204/">Extensible Markup
|
|
Language (XML) 1.1</a>. World Wide Web Consortium, <span>W3C
|
|
Recommendation,</span> 2004.</dd>
|
|
<dt id="XMLNS">XMLNS</dt>
|
|
<dd>Tim Bray <span>et al.</span>, editors<span>.</span> <a href=
|
|
"http://www.w3.org/TR/xml-names11/">Namespaces in XML 1.1</a>,
|
|
World Wide Web Consortium, <span>W3C Recommendation,</span>
|
|
200<span>6</span>.</dd>
|
|
<dt id="XSD1">XML Schema Structures</dt>
|
|
<dd>Henry S. Thompson <span>et al.</span>, editors. <a href=
|
|
"http://www.w3.org/TR/xmlschema-1/">XML Schema Part 1: Structures
|
|
Second Edition</a>, World Wide Web Consortium<span>, W3C
|
|
Recommendation</span>, 2004.</dd>
|
|
<dt id="XSD2">XML Schema Datatypes</dt>
|
|
<dd>Paul V. Biron <span>and</span> Ashok Malhotra, editors.
|
|
<a href="http://www.w3.org/TR/xmlschema-2/">XML Schema Part 2:
|
|
Datatypes Second Edition</a>, World Wide Web Consortium, <span>W3C
|
|
Recommendation,</span> 2004.</dd>
|
|
</dl>
|
|
<h3 id="appE2">E.2 Informative references</h3>
|
|
<dl>
|
|
<dt id="DOM">DOM</dt>
|
|
<dd><a href="http://www.w3.org/DOM/">Document Object Model</a>,
|
|
World Wide Web Consortium, 2005.</dd>
|
|
<dt id="ECMASCRIPT">ECMAScript</dt>
|
|
<dd><a href=
|
|
"http://www.ecma-international.org/publications/files/ECMA-ST/Ecma-262.pdf">
|
|
ECMAScript</a></dd>
|
|
<dt id="InkML">INKML</dt>
|
|
<dd>Yi-Min Chee, Max Froumentin, Stephen M. Watt, editors. <a href=
|
|
"http://www.w3.org/TR/InkML/">Ink Markup Language (InkML)</a>,
|
|
World Wide Web Consortium, W3C Working Draft, 2006.</dd>
|
|
<dt id="SI">SI<span>SR</span></dt>
|
|
<dd>Luc Van Tichelen <span>and Dave Burke</span>,
|
|
editor<span>s</span>. <a href=
|
|
"http://www.w3.org/TR/semantic-interpretation/">Semantic
|
|
Interpretation for Speech Recognition</a>, World Wide Web
|
|
Consortium, <span>W3C Proposed Recommendation, 2007</span>.</dd>
|
|
<dt id="SRGS">SRGS</dt>
|
|
<dd>Andrew Hunt, Scott McGlashan, editors. <a href=
|
|
"http://www.w3.org/TR/speech-grammar/">Speech Recognition Grammar
|
|
Specification Version 1.0</a>, World Wide Web Consortium<span>, W3C
|
|
Recommendation,</span> 2004.</dd>
|
|
<dt id="XFORMS">XFORMS</dt>
|
|
<dd><span>John M. Boyer et al., editors.</span> <a href=
|
|
"http://www.w3.org/TR/2006/REC-xforms-20060314/">XForms <span>1.0
|
|
(Second Edition)</span></a>, World Wide Web Consortium, <span>W3C
|
|
Recommendation,</span> 2006.</dd>
|
|
<dt id="RELAXNG">RELAX-NG</dt>
|
|
<dd><span>James Clark and Makoto Murata, editors.</span> <a href=
|
|
"http://www.oasis-open.org/committees/relax-ng/spec-20011203.html"><span>
|
|
RELAX NG Specification</span></a><span>, OASIS, Committee
|
|
Specification, 2001.</span></dd>
|
|
<dt id="EMMAreqs">EMMA Requirements</dt>
|
|
<dd>Stephane H. Maes and Stephen Potter, editors. <a href=
|
|
"http://www.w3.org/TR/EMMAreqs/">Requirements for EMMA</a>, World
|
|
Wide Web Consortium, <span>W3C Note,</span> 2003<span>.</span></dd>
|
|
<dt id="graphunification">Graph Unification</dt>
|
|
<dd>Bob Carpenter. <cite>The Logic of Typed Feature
|
|
Structures</cite>, Cambridge Tracts in Theoretical Computer Science
|
|
32, Cambridge University Press, 1992.</dd>
|
|
<dd>Kevin Knight. <cite>Unification: A Multidisciplinary
|
|
Survey</cite>, ACM Computing Surveys, 21(1), 1989.</dd>
|
|
<dd>Michael Johnston. <cite>Unification-based Multimodal
|
|
Parsing</cite>, Proceedings of Association for Computational
|
|
Linguistics, pp. 624-630, 1998.</dd>
|
|
<dt id="MMIF">MMI Framework</dt>
|
|
<dd>James A. Larson, T.V. Raman and Dave Raggett, editors. <a href=
|
|
"http://www.w3.org/TR/mmi-framework/">W3C Multimodal Interaction
|
|
Framework</a>, World Wide Web Consortium<span>, W3C Note</span>,
|
|
2003<span>.</span></dd>
|
|
<dt id="MMIreqs">MMI Requirements</dt>
|
|
<dd>Stephane H. Maes and Vijay Saraswat, editors. <a href=
|
|
"http://www.w3.org/TR/mmi-reqs/">Multimodal Interaction
|
|
Requirements</a>, World Wide Web Consortium<span>, W3C Note</span>,
|
|
2003<span>.</span></dd>
|
|
</dl>
|
|
<h2 id="appF">Appendix F. Changes since last draft</h2>
|
|
<p>This section is <span>I</span>nformative.</p>
|
|
<p>
|
|
Since the publication of the Proposed Recommendation of the EMMA
|
|
specification, the following minor editorial changes have been
|
|
added to the draft.
|
|
</p>
|
|
<ul>
|
|
<li>
|
|
Fixed wrong style of text.
|
|
(<a href="#s1.2">1.2 Terminology</a>)
|
|
</li>
|
|
|
|
<li>
|
|
Changed schemaLocation URI in example codes
|
|
from
|
|
"http://www.w3.org/TR/2008/PR-emma-20081215/"
|
|
to
|
|
"http://www.w3.org/TR/2009/REC-emma-20090210/".
|
|
(<a href="#s2">2. Structure of EMMA documents</a>,
|
|
<a href="#s3">3. EMMA structural elements</a>
|
|
and
|
|
<a href="#s4">4 EMMA annotations</a>)
|
|
</li>
|
|
|
|
<li>
|
|
Changed the note on the status of MIME type registration from
|
|
"being submitted to the IESG for review, approval, and registration
|
|
with IANA" to "registered with IANA at
|
|
http://www.iana.org/assignments/media-types/application/" because
|
|
the EMMA MIME type is registered with IANA.
|
|
(<a href="#appB">Appendix B</a>)
|
|
</li>
|
|
</ul>
|
|
|
|
<h2 id="appG">Appendix G. Acknowledgements</h2>
|
|
<p>This section is <span>I</span>nformative.</p>
|
|
<p>The editors would like to recognize the contributions of the
|
|
current and former members of the W3C Multimodal Interaction Group
|
|
<em>(listed in alphabetical order)</em>:</p>
|
|
<dl>
|
|
<dd>Kazuyuki Ashimura, W3C</dd>
|
|
<dd>Patrizio Bergallo, (until 2008, while at Loquendo)</dd>
|
|
<dd>Wu Chou, Avaya</dd>
|
|
<dd>Max Froumentin, (until 2006, while at W3C)</dd>
|
|
<dd>Katriina Halonen, Nokia</dd>
|
|
<dd>Jin Liu, T-Systems</dd>
|
|
<dd>Roberto Pieraccini, Speechcycle</dd>
|
|
<dd>Stephen Potter, Microsoft</dd>
|
|
<dd>Massimo Romanelli, DFKI</dd>
|
|
<dd>Yuan Shao, Canon</dd>
|
|
</dl>
|
|
</body>
|
|
</html>
|