You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
828 lines
25 KiB
828 lines
25 KiB
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
|
|
<html>
|
|
<head>
|
|
<meta http-equiv="Content-Type" content=
|
|
"text/html; charset=iso-8859-1">
|
|
<title>Grammar Requirements for Voice Markup</title>
|
|
<link rel="stylesheet" type="text/css" href=
|
|
"http://www.w3.org/StyleSheets/TR/W3C-WD.css">
|
|
<style type="text/css">
|
|
body {
|
|
font-family: sans-serif;
|
|
margin-left: 10%;
|
|
margin-right: 5%;
|
|
color: black;
|
|
background-color: white;
|
|
background-attachment: fixed;
|
|
background-image: url(http://www.w3.org/StyleSheets/TR/WD.gif);
|
|
background-position: top left;
|
|
background-repeat: no-repeat;
|
|
font-family: Tahoma, Verdana, "Myriad Web", Syntax, sans-serif;
|
|
}
|
|
.unfinished { font-style: normal; background-color: #FFFF33}
|
|
.dtd-code { font-family: monospace;
|
|
background-color: #dfdfdf; white-space: pre;
|
|
border: #000000; border-style: solid;
|
|
border-top-width: 1px; border-right-width: 1px;
|
|
border-bottom-width: 1px; border-left-width: 1px; }
|
|
p.copyright {font-size: smaller}
|
|
h2,h3 {margin-top: 1em;}
|
|
.extra { font-style: italic; color: #338033 }
|
|
code {
|
|
color: green;
|
|
font-family: monospace;
|
|
font-weight: bold;
|
|
}
|
|
.example {
|
|
border: solid green;
|
|
border-width: 2px;
|
|
color: green;
|
|
font-weight: bold;
|
|
margin-right: 5%;
|
|
margin-left: 0;
|
|
}
|
|
.bad {
|
|
border: solid red;
|
|
border-width: 2px;
|
|
margin-left: 0;
|
|
margin-right: 5%;
|
|
color: rgb(192, 101, 101);
|
|
}
|
|
div.navbar { text-align: center; }
|
|
div.contents {
|
|
background-color: rgb(204,204,255);
|
|
padding: 0.5em;
|
|
border: none;
|
|
margin-right: 5%;
|
|
}
|
|
table {
|
|
margin-left: -4%;
|
|
margin-right: 4%;
|
|
font-family: sans-serif;
|
|
background: white;
|
|
border-width: 2px;
|
|
border-color: white;
|
|
}
|
|
th { font-family: sans-serif; background: rgb(204, 204, 153) }
|
|
td { font-family: sans-serif; background: rgb(255, 255, 153) }
|
|
.tocline { list-style: none; }
|
|
</style>
|
|
</head>
|
|
<body>
|
|
<div class="head">
|
|
<p><a href="http://www.w3.org/"><img class="head" src=
|
|
"http://www.w3.org/Icons/WWW/w3c_home.gif" alt="W3C"></a></p>
|
|
|
|
<h1 class="head">Grammar Representation Requirements for Voice
|
|
Markup Languages</h1>
|
|
|
|
<h3 class="notoc">W3C Working Draft <i>23 December 1999</i></h3>
|
|
|
|
<dl>
|
|
<dt>This version:</dt>
|
|
|
|
<dd><a href=
|
|
"http://www.w3.org/TR/1999/WD-voice-grammar-reqs-19991223">
|
|
http://www.w3.org/TR/1999/WD-voice-grammar-reqs-19991223</a></dd>
|
|
|
|
<dt>Latest version:</dt>
|
|
|
|
<dd><a href=
|
|
"http://www.w3.org/TR/voice-grammar-reqs">
|
|
http://www.w3.org/TR/voice-grammar-reqs</a></dd>
|
|
|
|
<dt>Previous versions (Member-only):</dt>
|
|
|
|
<dd><a href=
|
|
"http://www.w3.org/Voice/Group/1999/grammar-reqs-19991116.html">
|
|
http://www.w3.org/Voice/Group/1999/grammar-reqs-19991116</a></dd>
|
|
|
|
<dt>Editor:</dt>
|
|
|
|
<dd>M. K. Brown, Bell Labs, Murray Hill, NJ</dd>
|
|
</dl>
|
|
|
|
<p class="copyright"><a href=
|
|
"http://www.w3.org/Consortium/Legal/ipr-notice#Copyright">
|
|
Copyright</a> © 1999 <a href="http://www.w3.org/">
|
|
W3C</a><sup>®</sup> (<a href=
|
|
"http://www.lcs.mit.edu/">MIT</a>, <a href=
|
|
"http://www.inria.fr/">INRIA</a>, <a href=
|
|
"http://www.keio.ac.jp/">Keio</a>), All Rights Reserved. <abbr
|
|
title="World Wide Web Consortium">W3C</abbr> <a href=
|
|
"http://www.w3.org/Consortium/Legal/ipr-notice#Legal_Disclaimer">
|
|
liability</a>, <a href=
|
|
"http://www.w3.org/Consortium/Legal/ipr-notice#W3C_Trademarks">
|
|
trademark</a>, <a href=
|
|
"http://www.w3.org/Consortium/Legal/copyright-documents">document
|
|
use</a> and <a href=
|
|
"http://www.w3.org/Consortium/Legal/copyright-software">software
|
|
licensing</a> rules apply.</p>
|
|
|
|
<hr>
|
|
</div>
|
|
|
|
<h2 class="notoc">Abstract</h2>
|
|
|
|
<p>The W3C Voice Browser working group aims to develop
|
|
specifications to enable access to the Web using spoken
|
|
interaction. This document is part of a set of requirements
|
|
studies for voice browsers, and provides details of the
|
|
requirements for grammars for speech recognition.</p>
|
|
|
|
<h2>Status of this document</h2>
|
|
|
|
<p>This document describes the requirements for grammars used for
|
|
speech recognition, as a precursor to starting work on
|
|
specifications. Related requirement drafts are linked from the <a
|
|
href="/TR/1999/WD-voice-intro-19991223">introduction</a>. The
|
|
requirements are being released as working drafts but are not
|
|
intended to become proposed recommendations.</p>
|
|
|
|
<p>This specification is a Working Draft of the Voice Browser working
|
|
group for review by W3C members and other interested parties. This is
|
|
the first public version of this document. It is a draft document and
|
|
may be updated, replaced, or obsoleted by other documents at any
|
|
time. It is inappropriate to use W3C Working Drafts as reference
|
|
material or to cite them as other than "work in progress".</p>
|
|
|
|
<p>Publication as a Working Draft does not imply endorsement by
|
|
the W3C membership, nor of members of the Voice Browser working
|
|
groups. This is still a draft document and may be updated,
|
|
replaced or obsoleted by other documents at any time. It is
|
|
inappropriate to cite W3C Working Drafts as other than "work in
|
|
progress."</p>
|
|
|
|
<p>This document has been produced as part of the <a href=
|
|
"http://www.w3.org/Voice/">W3C Voice Browser Activity</a>,
|
|
following the procedures set out for the <a href=
|
|
"http://www.w3.org/Consortium/Process/">W3C Process</a>. The
|
|
authors of this document are members of the <a href=
|
|
"http://www.w3.org/Voice/Group">Voice Browser Working Group</a>.
|
|
This document is for public review. Comments should be sent to
|
|
the public mailing list <<a href=
|
|
"mailto:www-voice@w3.org">www-voice@w3.org</a>> (<a href=
|
|
"http://www.w3.org/Archives/Public/www-voice/">archive</a>) by
|
|
14th January 2000.</p>
|
|
|
|
<p>A list of current W3C Recommendations and other technical
|
|
documents can be found at <a href="http://www.w3.org/TR">
|
|
http://www.w3.org/TR</a>.</p>
|
|
|
|
<h2>0. Introduction</h2>
|
|
|
|
<p>The main goal of this subgroup is to define a speech
|
|
recognition grammar specification language that will be generally
|
|
useful across a variety of speech platforms used in the context
|
|
of a dialog and synthesis markup environment. The process will
|
|
consist of four main steps:<br>
|
|
</p>
|
|
|
|
<ol>
|
|
<li>establish an appropriate set of requirements for grammar
|
|
specifications</li>
|
|
|
|
<li>evaluate existing grammar languages for satisfaction of
|
|
requirements</li>
|
|
|
|
<li>settle upon a language specification or modify as
|
|
necessary</li>
|
|
|
|
<li>deliver a specific language proposal to the full W3C working
|
|
group.</li>
|
|
</ol>
|
|
|
|
<br>
|
|
<p>The scope of issues discussed includes semantics and contexts
|
|
as well as natural language syntax. Therefore the activities of
|
|
the Grammar Representation Subgroup are to be coordinated with
|
|
the activities of both the Natural Language Subgroup and the
|
|
Dialog Subgroup.</p>
|
|
|
|
<p>The following eight main topic areas have been identified as
|
|
important:<br>
|
|
</p>
|
|
|
|
<ol>
|
|
<li>Natural Language Syntax</li>
|
|
|
|
<li>Large Vocabulary/Dictation</li>
|
|
|
|
<li>Grammar Contexts</li>
|
|
|
|
<li>Semantics</li>
|
|
|
|
<li>Post-Processing Issues</li>
|
|
|
|
<li>Efficiency Issues</li>
|
|
|
|
<li>XML Compatibility</li>
|
|
|
|
<li>Grammar Specification Language Syntax</li>
|
|
</ol>
|
|
|
|
<p>Each topic area consists of several issues that will be
|
|
discussed in detail in the following sections. Example
|
|
specifications presented in this document are for illustration
|
|
purposes only and do not necessarily represent recommended
|
|
formats.</p>
|
|
|
|
<h3>0.1 Terminology</h3>
|
|
|
|
<table border="1" cellpadding="6" width="85%" summary="first
|
|
column gives priority name, second its description">
|
|
<tr>
|
|
<th>BNF</th>
|
|
<td>Backus-Naur Format.</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<th>Context</th>
|
|
<td>A <b>context</b> is a subset of the full domain. A context
|
|
can possess <b>state</b>.</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<th>CFG</th>
|
|
<td>Context-Free Grammar.</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<th>Domain</th>
|
|
<td>The scope of task semantics over which the associated <b>
|
|
language</b> and associated attribute-values are meaningful.</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<th>Grammar</th>
|
|
<td>The representation of constraints defining the set of
|
|
allowable sentences in the <b>language</b>.</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<th>Language</th>
|
|
<td>The collection or set of sentences associated with a
|
|
particular <b>domain</b>. Language may refer to natural or
|
|
program language.</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<th>N-Best</th>
|
|
<td>Top N hypotheses; from speech recognition, in this case, but
|
|
could be from natural language processing.</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<th>N-Gram</th>
|
|
<td>Probabilistic grammar using conditional probabilities
|
|
P(w<sub>n</sub> | w<sub>n-1</sub> w<sub>n-2</sub> ...).</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<th>OOV</th>
|
|
<td>Out Of Vocabulary (words).</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<th>State</th>
|
|
<td>The current condition or value of variables and attributes of
|
|
a system.</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<th>URI</th>
|
|
<td>Universal Resource Identifier.</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<th>URL</th>
|
|
<td>Universal Resource Locator.</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<th>XML</th>
|
|
<td>Extensible Markup Language.</td>
|
|
</tr>
|
|
</table>
|
|
|
|
<h3>0.2 Symbology</h3>
|
|
|
|
<h4>Regular Expressions:</h4>
|
|
|
|
<table border="1" cellpadding="6" width="85%" summary="first
|
|
column gives feature name, second its description">
|
|
<tr>
|
|
<th>?</th>
|
|
<td>Postfix operator; Zero or one occurrence</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<th>*</th>
|
|
<td>Postfix operator; Zero or more occurrences</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<th>+</th>
|
|
<td>Postfix operator; One or more occurrences</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<th>()</th>
|
|
<td>Scoping symbols</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<th>( ... | ... )</th>
|
|
<td>Disjunction; exclusive OR</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<th>A:B</th>
|
|
<td>Acceptor token; input A yields output B</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<th>.</th>
|
|
<td><FILLER> or equivalent</td>
|
|
</tr>
|
|
</table>
|
|
|
|
<h4>Proposed Markup Tags (so far):</h4>
|
|
|
|
<table border="1" cellpadding="6" width="85%" summary="first
|
|
column gives element, second its description, and the third any
|
|
associated attributes">
|
|
<tr align="center">
|
|
<th><tt>Tag</tt></th>
|
|
<th>Definition</th>
|
|
<th>Attributes<br>
|
|
<i>(definitions in text)</i></th>
|
|
</tr>
|
|
|
|
<tr align="center">
|
|
<td><tt><ALPHABET= ... ></tt></td>
|
|
<td>Phonetic alphabet definition.</td>
|
|
<td>Phonetic alphabets.</td>
|
|
</tr>
|
|
|
|
<tr align="center">
|
|
<td><tt><FILLER></tt></td>
|
|
<td>Generic tag for OOV word(s).</td>
|
|
<td>None.</td>
|
|
</tr>
|
|
|
|
<tr align="center">
|
|
<td><tt><GRAMMAR> ... </GRAMMAR></tt></td>
|
|
<td>Grammar definition section.</td>
|
|
<td><tt>TYPE, ARY</tt></td>
|
|
</tr>
|
|
|
|
<tr align="center">
|
|
<td><tt><ITEM> ... </ITEM></tt></td>
|
|
<td>XML grammar rule item.</td>
|
|
<td>None.</td>
|
|
</tr>
|
|
|
|
<tr align="center">
|
|
<td><tt><N-GRAM ... > ... </N-GRAM></tt></td>
|
|
<td>N-gram token specifier.</td>
|
|
<td><tt>ARY, P, PBO</tt></td>
|
|
</tr>
|
|
|
|
<tr align="center">
|
|
<td><tt><RULE ... > ... </RULE></tt></td>
|
|
<td>XML format grammar rule.</td>
|
|
<td><tt>NAME</tt></td>
|
|
</tr>
|
|
</table>
|
|
|
|
<h3>0.3 Priorities</h3>
|
|
|
|
<p>The following priorities are used to indicate the level of
|
|
importance of each requirement in this document.</p>
|
|
|
|
<table border cellspacing="1" cellpadding="6" width="85%"
|
|
summary="term in first column, explanation in second">
|
|
<tr>
|
|
<th>Must Specify</th>
|
|
<td>The specification must define the feature.</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<th>Should Specify</th>
|
|
<td>The specification should define the feature, if
|
|
possible.</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<th>nice to specify</th>
|
|
<td>The specification may optionally define the feature.</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<th>Future Revision</th>
|
|
<td>The feature needs additional study before specification.</td>
|
|
</tr>
|
|
</table>
|
|
|
|
<h3>0.4 Subgroup Coordination</h3>
|
|
|
|
<p>The requirements and specification of the Grammar
|
|
Representation Subgroup will be coordinated with overlapping
|
|
requirements and specification of the Natural Language, Dialogue,
|
|
and Universal Access subgroups.<br>
|
|
</p>
|
|
|
|
<h2>1. Natural Language Syntax</h2>
|
|
|
|
<p>Specification of the natural language syntax in the grammar
|
|
representation shall conform to the following requirements:</p>
|
|
|
|
<h4>1.1 Context-Free Grammar Support (must specify)</h4>
|
|
|
|
<p>The grammar representation must support the definition of a
|
|
Context-Free Grammar (CFG) and, by subsumption, a Finite-State
|
|
Grammar (FSG). Some platforms will not support recursive rules so
|
|
content developers will need to be aware of specific product
|
|
limitations.</p>
|
|
|
|
<h4>1.2 CFG Specification (must specify)</h4>
|
|
|
|
<p>CFG's must be represented by specification in a well known
|
|
format. Each CFG rule will be a regular expression.</p>
|
|
|
|
<p class="extra">An example well-known CFG representation is
|
|
Backus-Naur Format (BNF) and a regular expression syntax is the
|
|
well known {<i>expr</i>}?, {<i>expr</i>}*, and {<i>expr</i>}+ for
|
|
optional, zero or more, and one or more expressions,
|
|
respectively, and the use of { <i>expr</i> | <i>expr</i> } for
|
|
exclusive OR.</p>
|
|
|
|
<h4>1.3 N-Gram Grammar Enabled (should specify)</h4>
|
|
|
|
<p>The grammar representation should enable the definition of an
|
|
N-Gram Grammar (NGG).</p>
|
|
|
|
<p></p>
|
|
|
|
<p class="extra">An example format is described in the
|
|
Appendix.</p>
|
|
|
|
<h4>1.4 Out-of-Vocabulary Words (must specify)</h4>
|
|
|
|
<p>The grammar representation must support the processing of
|
|
out-of-vocabulary (OOV) words and define a method for
|
|
representing such words. The content developer can specify the
|
|
action to be taken upon encountering OOV words.</p>
|
|
|
|
<h4>1.5 Disfluency and Noise Management (should specify)</h4>
|
|
|
|
<p>The grammar representation should support the handling of
|
|
disfluency and noise.</p>
|
|
|
|
<p class="extra">An example token syntax for a speech noise
|
|
absorbing model is <tt><FILLER></tt>.</p>
|
|
|
|
<h4>1.6 Notational Convenience (nice to specify, future
|
|
study)</h4>
|
|
|
|
<p>The grammar representation might allow for certain syntactic
|
|
conveniences. An example is the permutation of items in a list of
|
|
N items taken M=N at a time.</p>
|
|
|
|
<p class="extra">An example format for permutation is:</p>
|
|
|
|
<blockquote class="extra">Symbols: A, B, C<br>
|
|
<tt>'A || B'</tt> means <tt>'AB | BA'</tt><br>
|
|
<tt>'A || B || C'</tt> means <tt>'ABC | ACB | BAC | BCA | CAB |
|
|
CBA'</tt></blockquote>
|
|
|
|
<h2>2. Large Vocabulary and Dictation</h2>
|
|
|
|
<p>Special consideration for large vocabularies shall include the
|
|
following:</p>
|
|
|
|
<h4>2.1 Large Vocabulary Definition (must specify; cf. section
|
|
1.3)</h4>
|
|
|
|
<p>The grammar representation must support the definition large
|
|
vocabularies suitable for dictation applications. 
|
|
Associated attributes of such grammars shall be made available to
|
|
the speech recognizer for improving interpretation
|
|
characteristics. Specifications for large vocabulary will not
|
|
preclude the definition of small grammars.</p>
|
|
|
|
<h4>2.2 Efficiency (must specify; cf. section 6)</h4>
|
|
|
|
<p>The grammar representation must not preclude efficient
|
|
processing of speech grammars with large vocabularies.<br>
|
|
</p>
|
|
|
|
<h2>3. Grammar Contexts</h2>
|
|
|
|
<p>Multiple grammar contexts shall be supported with the
|
|
following requirements:</p>
|
|
|
|
<h4>3.1 External Grammars (must specify)</h4>
|
|
|
|
<p>The grammar representation must support the inclusion of
|
|
grammars defined outside of the current context. Access to
|
|
grammar contexts shall be provided by a suitable reference
|
|
mechanism.</p>
|
|
|
|
<h4>3.2 Run-Time Definition (must specify)</h4>
|
|
|
|
<p>Grammar constraint rules must be re-definable in part or
|
|
entirety while the system is operating. Several mechanisms are
|
|
possible including, but not limited to, unconstrained
|
|
redefinition of inferior grammar rules, prior declaration of
|
|
volatile rules, partitioning of the rule space into static and
|
|
dynamic arenas, etc.  <br>
|
|
</p>
|
|
|
|
<h2>4. Semantics</h2>
|
|
|
|
<p>Semantic specifications are to be coordinated with the Natural
|
|
Language, Dialog and Universal Access subgroups. In many cases
|
|
semantic definitions required by these other groups will be
|
|
implemented as part of the specification language of associated
|
|
grammars.</p>
|
|
|
|
<h4>4.1 Semantics Support (must specify)</h4>
|
|
|
|
<p>The grammar representation must support the specification of
|
|
semantics in association with the grammar syntax.</p>
|
|
|
|
<h4>4.2 Semantic Tagging (must specify)</h4>
|
|
|
|
<p>The grammar representation must support the tagging of syntax
|
|
for semantic interpretation. Semantic values shall be returned as
|
|
attribute-values pairs.</p>
|
|
|
|
<h4>4.3 Attributes (must specify)</h4>
|
|
|
|
<p>The grammar representation must include attributes that can be
|
|
attached to data returned from the speech recognizer. Such
|
|
attributes can have multiple values, be used to indicate the
|
|
context for interpretation of recognizer output, and generally
|
|
pass semantic information to later processing stages The
|
|
specification must be consistent with the natural language
|
|
interpreter input format.</p>
|
|
|
|
<h4>4.4 Attribute Processing (must specify)</h4>
|
|
|
|
<p>A grammar referencing another grammar having attributes must
|
|
be capable of performing a [currently undefined] set of
|
|
operations upon the referenced attributes.</p>
|
|
|
|
<p></p>
|
|
|
|
<p class="extra">Examples of such processing include boolean
|
|
operations, string manipulation and attribute renaming.</p>
|
|
|
|
<h2>5. Post-Processing Issues</h2>
|
|
|
|
<h4>5.1 Confidence Scoring (must specify)</h4>
|
|
|
|
<p>The grammar representation must provide information for the
|
|
post-processing of recognition confidence scores with regard to
|
|
error rejection processing. Such information can include the
|
|
language model perplexity (high perplexity would typically reduce
|
|
confidence, and hence rejection threshold) or direct cues to
|
|
tighten or relax the normal rejection constraints to provide
|
|
content based control of performance.</p>
|
|
|
|
<h4>5.2 N-Best Hypotheses (must specify)</h4>
|
|
|
|
<p>The grammar representation must support the post-processing of
|
|
N-best output of recognition hypotheses. This requirement will be
|
|
coordinated with the Dialog subgroup.<br>
|
|
</p>
|
|
|
|
<h2>6. Efficiency Issues (cf. section 2.2)</h2>
|
|
|
|
<h4>6.1 Native Grammar Formats (must specify)</h4>
|
|
|
|
<p>The grammar representation must support the downloading of
|
|
native grammar formats for efficiency purposes. A binary
|
|
reference format can be defined for this purpose. Native formats
|
|
will be useful when content is specifically written for
|
|
particular platforms.</p>
|
|
|
|
<h4>6.2 Grammar Libraries (should specify)</h4>
|
|
|
|
<p>The grammar representation should support the use of grammar
|
|
libraries, alternatively called grammar objects, that can contain
|
|
prepackaged collections of sub-grammars to be included in higher
|
|
level grammar constructs. Such libraries will be accessible via
|
|
the naming conventions, may contain symbol tables for efficient
|
|
reference resolution, may be imported and be designated to remain
|
|
resident in a platform.<br>
|
|
</p>
|
|
|
|
<h2>7. XML Compatibility</h2>
|
|
|
|
<h4>7.1 XML Embedding (must specify)</h4>
|
|
|
|
<p>The grammar representation must support easy embedding of
|
|
grammars into XML.</p>
|
|
|
|
<h4>7.2 Pure XML Format (must specify)</h4>
|
|
|
|
<p>A pure XML format for specification of grammars, including
|
|
CFG's, must be supported. XML grammar specifications must be
|
|
capable of expressive power equivalent to BNF specifications.</p>
|
|
|
|
<p><br>
|
|
</p>
|
|
|
|
<h2>8. Grammar Specification Language Syntax</h2>
|
|
|
|
<p>This is a general requirements section into which all other
|
|
requirements will eventually migrate as the representation syntax
|
|
is defined to satisfy those requirements.</p>
|
|
|
|
<h4>8.1 Understandability (must specify)</h4>
|
|
|
|
<p>The grammar representation must be easy to understand, using
|
|
well known methods for specifying the various elements.</p>
|
|
|
|
<p class="extra">Backus-Naur Format is an example of finite-state
|
|
and context-free grammar representations.</p>
|
|
|
|
<p class="extra">A modified form of the well known MIT format is
|
|
an example format for representation of N-gram grammars (cf.
|
|
Appendix).</p>
|
|
|
|
<h4>8.2 Mixed-Mode Grammars (should specify)</h4>
|
|
|
|
<p>The grammar representation should support the simultaneous
|
|
mixing of finite-state, context-free, and N-gram grammars.</p>
|
|
|
|
<h4>8.3 Language Extensions (should specify)</h4>
|
|
|
|
<p>The grammar representation should support the extension of the
|
|
grammar representation in an obvious manner.</p>
|
|
|
|
<h4>8.4 Grammar Naming (must specify)</h4>
|
|
|
|
<p>The grammar representation must support the naming of
|
|
grammars. Reference to full grammars and rules within grammars
|
|
shall be supported by a suitable multi-part naming mechanism.
|
|
Easy name resolution and overloading shall be supported. A
|
|
namespace mechanism to avoid naming conflicts shall be supported.
|
|
Such reference shall include reference by Universal Resource
|
|
Identifier (URI). Attribute fields shall be included in the
|
|
naming format.</p>
|
|
|
|
<h4>8.5 Native Natural Language (must specify)</h4>
|
|
|
|
<p>The grammar representation must support the specification of a
|
|
native language or locale. This specification can be embedded
|
|
within a grammar rule to change the native language in
|
|
mid-sentence.</p>
|
|
|
|
<p class="extra">An example syntax for the specification of
|
|
English: <tt><b>xml:lang="en"</b></tt>.</p>
|
|
|
|
<h4>8.6 Rule Weighting (must specify)</h4>
|
|
|
|
<p>The grammar representation must support the weighting of
|
|
grammar rules in the CFG format. Weighting is implicit in the
|
|
N-Gram format.</p>
|
|
|
|
<h4>8.7 Phonetic Pronunciation (must specify)</h4>
|
|
|
|
<p>The grammar representation must support the inclusion of
|
|
phonetic pronunciation rules. This information may override
|
|
default rules defined by the speech processing platform. The
|
|
thorough definition of this subject will be the charter of
|
|
another subgroup.</p>
|
|
|
|
<p></p>
|
|
|
|
<p class="extra">An example format is:</p>
|
|
|
|
<blockquote class="extra">a tag identifying the phonetic alphabet
|
|
in use e.g.<br>
|
|
<tt><alphabet=[arpabet|sampa|vendorspecifi]></tt>.</blockquote>
|
|
|
|
<h4>8.8 Grammar File Inclusion (must specify)</h4>
|
|
|
|
<p>The grammar representation must support the inclusion of other
|
|
grammar files referenced by name via a Universal Resource
|
|
Identifier (URI). This inclusion method is distinguished from
|
|
grammar reference by symbol.</p>
|
|
|
|
<h4>8.9 Comments (must specify)</h4>
|
|
|
|
<p>The grammar representation must include a commenting
|
|
mechanism. This mechanism can be provided by HTML or XML
|
|
commenting formats.</p>
|
|
|
|
<h4>8.10 Character Encodings (must specify)</h4>
|
|
|
|
<p>The grammar representation must support the use of character
|
|
encoding for foreign language support.</p>
|
|
|
|
<p></p>
|
|
|
|
<p class="extra">Example formats can include Unicode and JIS. XML
|
|
character encoding can be used for XML grammar
|
|
specifications.</p>
|
|
|
|
<h4>8.11 Recognizer Timeout Periods (should specify)</h4>
|
|
|
|
<p>The grammar representation should support the specification of
|
|
time limits inherently related to grammar characteristics. 
|
|
Such inherent characteristics can include the expected (typically
|
|
maximum) times required to normally speak a sentence from the
|
|
grammar. Such time limits can directly indicate the maximum
|
|
sentence length in the grammar and may include, but not be
|
|
limited to: maximum initial silence waiting time, minimum spoken
|
|
utterance time, maximum spoken utterance time, and maximum
|
|
intra-sentence silence time (for ASR endpointing).</p>
|
|
|
|
<h2>Appendix - Additional Examples</h2>
|
|
|
|
<h4>N-Gram Grammar Format</h4>
|
|
|
|
<p>An example format is derived from the MIT N-gram format as
|
|
follows:</p>
|
|
|
|
<p>Here is a brief description of the MIT bigram file format:</p>
|
|
|
|
<blockquote><tt>
|
|
...comments...<br>
|
|
ngram 1=A<br>
|
|
ngram 2=B<br>
|
|
...comments...<br>
|
|
P(w<sub>1</sub>) w<sub>1</sub> P<sub>bo</sub>(w<sub>1</sub>)<br>
|
|
    ...<br>
|
|
...comments...<br>
|
|
P(w<sub>2</sub>|w<sub>1</sub>) w<sub>1</sub> w<sub>2</sub><br>
|
|
    ...
|
|
</tt></blockquote>
|
|
|
|
<p>where A is the number of unigrams, B is the number of bigrams,
|
|
P(w<sub>1</sub>) is the unigram probability of word w<sub>1</sub>
|
|
(symbol), P<sub>bo</sub>(w<sub>1</sub>) is the corresponding
|
|
back-off probability, P(w<sub>2</sub>|w<sub>1</sub>) is the
|
|
probability of w<sub>2</sub> conditioned on prior w<sub>1</sub>.
|
|
Start and end of sentence are indicated by the '#' symbol, such
|
|
as P(#|w<sub>i</sub>) to indicate a sentence begins with
|
|
w<sub>i</sub> and P(w<sub>j</sub>|#) to indicate the sentence
|
|
ends with w<sub>j</sub>.</p>
|
|
|
|
<p>To adapt this to arbitrary N-grams we need to either indicate
|
|
N or define a section end marker. We can also provide data type
|
|
markers such as <GRAMMAR> and </GRAMMAR>. How about a
|
|
start marker <GRAMMAR TYPE="N-GRAM" ARY=[number]> to
|
|
indicate the n-ary of the grams. Comments can follow HTML style.
|
|
An alternative to specifying 1=A, 2=B, etc. is to use markers to
|
|
identify data.  Examples:</p>
|
|
|
|
<p>(Feedback: N-gram span is implicit within the format.)</p>
|
|
|
|
<pre>
|
|
<N-GRAM ARY="1" P="0.01" PBO="0.001">
|
|
"word"
|
|
</N-GRAM>
|
|
<N-GRAM ARY="2" P="0.01">
|
|
"word list"
|
|
</N-GRAM>
|
|
</pre>
|
|
|
|
<p>(Feedback: This may be too verbose.)</p>
|
|
|
|
<p>in which case we could intermix the types, but this would make
|
|
it more difficult for systems designers to automatically allocate
|
|
resources. Another issue is whether we want to allow the symbols
|
|
to be word phrases, so I suppose we could require quote marks
|
|
around symbol strings for this (as shown).</p>
|
|
|
|
<h2>Acknowledgments</h2>
|
|
<h3>Subgroup Members</h3>
|
|
|
|
<blockquote>Michael Brown (Bell Labs)<br>
|
|
Deborah Dahl (Unisys)<br>
|
|
Charles Hemphill (Conversa)<br>
|
|
Andrew Hunt (Sun Labs)<br>
|
|
Robert Keiller (Canon)<br>
|
|
Tetsuo Kosaka (Canon)<br>
|
|
James Larson (Intel)<br>
|
|
William Ledingham (SpeechWorks)<br>
|
|
Bruce Lucas (IBM)<br>
|
|
Jens Marschner (Philips)<br>
|
|
Scott McGlashen (PipeBeach)<br>
|
|
Michael Phillips (SpeechWorks)<br>
|
|
Stephen Potter (Entropic)<br>
|
|
David Raggett (W3C/HP)<br>
|
|
Ramesh Sarukkai (L&H)<br>
|
|
Frank Scahill (BT Labs)<br>
|
|
Volker Steinbiss (Philips)<br>
|
|
George White (General Magic)</blockquote>
|
|
|
|
</body>
|
|
</html>
|
|
|