Another abandoned server code base... this is kind of an ancestor of taskrambler.
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 
 
 

1018 lines
23 KiB

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<meta http-equiv="Content-Type" content=
"text/html; charset=iso-8859-1">
<title>Natural Language Processing Requirements for Voice Markup
Languages</title>
<style type="text/css">
body {
font-family: sans-serif;
margin-left: 10%;
margin-right: 5%;
color: black;
background-color: white;
background-attachment: fixed;
background-image: url(http://www.w3.org/StyleSheets/TR/WD.gif);
background-position: top left;
background-repeat: no-repeat;
font-family: Tahoma, Verdana, "Myriad Web", Syntax, sans-serif;
}
.unfinished { font-style: normal; background-color: #FFFF33}
.dtd-code { font-family: monospace;
background-color: #dfdfdf; white-space: pre;
border: #000000; border-style: solid;
border-top-width: 1px; border-right-width: 1px;
border-bottom-width: 1px; border-left-width: 1px; }
p.copyright {font-size: smaller}
h2,h3 {margin-top: 1em;}
.extra { font-style: italic; color: #338033 }
code {
color: green;
font-family: monospace;
font-weight: bold;
}
.example {
border: solid green;
border-width: 2px;
color: green;
font-weight: bold;
margin-right: 5%;
margin-left: 0;
}
.bad {
border: solid red;
border-width: 2px;
margin-left: 0;
margin-right: 5%;
color: rgb(192, 101, 101);
}
div.navbar { text-align: center; }
div.contents {
background-color: rgb(204,204,255);
padding: 0.5em;
border: none;
margin-right: 5%;
}
table {
margin-left: -4%;
margin-right: 4%;
font-family: sans-serif;
background: white;
border-width: 2px;
border-color: white;
}
th { font-family: sans-serif; background: rgb(204, 204, 153) }
td { font-family: sans-serif; background: rgb(255, 255, 153) }
.tocline { list-style: none; }
</style>
<link rel="stylesheet" type="text/css" href=
"http://www.w3.org/StyleSheets/TR/W3C-WD.css">
</head>
<body>
<div class="head">
<p><a href="http://www.w3.org/"><img class="head" src=
"http://www.w3.org/Icons/WWW/w3c_home.gif" alt="W3C"></a></p>
<h1 class="head">Natural Language Processing Requirements<br>
for Voice Markup Languages</h1>
<h3 class="notoc">W3C Working Draft <i>23 December 1999</i></h3>
<dl>
<dt>This version:</dt>
<dd><a href="http://www.w3.org/TR/1999/WD-voice-nlu-reqs-19991223">
http://www.w3.org/TR/1999/WD-voice-nlu-reqs-19991223</a></dd>
<dt>Latest version:</dt>
<dd><a href=
"http://www.w3.org/TR/voice-nlu-reqs">
http://www.w3.org/TR/voice-nlu-reqs</a></dd>
<dt>Editor:</dt>
<dd>Deborah Dahl</dd>
</dl>
<p class="copyright"><a href=
"http://www.w3.org/Consortium/Legal/ipr-notice#Copyright">
Copyright</a> &#169; 1999 <a href="http://www.w3.org/">
W3C</a><sup>&#174;</sup> (<a href=
"http://www.lcs.mit.edu/">MIT</a>, <a href=
"http://www.inria.fr/">INRIA</a>, <a href=
"http://www.keio.ac.jp/">Keio</a>), All Rights Reserved. <abbr
title="World Wide Web Consortium">W3C</abbr> <a href=
"http://www.w3.org/Consortium/Legal/ipr-notice#Legal_Disclaimer">
liability</a>, <a href=
"http://www.w3.org/Consortium/Legal/ipr-notice#W3C_Trademarks">
trademark</a>, <a href=
"http://www.w3.org/Consortium/Legal/copyright-documents">document
use</a> and <a href=
"http://www.w3.org/Consortium/Legal/copyright-software">software
licensing</a> rules apply.</p>
<hr>
</div>
<h2 class="notoc">Abstract</h2>
<p>The W3C Voice Browser working group aims to develop
specifications to enable access to the Web using spoken
interaction. This document is part of a set of requirements
studies for voice browsers, and provides details of the
requirements for natural language processing.</p>
<h2>Status of this document</h2>
<p>This document describes the requirements for natural language
processing for voice browsers, as a precursor to starting work on
specifications. Related requirement drafts are linked from the <a
href="/TR/1999/WD-voice-intro-19991223">introduction</a>. The
requirements are being released as working drafts but are not
intended to become proposed recommendations.</p>
<p>This specification is a Working Draft of the Voice Browser working
group for review by W3C members and other interested parties. This is
the first public version of this document. It is a draft document and
may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use W3C Working Drafts as reference
material or to cite them as other than "work in progress".</p>
<p>Publication as a Working Draft does not imply endorsement by
the W3C membership, nor of members of the Voice Browser working
groups. This is still a draft document and may be updated,
replaced or obsoleted by other documents at any time. It is
inappropriate to cite W3C Working Drafts as other than "work in
progress."</p>
<p>This document has been produced as part of the <a href=
"http://www.w3.org/Voice/">W3C Voice Browser Activity</a>,
following the procedures set out for the <a href=
"http://www.w3.org/Consortium/Process/">W3C Process</a>. The
authors of this document are members of the <a href=
"http://www.w3.org/Voice/Group">Voice Browser Working Group</a>.
This document is for public review. Comments should be sent to
the public mailing list &lt;<a href=
"mailto:www-voice@w3.org">www-voice@w3.org</a>&gt; (<a href=
"http://www.w3.org/Archives/Public/www-voice/">archive</a>) by
14th January 2000.</p>
<p>A list of current W3C Recommendations and other technical
documents can be found at <a href="http://www.w3.org/TR">
http://www.w3.org/TR</a>.</p>
<h1>Introduction</h1>
<p>The main goal of this subgroup is to establish a prioritized
list of requirements for &#160; natural language processing in a
voice browser environment.</p>
<p>The process will consist of the following steps:</p>
<ol type="1">
<li>Collect requirements on natural language processing.</li>
<li>Prioritize these requirements.</li>
<li>Distribute requirements to, and take feedback from, relevant
groups working on natural language in spoken dialog systems.</li>
<li>Define specifications for natural language processing
components, based on the feedback received.</li>
</ol>
<h2>0.1 Scope</h2>
<p>This document specifies requirements that define the
capabilities of any component of a voice browser system which
performs natural language interpretation, that is, the task of
determining and representing the content of a natural language
input from a user. Interpretation components include both
stand-alone natural language understanding (NLU) components which
receive text string results from a speech recognizer or keyboard
as well as speech recognizers that incorporate natural language
understanding functionality by returning &#160;interpretations
rather than, or in addition to,&#160;text strings.</p>
<h2>0.2 Interaction with Other Groups</h2>
<p>The activities of the Natural Language&#160; Requirements
Subgroup will be coordinated with the activities of the Grammar
Representation Subgroup, the Synthesis Markup Subgroup, and the
Dialog Subgroup.</p>
<h1>&#160; General Requirements</h1>
<p>The NLU system should be able to:</p>
<ol type="1">
<li>Return a message stating that it cannot interpret an input at
all. (must specify)</li>
<li>Return multiple pieces of information from multi-functional
utterances (must specify)</li>
<li>Return partial information if it is unable to completely
process an input. (must specify)</li>
<li>If partial information is returned, indicate how much of the
input was left unanalyzed. (nice to specify)</li>
<li>Be extensible in the sense that it should be possible to add
new types of utterances to the NLU specification. Specifically,
the system should be able to incorporate modular subdialogs.
(must specify)</li>
<li>Return a score reflecting its confidence in the overall
interpretation. The exact format of confidence scores remains to
be determined. It could be a rough scale or it could be
probabilities, for example. (should specify)</li>
<li>Return a score for each attribute, reflecting its confidence
in the interpretation of that attribute. (should specify)</li>
<li>Return multiple analyses (n-best) (nice to specify)</li>
</ol>
<h1>Input Requirements</h1>
<p>A standalone (i.e., not integrated with a speech recognizer)
NLU system should be able to:</p>
<ol type="1">
<li>Accept N-best ASR output with or without acoustic scores for
either the whole utterance, each word in an utterance or both.
(must specify)</li>
<li>Accept ASR output with out of vocabulary markers.(should
specify)</li>
<li>Accept an ASCII text representation of an utterance (should
specify)</li>
<li>Accept a word lattice as input.(must specify)</li>
<li>Accept prosodic notations on the ASR output. (nice to
specify)</li>
</ol>
<p>Any NLU system should be able to:</p>
<ol type="1">
<li>Dynamically switch to a different task model (must
specify)</li>
<li>Dynamically modify the task model; e.g. add or remove
possible choices from a slot (must specify)</li>
<li>Accept sequential multi-modal input (must specify)</li>
<li>Accept uncoordinated simultaneous multi-modal input (should
specify)</li>
<li>Accept coordinated simultaneous multi-modal input. For
example, the&#160; NLU system should be able to represent or
interpret a representation of&#160; the context&#160; so that
anaphoric expressions in the user's utterances which refer to
items in the context can be interpreted. The context can include
the speech context, including the system's utterances, as well as
the external context (e.g. <em>I'll take one of these</em>
(click)). (nice to specify)</li>
<li>Return ASR timing information to the dialog controller (nice
to specify).</li>
<li>Accept multi-modal time-stamped information (should
specify)</li>
</ol>
<h1>Task-specific information</h1>
<p>These requirements are intended to insure that the natural
language component is capable of representing results of
processing task-specific utterances.</p>
<p>An NLU system should be able to:</p>
<p>Represent task information:</p>
<ol type="1">
<li>Represent values for slots in a task model: <em>I want five
lines.</em> (must specify)</li>
<li>Support hierarchical attributes in task model:&#160; e.g. a
slot can itself be a frame. (must specify)</li>
<li>Represent interpretations of sentences with anaphora and
ellipsis. <em>I want two hamburgers, one with ketchup and one
without.</em> (must specify)</li>
<li>Represent deictic utterances, which require reference to the
non-linguistic context for their interpretation.<em> I want this.
(</em>must specify<em>).</em></li>
</ol>
<p>Represent meta-task information (all nice to specify)</p>
<ol type="1">
<li>Represent a request for a definition: <em>What does 'access
code' mean?</em></li>
<li>Represent a request for the status of a filled slot: <em>How
many lines did I ask for?</em></li>
<li>Represent information about a slot: <em>How many lines am I
allowed to ask for?</em></li>
<li>Represent a request for the possible fillers of a slot: <em>
What are my choices? Can I schedule a call on Sunday?</em></li>
<li>Represent a request for the status of all slots. <em>What
have I ordered so far?</em></li>
<li>Represent questions about possible, desirable, necessary and
conditional situations. <em>Can you pay my electric bill? Should
I order the chicken? Do I have to get a drink with the special?
If I stay over Saturday night will I get a lower fare?</em></li>
<li>Represent requests for explanation of a system response. <em>
Why?</em></li>
<li>Represent request for the amount of task remaining.<em> What
else do you need to know? Am I almost finished?</em></li>
</ol>
<h1>Generic Information about the Communication Process</h1>
<p>An NLU system should be able to represent meta-dialog
information having to do with the communication process.(all nice
to specify except as noted)</p>
<ol type="1">
<li>Represent utterances about the dialog: <em>I want to revisit
my previous answer. That's what I just said.</em></li>
<li>Represent a request for help.(must specify)</li>
<li>Represent a request to have the last prompt repeated</li>
<li>Represent requests to make the output louder, quieter,
faster, slower, change languages.</li>
<li>Represent requests to pause the dialog.</li>
<li>Represent utterances indicating that the user:
<ul>
<li>Didn't hear the prompt (must have)</li>
<li>Didn't understand the prompt (must have)</li>
<li>Refuses to supply the answer to a prompt.</li>
<li>Doesn't know the answer to the prompt</li>
</ul>
</li>
</ol>
<h1>Dialog Control</h1>
<p>The NLU system should be able to represent: (all must specify
except as noted)</p>
<ol type="1">
<li>A confirmation</li>
<li>A request to change the value of a slot due to a speech
recognition error, a user error, or the user changing his/her
mind. <em>I meant p.m.</em></li>
<li>A request to empty a slot that's already been filled. <em>
(forget X)</em></li>
<li>A request to stop or end the dialog</li>
<li>A request to start over.</li>
<li>A request to transfer to an operator.</li>
<li>A request to suspend and resume a dialog</li>
<li>An explicit request to switch to another task.</li>
<li>An implicit request to switch to another task. (nice to
specify).</li>
</ol>
<h1>Appendix: Sample Dialog with examples of a possible approach
to NLU representation.</h1>
<p>This is an example of a banking application with the user's
utterances annotated with an example of a possible NLU
representation, based on the following task model.</p>
<h3>Task Model/Frame:</h3>
<pre>
Identification:
Name:
Address:
Street:
City:
Zip Code:
Phone:
Action:
Transfer:
Source_account:
Destination_account:
Amount:
Value:
Currency:
Balance:
Account:
</pre>
<p class="c2">&#160;</p>
<table border="1" cellpadding="10" cellspacing="0" summary="3
columns, first with the dialog, second shows comments, the third
shows the output of the natural language processing">
<tbody>
<tr>
<td>
<p><strong>Dialog</strong></p>
</td>
<td>
<p><strong>Dialog Comments</strong></p>
</td>
<td>
<p><strong>NLU output</strong></p>
</td>
</tr>
<tr>
<td>
<p>1. System: "Welcome to Ajax Bank. Please tell me your name</p>
</td>
<td>
<p>&#160;</p>
</td>
<td>
<p>&#160;</p>
</td>
</tr>
<tr>
<td>
<p>2. User: I'm Jack Jones</p>
</td>
<td>&#160;</td>
<td>
<p><strong>Name: Jack Jones</strong></p>
</td>
</tr>
<tr>
<td>
<p>3. System: I know three people with the name Jack Jones, will
you tell my your address, please?</p>
</td>
<td>&#160;</td>
<td>&#160;</td>
</tr>
<tr>
<td>
<p>4. User: I live at 18773 Union Street in Carlton</p>
</td>
<td>&#160;</td>
<td>
<p>Address:</p>
<p>Street: 18773 Union St</p>
<p>City: Carlton</p>
</td>
</tr>
<tr>
<td>
<p>5. System: Then your phone number is 773-8427?</p>
</td>
<td>&#160;</td>
<td>
<p>&#160;</p>
</td>
</tr>
<tr>
<td>
<p>6. User: Yep, that's me.</p>
</td>
<td>&#160;</td>
<td>
<p><strong>Yes</strong></p>
</td>
</tr>
<tr>
<td>
<p>7. System: What would you like me to do?</p>
</td>
<td>
<p>&#160;</p>
</td>
<td>
<p>&#160;</p>
</td>
</tr>
<tr>
<td>
<p>8. User: Transfer $200 from my savings account to my checking
account.</p>
</td>
<td>&#160;</td>
<td>
<p><strong>Action: transfer</strong></p>
<p><strong>Source_account: savings</strong></p>
<p><strong>Destination_account: checking</strong></p>
<p><strong>Amount:</strong></p>
<p><strong>Value: 200</strong></p>
<p><strong>Currency: USD</strong></p>
</td>
</tr>
<tr>
<td>
<p>9. System: You have two savings accounts, your individual
savings account and a joint savings account with June Jones. From
which savings account should I transfer the $200 from?</p>
</td>
<td>&#160;</td>
<td>&#160;</td>
</tr>
<tr>
<td>
<p>10. User: From my individual account, not the joint
account</p>
</td>
<td>
<p>User clarifies by answering the question twice (once
identifying the correct account, and again by identifying the
incorrect account)</p>
</td>
<td>
<p><strong>Correction:</strong></p>
<p><strong>Source_account: individual</strong></p>
</td>
</tr>
<tr>
<td>
<p>11. System: I'm sorry, I can't do that. You have only $150 in
your savings account.</p>
</td>
<td>
<p>System enforces a business rule that says a user can't
transfer more money that what's in his/her account.</p>
</td>
<td>&#160;</td>
</tr>
<tr>
<td>
<p>12. User: OK, just transfer the $150.</p>
</td>
<td>
<p>User modifies the request by editing the amount to be
transferred.</p>
</td>
<td>
<p><strong>Correction:</strong></p>
<p><strong>Amount: Value: 150</strong></p>
<p><strong>Currency: USD</strong></p>
</td>
</tr>
<tr>
<td>
<p>13. System: OK, I have transferred $150 from your individual
saving account to your checking account. Is there anything else I
can do for you?</p>
</td>
<td>
<p>System confirms that the transaction is completed. Invite the
user to begin another transaction</p>
</td>
<td>
<p>&#160;</p>
</td>
</tr>
<tr>
<td>
<p>14. User: Yes, tell me how much money is in our joint saving
account</p>
</td>
<td>
<p>User asks a new question</p>
</td>
<td>
<p><strong>balance: joint account</strong></p>
</td>
</tr>
<tr>
<td>
<p>15. System: You have $600 in your joint savings account with
June Jones. What else can I do for you?</p>
</td>
<td>
<p>System correctly identifies the source because of the phrase
"our joint" savings account. The system prompt message changes so
that the user does not get tired of hearing the same prompt
message.</p>
</td>
<td>
<p>&#160;</p>
</td>
</tr>
<tr>
<td>
<p>16. User: Yes, transfer $50 from that account.</p>
</td>
<td>
<p>&#160;</p>
</td>
<td>
<p><strong>Action: transfer</strong></p>
<p><strong>Source_account: savings</strong></p>
<p><strong>Amount:</strong></p>
<p><strong>Value: 50</strong></p>
<p><strong>Currency: USD</strong></p>
</td>
</tr>
<tr>
<td>
<p>17. System: Do you want me to transfer $50 from your joint
account with June Jones to your checking account?</p>
</td>
<td>&#160;</td>
<td>&#160;</td>
</tr>
<tr>
<td>
<p>18. No, to my individual savings account.</p>
</td>
<td>
<p>&#160;</p>
</td>
<td>
<p><strong>Destination&#173;_account: individual
savings</strong></p>
</td>
</tr>
<tr>
<td>
<p>19. System: Do you want me to transfer $50 from your joint
savings account with June Jones to your individual savings
account?</p>
</td>
<td>
<p>System assumes that the source is still joint savings account
and that the amount is still $50</p>
</td>
<td>
<p>&#160;</p>
</td>
</tr>
<tr>
<td>
<p>20: User: Yes</p>
</td>
<td>
<p>&#160;</p>
</td>
<td>
<p><strong>Yes</strong></p>
</td>
</tr>
<tr>
<td>
<p>21.System: I've transferred $50 from your joint savings
account with June Jones to your individual savings account.</p>
</td>
<td>
<p>System confirms transaction is completed.</p>
</td>
<td>
<p>&#160;</p>
</td>
</tr>
<tr>
<td>
<p>22.System: Is there anything else I can do for you?</p>
</td>
<td>
<p>System prompts the user for a new transaction</p>
</td>
<td>
<p>&#160;</p>
</td>
</tr>
<tr>
<td>
<p>23. User: Yes, can you pay my electric bill?</p>
</td>
<td>
<p>User isn't actually asking to have the bill paid, just whether
the system can do it.</p>
</td>
<td>
<p><strong>Capability_query:</strong></p>
<p><strong>Action: transfer</strong></p>
<p><strong>Destination_account: electric bill</strong></p>
</td>
</tr>
<tr>
<td>
<p>24. System: Yes, I can transfer money to General Utilities</p>
</td>
<td>&#160;</td>
<td>
<p>&#160;</p>
</td>
</tr>
<tr>
<td>
<p>25: User: OK, pay it</p>
</td>
<td>&#160;</td>
<td>
<p><strong>Action: transfer</strong></p>
<p><strong>Destination_account: electric bill</strong></p>
</td>
</tr>
<tr>
<td>
<p>26: System, How much money should I transfer to General
Utilities?</p>
</td>
<td>
<p>System solicits a value for the amount parameter from the
user</p>
</td>
<td>
<p>&#160;</p>
</td>
</tr>
<tr>
<td>
<p>27: User: transfer $80 from my joint checking account</p>
</td>
<td>
<p>User responds with the value for the requested parameter, and
volunteers a parameter for another parameter, the source.</p>
</td>
<td>
<p><strong>Operation: transfer</strong></p>
<p><strong>Source_account: joint checking</strong></p>
<p><strong>Amount:</strong></p>
<p><strong>Value: 80</strong></p>
<p><strong>Currency: USD</strong></p>
</td>
</tr>
<tr>
<td>
<p>28: System: You want to transfer $80 from your joint checking
account to General Utilities?</p>
</td>
<td>
<p>System asks user to confirm entire transaction</p>
</td>
<td>
<p>&#160;</p>
</td>
</tr>
<tr>
<td>
<p>29: User: Make it so</p>
</td>
<td>
<p>The user watches Captain Picard on Star Trek and copies his
command phrasing.</p>
</td>
<td>
<p><strong>Yes</strong></p>
</td>
</tr>
<tr>
<td>
<p>30: System: Do you want me to do anything else?</p>
</td>
<td>
<p>System prompts user for another transaction</p>
</td>
<td>
<p>&#160;</p>
</td>
</tr>
<tr>
<td>
<p>31: User: Yes, transfer $800 from the IRS to my savings
account</p>
</td>
<td>
<p>User tries to do the impossible.</p>
</td>
<td>
<p><strong>Operation: transfer</strong></p>
<p><strong>Source_account: IRS</strong></p>
<p><strong>Destination_account: joint checking
account</strong></p>
</td>
</tr>
<tr>
<td>
<p>32: System: I am not allowed to transfer money from the IRS
account</p>
</td>
<td>
<p>system enforces business rules about source accounts.</p>
</td>
<td>
<p>&#160;</p>
</td>
</tr>
<tr>
<td>
<p>33: User: What can I do to the IRS?</p>
</td>
<td>
<p>While the user may be joking, the system treats this as a help
request</p>
</td>
<td>
<p><strong>Capability_query: action</strong></p>
<p><strong>actor: user</strong></p>
<p><strong>theme: IRS</strong></p>
<p><strong>If the request is not representable within the task
model, it may be useful for the system to fall back to a less
fully analyzed representation of the utterance, rather than
simply to fail to understand it.</strong></p>
</td>
</tr>
<tr>
<td>
<p>34: System: you may pay your taxes by transferring money to
the IRS</p>
</td>
<td>
<p>System explains what operations the user can perform with the
IRS account</p>
</td>
<td>
<p>&#160;</p>
</td>
</tr>
<tr>
<td>
<p>35: User: I don't want to do that.</p>
</td>
<td>
<p>User cancels a transaction in progress</p>
</td>
<td>
<p><strong>Cancel</strong></p>
</td>
</tr>
<tr>
<td>
<p>36. System: OK. What else can I do for you?</p>
</td>
<td>
<p>&#160;</p>
</td>
<td>
<p>&#160;</p>
</td>
</tr>
<tr>
<td>
<p>37: User: You're cute, can I call you?</p>
</td>
<td>
<p>User tries to make fun of the system</p>
</td>
<td>
<p><strong>did not understand</strong></p>
</td>
</tr>
<tr>
<td>
<p>38: System: I'm sorry, I didn't understand your request. Can
you please rephrase it?</p>
</td>
<td>
<p>&#160;</p>
</td>
<td>
<p>&#160;</p>
</td>
</tr>
<tr>
<td>
<p>29: No. Goodbye</p>
</td>
<td>
<p>&#160;</p>
</td>
<td>
<p><strong>Goodbye</strong></p>
</td>
</tr>
<tr>
<td>
<p>30: Thanks for using Ajax bank, Good bye.</p>
</td>
<td>
<p>User is logged off.</p>
</td>
<td>
<p>&#160;</p>
</td>
</tr>
</tbody>
</table>
<h2>Acknowledgments</h2>
<h3>Subgroup Members</h3>
<blockquote>Mike Brown, Lucent<br>
Carolina Di Cristo, Telecom Italia<br>
Deborah Dahl, Unisys<br>
Linda Dorrian, Productivity Works<br>
Robert Keiller, Canon<br>
Bill Ledingham, SpeechWorks<br>
Stephen Potter, Entropic<br>
Dave Raggett, HP and W3C<br>
Ramesh Sarukkai, Lernout and Hauspie<br>
Volker Steinbliss, Philips</blockquote>
</body>
</html>