You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
4014 lines
141 KiB
4014 lines
141 KiB
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"
|
|
"http://www.w3.org/TR/html4/strict.dtd">
|
|
|
|
<html lang=en>
|
|
<head>
|
|
<title>CSS Speech Module</title>
|
|
<meta content="text/html; charset=utf-8" http-equiv=Content-Type>
|
|
<link href=default.css rel=stylesheet type="text/css">
|
|
|
|
<style type="text/css">
|
|
p
|
|
{
|
|
padding-bottom : 1em;
|
|
}
|
|
|
|
p + p
|
|
{
|
|
text-indent : 0;
|
|
}
|
|
|
|
*:target
|
|
{
|
|
border : 1px dashed #66CC66;
|
|
}</style>
|
|
<!--
|
|
.prod
|
|
{
|
|
font-family : inherit;
|
|
font-size : inherit
|
|
}
|
|
|
|
pre.prod
|
|
{
|
|
white-space : pre-wrap;
|
|
margin : 1em 0 1em 2em
|
|
}
|
|
|
|
code
|
|
{
|
|
font-size : inherit;
|
|
}
|
|
|
|
#box-shadow-samples td
|
|
{
|
|
background : white;
|
|
color : black;
|
|
}
|
|
|
|
caption
|
|
{
|
|
text-align : left;
|
|
font-weight : bold
|
|
}
|
|
|
|
.note
|
|
{
|
|
font-style : italic
|
|
}
|
|
|
|
.issue
|
|
{
|
|
color : maroon;
|
|
font-style : italic
|
|
}
|
|
|
|
div.example pre
|
|
{
|
|
color : green;
|
|
margin-left : 2em
|
|
}
|
|
|
|
dl
|
|
{
|
|
margin-left : 2em
|
|
}
|
|
|
|
caption dfn
|
|
{
|
|
font-size : 120%
|
|
}
|
|
-->
|
|
<link href="http://www.w3.org/StyleSheets/TR/W3C-WD" rel=stylesheet
|
|
type="text/css">
|
|
|
|
<body>
|
|
<div class=head> <!--begin-logo-->
|
|
<p><a href="http://www.w3.org/"><img alt=W3C height=48
|
|
src="http://www.w3.org/Icons/w3c_home" width=72></a> <!--end-logo-->
|
|
|
|
<h1 id=top>CSS Speech Module</h1>
|
|
|
|
<h2 class="no-num no-toc" id=longstatus-date>W3C Working Draft 18 August
|
|
2011</h2>
|
|
|
|
<dl>
|
|
<dt>This version:
|
|
|
|
<dd> <a
|
|
href="http://www.w3.org/TR/2011/WD-css3-speech-20110818">http://www.w3.org/TR/2011/WD-css3-speech-20110818</a>
|
|
|
|
|
|
<dt>Latest version:
|
|
|
|
<dd> <a
|
|
href="http://www.w3.org/TR/css3-speech">http://www.w3.org/TR/css3-speech</a>
|
|
|
|
|
|
<dt>Previous versions:
|
|
|
|
<dd> <a
|
|
href="http://www.w3.org/TR/2011/WD-css3-speech-20110419">http://www.w3.org/TR/2011/WD-css3-speech-20110419</a>
|
|
|
|
|
|
<dt id=editors-list>Editor:
|
|
|
|
<dd><a href="mailto:dweck@daisy.org">Daniel Weck</a> (<a
|
|
href="http://www.daisy.org">DAISY Consortium</a>)
|
|
|
|
<dt>Former editors:
|
|
|
|
<dd><a href="mailto:dsr@w3.org">Dave Raggett</a> (<a
|
|
href="http://www.w3.org/">W3C</a>/<a
|
|
href="http://www.canon.com/">Canon</a>)
|
|
|
|
<dd><a href="mailto:daniel@glazman.org">Daniel Glazman</a> (<a
|
|
href="http://www.disruptive-innovations.com/">Disruptive
|
|
Innovations</a>)
|
|
|
|
<dd><a href="mailto:csant@opera.com">Claudio Santambrogio</a> (<a
|
|
href="http://www.opera.com/">Opera Software</a>)
|
|
</dl>
|
|
<!--begin-copyright-->
|
|
<p class=copyright><a
|
|
href="http://www.w3.org/Consortium/Legal/ipr-notice#Copyright"
|
|
rel=license>Copyright</a> © 2011 <a
|
|
href="http://www.w3.org/"><acronym title="World Wide Web
|
|
Consortium">W3C</acronym></a><sup>®</sup> (<a
|
|
href="http://www.csail.mit.edu/"><acronym title="Massachusetts Institute
|
|
of Technology">MIT</acronym></a>, <a href="http://www.ercim.eu/"><acronym
|
|
title="European Research Consortium for Informatics and
|
|
Mathematics">ERCIM</acronym></a>, <a
|
|
href="http://www.keio.ac.jp/">Keio</a>), All Rights Reserved. W3C <a
|
|
href="http://www.w3.org/Consortium/Legal/ipr-notice#Legal_Disclaimer">liability</a>,
|
|
<a
|
|
href="http://www.w3.org/Consortium/Legal/ipr-notice#W3C_Trademarks">trademark</a>
|
|
and <a
|
|
href="http://www.w3.org/Consortium/Legal/copyright-documents">document
|
|
use</a> rules apply.</p>
|
|
<!--end-copyright-->
|
|
<hr title="Separator for header">
|
|
</div>
|
|
|
|
<h2 class="no-num no-toc" id=abstract>Abstract</h2>
|
|
|
|
<p>CSS (Cascading Style Sheets) is a language that describes the rendering
|
|
of markup documents (e.g. HTML, XML) on various supports, such as screen,
|
|
paper, speech, etc. The Speech module defines aural CSS properties that
|
|
enable authors to declaratively control the rendering of documents via
|
|
speech synthesis, and using optional audio cues. The feature set exposed
|
|
by this specification is designed to match the model described by the
|
|
Speech Synthesis Markup Language (SSML) Version 1.1 <a href="#SSML"
|
|
rel=biblioentry>[SSML]<!--{{!SSML}}--></a>. Note that this standard was
|
|
developed in cooperation with the <a href="http://www.w3.org/Voice/">Voice
|
|
Browser Activity</a>.
|
|
|
|
<h2 class="no-num no-toc" id=status>Status of this document</h2>
|
|
<!--begin-status-->
|
|
|
|
<p><em>This section describes the status of this document at the time of
|
|
its publication. Other documents may supersede this document. A list of
|
|
current W3C publications and the latest revision of this technical report
|
|
can be found in the <a href="http://www.w3.org/TR/">W3C technical reports
|
|
index at http://www.w3.org/TR/.</a></em>
|
|
|
|
<p>Publication as a Working Draft does not imply endorsement by the W3C
|
|
Membership. This is a draft document and may be updated, replaced or
|
|
obsoleted by other documents at any time. It is inappropriate to cite this
|
|
document as other than work in progress.
|
|
|
|
<p>The (<a
|
|
href="http://lists.w3.org/Archives/Public/www-style/">archived</a>) public
|
|
mailing list <a href="mailto:www-style@w3.org">www-style@w3.org</a> (see
|
|
<a href="http://www.w3.org/Mail/Request">instructions</a>) is preferred
|
|
for discussion of this specification. When sending e-mail, please put the
|
|
text “css3-speech” in the subject, preferably like this:
|
|
“[<!---->css3-speech<!---->] <em>…summary of
|
|
comment…</em>”
|
|
|
|
<p>This document was produced by the <a
|
|
href="http://www.w3.org/Style/CSS/members">CSS Working Group</a> (part of
|
|
the <a href="http://www.w3.org/Style/">Style Activity</a>).
|
|
|
|
<p>This document was produced by a group operating under the <a
|
|
href="http://www.w3.org/Consortium/Patent-Policy-20040205/">5 February
|
|
2004 W3C Patent Policy</a>. W3C maintains a <a
|
|
href="http://www.w3.org/2004/01/pp-impl/32061/status"
|
|
rel=disclosure>public list of any patent disclosures</a> made in
|
|
connection with the deliverables of the group; that page also includes
|
|
instructions for disclosing a patent. An individual who has actual
|
|
knowledge of a patent which the individual believes contains <a
|
|
href="http://www.w3.org/Consortium/Patent-Policy-20040205/#def-essential">Essential
|
|
Claim(s)</a> must disclose the information in accordance with <a
|
|
href="http://www.w3.org/Consortium/Patent-Policy-20040205/#sec-Disclosure">section
|
|
6 of the W3C Patent Policy</a>.</p>
|
|
<!--end-status-->
|
|
|
|
<p>This is a “last call” working draft, i.e., the working group expects
|
|
this to be the last call for comments before the document becomes a W3C
|
|
Candidate Recommendation. The <strong>deadline for comments</strong> is
|
|
<strong>30 September 2011</strong>.</p>
|
|
<!-- div class="issue">
|
|
<p>The following issues need to be discussed and require working group resolutions:</p>
|
|
<ul>
|
|
<li>
|
|
<a href="#issue-xxx">xxx</a>
|
|
</li>
|
|
</ul>
|
|
<p>The CSS WG maintains a separate <a href="http://www.w3.org/Style/CSS/Tracker/products/29"
|
|
>list of issues</a> for this module.</p>
|
|
</div -->
|
|
|
|
<p>The following features are at-risk and may be dropped at the end of the
|
|
CR period if there has not been enough interest from implementers:
|
|
‘<a href="#voice-balance"><code
|
|
class=property>voice-balance</code></a>’, ‘<a
|
|
href="#voice-duration"><code
|
|
class=property>voice-duration</code></a>’, ‘<a
|
|
href="#voice-pitch"><code class=property>voice-pitch</code></a>’,
|
|
‘<a href="#voice-range"><code
|
|
class=property>voice-range</code></a>’, and ‘<a
|
|
href="#voice-stress"><code class=property>voice-stress</code></a>’.
|
|
|
|
<p>The CSS Speech module is a community effort and if you would like to
|
|
help with implementation and driving the specification forward along the
|
|
W3C Recommendation track, please contact the editors.
|
|
|
|
<h2 class="no-num no-toc" id=contents>Table of contents</h2>
|
|
<!--begin-toc-->
|
|
|
|
<ul class=toc>
|
|
<li><a href="#introduction"><span class=secno>1. </span>Introduction</a>
|
|
<ul class=toc>
|
|
<li><a href="#design-goals"><span class=secno>1.1. </span>Design goals,
|
|
motivations</a>
|
|
|
|
<li><a href="#css21-rel"><span class=secno>1.2. </span>Relationship with
|
|
CSS2.1</a>
|
|
</ul>
|
|
|
|
<li><a href="#values"><span class=secno>2. </span>CSS values</a>
|
|
|
|
<li><a href="#example"><span class=secno>3. </span>Example</a>
|
|
|
|
<li><a href="#aural-model"><span class=secno>4. </span>The aural
|
|
formatting model</a>
|
|
|
|
<li><a href="#mixing-props"><span class=secno>5. </span>Mixing
|
|
properties</a>
|
|
<ul class=toc>
|
|
<li><a href="#mixing-props-voice-volume"><span class=secno>5.1.
|
|
</span>The ‘<code class=property>voice-volume</code>’
|
|
property</a>
|
|
|
|
<li><a href="#mixing-props-voice-balance"><span class=secno>5.2.
|
|
</span>The ‘<code class=property>voice-balance</code>’
|
|
property</a>
|
|
</ul>
|
|
|
|
<li><a href="#speaking-props"><span class=secno>6. </span>Speaking
|
|
properties</a>
|
|
<ul class=toc>
|
|
<li><a href="#speaking-props-speak"><span class=secno>6.1. </span>The
|
|
‘<code class=property>speak</code>’ property</a>
|
|
|
|
<li><a href="#speaking-props-speak-as"><span class=secno>6.2. </span>The
|
|
‘<code class=property>speak-as</code>’ property</a>
|
|
</ul>
|
|
|
|
<li><a href="#pause-props"><span class=secno>7. </span>Pause properties
|
|
</a>
|
|
<ul class=toc>
|
|
<li><a href="#pause-props-pause-before-after"><span class=secno>7.1.
|
|
</span>The ‘<code class=property>pause-before</code>’ and
|
|
‘<code class=property>pause-after</code>’ properties</a>
|
|
|
|
<li><a href="#pause-props-pause"><span class=secno>7.2. </span>The
|
|
‘<code class=property>pause</code>’ shorthand property</a>
|
|
|
|
<li><a href="#collapsing"><span class=secno>7.3. </span>Collapsing
|
|
pauses</a>
|
|
</ul>
|
|
|
|
<li><a href="#rest-props"><span class=secno>8. </span>Rest properties</a>
|
|
<ul class=toc>
|
|
<li><a href="#rest-props-rest-before-after"><span class=secno>8.1.
|
|
</span>The ‘<code class=property>rest-before</code>’ and
|
|
‘<code class=property>rest-after</code>’ properties</a>
|
|
|
|
<li><a href="#rest-props-rest"><span class=secno>8.2. </span>The
|
|
‘<code class=property>rest</code>’ shorthand property</a>
|
|
</ul>
|
|
|
|
<li><a href="#cue-props"><span class=secno>9. </span>Cue properties</a>
|
|
<ul class=toc>
|
|
<li><a href="#cue-props-cue-before-after"><span class=secno>9.1.
|
|
</span>The ‘<code class=property>cue-before</code>’ and
|
|
‘<code class=property>cue-after</code>’ properties</a>
|
|
|
|
<li><a href="#cue-props-cue"><span class=secno>9.2. </span>The
|
|
‘<code class=property>cue</code>’ shorthand property</a>
|
|
</ul>
|
|
|
|
<li><a href="#voice-char-props"><span class=secno>10. </span>Voice
|
|
characteristic properties</a>
|
|
<ul class=toc>
|
|
<li><a href="#voice-props-voice-family"><span class=secno>10.1.
|
|
</span>The ‘<code class=property>voice-family</code>’
|
|
property</a>
|
|
|
|
<li><a href="#voice-props-voice-rate"><span class=secno>10.2. </span>The
|
|
‘<code class=property>voice-rate</code>’ property</a>
|
|
|
|
<li><a href="#voice-props-voice-pitch"><span class=secno>10.3.
|
|
</span>The ‘<code class=property>voice-pitch</code>’
|
|
property</a>
|
|
|
|
<li><a href="#voice-props-voice-range"><span class=secno>10.4.
|
|
</span>The ‘<code class=property>voice-range</code>’
|
|
property</a>
|
|
|
|
<li><a href="#voice-props-voice-stress"><span class=secno>10.5.
|
|
</span>The ‘<code class=property>voice-stress</code>’
|
|
property</a>
|
|
</ul>
|
|
|
|
<li><a href="#duration-props"><span class=secno>11. </span>Voice duration
|
|
property</a>
|
|
<ul class=toc>
|
|
<li><a href="#mixing-props-voice-duration"><span class=secno>11.1.
|
|
</span>The ‘<code class=property>voice-duration</code>’
|
|
property</a>
|
|
</ul>
|
|
|
|
<li><a href="#lists"><span class=secno>12. </span>List items and counters
|
|
styles</a>
|
|
|
|
<li><a href="#content"><span class=secno>13. </span>Inserted and replaced
|
|
content</a>
|
|
|
|
<li><a href="#pronunciation"><span class=secno>14. </span> Pronunciation,
|
|
phonemes </a>
|
|
|
|
<li class=no-num><a href="#property-index">Appendix A — Property
|
|
index</a>
|
|
|
|
<li class=no-num><a href="#index">Appendix B — Index</a>
|
|
|
|
<li class=no-num><a href="#definitions">Appendix C — Definitions</a>
|
|
|
|
<ul class=toc>
|
|
<li class=no-num><a href="#glossary">Glossary</a>
|
|
|
|
<li class=no-num><a href="#conformance">Conformance</a>
|
|
|
|
<li class=no-num><a href="#exit">CR exit criteria</a>
|
|
</ul>
|
|
|
|
<li class=no-num><a href="#ack">Appendix D — Acknowledgements</a>
|
|
|
|
<li class=no-num><a href="#changes">Appendix E — Changes from
|
|
previous draft</a>
|
|
|
|
<li class=no-num><a href="#references">Appendix F — References</a>
|
|
<ul class=toc>
|
|
<li class=no-num><a href="#normative-references">Normative
|
|
references</a>
|
|
|
|
<li class=no-num><a href="#other-references">Other references</a>
|
|
</ul>
|
|
</ul>
|
|
<!--end-toc-->
|
|
|
|
<h2 id=introduction><span class=secno>1. </span>Introduction</h2>
|
|
|
|
<p class=note>Note that this entire section is non-normative.
|
|
|
|
<h3 id=design-goals><span class=secno>1.1. </span>Design goals, motivations</h3>
|
|
|
|
<p>The aural rendering of a document combines speech synthesis (also known
|
|
as "TTS", the acronym for "Text to Speech") and auditory icons (which we
|
|
refer to as "audio cues" in this specification). The aural presentation of
|
|
information is common amongst communities of users who are blind or
|
|
visually-impaired. For instance, "screen readers" enable control of visual
|
|
user-interfaces that would otherwise be inaccessible. There are other
|
|
cases whereby listening to textual information (as opposed to reading it)
|
|
is a necessity. Typical examples include in-car use of an e-book reader,
|
|
industrial and medical documentation systems, home entertainment, helping
|
|
users to learn reading, or supporting users who have reading difficulties
|
|
(print disabilities).
|
|
|
|
<p> When it comes to documents, the quality of the speech rendition depends
|
|
on the structure and semantics authored within the content itself. The CSS
|
|
Speech module provides properties that enable authors to declaratively
|
|
control presentational aspects of the aural dimension (e.g. TTS voice,
|
|
pitch, rate, and volume levels). These style sheet properties can be used
|
|
together with visual properties (mixed media), or as a complete aural
|
|
alternative to a visual presentation.
|
|
|
|
<p> Content creators can conditionally include CSS properties dedicated to
|
|
user-agents with text to speech synthesis capabilities, by specifying the
|
|
"speech" media type via the <code>media</code> attribute of the
|
|
<code>link</code> element, or with the <code>@media</code> at-rule, or
|
|
within an <code>@import</code> statement. When doing so, the styles
|
|
authored within the scope of such conditional statements are ignored by
|
|
user-agents that do not support this module.
|
|
|
|
<h3 id=css21-rel><span class=secno>1.2. </span>Relationship with CSS2.1</h3>
|
|
|
|
<p> The CSS Speech module is a re-work of the informative CSS2.1 Aural
|
|
appendix, within which the "aural" media type was described, but also
|
|
deprecated (in favor of the "speech" media type). Although the <a
|
|
href="#CSS21" rel=biblioentry>[CSS21]<!--{{!CSS21}}--></a> specification
|
|
reserves the "speech" media type, it doesn't actually define the
|
|
corresponding properties. This module describes the CSS properties that
|
|
apply to the "speech" media type, and defines a new "box" model
|
|
specifically for the aural dimension.
|
|
|
|
<h2 id=values><span class=secno>2. </span>CSS values</h2>
|
|
|
|
<p>This specification follows the <a
|
|
href="http://www.w3.org/TR/CSS21/about.html#property-defs">CSS property
|
|
definition conventions</a> from <a href="#CSS21"
|
|
rel=biblioentry>[CSS21]<!--{{!CSS21}}--></a>. Value types not defined in
|
|
this specification are defined in CSS Value and Units Level 3 <a
|
|
href="#CSS3VAL" rel=biblioentry>[CSS3VAL]<!--{{!CSS3VAL}}--></a>.
|
|
|
|
<p>In addition to the property-specific values listed in their definitions,
|
|
all properties defined in this specification also accept the <a
|
|
href="http://www.w3.org/TR/CSS21/cascade.html#value-def-inherit">inherit</a>
|
|
keyword as their property value. For readability it has not been repeated
|
|
explicitly.
|
|
|
|
<h2 id=example><span class=secno>3. </span>Example</h2>
|
|
|
|
<div class=example>
|
|
<p>This example shows how authors can tell the speech synthesizer to speak
|
|
HTML headings with a voice called "paul", using "moderate" emphasis
|
|
(which is more than normal) and how to insert an audio cue (pre-recorded
|
|
audio clip located at the given URL) before the start of TTS rendering
|
|
for each heading. In a stereo-capable sound system, paragraphs marked
|
|
with the CSS class "heidi" are rendered on the left audio channel (and
|
|
with a female voice, etc.), whilst the class "peter" corresponds to the
|
|
right channel (and to a male voice, etc.). The volume level of text spans
|
|
marked with the class "special" is lower than normal, and a prosodic
|
|
boundary is created by introducing a strong pause after it is spoken
|
|
(note how the <code>span</code> inherits the voice-family from its parent
|
|
paragraph).</p>
|
|
|
|
<pre>
|
|
h1, h2, h3, h4, h5, h6
|
|
{
|
|
voice-family: paul;
|
|
voice-stress: moderate;
|
|
cue-before: url(../audio/ping.wav);
|
|
voice-volume: medium 6dB;
|
|
}
|
|
p.heidi
|
|
{
|
|
voice-family: female;
|
|
voice-balance: left;
|
|
voice-pitch: high;
|
|
voice-volume: -6dB;
|
|
}
|
|
p.peter
|
|
{
|
|
voice-family: male;
|
|
voice-balance: right;
|
|
voice-rate: fast;
|
|
}
|
|
span.special
|
|
{
|
|
voice-volume: soft;
|
|
pause-after: strong;
|
|
}
|
|
|
|
...
|
|
|
|
<h1>I am Paul, and I speak headings.</h1>
|
|
<p class="heidi">Hello, I am Heidi.</p>
|
|
<p class="peter">
|
|
<span class="special">Can you hear me ?</span>
|
|
I am Peter.
|
|
</p></pre>
|
|
</div>
|
|
|
|
<h2 id=aural-model><span class=secno>4. </span>The aural formatting model</h2>
|
|
|
|
<p>The CSS formatting model for aural media is based on a sequence of
|
|
sounds and silences that occur within a nested context similar to the <a
|
|
href="#box-model-def">visual box model</a>, which we name the <dfn
|
|
id=aural-box-model>aural "box" model</dfn>. The aural "canvas" consists of
|
|
a two-channel (stereo) space and of a temporal dimension, within which
|
|
synthetic speech and audio cues coexist. The selected element is
|
|
surrounded by ‘<a href="#rest"><code
|
|
class=property>rest</code></a>’, ‘<a href="#cue"><code
|
|
class=property>cue</code></a>’ and ‘<a href="#pause"><code
|
|
class=property>pause</code></a>’ properties (from the innermost to
|
|
the outermost position). These can be seen as aural equivalents to
|
|
‘<code class=property>padding</code>’, ‘<code
|
|
class=property>border</code>’ and ‘<code
|
|
class=property>margin</code>’, respectively. When used, the
|
|
‘<code class=css>:before</code>’ and ‘<code
|
|
class=css>:after</code>’ pseudo-elements <a href="#CSS21"
|
|
rel=biblioentry>[CSS21]<!--{{!CSS21}}--></a> get inserted between the
|
|
element's contents and the ‘<a href="#rest"><code
|
|
class=property>rest</code></a>’.
|
|
|
|
<p> The following diagram illustrates the equivalence between properties of
|
|
the visual and aural box models, applied to the selected <element>:
|
|
|
|
<p> <img alt="A graph depicting the aural 'box' model." id=aural-box
|
|
src=aural-box.png>
|
|
|
|
<h2 id=mixing-props><span class=secno>5. </span>Mixing properties</h2>
|
|
|
|
<h3 id=mixing-props-voice-volume><span class=secno>5.1. </span>The
|
|
‘<a href="#voice-volume"><code
|
|
class=property>voice-volume</code></a>’ property</h3>
|
|
|
|
<table class=propdef summary="name: syntax">
|
|
<tbody>
|
|
<tr>
|
|
<td>Name:
|
|
|
|
<td> <dfn id=voice-volume>voice-volume</dfn>
|
|
|
|
<tr>
|
|
<td> <em>Value:</em>
|
|
|
|
<td>silent | [[x-soft | soft | medium | loud | x-loud] ||
|
|
<decibel>]
|
|
|
|
<tr>
|
|
<td> <em>Initial:</em>
|
|
|
|
<td>medium
|
|
|
|
<tr>
|
|
<td> <em>Applies to:</em>
|
|
|
|
<td>all elements
|
|
|
|
<tr>
|
|
<td> <em>Inherited:</em>
|
|
|
|
<td>yes
|
|
|
|
<tr>
|
|
<td> <em>Percentages:</em>
|
|
|
|
<td>N/A
|
|
|
|
<tr>
|
|
<td> <em>Media:</em>
|
|
|
|
<td>speech
|
|
|
|
<tr>
|
|
<td> <em>Computed value:</em>
|
|
|
|
<td>a keyword value, and optionally also a decibel offset (if not zero)
|
|
</table>
|
|
|
|
<p>The ‘<a href="#voice-volume"><code
|
|
class=property>voice-volume</code></a>’ property allows authors to
|
|
control the amplitude of the audio waveform generated by the speech
|
|
synthesiser, and is also used to adjust the relative volume level of <a
|
|
href="#cue-props">audio cues</a> within the <a href="#aural-model">audio
|
|
"box" model</a>.
|
|
|
|
<p class=note> Note that the functionality provided by this property is
|
|
related to the <a
|
|
href="http://www.w3.org/TR/speech-synthesis11/#edef_prosody"><code>volume</code>
|
|
attribute of the <code>prosody</code> element</a> from the SSML markup
|
|
language <a href="#SSML" rel=biblioentry>[SSML]<!--{{!SSML}}--></a>.
|
|
|
|
<dl><!-- dt>
|
|
<strong>normal</strong>
|
|
</dt>
|
|
<dd>
|
|
<p> Corresponds to +0.0dB, which means that there is no modification of volume level. This
|
|
value overrides the inherited value.</p>
|
|
</dd -->
|
|
|
|
<dt> <strong>silent</strong>
|
|
|
|
<dd>
|
|
<p> Specifies that no sound is generated (the text is read "silently").
|
|
Corresponds to negative infinity in dB units.</p>
|
|
|
|
<p class=note> Note that there is a difference between an element whose
|
|
‘<a href="#voice-volume"><code
|
|
class=property>voice-volume</code></a>’ property has a value of
|
|
‘<code class=property>silent</code>’, and an element whose
|
|
‘<a href="#speak"><code class=property>speak</code></a>’
|
|
property has the value ‘<code class=property>none</code>’.
|
|
With the former, the selected element takes up the same time as if it
|
|
was spoken, including any pause before and after the element, but no
|
|
sound is generated (descendants can override the ‘<a
|
|
href="#voice-volume"><code class=property>voice-volume</code></a>’
|
|
value and may therefore generate audio output). With the latter, the
|
|
selected element is not rendered in the aural dimension and no time is
|
|
allocated for playback (descendants can override the ‘<a
|
|
href="#speak"><code class=property>speak</code></a>’ value and may
|
|
therefore generate audio output).</p>
|
|
|
|
<dt><strong>x-soft</strong>, <strong>soft</strong>,
|
|
<strong>medium</strong>, <strong>loud</strong>, <strong>x-loud</strong>
|
|
|
|
<dd>
|
|
<p> This sequence of keywords corresponds to monotonically non-decreasing
|
|
volume levels, mapped to implementation-dependent values (i.e. inferred
|
|
by the user-agent) that meet the user's requirements in terms of
|
|
perceived sound loudness . The keyword ‘<code
|
|
class=property>x-soft</code>’ maps to the user's <em>minimum
|
|
audible</em> volume level, ‘<code
|
|
class=property>x-loud</code>’ maps to the user's <em>maximum
|
|
tolerable</em> volume level, ‘<code
|
|
class=property>medium</code>’ maps to the user's
|
|
<em>preferred</em> volume level, ‘<code
|
|
class=property>soft</code>’ and ‘<code
|
|
class=property>loud</code>’ map to intermediary values.</p>
|
|
|
|
<dt> <strong><decibel></strong>
|
|
|
|
<dd>
|
|
<p>A <a href="#number-def">number</a> immediately followed by "dB"
|
|
(decibel unit). This represents a change (positive or negative) relative
|
|
to the given keyword value (see enumeration above), or to the default
|
|
value for the root element, or otherwise to the inherited volume level
|
|
(which may itself be a combination of a keyword value and of a decibel
|
|
offset, in which case the decibel values are combined additively). When
|
|
the inherited volume level is ‘<code
|
|
class=property>silent</code>’, this ‘<a
|
|
href="#voice-volume"><code class=property>voice-volume</code></a>’
|
|
resolves to ‘<code class=property>silent</code>’ too,
|
|
regardless of the specified <decibel> value. Decibels represent
|
|
the ratio of the squares of the new signal amplitude (a1) and the
|
|
current amplitude (a0), as per the following logarithmic equation:
|
|
volume(dB) = 20 log10 (a1 / a0)</p>
|
|
|
|
<p class=note> Note that -6.0dB is approximately half the amplitude of
|
|
the audio signal, and +6.0dB is approximately twice the amplitude.</p>
|
|
</dl>
|
|
|
|
<p class=note>Note that the actual perceived volume levels depend on
|
|
various factors, such as the listening environment and personal user
|
|
preferences. The effective volume variation between ‘<code
|
|
class=property>x-soft</code>’ and ‘<code
|
|
class=property>x-loud</code>’ represents the dynamic range (in terms
|
|
of loudness) of the speech output. Typically, this range would be
|
|
compressed in a noisy context, i.e. the perceived loudness corresponding
|
|
to ‘<code class=property>x-soft</code>’ would effectively be
|
|
closer to ‘<code class=property>x-loud</code>’ than it would
|
|
be in a quiet environment. There may also be situations where both
|
|
‘<code class=property>x-soft</code>’ and ‘<code
|
|
class=property>x-loud</code>’ would map to low volume levels, such
|
|
as in listening environments requiring discretion (e.g. library,
|
|
night-reading).
|
|
|
|
<h3 id=mixing-props-voice-balance><span class=secno>5.2. </span>The
|
|
‘<a href="#voice-balance"><code
|
|
class=property>voice-balance</code></a>’ property</h3>
|
|
|
|
<table class=propdef summary="name: syntax">
|
|
<tbody>
|
|
<tr>
|
|
<td>Name:
|
|
|
|
<td> <dfn id=voice-balance>voice-balance</dfn>
|
|
|
|
<tr>
|
|
<td> <em>Value:</em>
|
|
|
|
<td><number> | left | center | right | leftwards | rightwards
|
|
|
|
<tr>
|
|
<td> <em>Initial:</em>
|
|
|
|
<td>center
|
|
|
|
<tr>
|
|
<td> <em>Applies to:</em>
|
|
|
|
<td>all elements
|
|
|
|
<tr>
|
|
<td> <em>Inherited:</em>
|
|
|
|
<td>yes
|
|
|
|
<tr>
|
|
<td> <em>Percentages:</em>
|
|
|
|
<td>N/A
|
|
|
|
<tr>
|
|
<td> <em>Media:</em>
|
|
|
|
<td>speech
|
|
|
|
<tr>
|
|
<td> <em>Computed value:</em>
|
|
|
|
<td>the specified value resolved to a <number> between
|
|
‘<code class=css>-100</code>’ and ‘<code
|
|
class=css>100</code>’ (inclusive)
|
|
</table>
|
|
|
|
<p> The ‘<a href="#voice-balance"><code
|
|
class=property>voice-balance</code></a>’ property controls the
|
|
spatial distribution of audio output across a lateral sound stage: one
|
|
extremity is on the left, the other extremity is on the right hand side,
|
|
relative to the listener's position. Authors can specify intermediary
|
|
steps between left and right extremities, to represent the audio
|
|
separation along the resulting left-right axis.
|
|
|
|
<p class=note> Note that the functionality provided by this property has no
|
|
match in the SSML markup language <a href="#SSML"
|
|
rel=biblioentry>[SSML]<!--{{!SSML}}--></a>.
|
|
|
|
<dl>
|
|
<dt> <strong><number></strong>
|
|
|
|
<dd>
|
|
<p>A <a href="#number-def">number</a> between ‘<code
|
|
class=css>-100</code>’ and ‘<code
|
|
class=css>100</code>’ (inclusive). Values smaller than
|
|
‘<code class=css>-100</code>’ are clamped to ‘<code
|
|
class=css>-100</code>’. Values greater than ‘<code
|
|
class=css>100</code>’ are clamped to ‘<code
|
|
class=css>100</code>’. The value ‘<code
|
|
class=css>-100</code>’ represents the left side, and the value
|
|
‘<code class=css>100</code>’ represents the right side. The
|
|
value ‘<code class=css>0</code>’ represents the center point
|
|
whereby there is no discernible audio separation between left and right
|
|
sides (in a stereo sound system, this corresponds to equal distribution
|
|
of audio signals between left and right speakers).</p>
|
|
|
|
<dt> <strong>left</strong>
|
|
|
|
<dd>
|
|
<p>Same as ‘<code class=css>-100</code>’.</p>
|
|
|
|
<dt> <strong>center</strong>
|
|
|
|
<dd>
|
|
<p>Same as ‘<code class=css>0</code>’.</p>
|
|
|
|
<dt> <strong>right</strong>
|
|
|
|
<dd>
|
|
<p>Same as ‘<code class=css>100</code>’.</p>
|
|
|
|
<dt> <strong>leftwards</strong>
|
|
|
|
<dd>
|
|
<p>Moves the sound to the left, by subtracting 20 from the inherited
|
|
‘<a href="#voice-balance"><code
|
|
class=property>voice-balance</code></a>’ value, and by clamping
|
|
the resulting number to ‘<code class=css>-100</code>’.</p>
|
|
|
|
<dt> <strong>rightwards</strong>
|
|
|
|
<dd>
|
|
<p>Moves the sound to the right, by adding 20 to the inherited ‘<a
|
|
href="#voice-balance"><code
|
|
class=property>voice-balance</code></a>’ value, and by clamping
|
|
the resulting number to ‘<code class=css>100</code>’.</p>
|
|
</dl>
|
|
|
|
<p> User agents may be connected to different kinds of sound systems,
|
|
featuring varying audio mixing capabilities. The expected behavior for
|
|
mono, stereo, and surround sound systems is defined as follows:
|
|
|
|
<ul>
|
|
<li> When user-agents produce audio via a mono-aural sound system (i.e.
|
|
single-speaker setup), the ‘<a href="#voice-balance"><code
|
|
class=property>voice-balance</code></a>’ property has no effect.
|
|
|
|
<li> When user-agents produce audio through a stereo sound system (e.g.
|
|
two speakers, a pair of headphones), the left-right distribution of audio
|
|
signals can precisely match the authored values for the ‘<a
|
|
href="#voice-balance"><code
|
|
class=property>voice-balance</code></a>’ property.
|
|
|
|
<li> When user-agents are capable of mixing audio signals through more
|
|
than 2 channels (e.g. 5-speakers surround sound system, including a
|
|
dedicated center channel), the physical distribution of audio signals
|
|
resulting from the application of the ‘<a
|
|
href="#voice-balance"><code
|
|
class=property>voice-balance</code></a>’ property should be
|
|
performed so that the listener perceives sound as if it was coming from a
|
|
basic stereo layout. For example, the center channel as well as the
|
|
left/right speakers may be used altogether in order to emulate the
|
|
behavior of the ‘<code class=property>center</code>’ value.
|
|
</ul>
|
|
|
|
<p> Future revisions of the CSS Speech module may include support for
|
|
three-dimensional audio, which would effectively enable authors to specify
|
|
"azimuth" and "elevation" values. In the future, content authored using
|
|
the current specification may therefore be consumed by user-agents which
|
|
are compliant with the version of CSS Speech that supports
|
|
three-dimensional audio. In order to prepare for this possibility, the
|
|
values enabled by the current ‘<a href="#voice-balance"><code
|
|
class=property>voice-balance</code></a>’ property are designed to
|
|
remain compatible with "azimuth" angles. More precisely, the mapping
|
|
between the current left-right audio axis (lateral sound stage) and the
|
|
envisioned 360 degrees plane around the listener's position is defined as
|
|
follows:
|
|
|
|
<ul>
|
|
<li>The value ‘<code class=css>0</code>’ maps to zero degrees
|
|
(‘<code class=property>center</code>’). This is in "front" of
|
|
the listener, not from "behind".
|
|
|
|
<li>The value ‘<code class=css>-100</code>’ maps to -40
|
|
degrees (‘<code class=property>left</code>’). Negative angles
|
|
are in the counter-clockwise direction (the audio stage is seen from the
|
|
top).
|
|
|
|
<li>The value ‘<code class=css>100</code>’ maps to 40 degrees
|
|
(‘<code class=property>right</code>’). Positive angles are in
|
|
the clockwise direction (the audio stage is seen from the top).
|
|
|
|
<li>Intermediary values on the scale from ‘<code
|
|
class=css>-100</code>’ to ‘<code class=css>100</code>’
|
|
map to the angles between -40 and 40 degrees in a numerically
|
|
linearly-proportional manner. For example, ‘<code
|
|
class=css>-50</code>’ maps to -20 degrees.
|
|
</ul>
|
|
|
|
<p class=note> Note that sound systems may be configured by users in such a
|
|
way that it would interfere with the left-right audio distribution
|
|
specified by document authors. Typically, the various "surround" modes
|
|
available in modern sound systems (including systems based on basic stereo
|
|
speakers) tend to greatly alter the perceived spatial arrangement of audio
|
|
signals. The illusion of a three-dimensional sound stage is often achieved
|
|
using a combination of phase shifting, digital delay, volume control
|
|
(channel mixing), and other techniques. Some users may even configure
|
|
their system to "downgrade" any rendered sound to a single mono channel,
|
|
in which case the effect of the ‘<a href="#voice-balance"><code
|
|
class=property>voice-balance</code></a>’ property would obviously
|
|
not be perceivable at all. The rendering fidelity of authored content is
|
|
therefore dependent on such user customizations, and the ‘<a
|
|
href="#voice-balance"><code class=property>voice-balance</code></a>’
|
|
property merely specifies the desired end-result.
|
|
|
|
<p class=note> Note that many speech synthesizers only generate mono sound,
|
|
and therefore do not intrinsically support the ‘<a
|
|
href="#voice-balance"><code class=property>voice-balance</code></a>’
|
|
property. The sound distribution along the left-right axis consequently
|
|
occurs at post-synthesis stage (when the speech-enabled user-agent mixes
|
|
the various audio sources authored within the document)
|
|
|
|
<h2 id=speaking-props><span class=secno>6. </span>Speaking properties</h2>
|
|
|
|
<h3 id=speaking-props-speak><span class=secno>6.1. </span>The ‘<a
|
|
href="#speak"><code class=property>speak</code></a>’ property</h3>
|
|
|
|
<table class=propdef summary="name: syntax">
|
|
<tbody>
|
|
<tr>
|
|
<td>Name:
|
|
|
|
<td> <dfn id=speak>speak</dfn>
|
|
|
|
<tr>
|
|
<td> <em>Value:</em>
|
|
|
|
<td>auto | none | normal
|
|
|
|
<tr>
|
|
<td> <em>Initial:</em>
|
|
|
|
<td>auto
|
|
|
|
<tr>
|
|
<td> <em>Applies to:</em>
|
|
|
|
<td>all elements
|
|
|
|
<tr>
|
|
<td> <em>Inherited:</em>
|
|
|
|
<td>yes
|
|
|
|
<tr>
|
|
<td> <em>Percentages:</em>
|
|
|
|
<td>N/A
|
|
|
|
<tr>
|
|
<td> <em>Media:</em>
|
|
|
|
<td>speech
|
|
|
|
<tr>
|
|
<td> <em>Computed value:</em>
|
|
|
|
<td>specified value
|
|
</table>
|
|
|
|
<p>The ‘<a href="#speak"><code class=property>speak</code></a>’
|
|
property determines whether or not to render text aurally.
|
|
|
|
<p class=note> Note that the functionality provided by this property has no
|
|
match in the SSML markup language <a href="#SSML"
|
|
rel=biblioentry>[SSML]<!--{{!SSML}}--></a>.
|
|
|
|
<dl>
|
|
<dt> <strong>auto</strong>
|
|
|
|
<dd>
|
|
<p>Resolves to a computed value of ‘<code
|
|
class=property>none</code>’ when <a
|
|
href="#display-def">‘<code
|
|
class=property>display</code>’</a> is ‘<code
|
|
class=property>none</code>’, otherwise resolves to a computed
|
|
value of ‘<code class=property>auto</code>’ which yields a
|
|
used value of ‘<code class=property>normal</code>’.</p>
|
|
|
|
<p class=note> Note that the ‘<code
|
|
class=property>none</code>’ value of the <a
|
|
href="#display-def">‘<code
|
|
class=property>display</code>’</a> property cannot be overridden
|
|
by descendants of the selected element, but the ‘<code
|
|
class=property>auto</code>’ value of ‘<a href="#speak"><code
|
|
class=property>speak</code></a>’ can however be overridden using
|
|
either of ‘<code class=property>none</code>’ or ‘<code
|
|
class=property>normal</code>’.</p>
|
|
|
|
<dt> <strong>none</strong>
|
|
|
|
<dd>
|
|
<p> This value causes an element (including pauses, cues, rests and
|
|
actual content) to not be rendered (i.e., the element has no effect in
|
|
the aural dimension).</p>
|
|
|
|
<p class=note> Note that any of the descendants of the affected element
|
|
are allowed to override this value, so descendants can actually take
|
|
part in the aural rendering despite using ‘<code
|
|
class=property>none</code>’ at this level. However, the pauses,
|
|
cues, and rests of the ancestor element remain "deactivated" in the
|
|
aural dimension, and therefore do not contribute to the <a
|
|
href="#collapsing">collapsing of pauses</a> or additive behavior of
|
|
adjoining rests.</p>
|
|
|
|
<dt> <strong>normal</strong>
|
|
|
|
<dd>
|
|
<p> The element is rendered aurally (regardless of its <a
|
|
href="#display-def">‘<code
|
|
class=property>display</code>’</a> value and the <a
|
|
href="#display-def">‘<code
|
|
class=property>display</code>’</a> and ‘<a
|
|
href="#speak"><code class=property>speak</code></a>’ values of its
|
|
ancestors).</p>
|
|
|
|
<p class=note> Note that using this value can result in the element being
|
|
rendered in the aural dimension even though it would not be rendered on
|
|
the visual canvas.</p>
|
|
</dl>
|
|
|
|
<h3 id=speaking-props-speak-as><span class=secno>6.2. </span>The ‘<a
|
|
href="#speak-as"><code class=property>speak-as</code></a>’ property</h3>
|
|
|
|
<table class=propdef summary="name: syntax">
|
|
<tbody>
|
|
<tr>
|
|
<td>Name:
|
|
|
|
<td> <dfn id=speak-as>speak-as</dfn>
|
|
|
|
<tr>
|
|
<td> <em>Value:</em>
|
|
|
|
<td>normal | spell-out || digits || [ literal-punctuation |
|
|
no-punctuation ]
|
|
|
|
<tr>
|
|
<td> <em>Initial:</em>
|
|
|
|
<td>normal
|
|
|
|
<tr>
|
|
<td> <em>Applies to:</em>
|
|
|
|
<td>all elements
|
|
|
|
<tr>
|
|
<td> <em>Inherited:</em>
|
|
|
|
<td>yes
|
|
|
|
<tr>
|
|
<td> <em>Percentages:</em>
|
|
|
|
<td>N/A
|
|
|
|
<tr>
|
|
<td> <em>Media:</em>
|
|
|
|
<td>speech
|
|
|
|
<tr>
|
|
<td> <em>Computed value:</em>
|
|
|
|
<td>specified value
|
|
</table>
|
|
|
|
<p>The ‘<a href="#speak-as"><code
|
|
class=property>speak-as</code></a>’ property determines in what
|
|
manner text gets rendered aurally, based upon a basic predefined list of
|
|
possible values.
|
|
|
|
<p class=note> Note that the functionality provided by this property is
|
|
related to the <a
|
|
href="http://www.w3.org/TR/speech-synthesis11/#edef_say-as"><code>say-as</code>
|
|
element</a> from the SSML markup language <a href="#SSML"
|
|
rel=biblioentry>[SSML]<!--{{!SSML}}--></a>, whose values are described in
|
|
the <a href="#SSML-SAYAS"
|
|
rel=biblioentry>[SSML-SAYAS]<!--{{SSML-SAYAS}}--></a> W3C Note.
|
|
|
|
<dl>
|
|
<dt> <strong>normal</strong>
|
|
|
|
<dd>
|
|
<p>Uses language-dependent pronunciation rules for rendering the
|
|
element's content. For example, punctuation is not spoken as-is, but
|
|
instead rendered naturally as appropriate pauses.</p>
|
|
|
|
<dt> <strong>spell-out</strong>
|
|
|
|
<dd>
|
|
<p>Spells the text one letter at a time (useful for acronyms and
|
|
abbreviations). In languages where accented characters are rare, it is
|
|
permitted to drop accents in favor of alternative unaccented spellings.
|
|
As as example, in English, the word "rôle" can also be written as
|
|
"role". A conforming implementation would thus be able to spell-out
|
|
"rôle" as "R O L E".</p>
|
|
|
|
<dt> <strong>digits</strong>
|
|
|
|
<dd>
|
|
<p>Speak numbers one digit at a time, for instance, "twelve" would be
|
|
spoken as "one two", and "31" as "three one".</p>
|
|
|
|
<p class=note>Speech synthesizers are knowledgeable about what is and
|
|
what is not a number. The ‘<a href="#speak-as"><code
|
|
class=property>speak-as</code></a>’ property enables authors to
|
|
control how the user-agent renders numbers, and may be implemented as a
|
|
preprocessing step before passing the text to the actual speech
|
|
synthesizer.</p>
|
|
|
|
<dt> <strong>literal-punctuation</strong>
|
|
|
|
<dd>
|
|
<p> Punctuation such as semicolons, braces, and so on is named aloud
|
|
(i.e. spoken literally) rather than rendered naturally as appropriate
|
|
pauses.</p>
|
|
|
|
<dt> <strong>no-punctuation</strong>
|
|
|
|
<dd>
|
|
<p>Punctuation is not rendered: neither spoken nor rendered as pauses.</p>
|
|
</dl>
|
|
|
|
<h2 id=pause-props><span class=secno>7. </span>Pause properties</h2>
|
|
|
|
<h3 id=pause-props-pause-before-after><span class=secno>7.1. </span>The
|
|
‘<a href="#pause-before"><code
|
|
class=property>pause-before</code></a>’ and ‘<a
|
|
href="#pause-after"><code class=property>pause-after</code></a>’
|
|
properties</h3>
|
|
|
|
<table class=propdef summary="name: syntax">
|
|
<tbody>
|
|
<tr>
|
|
<td>Name:
|
|
|
|
<td> <dfn id=pause-before>pause-before</dfn>
|
|
|
|
<tr>
|
|
<td> <em>Value:</em>
|
|
|
|
<td><time> | none | x-weak | weak | medium | strong | x-strong
|
|
|
|
<tr>
|
|
<td> <em>Initial:</em>
|
|
|
|
<td>none
|
|
|
|
<tr>
|
|
<td> <em>Applies to:</em>
|
|
|
|
<td>all elements
|
|
|
|
<tr>
|
|
<td> <em>Inherited:</em>
|
|
|
|
<td>no
|
|
|
|
<tr>
|
|
<td> <em>Percentages:</em>
|
|
|
|
<td>N/A
|
|
|
|
<tr>
|
|
<td> <em>Media:</em>
|
|
|
|
<td>speech
|
|
|
|
<tr>
|
|
<td> <em>Computed value:</em>
|
|
|
|
<td>specified value
|
|
</table>
|
|
|
|
<p>
|
|
|
|
<table class=propdef summary="name: syntax">
|
|
<tbody>
|
|
<tr>
|
|
<td>Name:
|
|
|
|
<td> <dfn id=pause-after>pause-after</dfn>
|
|
|
|
<tr>
|
|
<td> <em>Value:</em>
|
|
|
|
<td><time> | none | x-weak | weak | medium | strong | x-strong
|
|
|
|
<tr>
|
|
<td> <em>Initial:</em>
|
|
|
|
<td>none
|
|
|
|
<tr>
|
|
<td> <em>Applies to:</em>
|
|
|
|
<td>all elements
|
|
|
|
<tr>
|
|
<td> <em>Inherited:</em>
|
|
|
|
<td>no
|
|
|
|
<tr>
|
|
<td> <em>Percentages:</em>
|
|
|
|
<td>N/A
|
|
|
|
<tr>
|
|
<td> <em>Media:</em>
|
|
|
|
<td>speech
|
|
|
|
<tr>
|
|
<td> <em>Computed value:</em>
|
|
|
|
<td>specified value
|
|
</table>
|
|
|
|
<p>The ‘<a href="#pause-before"><code
|
|
class=property>pause-before</code></a>’ and ‘<a
|
|
href="#pause-after"><code class=property>pause-after</code></a>’
|
|
properties specify a prosodic boundary (silence with a specific duration)
|
|
that occurs before (or after) the speech synthesis rendition of the
|
|
selected element, or if any ‘<a href="#cue-before"><code
|
|
class=property>cue-before</code></a>’ (or ‘<a
|
|
href="#cue-after"><code class=property>cue-after</code></a>’) is
|
|
specified, before (or after) the cue within the <a
|
|
href="#aural-model">audio "box" model</a>.
|
|
|
|
<p class=note> Note that the functionality provided by this property is
|
|
related to the <a
|
|
href="http://www.w3.org/TR/speech-synthesis11/#edef_break"><code>break</code>
|
|
element</a> from the SSML markup language <a href="#SSML"
|
|
rel=biblioentry>[SSML]<!--{{!SSML}}--></a>.
|
|
|
|
<dl>
|
|
<dt> <strong><time></strong>
|
|
|
|
<dd>
|
|
<p>Expresses the pause in absolute <a href="#time-def">time</a> units
|
|
(seconds and milliseconds, e.g. "+3s", "250ms"). Only non-negative
|
|
values are allowed.</p>
|
|
|
|
<dt> <strong>none</strong>
|
|
|
|
<dd>
|
|
<p> Equivalent to 0ms (no prosodic break is produced by the speech
|
|
processor).</p>
|
|
|
|
<dt> <strong>x-weak</strong>, <strong>weak</strong>,
|
|
<strong>medium</strong>, <strong>strong</strong>, and
|
|
<strong>x-strong</strong>
|
|
|
|
<dd>
|
|
<p> Expresses the pause by the strength of the prosodic break in speech
|
|
output. The exact time is implementation-dependent. The values indicate
|
|
monotonically non-decreasing (conceptually increasing) break strength
|
|
between elements.</p>
|
|
</dl>
|
|
|
|
<p class=note> Note that stronger content boundaries are typically
|
|
accompanied by pauses. For example, the breaks between paragraphs are
|
|
typically much more substantial than the breaks between words within a
|
|
sentence.
|
|
|
|
<div class=example>
|
|
<p> This example illustrates how the default strengths of prosodic breaks
|
|
for specific elements (which are defined by the user-agent stylesheet)
|
|
can be overridden by authored styles.</p>
|
|
|
|
<pre>
|
|
p { pause: none } /* pause-before: none; pause-after: none */</pre>
|
|
</div>
|
|
|
|
<h3 id=pause-props-pause><span class=secno>7.2. </span>The ‘<a
|
|
href="#pause"><code class=property>pause</code></a>’ shorthand
|
|
property</h3>
|
|
|
|
<table class=propdef summary="name: syntax">
|
|
<tbody>
|
|
<tr>
|
|
<td>Name:
|
|
|
|
<td> <dfn id=pause>pause</dfn>
|
|
|
|
<tr>
|
|
<td> <em>Value:</em>
|
|
|
|
<td><‘<a href="#pause-before"><code
|
|
class=property>pause-before</code></a>’> <‘<a
|
|
href="#pause-after"><code
|
|
class=property>pause-after</code></a>’>?
|
|
|
|
<tr>
|
|
<td> <em>Initial:</em>
|
|
|
|
<td>N/A (see individual properties)
|
|
|
|
<tr>
|
|
<td> <em>Applies to:</em>
|
|
|
|
<td>all elements
|
|
|
|
<tr>
|
|
<td> <em>Inherited:</em>
|
|
|
|
<td>no
|
|
|
|
<tr>
|
|
<td> <em>Percentages:</em>
|
|
|
|
<td>N/A
|
|
|
|
<tr>
|
|
<td> <em>Media:</em>
|
|
|
|
<td>speech
|
|
|
|
<tr>
|
|
<td> <em>Computed value:</em>
|
|
|
|
<td>N/A (see individual properties)
|
|
</table>
|
|
|
|
<p>The ‘<a href="#pause"><code class=property>pause</code></a>’
|
|
property is a shorthand property for ‘<a href="#pause-before"><code
|
|
class=property>pause-before</code></a>’ and ‘<a
|
|
href="#pause-after"><code class=property>pause-after</code></a>’. If
|
|
two values are given, the first value is ‘<a
|
|
href="#pause-before"><code class=property>pause-before</code></a>’
|
|
and the second is ‘<a href="#pause-after"><code
|
|
class=property>pause-after</code></a>’. If only one value is given,
|
|
it applies to both properties.
|
|
|
|
<div class=example>
|
|
<p> Examples of property values:</p>
|
|
|
|
<pre>
|
|
h1 { pause: 20ms; } /* pause-before: 20ms; pause-after: 20ms */
|
|
h2 { pause: 30ms 40ms; } /* pause-before: 30ms; pause-after: 40ms */
|
|
h3 { pause-after: 10ms; } /* pause-before: <i>unspecified</i>; pause-after: 10ms */</pre>
|
|
</div>
|
|
|
|
<h3 id=collapsing><span class=secno>7.3. </span>Collapsing pauses</h3>
|
|
|
|
<p> The pause defines the minimum distance of the aural "box" to the aural
|
|
"boxes" before and after it. Adjoining pauses are merged by selecting the
|
|
strongest named break and the longest absolute time interval. For example,
|
|
"strong" is selected when comparing "strong" and "weak", "1s" is selected
|
|
when comparing "1s" and "250ms", and "strong" and "250ms" take effect
|
|
additively when comparing "strong" and "250ms".
|
|
|
|
<p>The following pauses are adjoining:
|
|
|
|
<ol>
|
|
<li>The ‘<a href="#pause-after"><code
|
|
class=property>pause-after</code></a>’ of an aural "box" and the
|
|
‘<a href="#pause-after"><code
|
|
class=property>pause-after</code></a>’ of its last child, provided
|
|
the former has no ‘<a href="#rest-after"><code
|
|
class=property>rest-after</code></a>’ and no ‘<a
|
|
href="#cue-after"><code class=property>cue-after</code></a>’.
|
|
|
|
<li>The ‘<a href="#pause-before"><code
|
|
class=property>pause-before</code></a>’ of an aural "box" and the
|
|
‘<a href="#pause-before"><code
|
|
class=property>pause-before</code></a>’ of its first child,
|
|
provided the former has no ‘<a href="#rest-before"><code
|
|
class=property>rest-before</code></a>’ and no ‘<a
|
|
href="#cue-before"><code class=property>cue-before</code></a>’.
|
|
|
|
<li>The ‘<a href="#pause-after"><code
|
|
class=property>pause-after</code></a>’ of an aural "box" and the
|
|
‘<a href="#pause-before"><code
|
|
class=property>pause-before</code></a>’ of its next sibling.
|
|
|
|
<li>The ‘<a href="#pause-before"><code
|
|
class=property>pause-before</code></a>’ and ‘<a
|
|
href="#pause-after"><code class=property>pause-after</code></a>’ of
|
|
an aural "box", if the the "box" has a ‘<a
|
|
href="#voice-duration"><code
|
|
class=property>voice-duration</code></a>’ of "0ms" and no ‘<a
|
|
href="#rest-before"><code class=property>rest-before</code></a>’ or
|
|
‘<a href="#rest-after"><code
|
|
class=property>rest-after</code></a>’ and no ‘<a
|
|
href="#cue-before"><code class=property>cue-before</code></a>’ or
|
|
‘<a href="#cue-after"><code
|
|
class=property>cue-after</code></a>’, or if the the "box" has no
|
|
rendered content at all (see ‘<a href="#speak"><code
|
|
class=property>speak</code></a>’).
|
|
</ol>
|
|
|
|
<p>A collapsed pause is considered adjoining to another pause if any of its
|
|
component pauses is adjoining to that pause.
|
|
|
|
<p class=note> Note that ‘<a href="#pause"><code
|
|
class=property>pause</code></a>’ has been moved from between the
|
|
element's contents and any ‘<a href="#cue"><code
|
|
class=property>cue</code></a>’ to outside the ‘<a
|
|
href="#cue"><code class=property>cue</code></a>’. This is not
|
|
backwards compatible with the informative CSS2.1 Aural appendix <a
|
|
href="#CSS21" rel=biblioentry>[CSS21]<!--{{!CSS21}}--></a>.
|
|
|
|
<h2 id=rest-props><span class=secno>8. </span>Rest properties</h2>
|
|
|
|
<h3 id=rest-props-rest-before-after><span class=secno>8.1. </span>The
|
|
‘<a href="#rest-before"><code
|
|
class=property>rest-before</code></a>’ and ‘<a
|
|
href="#rest-after"><code class=property>rest-after</code></a>’
|
|
properties</h3>
|
|
|
|
<table class=propdef summary="name: syntax">
|
|
<tbody>
|
|
<tr>
|
|
<td>Name:
|
|
|
|
<td> <dfn id=rest-before>rest-before</dfn>
|
|
|
|
<tr>
|
|
<td> <em>Value:</em>
|
|
|
|
<td><time> | none | x-weak | weak | medium | strong | x-strong
|
|
|
|
<tr>
|
|
<td> <em>Initial:</em>
|
|
|
|
<td>none
|
|
|
|
<tr>
|
|
<td> <em>Applies to:</em>
|
|
|
|
<td>all elements
|
|
|
|
<tr>
|
|
<td> <em>Inherited:</em>
|
|
|
|
<td>no
|
|
|
|
<tr>
|
|
<td> <em>Percentages:</em>
|
|
|
|
<td>N/A
|
|
|
|
<tr>
|
|
<td> <em>Media:</em>
|
|
|
|
<td>speech
|
|
|
|
<tr>
|
|
<td> <em>Computed value:</em>
|
|
|
|
<td>specified value
|
|
</table>
|
|
|
|
<p>
|
|
|
|
<table class=propdef summary="name: syntax">
|
|
<tbody>
|
|
<tr>
|
|
<td>Name:
|
|
|
|
<td> <dfn id=rest-after>rest-after</dfn>
|
|
|
|
<tr>
|
|
<td> <em>Value:</em>
|
|
|
|
<td><time> | none | x-weak | weak | medium | strong | x-strong
|
|
|
|
<tr>
|
|
<td> <em>Initial:</em>
|
|
|
|
<td>none
|
|
|
|
<tr>
|
|
<td> <em>Applies to:</em>
|
|
|
|
<td>all elements
|
|
|
|
<tr>
|
|
<td> <em>Inherited:</em>
|
|
|
|
<td>no
|
|
|
|
<tr>
|
|
<td> <em>Percentages:</em>
|
|
|
|
<td>N/A
|
|
|
|
<tr>
|
|
<td> <em>Media:</em>
|
|
|
|
<td>speech
|
|
|
|
<tr>
|
|
<td> <em>Computed value:</em>
|
|
|
|
<td>specified value
|
|
</table>
|
|
|
|
<p>The ‘<a href="#rest-before"><code
|
|
class=property>rest-before</code></a>’ and ‘<a
|
|
href="#rest-after"><code class=property>rest-after</code></a>’
|
|
properties specify a prosodic boundary (silence with a specific duration)
|
|
that occurs before (or after) the speech synthesis rendition of an element
|
|
within the <a href="#aural-model">audio "box" model</a>.
|
|
|
|
<p class=note> Note that the functionality provided by this property is
|
|
related to the <a
|
|
href="http://www.w3.org/TR/speech-synthesis11/#edef_break"><code>break</code>
|
|
element</a> from the SSML markup language <a href="#SSML"
|
|
rel=biblioentry>[SSML]<!--{{!SSML}}--></a>.
|
|
|
|
<dl>
|
|
<dt> <strong><time></strong>
|
|
|
|
<dd>
|
|
<p>Expresses the rest in absolute <a href="#time-def">time</a> units
|
|
(seconds and milliseconds, e.g. "+3s", "250ms"). Only non-negative
|
|
values are allowed.</p>
|
|
|
|
<dt> <strong>none</strong>
|
|
|
|
<dd>
|
|
<p> Equivalent to 0ms (no prosodic break is produced by the speech
|
|
processor).</p>
|
|
|
|
<dt> <strong>x-weak</strong>, <strong>weak</strong>,
|
|
<strong>medium</strong>, <strong>strong</strong>, and
|
|
<strong>x-strong</strong>
|
|
|
|
<dd>
|
|
<p> Expresses the rest by the strength of the prosodic break in speech
|
|
output. The exact time is implementation-dependent. The values indicate
|
|
monotonically non-decreasing (conceptually increasing) break strength
|
|
between elements.</p>
|
|
</dl>
|
|
|
|
<p>As opposed to <a href="#pause-props">pause properties</a>, the rest is
|
|
inserted between the element's content and any ‘<a
|
|
href="#cue-before"><code class=property>cue-before</code></a>’ or
|
|
‘<a href="#cue-after"><code
|
|
class=property>cue-after</code></a>’ content. Adjoining rests are
|
|
treated additively, and do not collapse.
|
|
|
|
<h3 id=rest-props-rest><span class=secno>8.2. </span>The ‘<a
|
|
href="#rest"><code class=property>rest</code></a>’ shorthand
|
|
property</h3>
|
|
|
|
<table class=propdef summary="name: syntax">
|
|
<tbody>
|
|
<tr>
|
|
<td>Name:
|
|
|
|
<td> <dfn id=rest>rest</dfn>
|
|
|
|
<tr>
|
|
<td> <em>Value:</em>
|
|
|
|
<td><‘<a href="#rest-before"><code
|
|
class=property>rest-before</code></a>’> <‘<a
|
|
href="#rest-after"><code
|
|
class=property>rest-after</code></a>’>?
|
|
|
|
<tr>
|
|
<td> <em>Initial:</em>
|
|
|
|
<td>N/A (see individual properties)
|
|
|
|
<tr>
|
|
<td> <em>Applies to:</em>
|
|
|
|
<td>all elements
|
|
|
|
<tr>
|
|
<td> <em>Inherited:</em>
|
|
|
|
<td>no
|
|
|
|
<tr>
|
|
<td> <em>Percentages:</em>
|
|
|
|
<td>N/A
|
|
|
|
<tr>
|
|
<td> <em>Media:</em>
|
|
|
|
<td>speech
|
|
|
|
<tr>
|
|
<td> <em>Computed value:</em>
|
|
|
|
<td>N/A (see individual properties)
|
|
</table>
|
|
|
|
<p>The ‘<a href="#rest"><code class=property>rest</code></a>’
|
|
property is a shorthand for ‘<a href="#rest-before"><code
|
|
class=property>rest-before</code></a>’ and ‘<a
|
|
href="#rest-after"><code class=property>rest-after</code></a>’. If
|
|
two values are given, the first value is ‘<a
|
|
href="#rest-before"><code class=property>rest-before</code></a>’ and
|
|
the second is ‘<a href="#rest-after"><code
|
|
class=property>rest-after</code></a>’. If only one value is given,
|
|
it applies to both properties.
|
|
|
|
<h2 id=cue-props><span class=secno>9. </span>Cue properties</h2>
|
|
|
|
<h3 id=cue-props-cue-before-after><span class=secno>9.1. </span>The
|
|
‘<a href="#cue-before"><code
|
|
class=property>cue-before</code></a>’ and ‘<a
|
|
href="#cue-after"><code class=property>cue-after</code></a>’
|
|
properties</h3>
|
|
|
|
<table class=propdef summary="name: syntax">
|
|
<tbody>
|
|
<tr>
|
|
<td>Name:
|
|
|
|
<td> <dfn id=cue-before>cue-before</dfn>
|
|
|
|
<tr>
|
|
<td> <em>Value:</em>
|
|
|
|
<td><uri> <decibel>? | none
|
|
|
|
<tr>
|
|
<td> <em>Initial:</em>
|
|
|
|
<td>none
|
|
|
|
<tr>
|
|
<td> <em>Applies to:</em>
|
|
|
|
<td>all elements
|
|
|
|
<tr>
|
|
<td> <em>Inherited:</em>
|
|
|
|
<td>no
|
|
|
|
<tr>
|
|
<td> <em>Percentages:</em>
|
|
|
|
<td>N/A
|
|
|
|
<tr>
|
|
<td> <em>Media:</em>
|
|
|
|
<td>speech
|
|
|
|
<tr>
|
|
<td> <em>Computed value:</em>
|
|
|
|
<td>specified value
|
|
</table>
|
|
|
|
<p>
|
|
|
|
<table class=propdef summary="name: syntax">
|
|
<tbody>
|
|
<tr>
|
|
<td>Name:
|
|
|
|
<td> <dfn id=cue-after>cue-after</dfn>
|
|
|
|
<tr>
|
|
<td> <em>Value:</em>
|
|
|
|
<td><uri> <decibel>? | none
|
|
|
|
<tr>
|
|
<td> <em>Initial:</em>
|
|
|
|
<td>none
|
|
|
|
<tr>
|
|
<td> <em>Applies to:</em>
|
|
|
|
<td>all elements
|
|
|
|
<tr>
|
|
<td> <em>Inherited:</em>
|
|
|
|
<td>no
|
|
|
|
<tr>
|
|
<td> <em>Percentages:</em>
|
|
|
|
<td>N/A
|
|
|
|
<tr>
|
|
<td> <em>Media:</em>
|
|
|
|
<td>speech
|
|
|
|
<tr>
|
|
<td> <em>Computed value:</em>
|
|
|
|
<td>specified value
|
|
</table>
|
|
|
|
<p>The ‘<a href="#cue-before"><code
|
|
class=property>cue-before</code></a>’ and ‘<a
|
|
href="#cue-after"><code class=property>cue-after</code></a>’
|
|
properties specify auditory icons (i.e. pre-recorded / pre-generated sound
|
|
clips) to be played before (or after) the selected element within the <a
|
|
href="#aural-model">audio "box" model</a>.
|
|
|
|
<p class=note> Note that the functionality provided by this property is
|
|
related to the <a
|
|
href="http://www.w3.org/TR/speech-synthesis11/#edef_audio"><code>audio</code>
|
|
element</a> from the SSML markup language <a href="#SSML"
|
|
rel=biblioentry>[SSML]<!--{{!SSML}}--></a>.
|
|
|
|
<dl>
|
|
<dt> <strong><uri></strong>
|
|
|
|
<dd>
|
|
<p>The URI designates an auditory icon resource. When a user agent is not
|
|
able to render the specified auditory icon (e.g. missing file resource,
|
|
or unsupported audio codec), it is recommended to produce an alternative
|
|
cue, such as a bell sound.</p>
|
|
|
|
<dt> <strong>none</strong>
|
|
|
|
<dd>
|
|
<p>Specifies that no auditory icon is used.</p>
|
|
|
|
<dt> <strong><decibel></strong>
|
|
|
|
<dd>
|
|
<p>A <a href="#number-def">number</a> immediately followed by "dB"
|
|
(decibel unit). This represents a change (positive or negative) relative
|
|
to the computed value of the ‘<a href="#voice-volume"><code
|
|
class=property>voice-volume</code></a>’ property within the <a
|
|
href="#aural-model">aural "box" model</a> of the selected element.
|
|
Decibels express the ratio of the squares of the new signal amplitude
|
|
(a1) and the current amplitude (a0), as per the following logarithmic
|
|
equation: volume(dB) = 20 log10 (a1 / a0)</p>
|
|
|
|
<p> When the ‘<a href="#voice-volume"><code
|
|
class=property>voice-volume</code></a>’ property is set to
|
|
‘<code class=property>silent</code>’, the audio cue is also
|
|
set to ‘<code class=property>silent</code>’ (regardless of
|
|
this specified <decibel> value). Otherwise (when not ‘<code
|
|
class=property>silent</code>’), ‘<a
|
|
href="#voice-volume"><code class=property>voice-volume</code></a>’
|
|
values are always specified relatively to the volume level keywords,
|
|
which map to a user-configured scale of "preferred" loudness settings
|
|
(see the definition of ‘<a href="#voice-volume"><code
|
|
class=property>voice-volume</code></a>’). If the inherited
|
|
‘<a href="#voice-volume"><code
|
|
class=property>voice-volume</code></a>’ value already contains a
|
|
decibel offset, the dB offset specific to the audio cue is combined
|
|
additively.
|
|
|
|
<p> The desired effect of an audio cue set at +0dB is that the volume
|
|
level during playback of the pre-recorded / pre-generated audio signal
|
|
is effectively the same as the loudness of live (i.e. real-time) speech
|
|
synthesis rendition. In order to achieve this effect, speech processors
|
|
are capable of directly controlling the waveform amplitude of generated
|
|
text-to-speech audio, user agents must be able to adjust the volume
|
|
output of audio cues (i.e. amplify or attenuate audio signals based on
|
|
the intrinsic waveform amplitude of digitized sound clips), and last but
|
|
not least, authors must ensure that the "normal" volume level of
|
|
pre-recorded audio cues (on average, as there may be discrete loudness
|
|
variations due to changes in the audio stream, such as intonation,
|
|
stress, etc.) matches that of a "typical" TTS voice output (based on the
|
|
‘<a href="#voice-family"><code
|
|
class=property>voice-family</code></a>’ intended for use), given
|
|
standard listening conditions (i.e. default system volume levels,
|
|
centered equalization across the frequency spectrum). This latter
|
|
prerequisite sets a baseline that enables a user agent to align the
|
|
volume outputs of both TTS and cue audio streams within the same aural
|
|
"box" model. Due to the complex relationship between perceived audio
|
|
characteristics and the processing applied to the digitized audio
|
|
signal, we will simplify the definition of "normal" volume levels by
|
|
referring to a canonical recording scenario, whereby the attenuation is
|
|
typically indicated in decibels, ranging from 0dB (maximum audio input,
|
|
near clipping threshold) to -60dB (total silence). In this common
|
|
context, a "standard" audio clip would oscillate between these values,
|
|
the loudest peak levels would be close to -3dB (to avoid distortion),
|
|
and the relevant audible passages would have average (RMS) volume levels
|
|
as high as possible (i.e. not too quiet, to avoid background noise
|
|
during amplification). This would roughly provide an audio experience
|
|
that could be seamlessly combined with text-to-speech output (i.e. there
|
|
would be no discernible difference in volume levels when switching from
|
|
pre-recorded audio to speech synthesis). Although there exists no
|
|
industry-wide standard to support such convention, TTS engines usually
|
|
generate comparably-loud audio signals when no gain or attenuation is
|
|
specified. For voice and soft music, -15dB RMS seems to be pretty
|
|
standard.</p>
|
|
|
|
<p class=note> Note that -6.0dB is approximately half the amplitude of
|
|
the audio signal, and +6.0dB is approximately twice the amplitude.</p>
|
|
|
|
<p class=note> Note that there is a difference between an audio cue whose
|
|
volume is set to ‘<code class=property>silent</code>’ and
|
|
one whose value is ‘<code class=property>none</code>’. In
|
|
the former case, the audio cue takes up the same time as if it had been
|
|
played, but no sound is generated. In the latter case, the there is no
|
|
manifestation of the audio cue at all (i.e. no time is allocated for the
|
|
cue in the aural dimension).</p>
|
|
</dl>
|
|
|
|
<div class=example>
|
|
<p> Examples of property values:</p>
|
|
|
|
<pre>
|
|
a
|
|
{
|
|
cue-before: url(/audio/bell.aiff) -3dB;
|
|
cue-after: url(dong.wav);
|
|
}
|
|
|
|
h1
|
|
{
|
|
cue-before: url(../clips-1/pop.au) +6dB;
|
|
cue-after: url(../clips-2/pop.au) 6dB;
|
|
}
|
|
|
|
div.caution { cue-before: url(./audio/caution.wav) +8dB; }</pre>
|
|
</div>
|
|
|
|
<h3 id=cue-props-cue><span class=secno>9.2. </span>The ‘<a
|
|
href="#cue"><code class=property>cue</code></a>’ shorthand property</h3>
|
|
|
|
<table class=propdef summary="name: syntax">
|
|
<tbody>
|
|
<tr>
|
|
<td>Name:
|
|
|
|
<td> <dfn id=cue>cue</dfn>
|
|
|
|
<tr>
|
|
<td> <em>Value:</em>
|
|
|
|
<td><‘<a href="#cue-before"><code
|
|
class=property>cue-before</code></a>’> <‘<a
|
|
href="#cue-after"><code class=property>cue-after</code></a>’>?
|
|
|
|
<tr>
|
|
<td> <em>Initial:</em>
|
|
|
|
<td>N/A (see individual properties)
|
|
|
|
<tr>
|
|
<td> <em>Applies to:</em>
|
|
|
|
<td>all elements
|
|
|
|
<tr>
|
|
<td> <em>Inherited:</em>
|
|
|
|
<td>no
|
|
|
|
<tr>
|
|
<td> <em>Percentages:</em>
|
|
|
|
<td>N/A
|
|
|
|
<tr>
|
|
<td> <em>Media:</em>
|
|
|
|
<td>speech
|
|
|
|
<tr>
|
|
<td> <em>Computed value:</em>
|
|
|
|
<td>N/A (see individual properties)
|
|
</table>
|
|
|
|
<p>The ‘<a href="#cue"><code class=property>cue</code></a>’
|
|
property is a shorthand for ‘<a href="#cue-before"><code
|
|
class=property>cue-before</code></a>’ and ‘<a
|
|
href="#cue-after"><code class=property>cue-after</code></a>’. If two
|
|
values are given the first value is ‘<a href="#cue-before"><code
|
|
class=property>cue-before</code></a>’ and the second is ‘<a
|
|
href="#cue-after"><code class=property>cue-after</code></a>’. If
|
|
only one value is given, it applies to both properties.
|
|
|
|
<div class=example>
|
|
<p> Example of shorthand notation:</p>
|
|
|
|
<pre>
|
|
h1
|
|
{
|
|
cue-before: url(pop.au);
|
|
cue-after: url(pop.au);
|
|
}
|
|
/* ...is equivalent to: */
|
|
h1
|
|
{
|
|
cue: url(pop.au);
|
|
}</pre>
|
|
</div>
|
|
|
|
<h2 id=voice-char-props><span class=secno>10. </span>Voice characteristic
|
|
properties</h2>
|
|
|
|
<h3 id=voice-props-voice-family><span class=secno>10.1. </span>The
|
|
‘<a href="#voice-family"><code
|
|
class=property>voice-family</code></a>’ property</h3>
|
|
|
|
<table class=propdef summary="name: syntax">
|
|
<tbody>
|
|
<tr>
|
|
<td>Name:
|
|
|
|
<td> <dfn id=voice-family>voice-family</dfn>
|
|
|
|
<tr>
|
|
<td> <em>Value:</em>
|
|
|
|
<td> [[<name> | <generic-voice>],]* [<name> |
|
|
<generic-voice>] | preserve
|
|
|
|
<tr>
|
|
<td> <em>Initial:</em>
|
|
|
|
<td>implementation-dependent
|
|
|
|
<tr>
|
|
<td> <em>Applies to:</em>
|
|
|
|
<td>all elements
|
|
|
|
<tr>
|
|
<td> <em>Inherited:</em>
|
|
|
|
<td>yes
|
|
|
|
<tr>
|
|
<td> <em>Percentages:</em>
|
|
|
|
<td>N/A
|
|
|
|
<tr>
|
|
<td> <em>Media:</em>
|
|
|
|
<td>speech
|
|
|
|
<tr>
|
|
<td> <em>Computed value:</em>
|
|
|
|
<td>specified value
|
|
</table>
|
|
|
|
<p>The ‘<a href="#voice-family"><code
|
|
class=property>voice-family</code></a>’ property specifies a
|
|
prioritized list of component values that are separated by commas to
|
|
indicate that they are alternatives (this is analogous to ‘<code
|
|
class=css><a href="#font-family-def"><code
|
|
class=property>font-family</code></a></code>’ in visual style
|
|
sheets). Each component value potentially designates a speech synthesis
|
|
voice instance, by specifying match criteria (see the <a
|
|
href="#voice-selection">voice selection</a> section on this topic).
|
|
|
|
<p> <strong><generic-voice></strong> = [<age>? <gender>
|
|
<integer>?]
|
|
|
|
<p class=note> Note that the functionality provided by this property is
|
|
related to the <a
|
|
href="http://www.w3.org/TR/speech-synthesis11/#edef_voice"><code>voice</code>
|
|
element</a> from the SSML markup language <a href="#SSML"
|
|
rel=biblioentry>[SSML]<!--{{!SSML}}--></a>.
|
|
|
|
<dl>
|
|
<dt> <strong><name></strong>
|
|
|
|
<dd>
|
|
<p>Values are specific voice instances (e.g., Mike, comedian, mary,
|
|
carlos2, "valley girl"). Voice names must either be given quoted as <a
|
|
href="#strings-def">strings</a>, or unquoted as a sequence of one or
|
|
more <a href="#identifier-def">identifiers</a>.</p>
|
|
|
|
<p class=note>Note that as a result, most punctuation characters, or
|
|
digits at the start of each token, must be escaped in unquoted voice
|
|
names.</p>
|
|
|
|
<p> If a sequence of identifiers is given as a voice name, the computed
|
|
value is the name converted to a string by joining all the identifiers
|
|
in the sequence by single spaces.</p>
|
|
|
|
<p> Voice names that happen to be the same as the gender keywords
|
|
(‘<code class=property>male</code>’, ‘<code
|
|
class=property>female</code>’ and ‘<code
|
|
class=property>neutral</code>’) or that happen to match the
|
|
keywords ‘<code class=property>inherit</code>’ or
|
|
‘<code class=property>preserve</code>’ must be quoted to
|
|
disambiguate with these keywords. The keywords ‘<code
|
|
class=property>initial</code>’ and ‘<code
|
|
class=property>default</code>’ are reserved for future use and
|
|
must also be quoted when used as voice names.</p>
|
|
|
|
<p class=note> Note that in <a href="#SSML"
|
|
rel=biblioentry>[SSML]<!--{{!SSML}}--></a>, voice names are
|
|
space-separated and cannot contain whitespace characters.</p>
|
|
|
|
<p> It is recommended to quote voice names that contain white space,
|
|
digits, or punctuation characters other than hyphens - even if these
|
|
voice names are valid in unquoted form - in order to improve code
|
|
clarity. For example: <code>voice-family: "john doe", "Henry
|
|
the-8th";</code></p>
|
|
|
|
<dt> <strong><age></strong>
|
|
|
|
<dd>
|
|
<p> Possible values are ‘<code class=property>child</code>’,
|
|
‘<code class=property>young</code>’ and ‘<code
|
|
class=property>old</code>’, indicating the preferred age category
|
|
to match during voice selection. The mapping with <a href="#SSML"
|
|
rel=biblioentry>[SSML]<!--{{!SSML}}--></a> ages is defined as follows:
|
|
‘<code class=property>child</code>’ = 6 y/o, ‘<code
|
|
class=property>young</code>’ = 24 y/o, ‘<code
|
|
class=property>old</code>’ = 75 y/o (note that more flexible age
|
|
ranges may be used by the processor-dependent voice-matching algorithm).
|
|
</p>
|
|
|
|
<p class=note> Note that the interpretation of the relationship between a
|
|
person's age and a recognizable type of voice cannot realistically be
|
|
defined in a universal manner, as it effectively depends on numerous
|
|
criteria (cultural, linguistic, biological, etc.). The values provided
|
|
by this specification therefore represent a simplified model that can be
|
|
reasonably applied to a broad variety of speech contexts, albeit at the
|
|
cost of a certain degree of approximation. Future versions of this
|
|
specification may refine the level of precision of the voice-matching
|
|
algorithm, as speech processor implementations become more standardized.
|
|
</p>
|
|
|
|
<dt> <strong><gender></strong>
|
|
|
|
<dd>
|
|
<p> One of the keywords ‘<code class=property>male</code>’,
|
|
‘<code class=property>female</code>’, or ‘<code
|
|
class=property>neutral</code>’, specifying a male, female, or
|
|
neutral voice, respectively.</p>
|
|
|
|
<dt> <strong><integer></strong>
|
|
|
|
<dd>
|
|
<p>An <a href="#integer-def">integer</a> indicating the preferred variant
|
|
(e.g. "the second male child voice"). Only positive integers (i.e.
|
|
excluding zero) are allowed. The value "1" refers to the first of all
|
|
matching voices.</p>
|
|
|
|
<dt> <strong>preserve</strong>
|
|
|
|
<dd>
|
|
<p>Indicates that the ‘<a href="#voice-family"><code
|
|
class=property>voice-family</code></a>’ value gets inherited and
|
|
used regardless of any potential language change within the content
|
|
markup (see the section below about voice selection and language
|
|
handling). This value behaves as ‘<code
|
|
class=property>inherit</code>’ when applied to the root element.</p>
|
|
|
|
<p class=note> Note that descendants of the selected element
|
|
automatically inherit the ‘<code
|
|
class=property>preserve</code>’ value, unless it is explicitly
|
|
overridden by other ‘<a href="#voice-family"><code
|
|
class=property>voice-family</code></a>’ values (e.g. name, gender,
|
|
age).</p>
|
|
</dl>
|
|
|
|
<div class=example>
|
|
<p> Examples of invalid declarations:</p>
|
|
|
|
<pre>
|
|
voice-family: john/doe; /* forward slash character should be escaped */
|
|
voice-family: john "doe"; /* identifier sequence cannot contain strings */
|
|
voice-family: john!; /* exclamation mark should be escaped */
|
|
voice-family: john@doe; /* "at" character should be escaped */
|
|
voice-family: #john; /* identifier cannot start with hash character */
|
|
voice-family: john 1st; /* identifier cannot start with digit */</pre>
|
|
</div>
|
|
|
|
<h4 class=no-toc id=voice-selection><span class=secno>10.1.1. </span>Voice
|
|
selection, content language</h4>
|
|
|
|
<p>The ‘<a href="#voice-family"><code
|
|
class=property>voice-family</code></a>’ property is used to guide
|
|
the selection of the speech synthesis voice instance. As part of this
|
|
selection process, speech-capable user agents must also take into account
|
|
the language of the selected element within the markup content. The
|
|
"name", "gender", "age", and preferred "variant" (index) are voice
|
|
selection hints that get carried down the content hierarchy as the
|
|
‘<a href="#voice-family"><code
|
|
class=property>voice-family</code></a>’ property value gets
|
|
inherited by descendant elements. At any point within the content
|
|
structure, the language takes precedence (i.e. has a higher priority) over
|
|
the specified CSS voice characteristics.
|
|
|
|
<p> The following list outlines the voice selection algorithm (note that
|
|
the definition of "language" is loose here, in order to cater for
|
|
dialectic variations):
|
|
|
|
<ol>
|
|
<li> If only a single voice instance is available for the language of the
|
|
selected content, then this voice must be used, regardless of the
|
|
specified CSS voice characteristics.
|
|
|
|
<li> If several voice instances are available for the language of the
|
|
selected content, then the chosen voice is the one that most closely
|
|
matches the specified name, or gender, age, and preferred voice variant.
|
|
The actual definition of "best match" is processor-dependent. For
|
|
example, in a system that only has male and female adult voices
|
|
available, a reasonable match for "voice-family: young male" may well be
|
|
a higher-pitched female voice, as this tone of voice would be close to
|
|
that of a young boy. If no voice instance matches the characteristics
|
|
provided by any of the ‘<a href="#voice-family"><code
|
|
class=property>voice-family</code></a>’ component values, the first
|
|
available voice instance (amongst those suitable for the language of the
|
|
selected content) must be used.
|
|
|
|
<li> If no voice is available for the language of the selected content, it
|
|
is recommended that user-agents let the user know about the lack of
|
|
appropriate TTS voice.
|
|
</ol>
|
|
|
|
<p>The speech synthesizer voice must be re-evaluated (i.e. the selection
|
|
process must take place once again) whenever any of the CSS voice
|
|
characteristics change within the content flow. The voice must also be
|
|
re-calculated whenever the content language changes, unless the
|
|
‘<code class=property>preserve</code>’ keyword is used (this
|
|
may be useful in cases where embedded foreign language text can be spoken
|
|
using a voice not designed for this language, as demonstrated by the
|
|
example below).
|
|
|
|
<p class=note>Note that dynamically computing a voice may lead to
|
|
unexpected lag, so user-agents should try to resolve concrete voice
|
|
instances in the document tree before the playback starts.
|
|
|
|
<div class=example>
|
|
<p>Examples of property values:</p>
|
|
|
|
<pre>
|
|
h1 { voice-family: announcer, old male; }
|
|
p.romeo { voice-family: romeo, young male; }
|
|
p.juliet { voice-family: juliet, young female; }
|
|
p.mercutio { voice-family: young male; }
|
|
p.tybalt { voice-family: young male; }
|
|
p.nurse { voice-family: amelie; }
|
|
|
|
...
|
|
|
|
<p class="romeo" xml:lang="en-US">
|
|
The French text below will be spoken with an English voice:
|
|
<span style="voice-family: preserve;" xml:lang="fr-FR">Bonjour monsieur !</span>
|
|
|
|
The English text below will be spoken with a voice different
|
|
than that corresponding to the class "romeo"
|
|
(which is inherited from the "p" parent element):
|
|
<span style="voice-family: female;">Hello sir!</span>
|
|
</p></pre>
|
|
</div>
|
|
|
|
<h3 id=voice-props-voice-rate><span class=secno>10.2. </span>The ‘<a
|
|
href="#voice-rate"><code class=property>voice-rate</code></a>’
|
|
property</h3>
|
|
|
|
<table class=propdef summary="name: syntax">
|
|
<tbody>
|
|
<tr>
|
|
<td>Name:
|
|
|
|
<td> <dfn id=voice-rate>voice-rate</dfn>
|
|
|
|
<tr>
|
|
<td> <em>Value:</em>
|
|
|
|
<td>[normal | x-slow | slow | medium | fast | x-fast] ||
|
|
<percentage>
|
|
|
|
<tr>
|
|
<td> <em>Initial:</em>
|
|
|
|
<td>normal
|
|
|
|
<tr>
|
|
<td> <em>Applies to:</em>
|
|
|
|
<td>all elements
|
|
|
|
<tr>
|
|
<td> <em>Inherited:</em>
|
|
|
|
<td>yes
|
|
|
|
<tr>
|
|
<td> <em>Percentages:</em>
|
|
|
|
<td>refer to default value
|
|
|
|
<tr>
|
|
<td> <em>Media:</em>
|
|
|
|
<td>speech
|
|
|
|
<tr>
|
|
<td> <em>Computed value:</em>
|
|
|
|
<td>a keyword value, and optionally also a percentage relative to the
|
|
keyword (if not 100%)
|
|
</table>
|
|
|
|
<p>The ‘<a href="#voice-rate"><code
|
|
class=property>voice-rate</code></a>’ property manipulates the rate
|
|
of generated synthetic speech in terms of words per minute.
|
|
|
|
<p class=note> Note that the functionality provided by this property is
|
|
related to the <a
|
|
href="http://www.w3.org/TR/speech-synthesis11/#edef_prosody"><code>rate</code>
|
|
attribute of the <code>prosody</code> element</a> from the SSML markup
|
|
language <a href="#SSML" rel=biblioentry>[SSML]<!--{{!SSML}}--></a>.
|
|
|
|
<dl>
|
|
<dt> <strong>normal</strong>
|
|
|
|
<dd>
|
|
<p>Represents the default rate produced by the speech synthesizer for the
|
|
currently active voice. This is processor-specific and depends on the
|
|
language, dialect and on the "personality" of the voice.</p>
|
|
|
|
<dt><strong>x-slow</strong>, <strong>slow</strong>,
|
|
<strong>medium</strong>, <strong>fast</strong> and
|
|
<strong>x-fast</strong>
|
|
|
|
<dd>
|
|
<p>A sequence of monotonically non-decreasing speaking rates that are
|
|
implementation and voice -specific. For example, typical values for the
|
|
English language are (in words per minute) x-slow = 80, slow = 120,
|
|
medium = between 180 and 200, fast = 500.</p>
|
|
|
|
<dt> <strong><percentage></strong>
|
|
|
|
<dd>
|
|
<p>Only non-negative <a href="#percentage-def">percentage</a> values are
|
|
allowed. This represents a change relative to the given keyword value
|
|
(see enumeration above), or to the default value for the root element,
|
|
or otherwise to the inherited speaking rate (which may itself be a
|
|
combination of a keyword value and of a percentage, in which case
|
|
percentages are combined multiplicatively). For example, 50% means that
|
|
the speaking rate gets multiplied by 0.5 (half the value).</p>
|
|
</dl>
|
|
|
|
<div class=example>
|
|
<p>Examples of inherited values:</p>
|
|
|
|
<pre>
|
|
<body>
|
|
<e1>
|
|
<e2>
|
|
<e3>
|
|
...
|
|
</e3>
|
|
</e2>
|
|
</e1>
|
|
</body>
|
|
|
|
|
|
|
|
|
|
body { voice-rate: inherit; } /* the initial value is 'normal'
|
|
(the actual speaking rate value
|
|
depends on the active voice) */
|
|
|
|
e1 { voice-rate: +50%; } /* the computed value is
|
|
['normal' and 50%], which will resolve
|
|
to the rate corresponding to 'normal'
|
|
multiplied by 0.5 (half the speaking rate) */
|
|
|
|
e2 { voice-rate: fast 120%; } /* the computed value is
|
|
['fast' and 120%], which will resolve
|
|
to the rate corresponding to 'fast'
|
|
multiplied by 1.2 (one and a half times the speaking rate) */
|
|
|
|
e3 { voice-rate: normal; /* "resets" the speaking rate to the intrinsic voice value,
|
|
the computed value is 'normal' (see comment below for actual value) */
|
|
|
|
voice-family: "another-voice"; } /* because the voice is different,
|
|
the calculated speaking rate may vary
|
|
compared to "body" (even though the computed
|
|
'voice-rate' value is the same) */
|
|
</pre>
|
|
</div>
|
|
|
|
<h3 id=voice-props-voice-pitch><span class=secno>10.3. </span>The ‘<a
|
|
href="#voice-pitch"><code class=property>voice-pitch</code></a>’
|
|
property</h3>
|
|
|
|
<table class=propdef summary="name: syntax">
|
|
<tbody>
|
|
<tr>
|
|
<td>Name:
|
|
|
|
<td> <dfn id=voice-pitch>voice-pitch</dfn>
|
|
|
|
<tr>
|
|
<td> <em>Value:</em>
|
|
|
|
<td><frequency> && absolute | [[x-low | low | medium |
|
|
high | x-high] || [<frequency> | <semitones> |
|
|
<percentage>]]
|
|
|
|
<tr>
|
|
<td> <em>Initial:</em>
|
|
|
|
<td>medium
|
|
|
|
<tr>
|
|
<td> <em>Applies to:</em>
|
|
|
|
<td>all elements
|
|
|
|
<tr>
|
|
<td> <em>Inherited:</em>
|
|
|
|
<td>yes
|
|
|
|
<tr>
|
|
<td> <em>Percentages:</em>
|
|
|
|
<td>refer to inherited value
|
|
|
|
<tr>
|
|
<td> <em>Media:</em>
|
|
|
|
<td>speech
|
|
|
|
<tr>
|
|
<td> <em>Computed value:</em>
|
|
|
|
<td> one of the predefined pitch keywords if only the keyword is
|
|
specified by itself, otherwise an absolute frequency calculated by
|
|
converting the keyword value (if any) to a fixed frequency based on the
|
|
current voice-family and by applying the specified relative offset (if
|
|
any)
|
|
</table>
|
|
|
|
<p>The ‘<a href="#voice-pitch"><code
|
|
class=property>voice-pitch</code></a>’ property specifies the
|
|
"baseline" pitch of the generated speech output, which depends on the used
|
|
‘<a href="#voice-family"><code
|
|
class=property>voice-family</code></a>’ instance, and varies across
|
|
speech synthesis processors (it approximately corresponds to the average
|
|
pitch of the output). For example, the common pitch for a male voice is
|
|
around 120Hz, whereas it is around 210Hz for a female voice.
|
|
|
|
<p class=note> Note that the functionality provided by this property is
|
|
related to the <a
|
|
href="http://www.w3.org/TR/speech-synthesis11/#edef_prosody"><code>pitch</code>
|
|
attribute of the <code>prosody</code> element</a> from the SSML markup
|
|
language <a href="#SSML" rel=biblioentry>[SSML]<!--{{!SSML}}--></a>.
|
|
|
|
<dl>
|
|
<dt> <strong><frequency></strong>
|
|
|
|
<dd>
|
|
<p> A value in <a href="#frequency-def">frequency</a> units (Hertz or
|
|
kiloHertz, e.g. "100Hz", "+2kHz"). Values are restricted to positive
|
|
numbers when the ‘<code class=property>absolute</code>’
|
|
keyword is specified. Otherwise (when the ‘<code
|
|
class=property>absolute</code>’ keyword is not specified), a
|
|
negative value represents a decrement, and a positive value represents
|
|
an increment, relative to the inherited value. For example, "2kHz" is a
|
|
positive offset (strictly equivalent to "+2kHz"), and "+2kHz absolute"
|
|
is an absolute frequency (strictly equivalent to "2kHz absolute").</p>
|
|
|
|
<dt> <strong>absolute</strong>
|
|
|
|
<dd>
|
|
<p> If specified, this keyword indicates that the specified frequency
|
|
represents an absolute value. If a negative frequency is specified, the
|
|
computed frequency will be zero.</p>
|
|
|
|
<dt> <strong><semitones></strong>
|
|
|
|
<dd>
|
|
<p> Specifies a relative change (decrement or increment) to the inherited
|
|
value. The syntax of allowed values is a <<a
|
|
href="#number-def">number</a>> followed immediately by "st"
|
|
(semitones). A semitone interval corresponds to the step between each
|
|
note on an equal temperament chromatic scale. A semitone can therefore
|
|
be quantified as the difference between two consecutive pitch
|
|
frequencies on such scale. The ratio between two consecutive frequencies
|
|
separated by exactly one semitone is the twelfth root of two
|
|
(approximately 11011/10393, which equals exactly 1.0594631). As a
|
|
result, the value in Hertz corresponding to a semitone offset is
|
|
relative to the initial frequency the offset is applied to (in other
|
|
words, a semitone doesn't correspond to a fixed numerical value in
|
|
Hertz).</p>
|
|
|
|
<dt> <strong><percentage></strong>
|
|
|
|
<dd>
|
|
<p> Positive and negative <a href="#percentage-def">percentage</a> values
|
|
are allowed, to represent an increment or decrement (respectively)
|
|
relative to the inherited value. Computed values are calculated by
|
|
adding (or subtracting) the specified fraction of the inherited value,
|
|
to (from) the inherited value. For example, 50% (which is equivalent to
|
|
+50%) with a inherited value of 200Hz results in <code>200 +
|
|
(200*0.5)</code> = 300Hz. Conversely, -50% results in
|
|
<code>200-(200*0.5)</code> = 100Hz.</p>
|
|
|
|
<dt><strong>x-low</strong>, <strong>low</strong>, <strong>medium</strong>,
|
|
<strong>high</strong>, <strong>x-high</strong>
|
|
|
|
<dd>
|
|
<p>A sequence of monotonically non-decreasing pitch levels that are
|
|
implementation and voice specific. When the computed value for a given
|
|
element is only a keyword (i.e. no relative offset is specified), then
|
|
the corresponding absolute frequency will be re-evaluated on a voice
|
|
change. Conversely, the application of a relative offset requires the
|
|
calculation of the resulting frequency based on the current voice at the
|
|
point at which the relative offset is specified, so the computed
|
|
frequency will inherit absolutely regardless of any voice change further
|
|
down the style cascade. Authors should therefore only use keyword values
|
|
in cases where they wish that voice changes trigger the re-evaluation of
|
|
the conversion from a keyword to a concrete, voice-dependent frequency.</p>
|
|
</dl>
|
|
|
|
<p> Computed absolute frequencies that are negative are clamped to zero
|
|
Hertz. Speech-capable user agents are likely to support a specific range
|
|
of values rather than the full range of possible calculated numerical
|
|
values for frequencies. The actual values in user agents may therefore be
|
|
clamped to implementation-dependent minimum and maximum boundaries. For
|
|
example: although the 0Hz frequency can be legitimately calculated, it may
|
|
be clamped to a more meaningful value in the context of the speech
|
|
synthesizer.
|
|
|
|
<div class=example>
|
|
<p>Examples of property values:</p>
|
|
|
|
<pre>
|
|
h1 { voice-pitch: 250Hz; } /* positive offset relative to the inherited absolute frequency */
|
|
h1 { voice-pitch: +250Hz; } /* identical to the line above */
|
|
h2 { voice-pitch: +30Hz absolute; } /* not an increment */
|
|
h2 { voice-pitch: absolute 30Hz; } /* identical to the line above */
|
|
h3 { voice-pitch: -20Hz; } /* negative offset (decrement) relative to the inherited absolute frequency */
|
|
h4 { voice-pitch: -20Hz absolute; } /* illegal syntax => value ignored ("absolute" keyword not allowed with negative frequency) */
|
|
h5 { voice-pitch: -3.5st; } /* semitones, negative offset */
|
|
h6 { voice-pitch: 25%; } /* this means "add a quarter of the inherited value, to the inherited value" */
|
|
h6 { voice-pitch: +25%; } /* identical to the line above */
|
|
</pre>
|
|
</div>
|
|
|
|
<h3 id=voice-props-voice-range><span class=secno>10.4. </span>The ‘<a
|
|
href="#voice-range"><code class=property>voice-range</code></a>’
|
|
property</h3>
|
|
|
|
<table class=propdef summary="name: syntax">
|
|
<tbody>
|
|
<tr>
|
|
<td>Name:
|
|
|
|
<td> <dfn id=voice-range>voice-range</dfn>
|
|
|
|
<tr>
|
|
<td> <em>Value:</em>
|
|
|
|
<td><frequency> && absolute | [[x-low | low | medium |
|
|
high | x-high] || [<frequency> | <semitones> |
|
|
<percentage>]]
|
|
|
|
<tr>
|
|
<td> <em>Initial:</em>
|
|
|
|
<td>medium
|
|
|
|
<tr>
|
|
<td> <em>Applies to:</em>
|
|
|
|
<td>all elements
|
|
|
|
<tr>
|
|
<td> <em>Inherited:</em>
|
|
|
|
<td>yes
|
|
|
|
<tr>
|
|
<td> <em>Percentages:</em>
|
|
|
|
<td>refer to inherited value
|
|
|
|
<tr>
|
|
<td> <em>Media:</em>
|
|
|
|
<td>speech
|
|
|
|
<tr>
|
|
<td> <em>Computed value:</em>
|
|
|
|
<td> one of the predefined pitch keywords if only the keyword is
|
|
specified by itself, otherwise an absolute frequency calculated by
|
|
converting the keyword value (if any) to a fixed frequency based on the
|
|
current voice-family and by applying the specified relative offset (if
|
|
any)
|
|
</table>
|
|
|
|
<p> The ‘<a href="#voice-range"><code
|
|
class=property>voice-range</code></a>’ property specifies the
|
|
variability in the "baseline" pitch, i.e. how much the fundamental
|
|
frequency may deviate from the average pitch of the speech output. The
|
|
dynamic pitch range of the generated speech generally increases for a
|
|
highly animated voice, for example when variations in inflection are used
|
|
to convey meaning and emphasis in speech. Typically, a low range produces
|
|
a flat, monotonic voice, whereas a high range produces an animated voice.
|
|
|
|
<p class=note> Note that the functionality provided by this property is
|
|
related to the <a
|
|
href="http://www.w3.org/TR/speech-synthesis11/#edef_prosody"><code>range</code>
|
|
attribute of the <code>prosody</code> element</a> from the SSML markup
|
|
language <a href="#SSML" rel=biblioentry>[SSML]<!--{{!SSML}}--></a>.
|
|
|
|
<dl>
|
|
<dt> <strong><frequency></strong>
|
|
|
|
<dd>
|
|
<p> A value in <a href="#frequency-def">frequency</a> units (Hertz or
|
|
kiloHertz, e.g. "100Hz", "+2kHz"). Values are restricted to positive
|
|
numbers when the ‘<code class=property>absolute</code>’
|
|
keyword is specified. Otherwise (when the ‘<code
|
|
class=property>absolute</code>’ keyword is not specified), a
|
|
negative value represents a decrement, and a positive value represents
|
|
an increment, relative to the inherited value. For example, "2kHz" is a
|
|
positive offset (strictly equivalent to "+2kHz"), and "+2kHz absolute"
|
|
is an absolute frequency (strictly equivalent to "2kHz absolute").</p>
|
|
|
|
<dt> <strong>absolute</strong>
|
|
|
|
<dd>
|
|
<p> If specified, this keyword indicates that the specified frequency
|
|
represents an absolute value. If a negative frequency is specified, the
|
|
computed frequency will be zero.</p>
|
|
|
|
<dt> <strong><semitones></strong>
|
|
|
|
<dd>
|
|
<p> Specifies a relative change (decrement or increment) to the inherited
|
|
value. The syntax of allowed values is a <<a
|
|
href="#number-def">number</a>> followed immediately by "st"
|
|
(semitones). A semitone interval corresponds to the step between each
|
|
note on an equal temperament chromatic scale. A semitone can therefore
|
|
be quantified as the difference between two consecutive pitch
|
|
frequencies on such scale. The ratio between two consecutive frequencies
|
|
separated by exactly one semitone is the twelfth root of two
|
|
(approximately 11011/10393, which equals exactly 1.0594631). As a
|
|
result, the value in Hertz corresponding to a semitone offset is
|
|
relative to the initial frequency the offset is applied to (in other
|
|
words, a semitone doesn't correspond to a fixed numerical value in
|
|
Hertz).</p>
|
|
|
|
<dt> <strong><percentage></strong>
|
|
|
|
<dd>
|
|
<p> Positive and negative <a href="#percentage-def">percentage</a> values
|
|
are allowed, to represent an increment or decrement (respectively)
|
|
relative to the inherited value. Computed values are calculated by
|
|
adding (or subtracting) the specified fraction of the inherited value,
|
|
to (from) the inherited value. For example, 50% (which is equivalent to
|
|
+50%) with a inherited value of 200Hz results in <code>200 +
|
|
(200*0.5)</code> = 300Hz. Conversely, -50% results in
|
|
<code>200-(200*0.5)</code> = 100Hz.</p>
|
|
|
|
<dt><strong>x-low</strong>, <strong>low</strong>, <strong>medium</strong>,
|
|
<strong>high</strong>, <strong>x-high</strong>
|
|
|
|
<dd>
|
|
<p>A sequence of monotonically non-decreasing pitch levels that are
|
|
implementation and voice specific. When the computed value for a given
|
|
element is only a keyword (i.e. no relative offset is specified), then
|
|
the corresponding absolute frequency will be re-evaluated on a voice
|
|
change. Conversely, the application of a relative offset requires the
|
|
calculation of the resulting frequency based on the current voice at the
|
|
point at which the relative offset is specified, so the computed
|
|
frequency will inherit absolutely regardless of any voice change further
|
|
down the style cascade. Authors should therefore only use keyword values
|
|
in cases where they wish that voice changes trigger the re-evaluation of
|
|
the conversion from a keyword to a concrete, voice-dependent frequency.</p>
|
|
</dl>
|
|
|
|
<p> Computed absolute frequencies that are negative are clamped to zero
|
|
Hertz. Speech-capable user agents are likely to support a specific range
|
|
of values rather than the full range of possible calculated numerical
|
|
values for frequencies. The actual values in user agents may therefore be
|
|
clamped to implementation-dependent minimum and maximum boundaries. For
|
|
example: although the 0Hz frequency can be legitimately calculated, it may
|
|
be clamped to a more meaningful value in the context of the speech
|
|
synthesizer.
|
|
|
|
<div class=example>
|
|
<p>Examples of inherited values:</p>
|
|
|
|
<pre>
|
|
<body>
|
|
<e1>
|
|
<e2>
|
|
<e3>
|
|
<e4>
|
|
<e5>
|
|
<e6>
|
|
...
|
|
</e6>
|
|
</e5>
|
|
</e4>
|
|
</e3>
|
|
</e2>
|
|
</e1>
|
|
</body>
|
|
|
|
|
|
|
|
|
|
body { voice-range: inherit; } /* the initial value is 'medium'
|
|
(the actual frequency value
|
|
depends on the current voice) */
|
|
|
|
e1 { voice-range: +25%; } /* the computed value is
|
|
['medium' + 25%] which resolves
|
|
to the frequency corresponding to 'medium'
|
|
plus 0.25 times the frequency
|
|
corresponding to 'medium' */
|
|
|
|
e2 { voice-range: +10Hz; } /* the computed value is
|
|
[FREQ + 10Hz] where "FREQ" is the absolute frequency
|
|
calculated in the "e1" rule above.
|
|
*/
|
|
|
|
e3 { voice-range: inherit; /* this could be omitted,
|
|
but we explicitly specify it for clarity purposes */
|
|
|
|
voice-family: "another-voice"; } /* this voice change would have resulted in
|
|
the re-evaluation of the initial 'medium' keyword
|
|
inherited by the "body" element
|
|
(i.e. conversion from a voice-dependent keyword value
|
|
to a concrete, absolute frequency),
|
|
but because relative offsets were applied down the style
|
|
cascade, the inherited value is actually the frequency
|
|
calculated at the "e2" rule above. */
|
|
|
|
e4 { voice-range: 200Hz absolute; } /* override with an absolute frequency
|
|
which doesn't depend on the current voice */
|
|
|
|
e5 { voice-range: 2st; } /* the computed value is an absolute frequency,
|
|
which is the result of the
|
|
calculation: 200Hz + two semitones
|
|
(reminder: the actual frequency corresponding to a semitone
|
|
depends on the base value to which it applies) */
|
|
|
|
e6 { voice-range: inherit; /* this could be omitted,
|
|
but we explicitly specify it for clarity purposes */
|
|
|
|
voice-family: "yet-another-voice"; } /* despite the voice change,
|
|
the computed value is the same as
|
|
for "e5" (i.e. an absolute frequency value,
|
|
independent from the current voice) */
|
|
</pre>
|
|
</div>
|
|
|
|
<h3 id=voice-props-voice-stress><span class=secno>10.5. </span>The
|
|
‘<a href="#voice-stress"><code
|
|
class=property>voice-stress</code></a>’ property</h3>
|
|
|
|
<table class=propdef summary="name: syntax">
|
|
<tbody>
|
|
<tr>
|
|
<td>Name:
|
|
|
|
<td> <dfn id=voice-stress>voice-stress</dfn>
|
|
|
|
<tr>
|
|
<td> <em>Value:</em>
|
|
|
|
<td>normal | strong | moderate | none | reduced
|
|
|
|
<tr>
|
|
<td> <em>Initial:</em>
|
|
|
|
<td>normal
|
|
|
|
<tr>
|
|
<td> <em>Applies to:</em>
|
|
|
|
<td>all elements
|
|
|
|
<tr>
|
|
<td> <em>Inherited:</em>
|
|
|
|
<td>yes
|
|
|
|
<tr>
|
|
<td> <em>Percentages:</em>
|
|
|
|
<td>N/A
|
|
|
|
<tr>
|
|
<td> <em>Media:</em>
|
|
|
|
<td>speech
|
|
|
|
<tr>
|
|
<td> <em>Computed value:</em>
|
|
|
|
<td>specified value
|
|
</table>
|
|
|
|
<p>The ‘<a href="#voice-stress"><code
|
|
class=property>voice-stress</code></a>’ property manipulates the
|
|
strength of emphasis, which is normally applied using a combination of
|
|
pitch change, timing changes, loudness and other acoustic differences. The
|
|
precise meaning of the values therefore depend on the language being
|
|
spoken.
|
|
|
|
<p class=note> Note that the functionality provided by this property is
|
|
related to the <a
|
|
href="http://www.w3.org/TR/speech-synthesis11/#edef_emphasis"><code>emphasis</code>
|
|
element</a> from the SSML markup language <a href="#SSML"
|
|
rel=biblioentry>[SSML]<!--{{!SSML}}--></a>.
|
|
|
|
<dl>
|
|
<dt> <strong>normal</strong>
|
|
|
|
<dd>
|
|
<p>Represents the default emphasis produced by the speech synthesizer.</p>
|
|
|
|
<dt> <strong>none</strong>
|
|
|
|
<dd>
|
|
<p>Prevents the synthesizer from emphasizing text it would normally
|
|
emphasize.</p>
|
|
|
|
<dt><strong>moderate</strong> and <strong>strong</strong>
|
|
|
|
<dd>
|
|
<p>These values are monotonically non-decreasing in strength. Their
|
|
application results in more emphasis than what the speech synthesizer
|
|
would normally produce (i.e. more than the value corresponding to
|
|
‘<code class=property>normal</code>’).</p>
|
|
|
|
<dt> <strong>reduced</strong>
|
|
|
|
<dd>
|
|
<p>Effectively the opposite of emphasizing a word.</p>
|
|
</dl>
|
|
|
|
<div class=example>
|
|
<p>Examples of property values, with HTML sample:</p>
|
|
|
|
<pre>
|
|
span.default-emphasis { voice-stress: normal; }
|
|
span.lowered-emphasis { voice-stress: reduced; }
|
|
span.removed-emphasis { voice-stress: none; }
|
|
span.normal-emphasis { voice-stress: moderate; }
|
|
span.huge-emphasis { voice-stress: strong; }
|
|
|
|
...
|
|
|
|
<p>This is a big car.</p>
|
|
<!-- The speech output from the line above is identical to the line below: -->
|
|
<p>This is a <span class="default-emphasis">big</span> car.</p>
|
|
|
|
<p>This car is <span class="lowered-emphasis">massive</span>!</p>
|
|
<!-- The "span" below is totally de-emphasized, whereas the emphasis in the line above is only reduced: -->
|
|
<p>This car is <span class="removed-emphasis">massive</span>!</p>
|
|
|
|
<!-- The lines below demonstrate increasing levels of emphasis: -->
|
|
<p>This is a <span class="normal-emphasis">big</span> car!</p>
|
|
<p>This is a <span class="huge-emphasis">big</span> car!!!</p></pre>
|
|
</div>
|
|
|
|
<h2 id=duration-props><span class=secno>11. </span>Voice duration property</h2>
|
|
|
|
<h3 id=mixing-props-voice-duration><span class=secno>11.1. </span>The
|
|
‘<a href="#voice-duration"><code
|
|
class=property>voice-duration</code></a>’ property</h3>
|
|
|
|
<table class=propdef summary="name: syntax">
|
|
<tbody>
|
|
<tr>
|
|
<td>Name:
|
|
|
|
<td> <dfn id=voice-duration>voice-duration</dfn>
|
|
|
|
<tr>
|
|
<td> <em>Value:</em>
|
|
|
|
<td>auto | <time>
|
|
|
|
<tr>
|
|
<td> <em>Initial:</em>
|
|
|
|
<td> <em>auto</em>
|
|
|
|
<tr>
|
|
<td> <em>Applies to:</em>
|
|
|
|
<td>all elements
|
|
|
|
<tr>
|
|
<td> <em>Inherited:</em>
|
|
|
|
<td>no
|
|
|
|
<tr>
|
|
<td> <em>Percentages:</em>
|
|
|
|
<td>N/A
|
|
|
|
<tr>
|
|
<td> <em>Media:</em>
|
|
|
|
<td>speech
|
|
|
|
<tr>
|
|
<td> <em>Computed value:</em>
|
|
|
|
<td>specified value
|
|
</table>
|
|
|
|
<p> The ‘<a href="#voice-duration"><code
|
|
class=property>voice-duration</code></a>’ property specifies how
|
|
long it should take to render the selected element's content (not
|
|
including <a href="#cue-props">audio cues</a>, <a href="#pause-props">
|
|
pauses</a> and <a href="#rest-props">rests</a> ). Unless the value
|
|
‘<code class=property>auto</code>’ is specified, this property
|
|
takes precedence over the ‘<a href="#voice-rate"><code
|
|
class=property>voice-rate</code></a>’ property, and should be used
|
|
to determine a suitable speaking rate for the voice. An element for which
|
|
the ‘<a href="#voice-duration"><code
|
|
class=property>voice-duration</code></a>’ property value is not
|
|
‘<code class=property>auto</code>’ may have descendants for
|
|
which the ‘<a href="#voice-duration"><code
|
|
class=property>voice-duration</code></a>’ and ‘<a
|
|
href="#voice-rate"><code class=property>voice-rate</code></a>’
|
|
properties are specified, but these must be ignored. In other words, when
|
|
a ‘<code class=property>time</code>’ is specified for the
|
|
‘<a href="#voice-duration"><code
|
|
class=property>voice-duration</code></a>’ of a selected element, it
|
|
applies to the entire element subtree (children cannot override the
|
|
property).
|
|
|
|
<p class=note> Note that the functionality provided by this property is
|
|
related to the <a
|
|
href="http://www.w3.org/TR/speech-synthesis11/#edef_prosody"><code>duration</code>
|
|
attribute of the <code>prosody</code> element</a> from the SSML markup
|
|
language <a href="#SSML" rel=biblioentry>[SSML]<!--{{!SSML}}--></a>.
|
|
|
|
<dl>
|
|
<dt> <strong>auto</strong>
|
|
|
|
<dd>
|
|
<p>Resolves to a used value corresponding to the duration of the speech
|
|
synthesis when using the inherited ‘<a href="#voice-rate"><code
|
|
class=property>voice-rate</code></a>’.</p>
|
|
|
|
<dt> <strong><time></strong>
|
|
|
|
<dd>
|
|
<p> Specifies a value in absolute <a href="#time-def">time</a> units
|
|
(seconds and milliseconds, e.g. "+3s", "250ms"). Only non-negative
|
|
values are allowed.</p>
|
|
</dl>
|
|
|
|
<h2 id=lists><span class=secno>12. </span>List items and counters styles</h2>
|
|
|
|
<p>The ‘<code class=css><a href="#list-style-type-def"> <code
|
|
class=property>list-style-type</code></a></code>’ property of <a
|
|
href="#CSS21" rel=biblioentry>[CSS21]<!--{{!CSS21}}--></a> specifies three
|
|
types of list item markers: glyphs, numbering systems, and alphabetic
|
|
systems. The values allowed for this property are also used for the
|
|
counter() function of the ‘<a href="#content-def"><code
|
|
class=property>content</code></a>’ property. The CSS Speech module
|
|
defines how to render these styles in the aural dimension, using speech
|
|
synthesis. The ‘<code class=css><a
|
|
href="#list-style-image-def"><code
|
|
class=property>list-style-image</code></a></code>’ property of <a
|
|
href="#CSS21" rel=biblioentry>[CSS21]<!--{{!CSS21}}--></a> is ignored, and
|
|
instead the ‘<code class=css><a href="#list-style-type-def"><code
|
|
class=property>list-style-type</code></a></code>’ is used.
|
|
|
|
<p class=note> Note that the speech rendering of new features from the CSS
|
|
Lists and Counters Module Level 3 <a href="#CSS3LIST"
|
|
rel=biblioentry>[CSS3LIST]<!--{{CSS3LIST}}--></a> is not covered in this
|
|
level of CSS Speech, but may be defined in a future specification.
|
|
|
|
<dl>
|
|
<dt> <strong>disc, circle, square</strong>
|
|
|
|
<dd>
|
|
<p> For these list item styles, the user-agent defines (possibly based on
|
|
user preferences) what equivalent phrase is spoken or what audio cue is
|
|
played. List items with graphical bullets are therefore announced
|
|
appropriately in an implementation-dependent manner.</p>
|
|
|
|
<dt> <strong>decimal, decimal-leading-zero, lower-roman, upper-roman,
|
|
georgian, armenian</strong>
|
|
|
|
<dd>
|
|
<p> For these list item styles, corresponding numbers are spoken as-is by
|
|
the speech synthesizer, and may be complemented with additional audio
|
|
cues or speech phrases in the document's language (i.e. with the same
|
|
TTS voice used to speak the list item content) in order to indicate the
|
|
presence of list items. For example, when using the English language,
|
|
the list item counter could be prefixed with the word "Item", which
|
|
would result in list items being announced with "Item one", "Item two",
|
|
etc.</p>
|
|
|
|
<dt> <strong>lower-latin, lower-alpha, upper-latin, upper-alpha,
|
|
lower-greek</strong>
|
|
|
|
<dd>
|
|
<p> These list item styles are spelled out letter-by-letter by the speech
|
|
synthesizer, in the document language (i.e. with the same TTS voice used
|
|
to speak the list item content). For example, ‘<code
|
|
class=property>lower-greek</code>’ in English would be read out as
|
|
"alpha", "beta", "gamma", etc. Conversely, ‘<code
|
|
class=property>upper-latin</code>’ in French would be read out as
|
|
/a/, /be/, /se/, etc. (phonetic notation)</p>
|
|
</dl>
|
|
|
|
<p class=note>Note that it is common for user-agents such as screen readers
|
|
to announce the nesting depth of list items, or more generally, to
|
|
indicate additional structural information pertaining to complex
|
|
hierarchical content. The verbosity of these additional audio cues and/or
|
|
speech output can usually be controlled by users, and contribute to
|
|
increasing usability. These navigation aids are implementation-dependent,
|
|
but it is recommended that user-agents supporting the CSS Speech module
|
|
ensure that these additional audio cues and speech output don't generate
|
|
redundancies or create inconsistencies (for example: duplicated or
|
|
different list item numbering scheme).
|
|
|
|
<h2 id=content><span class=secno>13. </span>Inserted and replaced content</h2>
|
|
|
|
<p class=note>Note that this entire section is non-normative.
|
|
|
|
<p>Sometimes, authors will want to specify a mapping from the source text
|
|
into another string prior to the application of the regular pronunciation
|
|
rules. This may be used for uncommon abbreviations or acronyms which are
|
|
unlikely to be recognized by the synthesizer. The ‘<a
|
|
href="#content-def"><code class=property>content</code></a>’
|
|
property can be used to replace one string by another. The functionality
|
|
provided by this property is related to the <a
|
|
href="http://www.w3.org/TR/speech-synthesis11/#edef_sub"><code>alias</code>
|
|
attribute of the <code>sub</code> element</a> from the SSML markup
|
|
language <a href="#SSML" rel=biblioentry>[SSML]<!--{{!SSML}}--></a>.
|
|
|
|
<div class=example>
|
|
<p> In this example, the abbreviation is rendered using the content of the
|
|
title attribute instead of the element's content.</p>
|
|
|
|
<pre>
|
|
/* This replaces the content of the selected element
|
|
by the string "World Wide Web Consortium". */
|
|
abbr { content: attr(title); }
|
|
...
|
|
|
|
<abbr title="World Wide Web Consortium">W3C</abbr></pre>
|
|
</div>
|
|
|
|
<p>In a similar way, text strings in a document can be replaced by a
|
|
previously recorded version.
|
|
|
|
<div class=example>
|
|
<p>In this example - assuming the format is supported, the file is
|
|
available and the UA is configured to do so - a recording of Sir John
|
|
Gielgud's declamation of the famous monologue is played. Otherwise the UA
|
|
falls back to render the text using synthesized speech.</p>
|
|
|
|
<pre>
|
|
.hamlet { content: url(./audio/gielgud.wav); }
|
|
...
|
|
|
|
<div class="hamlet">
|
|
To be, or not to be: that is the question:
|
|
</div></pre>
|
|
</div>
|
|
|
|
<p>Furthermore, authors (or users via a user stylesheet) may add some
|
|
information to ease the understanding of structures during non-visual
|
|
interaction with the document. They can do so by using the ‘<code
|
|
class=css>::before</code>’ and ‘<code
|
|
class=css>::after</code>’ pseudo-elements. Note that different
|
|
stylesheets can be used to define the level of verbosity for additional
|
|
information spoken by screen readers.
|
|
|
|
<div class=example>
|
|
<p>This example inserts the string "Start list: " before a list and the
|
|
string "List item: " before the content of each list item. Likewise, the
|
|
string "List end: " gets inserted after the list to inform the user that
|
|
the list speech output is over.</p>
|
|
|
|
<pre>
|
|
ul::before { content: "Start list: "; }
|
|
ul::after { content: "List end. "; }
|
|
li::before { content: "List item: "; }</pre>
|
|
</div>
|
|
|
|
<p>Detailed information can be found in the CSS3 Generated and Replaced
|
|
Content module <a href="#CSS3GENCON"
|
|
rel=biblioentry>[CSS3GENCON]<!--{{CSS3GENCON}}--></a>.
|
|
|
|
<h2 id=pronunciation><span class=secno>14. </span> Pronunciation, phonemes</h2>
|
|
|
|
<p class=note>Note that this entire section is non-normative.
|
|
|
|
<p> CSS does not specify how to define the pronunciation (expressed using a
|
|
well-defined phonetic alphabet) of a particular piece of text within the
|
|
markup document. A "phonemes" property was described in earlier drafts of
|
|
this specification, but objections were raised due to breaking the
|
|
principle of separation between content and presentation (the "phonemes"
|
|
authored within aural CSS stylesheets would have needed to be updated each
|
|
time text changed within the markup document). The "phonemes"
|
|
functionality is therefore considered out-of-scope in CSS (the
|
|
presentation layer) and should be addressed in the markup / content layer.
|
|
|
|
<p> The <a
|
|
href="http://microformats.org/wiki/rel-pronunciation">"pronunciation"</a>
|
|
<code>rel</code> value allows importing pronunciation lexicons in HTML
|
|
documents using the <code>link</code> element (similar to how CSS
|
|
stylesheets can be included). The W3C PLS (Pronunciation Lexicon
|
|
Specification) <a href="#PRONUNCIATION-LEXICON"
|
|
rel=biblioentry>[PRONUNCIATION-LEXICON]<!--{{PRONUNCIATION-LEXICON}}--></a>
|
|
is one format that can be used to describe such a lexicon.
|
|
|
|
<p> Additionally, an attribute-based mechanism can be used within the
|
|
markup to author text-pronunciation associations. At the time of writing,
|
|
such mechanism isn't formally defined in the W3C HTML standard(s).
|
|
However, the <a href="http://idpf.org/epub/30">EPUB 3.0 draft
|
|
specification</a> allows (x)HTML5 documents to contain attributes derived
|
|
from the <a href="#SSML" rel=biblioentry>[SSML]<!--{{!SSML}}--></a>
|
|
specification, that describe how to pronounce text based on a particular
|
|
phonetic alphabet.</p>
|
|
<!-- p>
|
|
One avenue to explore is the use CSS to "bind" HTML text with a
|
|
phoneme (also declared in the HTML document). This would maintain a
|
|
clear separation between content and presentation, and it would allow
|
|
authors to define different pronunciations for one given text token
|
|
(Media Queries could drive the switch of stylesheet to import). This
|
|
possibility has been mentioned several times by Working Group members
|
|
as well as people from the public mailing-list, so it cannot be
|
|
ignored. However, there are architectural considerations (e.g.
|
|
collision between CSS versus HTML -defined phonemes) which make this a
|
|
lot trickier to standardize than it sounds. The
|
|
whole "speech synthesis" issue should be tackled globally at the level
|
|
of the W3C ecosystem. For example, there are many cross-cutting
|
|
concerns with the work done by the HTML-Audio and HTML-Speech
|
|
Incubator Groups.
|
|
</p -->
|
|
|
|
<h2 class=no-num id=property-index>Appendix A — Property index</h2>
|
|
<!--begin-properties-->
|
|
|
|
<table class=proptable>
|
|
<thead>
|
|
<tr>
|
|
<th>Property
|
|
|
|
<th>Values
|
|
|
|
<th>Initial
|
|
|
|
<th>Applies to
|
|
|
|
<th>Inh.
|
|
|
|
<th>Percentages
|
|
|
|
<th>Media
|
|
|
|
<tbody>
|
|
<tr>
|
|
<th><a class=property href="#cue">cue</a>
|
|
|
|
<td><‘cue-before’> <‘cue-after’>?
|
|
|
|
<td>N/A (see individual properties)
|
|
|
|
<td>all elements
|
|
|
|
<td>no
|
|
|
|
<td>N/A
|
|
|
|
<td>speech
|
|
|
|
<tr>
|
|
<th><a class=property href="#cue-after">cue-after</a>
|
|
|
|
<td><uri> <decibel>? | none
|
|
|
|
<td>none
|
|
|
|
<td>all elements
|
|
|
|
<td>no
|
|
|
|
<td>N/A
|
|
|
|
<td>speech
|
|
|
|
<tr>
|
|
<th><a class=property href="#cue-before">cue-before</a>
|
|
|
|
<td><uri> <decibel>? | none
|
|
|
|
<td>none
|
|
|
|
<td>all elements
|
|
|
|
<td>no
|
|
|
|
<td>N/A
|
|
|
|
<td>speech
|
|
|
|
<tr>
|
|
<th><a class=property href="#pause">pause</a>
|
|
|
|
<td><‘pause-before’>
|
|
<‘pause-after’>?
|
|
|
|
<td>N/A (see individual properties)
|
|
|
|
<td>all elements
|
|
|
|
<td>no
|
|
|
|
<td>N/A
|
|
|
|
<td>speech
|
|
|
|
<tr>
|
|
<th><a class=property href="#pause-after">pause-after</a>
|
|
|
|
<td><time> | none | x-weak | weak | medium | strong | x-strong
|
|
|
|
<td>none
|
|
|
|
<td>all elements
|
|
|
|
<td>no
|
|
|
|
<td>N/A
|
|
|
|
<td>speech
|
|
|
|
<tr>
|
|
<th><a class=property href="#pause-before">pause-before</a>
|
|
|
|
<td><time> | none | x-weak | weak | medium | strong | x-strong
|
|
|
|
<td>none
|
|
|
|
<td>all elements
|
|
|
|
<td>no
|
|
|
|
<td>N/A
|
|
|
|
<td>speech
|
|
|
|
<tr>
|
|
<th><a class=property href="#rest">rest</a>
|
|
|
|
<td><‘rest-before’> <‘rest-after’>?
|
|
|
|
<td>N/A (see individual properties)
|
|
|
|
<td>all elements
|
|
|
|
<td>no
|
|
|
|
<td>N/A
|
|
|
|
<td>speech
|
|
|
|
<tr>
|
|
<th><a class=property href="#rest-after">rest-after</a>
|
|
|
|
<td><time> | none | x-weak | weak | medium | strong | x-strong
|
|
|
|
<td>none
|
|
|
|
<td>all elements
|
|
|
|
<td>no
|
|
|
|
<td>N/A
|
|
|
|
<td>speech
|
|
|
|
<tr>
|
|
<th><a class=property href="#rest-before">rest-before</a>
|
|
|
|
<td><time> | none | x-weak | weak | medium | strong | x-strong
|
|
|
|
<td>none
|
|
|
|
<td>all elements
|
|
|
|
<td>no
|
|
|
|
<td>N/A
|
|
|
|
<td>speech
|
|
|
|
<tr>
|
|
<th><a class=property href="#speak">speak</a>
|
|
|
|
<td>auto | none | normal
|
|
|
|
<td>auto
|
|
|
|
<td>all elements
|
|
|
|
<td>yes
|
|
|
|
<td>N/A
|
|
|
|
<td>speech
|
|
|
|
<tr>
|
|
<th><a class=property href="#speak-as">speak-as</a>
|
|
|
|
<td>normal | spell-out || digits || [ literal-punctuation |
|
|
no-punctuation ]
|
|
|
|
<td>normal
|
|
|
|
<td>all elements
|
|
|
|
<td>yes
|
|
|
|
<td>N/A
|
|
|
|
<td>speech
|
|
|
|
<tr>
|
|
<th><a class=property href="#voice-balance">voice-balance</a>
|
|
|
|
<td><number> | left | center | right | leftwards | rightwards
|
|
|
|
<td>center
|
|
|
|
<td>all elements
|
|
|
|
<td>yes
|
|
|
|
<td>N/A
|
|
|
|
<td>speech
|
|
|
|
<tr>
|
|
<th><a class=property href="#voice-duration">voice-duration</a>
|
|
|
|
<td>auto | <time>
|
|
|
|
<td>auto
|
|
|
|
<td>all elements
|
|
|
|
<td>no
|
|
|
|
<td>N/A
|
|
|
|
<td>speech
|
|
|
|
<tr>
|
|
<th><a class=property href="#voice-family">voice-family</a>
|
|
|
|
<td>[[<name> | <generic-voice>],]* [<name> |
|
|
<generic-voice>] | preserve
|
|
|
|
<td>implementation-dependent
|
|
|
|
<td>all elements
|
|
|
|
<td>yes
|
|
|
|
<td>N/A
|
|
|
|
<td>speech
|
|
|
|
<tr>
|
|
<th><a class=property href="#voice-pitch">voice-pitch</a>
|
|
|
|
<td><frequency> && absolute | [[x-low | low | medium |
|
|
high | x-high] || [<frequency> | <semitones> |
|
|
<percentage>]]
|
|
|
|
<td>medium
|
|
|
|
<td>all elements
|
|
|
|
<td>yes
|
|
|
|
<td>refer to inherited value
|
|
|
|
<td>speech
|
|
|
|
<tr>
|
|
<th><a class=property href="#voice-range">voice-range</a>
|
|
|
|
<td><frequency> && absolute | [[x-low | low | medium |
|
|
high | x-high] || [<frequency> | <semitones> |
|
|
<percentage>]]
|
|
|
|
<td>medium
|
|
|
|
<td>all elements
|
|
|
|
<td>yes
|
|
|
|
<td>refer to inherited value
|
|
|
|
<td>speech
|
|
|
|
<tr>
|
|
<th><a class=property href="#voice-rate">voice-rate</a>
|
|
|
|
<td>[normal | x-slow | slow | medium | fast | x-fast] ||
|
|
<percentage>
|
|
|
|
<td>normal
|
|
|
|
<td>all elements
|
|
|
|
<td>yes
|
|
|
|
<td>refer to default value
|
|
|
|
<td>speech
|
|
|
|
<tr>
|
|
<th><a class=property href="#voice-stress">voice-stress</a>
|
|
|
|
<td>normal | strong | moderate | none | reduced
|
|
|
|
<td>normal
|
|
|
|
<td>all elements
|
|
|
|
<td>yes
|
|
|
|
<td>N/A
|
|
|
|
<td>speech
|
|
|
|
<tr>
|
|
<th><a class=property href="#voice-volume">voice-volume</a>
|
|
|
|
<td>silent | [[x-soft | soft | medium | loud | x-loud] ||
|
|
<decibel>]
|
|
|
|
<td>medium
|
|
|
|
<td>all elements
|
|
|
|
<td>yes
|
|
|
|
<td>N/A
|
|
|
|
<td>speech
|
|
</table>
|
|
<!--end-properties-->
|
|
|
|
<p>The following properties are defined in other modules or specifications:
|
|
|
|
<ul>
|
|
<li> <dfn id=display-def> <a
|
|
href="http://www.w3.org/TR/CSS21/visuren.html#display-prop"> display
|
|
</a></dfn> <a href="#CSS21"
|
|
rel=biblioentry>[CSS21]<!--{{!CSS21}}--></a>
|
|
|
|
<li> <dfn id=padding-def> <a
|
|
href="http://www.w3.org/TR/CSS21/box.html#padding-properties"> padding
|
|
</a></dfn> <a href="#CSS21"
|
|
rel=biblioentry>[CSS21]<!--{{!CSS21}}--></a>
|
|
|
|
<li> <dfn id=border-def> <a
|
|
href="http://www.w3.org/TR/CSS21/box.html#border-properties"> border
|
|
</a></dfn> <a href="#CSS21"
|
|
rel=biblioentry>[CSS21]<!--{{!CSS21}}--></a>
|
|
|
|
<li> <dfn id=margin-def> <a
|
|
href="http://www.w3.org/TR/CSS21/box.html#margin-properties"> margin
|
|
</a></dfn> <a href="#CSS21"
|
|
rel=biblioentry>[CSS21]<!--{{!CSS21}}--></a>
|
|
|
|
<li> <dfn id=font-family-def> <a
|
|
href="http://www.w3.org/TR/CSS21/fonts.html#font-family-prop">
|
|
font-family </a></dfn> <a href="#CSS21"
|
|
rel=biblioentry>[CSS21]<!--{{!CSS21}}--></a>
|
|
|
|
<li><dfn id=content-def>content</dfn> <a href="#CSS3GENCON"
|
|
rel=biblioentry>[CSS3GENCON]<!--{{CSS3GENCON}}--></a>
|
|
|
|
<li> <dfn id=list-style-type-def> <a
|
|
href="http://www.w3.org/TR/CSS21/generate.html#propdef-list-style-type">
|
|
list-style-type </a></dfn> <a href="#CSS21"
|
|
rel=biblioentry>[CSS21]<!--{{!CSS21}}--></a>
|
|
|
|
<li> <dfn id=list-style-image-def> <a
|
|
href="http://www.w3.org/TR/CSS21/generate.html#propdef-list-style-image">
|
|
list-style-image </a></dfn> <a href="#CSS21"
|
|
rel=biblioentry>[CSS21]<!--{{!CSS21}}--></a>
|
|
</ul>
|
|
|
|
<p>The following definitions are provided by other modules or
|
|
specifications:
|
|
|
|
<ul>
|
|
<li> <dfn id=cascade-def> <a
|
|
href="http://www.w3.org/TR/CSS21/cascade.html#cascade"> cascade
|
|
</a></dfn> <a href="#CSS21"
|
|
rel=biblioentry>[CSS21]<!--{{!CSS21}}--></a>
|
|
|
|
<li> <dfn id=box-model-def> <a href="http://www.w3.org/TR/CSS21/box.html">
|
|
visual box model </a></dfn> <a href="#CSS21"
|
|
rel=biblioentry>[CSS21]<!--{{!CSS21}}--></a>
|
|
|
|
<li> <dfn id=time-def> <a href="http://www.w3.org/TR/css3-values/#times">
|
|
time </a></dfn> <a href="#CSS3VAL"
|
|
rel=biblioentry>[CSS3VAL]<!--{{!CSS3VAL}}--></a>
|
|
|
|
<li> <dfn id=frequency-def> <a
|
|
href="http://www.w3.org/TR/css3-values/#frequencies"> frequency
|
|
</a></dfn> <a href="#CSS3VAL"
|
|
rel=biblioentry>[CSS3VAL]<!--{{!CSS3VAL}}--></a>
|
|
|
|
<li> <dfn id=number-def> <a
|
|
href="http://www.w3.org/TR/css3-values/#ltnumbergt"> number </a></dfn>
|
|
<a href="#CSS3VAL"
|
|
rel=biblioentry>[CSS3VAL]<!--{{!CSS3VAL}}--></a>
|
|
|
|
<li> <dfn id=integer-def> <a
|
|
href="http://www.w3.org/TR/css3-values/#ltintegergt"> integer </a></dfn>
|
|
<a href="#CSS3VAL"
|
|
rel=biblioentry>[CSS3VAL]<!--{{!CSS3VAL}}--></a>
|
|
|
|
<li> <dfn id=non-negative-number-def> <a
|
|
href="http://www.w3.org/TR/css3-values/#non-negative">
|
|
non-negative-number </a></dfn> <a href="#CSS3VAL"
|
|
rel=biblioentry>[CSS3VAL]<!--{{!CSS3VAL}}--></a>
|
|
|
|
<li> <dfn id=percentage-def> <a
|
|
href="http://www.w3.org/TR/css3-values/#percentages"> percentage
|
|
</a></dfn> <a href="#CSS3VAL"
|
|
rel=biblioentry>[CSS3VAL]<!--{{!CSS3VAL}}--></a>
|
|
|
|
<li> <dfn id=identifier-def> <a
|
|
href="http://www.w3.org/TR/CSS21/syndata.html#value-def-identifier">
|
|
identifier </a></dfn> <a href="#CSS21"
|
|
rel=biblioentry>[CSS21]<!--{{!CSS21}}--></a>
|
|
|
|
<li> <dfn id=strings-def> <a
|
|
href="http://www.w3.org/TR/CSS21/syndata.html#strings"> strings
|
|
</a></dfn> <a href="#CSS21"
|
|
rel=biblioentry>[CSS21]<!--{{!CSS21}}--></a>
|
|
</ul>
|
|
|
|
<h2 class=no-num id=index>Appendix B — Index</h2>
|
|
<!--begin-index-->
|
|
|
|
<ul class=indexlist>
|
|
<li>aural "box" model, <a href="#aural-box-model" title="aural
|
|
"box" model"><strong>4.</strong></a>
|
|
|
|
<li>authoring tool, <a href="#authoring-tool" title="authoring
|
|
tool"><strong>#</strong></a>
|
|
|
|
<li>border, <a href="#border-def" title=border><strong>#</strong></a>
|
|
|
|
<li>cascade, <a href="#cascade-def" title=cascade><strong>#</strong></a>
|
|
|
|
<li>content, <a href="#content-def" title=content><strong>#</strong></a>
|
|
|
|
<li>cue, <a href="#cue" title=cue><strong>9.2.</strong></a>
|
|
|
|
<li>cue-after, <a href="#cue-after"
|
|
title=cue-after><strong>9.1.</strong></a>
|
|
|
|
<li>cue-before, <a href="#cue-before"
|
|
title=cue-before><strong>9.1.</strong></a>
|
|
|
|
<li>display, <a href="#display-def" title=display><strong>#</strong></a>
|
|
|
|
<li>document, <a href="#document" title=document><strong>#</strong></a>
|
|
|
|
<li>documents, <a href="#document" title=documents><strong>#</strong></a>
|
|
|
|
<li>font-family, <a href="#font-family-def"
|
|
title=font-family><strong>#</strong></a>
|
|
|
|
<li>frequency, <a href="#frequency-def"
|
|
title=frequency><strong>#</strong></a>
|
|
|
|
<li>identifier, <a href="#identifier-def"
|
|
title=identifier><strong>#</strong></a>
|
|
|
|
<li>integer, <a href="#integer-def" title=integer><strong>#</strong></a>
|
|
|
|
<li>list-style-image, <a href="#list-style-image-def"
|
|
title=list-style-image><strong>#</strong></a>
|
|
|
|
<li>list-style-type, <a href="#list-style-type-def"
|
|
title=list-style-type><strong>#</strong></a>
|
|
|
|
<li>margin, <a href="#margin-def" title=margin><strong>#</strong></a>
|
|
|
|
<li>non-negative-number, <a href="#non-negative-number-def"
|
|
title=non-negative-number><strong>#</strong></a>
|
|
|
|
<li>number, <a href="#number-def" title=number><strong>#</strong></a>
|
|
|
|
<li>padding, <a href="#padding-def" title=padding><strong>#</strong></a>
|
|
|
|
<li>pause, <a href="#pause" title=pause><strong>7.2.</strong></a>
|
|
|
|
<li>pause-after, <a href="#pause-after"
|
|
title=pause-after><strong>7.1.</strong></a>
|
|
|
|
<li>pause-before, <a href="#pause-before"
|
|
title=pause-before><strong>7.1.</strong></a>
|
|
|
|
<li>percentage, <a href="#percentage-def"
|
|
title=percentage><strong>#</strong></a>
|
|
|
|
<li>renderer, <a href="#renderer" title=renderer><strong>#</strong></a>
|
|
|
|
<li>rest, <a href="#rest" title=rest><strong>8.2.</strong></a>
|
|
|
|
<li>rest-after, <a href="#rest-after"
|
|
title=rest-after><strong>8.1.</strong></a>
|
|
|
|
<li>rest-before, <a href="#rest-before"
|
|
title=rest-before><strong>8.1.</strong></a>
|
|
|
|
<li>speak, <a href="#speak" title=speak><strong>6.1.</strong></a>
|
|
|
|
<li>speak-as, <a href="#speak-as" title=speak-as><strong>6.2.</strong></a>
|
|
|
|
|
|
<li>strings, <a href="#strings-def" title=strings><strong>#</strong></a>
|
|
|
|
<li>style sheet, <a href="#style-sheet" title="style
|
|
sheet"><strong>#</strong></a>
|
|
<ul>
|
|
<li>as conformance class, <a href="#style-sheet0" title="style sheet, as
|
|
conformance class"><strong>#</strong></a>
|
|
</ul>
|
|
|
|
<li>time, <a href="#time-def" title=time><strong>#</strong></a>
|
|
|
|
<li>UA, <a href="#ua" title=UA><strong>#</strong></a>
|
|
|
|
<li>User Agent, <a href="#user-agent" title="User
|
|
Agent"><strong>#</strong></a>
|
|
|
|
<li>visual box model, <a href="#box-model-def" title="visual box
|
|
model"><strong>#</strong></a>
|
|
|
|
<li>voice-balance, <a href="#voice-balance"
|
|
title=voice-balance><strong>5.2.</strong></a>
|
|
|
|
<li>voice-duration, <a href="#voice-duration"
|
|
title=voice-duration><strong>11.1.</strong></a>
|
|
|
|
<li>voice-family, <a href="#voice-family"
|
|
title=voice-family><strong>10.1.</strong></a>
|
|
|
|
<li>voice-pitch, <a href="#voice-pitch"
|
|
title=voice-pitch><strong>10.3.</strong></a>
|
|
|
|
<li>voice-range, <a href="#voice-range"
|
|
title=voice-range><strong>10.4.</strong></a>
|
|
|
|
<li>voice-rate, <a href="#voice-rate"
|
|
title=voice-rate><strong>10.2.</strong></a>
|
|
|
|
<li>voice-stress, <a href="#voice-stress"
|
|
title=voice-stress><strong>10.5.</strong></a>
|
|
|
|
<li>voice-volume, <a href="#voice-volume"
|
|
title=voice-volume><strong>5.1.</strong></a>
|
|
</ul>
|
|
<!--end-index-->
|
|
|
|
<h2 class=no-num id=definitions>Appendix C — Definitions</h2>
|
|
|
|
<h3 class=no-num id=glossary>Glossary</h3>
|
|
|
|
<p>The following terms and abbreviations are used in this module.
|
|
|
|
<dl>
|
|
<dt> <dfn id=ua>UA</dfn>
|
|
|
|
<dt> <dfn id=user-agent>User Agent</dfn>
|
|
|
|
<dd>
|
|
<p>A program that reads and/or writes CSS style sheets on behalf of a
|
|
user in either or both of these categories: programs whose purpose is to
|
|
render <a href="#document">documents</a> (e.g., browsers) and programs
|
|
whose purpose is to create style sheets (e.g., editors). A UA may fall
|
|
into both categories. (There are other programs that read or write style
|
|
sheets, but this module gives no rules for them.)</p>
|
|
|
|
<dt> <dfn id=document title="document|documents">document</dfn>
|
|
|
|
<dd>
|
|
<p>A tree-structured document with elements and attributes, such as an
|
|
SGML or XML document <a href="#XML11"
|
|
rel=biblioentry>[XML11]<!--{{!XML11}}--></a>.</p>
|
|
|
|
<dt> <dfn id=style-sheet>style sheet</dfn>
|
|
|
|
<dd>
|
|
<p>A <a href="http://www.w3.org/TR/CSS21/conform.html#style-sheet">CSS
|
|
style sheet</a></p>
|
|
</dl>
|
|
|
|
<h3 class=no-num id=conformance>Conformance</h3>
|
|
|
|
<p>Conformance requirements are expressed with a combination of descriptive
|
|
assertions and RFC 2119 terminology. The key words "MUST", "MUST NOT",
|
|
"REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED",
|
|
"MAY", and "OPTIONAL" in the normative parts of this document are to be
|
|
interpreted as described in RFC 2119. However, for readability, these
|
|
words do not appear in all uppercase letters in this specification. All of
|
|
the text of this specification is normative except sections explicitly
|
|
marked as non-normative, examples, and notes. <a href="#RFC2119"
|
|
rel=biblioentry>[RFC2119]<!--{{!RFC2119}}--></a>
|
|
|
|
<p>Examples in this specification are introduced with the words "for
|
|
example" or are set apart from the normative text with
|
|
<code>class="example"</code>, like this:
|
|
|
|
<div class=example>
|
|
<p>This is an example of an informative example.</p>
|
|
</div>
|
|
|
|
<p>Informative notes begin with the word "Note" and are set apart from the
|
|
normative text with <code>class="note"</code>, like this:
|
|
|
|
<p class=note>Note, this is an informative note.
|
|
|
|
<p>Conformance to the CSS3 Speech module is defined for three classes:
|
|
|
|
<dl>
|
|
<dt> <dfn id=style-sheet0 title="style sheet!!as conformance class">style
|
|
sheet</dfn>
|
|
|
|
<dd>A <a href="http://www.w3.org/TR/CSS21/conform.html#style-sheet">CSS
|
|
style sheet</a>.
|
|
|
|
<dt> <dfn id=renderer>renderer</dfn>
|
|
|
|
<dd>A UA that interprets the semantics of a style sheet and renders <a
|
|
href="#document">documents</a> that use them.
|
|
|
|
<dt> <dfn id=authoring-tool>authoring tool</dfn>
|
|
|
|
<dd>A UA that writes a style sheet.
|
|
</dl>
|
|
|
|
<p>A style sheet is conformant to the CSS3 Speech module if all of its
|
|
declarations that use properties defined in this module have values that
|
|
are valid according to the generic CSS grammar and the individual grammars
|
|
of each property as given in this module.
|
|
|
|
<p>A renderer is conformant to the CSS3 Speech module if, in addition to
|
|
interpreting the style sheet as defined by the appropriate specifications,
|
|
it supports all the properties defined by CSS3 Speech module by parsing
|
|
them correctly and rendering the document accordingly. However the
|
|
inability of a UA to correctly render a document due to limitations of the
|
|
device does not make the UA non-conformant. (For example, a UA is not
|
|
required to render color on a monochrome monitor.)
|
|
|
|
<p>An authoring tool is conformant to CSS3 Speech module if it writes
|
|
syntactically correct style sheets, according to the generic CSS grammar
|
|
and the individual grammars of each property in this module.</p>
|
|
<!-- h3 class="no-num" id="levels">Levels</h3>
|
|
|
|
<p><em>This section is informative.</em> CSS has different levels of
|
|
features, each a subset of the other. (See [[CSSBEIJING]] for a full
|
|
explanation.) The lists below describe which features from this
|
|
specification are in each level.
|
|
|
|
<h4 class="no-num" id="level-1">CSS Level 1</h4>
|
|
<ul>
|
|
<li>'background-color'
|
|
<li>'background-image' only one image (no layers)
|
|
<li>'background-repeat': only 'repeat' | 'repeat-x' | 'repeat-y' | 'no-repeat'
|
|
<li>'background-attachment': only 'scroll' | 'fixed'
|
|
<li>'background-position': only one or two values allowed
|
|
<li>'background' shorthand: only color, image, repeat, attachment and position
|
|
<li>'border-color' properties
|
|
<li>'border-style' properties
|
|
<li>'border-width' properties
|
|
<li>'border-top', 'border-bottom', 'border-right', 'border-left', and 'border' shorthands
|
|
</ul>
|
|
|
|
<h4 class="no-num" id="level-2">CSS Level 2</h4>
|
|
<ul>
|
|
<li>'background-color'
|
|
<li>'background-image': only one image (no layers)
|
|
<li>'background-repeat': only 'repeat' | 'repeat-x' | 'repeat-y' | 'no-repeat'
|
|
<li>'background-attachment': only 'scroll' | 'fixed'
|
|
<li>'background-position': only one or two values allowed
|
|
<li>'background': only color, image, repeat, attachment and position
|
|
<li>'border-color' properties
|
|
<li>'border-style' properties
|
|
<li>'border-width' properties
|
|
<li>'border-top', 'border-bottom', 'border-right', 'border-left', and 'border' shorthands
|
|
</ul>
|
|
|
|
<h4 class="no-num" id="level-3">CSS Level 3</h4>
|
|
<ul>
|
|
<li>All features described in the CSS3 Speech module
|
|
</ul -->
|
|
|
|
<h3 class=no-num id=exit>CR exit criteria</h3>
|
|
|
|
<p>As described in the W3C process document, a <a
|
|
href="http://www.w3.org/2005/10/Process-20051014/tr.html#cfi">Candidate
|
|
Recommendation</a> (CR) is a specification that W3C recommends for use on
|
|
the Web. The next stage is "Recommendation" when the specification is
|
|
sufficiently implemented.
|
|
|
|
<p>For this specification to be proposed as a W3C Recommendation, the
|
|
following conditions shall be met. There must be at least two independent,
|
|
interoperable implementations of each feature. Each feature may be
|
|
implemented by a different set of products, there is no requirement that
|
|
all features be implemented by a single product. For the purposes of this
|
|
criterion, we define the following terms:
|
|
|
|
<dl>
|
|
<dt>independent
|
|
|
|
<dd>each implementation must be developed by a different party and cannot
|
|
share, reuse, or derive from code used by another qualifying
|
|
implementation. Sections of code that have no bearing on the
|
|
implementation of this specification are exempt from this requirement.
|
|
|
|
<dt>interoperable
|
|
|
|
<dd>passing the respective test case(s) in the official CSS test suite,
|
|
or, if the implementation is not a Web browser, an equivalent test. Every
|
|
relevant test in the test suite should have an equivalent test created if
|
|
such a user agent (UA) is to be used to claim interoperability. In
|
|
addition if such a UA is to be used to claim interoperability, then there
|
|
must one or more additional UAs which can also pass those equivalent
|
|
tests in the same way for the purpose of interoperability. The equivalent
|
|
tests must be made publicly available for the purposes of peer review.
|
|
|
|
<dt>implementation
|
|
|
|
<dd>a user agent which:
|
|
<ol class=inline>
|
|
<li>implements the specification.
|
|
|
|
<li>is available to the general public. The implementation may be a
|
|
shipping product or other publicly available version (i.e., beta
|
|
version, preview release, or "nightly build"). Non-shipping product
|
|
releases must have implemented the feature(s) for a period of at least
|
|
one month in order to demonstrate stability.
|
|
|
|
<li>is not experimental (i.e., a version specifically designed to pass
|
|
the test suite and is not intended for normal usage going forward).
|
|
</ol>
|
|
</dl>
|
|
|
|
<p>A minimum of sixth months of the CR period must have elapsed. This is to
|
|
ensure that enough time is given for any remaining major errors to be
|
|
caught.
|
|
|
|
<p>Features will be dropped if two or more interoperable implementations
|
|
are not found by the end of the CR period.
|
|
|
|
<p>Features may/will also be dropped if adequate/sufficient (by judgment of
|
|
CSS WG) tests have not been produced for those feature(s) by the end of
|
|
the CR period.
|
|
|
|
<h2 class=no-num id=ack>Appendix D — Acknowledgements</h2>
|
|
|
|
<p>The editors would like to thank the members of the W3C Voice Browser and
|
|
Cascading Style Sheets working groups for their assistance in preparing
|
|
this specification. Special thanks to Ellen Eide (IBM) for her detailed
|
|
comments, and to Elika Etemad (Fantasai) for her thorough reviews.
|
|
|
|
<h2 class=no-num id=changes>Appendix E — Changes from previous draft</h2>
|
|
|
|
<p> Note that the <a
|
|
href="http://www.w3.org/TR/2011/WD-css3-speech-20110419">previous Working
|
|
Draft</a> includes <a
|
|
href="http://www.w3.org/TR/2011/WD-css3-speech-20110419#changes">its own
|
|
list of changes</a>, which - for succinctness - is not repeated here.
|
|
|
|
<ul>
|
|
<li>Renamed ‘<code class=property>voice-pitch-range</code>’ to
|
|
‘<a href="#voice-range"><code
|
|
class=property>voice-range</code></a>’, which is compatible with
|
|
SSML's notation, and removes the possibility to interpret this property
|
|
as being a subset of ‘<a href="#voice-pitch"><code
|
|
class=property>voice-pitch</code></a>’.
|
|
|
|
<li>Fixed "computed value" for ‘<a href="#voice-pitch"><code
|
|
class=property>voice-pitch</code></a>’ and ‘<a
|
|
href="#voice-range"><code class=property>voice-range</code></a>’
|
|
properties, and added the possibility to combine a keyword with a
|
|
relative change.
|
|
|
|
<li>Removed the "phonemes" property (and its associated "@alphabet"
|
|
at-rule).
|
|
|
|
<li>Renamed ‘<code class=property>speakability</code>’ to
|
|
‘<a href="#speak"><code class=property>speak</code></a>’, and
|
|
‘<a href="#speak"><code class=property>speak</code></a>’ to
|
|
‘<a href="#speak-as"><code
|
|
class=property>speak-as</code></a>’. Reorganized the ‘<a
|
|
href="#speak-as"><code class=property>speak-as</code></a>’ values
|
|
to allow mixing different types.
|
|
|
|
<li>Added support for lists and counters (item styles, numbering, etc.).
|
|
|
|
<li>Adjusted the [initial] value for shorthand properties, to be
|
|
consistent with other CSS specifications (i.e. "see individual
|
|
properties"), and removed the erroneous "inherit" value.
|
|
|
|
<li>Fixed ‘<a href="#voice-volume"><code
|
|
class=property>voice-volume</code></a>’ by conforming to SSML 1.1
|
|
(dB scale, etc.).
|
|
|
|
<li>Fixed the [initial] values for ‘<a href="#pause"><code
|
|
class=property>pause</code></a>’ and ‘<a href="#rest"><code
|
|
class=property>rest</code></a>’, which should be zero (were
|
|
"implementation-dependent").
|
|
|
|
<li>Corrected the [initial] values for ‘<a href="#voice-range"><code
|
|
class=property>voice-range</code></a>’ and ‘<a
|
|
href="#voice-pitch"><code class=property>voice-pitch</code></a>’ to
|
|
"medium".
|
|
|
|
<li>Added an "auto" value to ‘<a href="#voice-duration"><code
|
|
class=property>voice-duration</code></a>’, which is the [initial]
|
|
property value as well.
|
|
|
|
<li>Handling of ‘<a href="#voice-balance"><code
|
|
class=property>voice-balance</code></a>’ values outside of the
|
|
allowed range (clamping).
|
|
|
|
<li>Fixed ‘<a href="#voice-balance"><code
|
|
class=property>voice-balance</code></a>’ prose to better explain
|
|
the relationship between author intent (stereo sound distribution) and
|
|
actual user sound system setup (mono, stereo, or surround speaker layout
|
|
/ mixing capabilities).
|
|
|
|
<li>Added prose for ‘<a href="#voice-balance"><code
|
|
class=property>voice-balance</code></a>’ to describe the mapping
|
|
between stereo left-right sound axis and three-dimensional sound stage
|
|
(azimuth support in future versions of CSS-Speech).
|
|
|
|
<li>Fixed the "computed value" for ‘<a href="#voice-balance"><code
|
|
class=property>voice-balance</code></a>’.
|
|
|
|
<li>Added the ‘<code class=property>normal</code>’ value for
|
|
voice-rate ("default" in SSML 1.1).
|
|
|
|
<li>Fixed the "computed value" for voice-rate, and added the possibility
|
|
to combine keywords and percentages (to be consistent with ‘<a
|
|
href="#voice-volume"><code
|
|
class=property>voice-volume</code></a>’). Added an example to
|
|
illustrate inheritance and value resolution.
|
|
|
|
<li>Renamed voice-family fields to be consistent with SSML.
|
|
|
|
<li>Improved the ‘<a href="#voice-family"><code
|
|
class=property>voice-family</code></a>’ selection algorithm to
|
|
cater for language changes.
|
|
|
|
<li>Separated definition of semitones (pitch properties).
|
|
|
|
<li>More consistent behavior when audio cue URI fails (for whatever
|
|
reason).
|
|
|
|
<li>Enabled voice-family names to contain spaces, matching ‘<code
|
|
class=property>font-family</code>’ syntax which is based on quoted
|
|
strings and concatenated identifiers.
|
|
|
|
<li>Added a new section to define the relationship of this specification
|
|
with CSS2.1.
|
|
|
|
<li>Added the missing "Computed value" line to each property definition.
|
|
|
|
<li>Cleaned-up the list of module dependencies, and removed redundant
|
|
"module dependencies" section.
|
|
|
|
<li> Voice age keywords now mapped to SSML ages.
|
|
|
|
<li>Improved the pause collapsing prose, removed redundant paragraphs.
|
|
|
|
<li>Added the missing ‘<code class=property>normal</code>’
|
|
value for ‘<a href="#voice-stress"><code
|
|
class=property>voice-stress</code></a>’.
|
|
|
|
<li>Separated the ‘<code class=property>absolute</code>’
|
|
keyword for ‘<a href="#voice-pitch"><code
|
|
class=property>voice-pitch</code></a>’ and ‘<a
|
|
href="#voice-range"><code class=property>voice-range</code></a>’.
|
|
|
|
<li>Improved document structure by adding sub-sections.
|
|
|
|
<li>Removed the implicit ‘<code class=property>inherit</code>’
|
|
value for all properties.
|
|
|
|
<li>Fixed typos and made other minor edits.
|
|
</ul>
|
|
<!-- For reference only, changes in previous draft: -->
|
|
<!-- ul>
|
|
<li>Removed the "mark" property, see the <a href="http://lists.w3.org/Archives/Public/www-style/2011Feb/0029.html">Working Group resolution</a></li>
|
|
<li>Added the 'speakability' property and removed the 'none' value of the 'speak' property,
|
|
as per this <a href="http://lists.w3.org/Archives/Public/www-style/2011Jan/0483.html">discussion</a></li>
|
|
<li>Fixed 'voice-family' grammar as per <a href="http://lists.w3.org/Archives/Public/www-style/2010Dec/0231.html">this discussion</a></li>
|
|
<li>The volume level of audio cues can only be set relatively to the inherited 'voice-volume' property (to avoid cues being spoken when the main element is silent, which contradicts the "aural box model").</li>
|
|
<li>Added "HTML" to "CSS defines aural properties that give control over rendering
|
|
XML to speech" in the abstract.</li>
|
|
<li>Removed unused normative links to CSS3 modules (actually moved to informative references), now the only dependency is CSS3 Values and Units.</li>
|
|
<li>Removed issue about the 'sub' SSML element given that the CSS "content" replacement functionality addresses the same requirement.</li>
|
|
<li>Added support for semitones in pitch alterations.</li>
|
|
<li>Added reference to "time" values syntax (s, ms) for 'voice-duration'.</li>
|
|
<li>Moved "content" outside of "phonetics", as the ::before and ::after use-cases do not relate to pronunciation rules (this is actually more similar to audio cues, only applied with text rather than audio files)</li>
|
|
<li>Added prose to explicitly support alphabet other than IPA, via the "x-" vendor-specific prefix.</li>
|
|
<li>Reworked HTML source code to work with the <a href="http://cgi.w3.org/member-bin/process.cgi">members-only W3C pre-processor/generator</a></li>
|
|
<li>Added note about the "speech" and "aural" media types.</li>
|
|
<li>Harmonized all hyperlinks so that CSS properties get auto-linked by the pre-processor</li>
|
|
<li>Clarified computation rules for positive percentages with "+" prefixes (i.e. they do not denote increments, the regular multiplicative behavior is used).</li>
|
|
<li>Fixed IPA URL reference</li>
|
|
<li>Reorganized appendixes</li>
|
|
<li>Fixed minor typos</li>
|
|
</ul -->
|
|
|
|
<h2 class=no-num id=references>Appendix F — References</h2>
|
|
|
|
<h3 class=no-num id=normative-references>Normative references</h3>
|
|
<!--begin-normative-->
|
|
<!-- Sorted by label -->
|
|
|
|
<dl class=bibliography>
|
|
<dt style="display: none"><!-- keeps the doc valid if the DL is empty -->
|
|
<!---->
|
|
|
|
<dt id=CSS21>[CSS21]
|
|
|
|
<dd>Bert Bos; et al. <a
|
|
href="http://www.w3.org/TR/2011/REC-CSS2-20110607"><cite>Cascading Style
|
|
Sheets Level 2 Revision 1 (CSS 2.1) Specification.</cite></a> 7 June
|
|
2011. W3C Recommendation. URL: <a
|
|
href="http://www.w3.org/TR/2011/REC-CSS2-20110607">http://www.w3.org/TR/2011/REC-CSS2-20110607</a>
|
|
</dd>
|
|
<!---->
|
|
|
|
<dt id=CSS3VAL>[CSS3VAL]
|
|
|
|
<dd>Håkon Wium Lie; Chris Lilley. <a
|
|
href="http://www.w3.org/TR/2006/WD-css3-values-20060919"><cite>CSS3
|
|
Values and Units.</cite></a> 19 September 2006. W3C Working Draft. (Work
|
|
in progress.) URL: <a
|
|
href="http://www.w3.org/TR/2006/WD-css3-values-20060919">http://www.w3.org/TR/2006/WD-css3-values-20060919</a>
|
|
</dd>
|
|
<!---->
|
|
|
|
<dt id=RFC2119>[RFC2119]
|
|
|
|
<dd>S. Bradner. <a href="http://www.ietf.org/rfc/rfc2119.txt"><cite>Key
|
|
words for use in RFCs to Indicate Requirement Levels.</cite></a> Internet
|
|
RFC 2119. URL: <a
|
|
href="http://www.ietf.org/rfc/rfc2119.txt">http://www.ietf.org/rfc/rfc2119.txt</a>
|
|
</dd>
|
|
<!---->
|
|
|
|
<dt id=SSML>[SSML]
|
|
|
|
<dd>Daniel C. Burnett; 双志伟 (Zhi Wei Shuang). <a
|
|
href="http://www.w3.org/TR/2010/REC-speech-synthesis11-20100907/"><cite>Speech
|
|
Synthesis Markup Language (SSML) Version 1.1.</cite></a> 7 September
|
|
2010. W3C Recommendation. URL: <a
|
|
href="http://www.w3.org/TR/2010/REC-speech-synthesis11-20100907/">http://www.w3.org/TR/2010/REC-speech-synthesis11-20100907/</a>
|
|
</dd>
|
|
<!---->
|
|
|
|
<dt id=XML11>[XML11]
|
|
|
|
<dd>Eve Maler; et al. <a
|
|
href="http://www.w3.org/TR/2006/REC-xml11-20060816"><cite>Extensible
|
|
Markup Language (XML) 1.1 (Second Edition).</cite></a> 16 August 2006.
|
|
W3C Recommendation. URL: <a
|
|
href="http://www.w3.org/TR/2006/REC-xml11-20060816">http://www.w3.org/TR/2006/REC-xml11-20060816</a>
|
|
</dd>
|
|
<!---->
|
|
</dl>
|
|
<!--end-normative-->
|
|
|
|
<h3 class=no-num id=other-references>Other references</h3>
|
|
<!--begin-informative-->
|
|
<!-- Sorted by label -->
|
|
|
|
<dl class=bibliography>
|
|
<dt style="display: none"><!-- keeps the doc valid if the DL is empty -->
|
|
<!---->
|
|
|
|
<dt id=CSS3GENCON>[CSS3GENCON]
|
|
|
|
<dd>Ian Hickson. <a
|
|
href="http://www.w3.org/TR/2003/WD-css3-content-20030514"><cite>CSS3
|
|
Generated and Replaced Content Module.</cite></a> 14 May 2003. W3C
|
|
Working Draft. (Work in progress.) URL: <a
|
|
href="http://www.w3.org/TR/2003/WD-css3-content-20030514">http://www.w3.org/TR/2003/WD-css3-content-20030514</a>
|
|
</dd>
|
|
<!---->
|
|
|
|
<dt id=CSS3LIST>[CSS3LIST]
|
|
|
|
<dd>Tab Atkins Jr. <a
|
|
href="http://www.w3.org/TR/2011/WD-css3-lists-20110524"><cite>CSS Lists
|
|
and Counters Module Level 3.</cite></a> 24 May 2011. W3C Working Draft.
|
|
(Work in progress.) URL: <a
|
|
href="http://www.w3.org/TR/2011/WD-css3-lists-20110524">http://www.w3.org/TR/2011/WD-css3-lists-20110524</a>
|
|
</dd>
|
|
<!---->
|
|
|
|
<dt id=PRONUNCIATION-LEXICON>[PRONUNCIATION-LEXICON]
|
|
|
|
<dd>Paolo Baggia. <a
|
|
href="http://www.w3.org/TR/2008/REC-pronunciation-lexicon-20081014/"><cite>Pronunciation
|
|
Lexicon Specification (PLS) Version 1.0.</cite></a> 14 October 2008. W3C
|
|
Recommendation. URL: <a
|
|
href="http://www.w3.org/TR/2008/REC-pronunciation-lexicon-20081014/">http://www.w3.org/TR/2008/REC-pronunciation-lexicon-20081014/</a>
|
|
</dd>
|
|
<!---->
|
|
|
|
<dt id=SSML-SAYAS>[SSML-SAYAS]
|
|
|
|
<dd>Daniel C. Burnett; et al. <a
|
|
href="http://www.w3.org/TR/2005/NOTE-ssml-sayas-20050526"><cite>SSML 1.0
|
|
say-as attribute values.</cite></a> 26 May 2005. W3C Working Group Note.
|
|
URL: <a
|
|
href="http://www.w3.org/TR/2005/NOTE-ssml-sayas-20050526">http://www.w3.org/TR/2005/NOTE-ssml-sayas-20050526</a>
|
|
</dd>
|
|
<!---->
|
|
</dl>
|
|
<!--end-informative-->
|
|
</html>
|
|
<!-- Keep this comment at the end of the file
|
|
Local variables:
|
|
mode: sgml
|
|
sgml-default-doctype-name:"html"
|
|
sgml-minimize-attributes:t
|
|
End:
|
|
-->
|