You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
3026 lines
114 KiB
3026 lines
114 KiB
<?xml version="1.0" encoding="UTF-8"?>
|
|
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
|
|
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
|
|
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-us" lang="en-us">
|
|
<head>
|
|
<meta http-equiv="content-type" content="text/html; charset=UTF-8" />
|
|
<title>Use cases and requirements for Media Fragments</title>
|
|
<style type="text/css">
|
|
/**/
|
|
code { font-family: monospace; }
|
|
|
|
div.constraint,
|
|
div.issue,
|
|
div.note,
|
|
div.notice { margin-left: 2em; }
|
|
|
|
ol.enumar { list-style-type: decimal; }
|
|
ol.enumla { list-style-type: lower-alpha; }
|
|
ol.enumlr { list-style-type: lower-roman; }
|
|
ol.enumua { list-style-type: upper-alpha; }
|
|
ol.enumur { list-style-type: upper-roman; }
|
|
|
|
dt.label { display: run-in; }
|
|
|
|
li, p { margin-top: 0.3em;
|
|
margin-bottom: 0.3em; }
|
|
|
|
.diff-chg { background-color: yellow; }
|
|
.diff-del { background-color: red; text-decoration: line-through;}
|
|
.diff-add { background-color: lime; }
|
|
|
|
table { empty-cells: show; }
|
|
|
|
table caption {
|
|
font-weight: normal;
|
|
font-style: italic;
|
|
text-align: left;
|
|
margin-bottom: .5em;
|
|
}
|
|
|
|
div.issue {
|
|
color: red;
|
|
}
|
|
.rfc2119 {
|
|
font-variant: small-caps;
|
|
}
|
|
|
|
div.exampleInner pre { margin-left: 1em;
|
|
margin-top: 0em; margin-bottom: 0em}
|
|
div.exampleOuter {border: 4px double gray;
|
|
margin: 0em; padding: 0em}
|
|
div.exampleInner { background-color: #d5dee3;
|
|
border-top-width: 4px;
|
|
border-top-style: double;
|
|
border-top-color: #d3d3d3;
|
|
border-bottom-width: 4px;
|
|
border-bottom-style: double;
|
|
border-bottom-color: #d3d3d3;
|
|
padding: 4px; margin: 0em }
|
|
div.exampleWrapper { margin: 4px }
|
|
div.exampleHeader { font-weight: bold;
|
|
margin: 4px}
|
|
|
|
div.boxedtext {
|
|
border: solid #bebebe 1px;
|
|
margin: 2em 1em 1em 2em;
|
|
}
|
|
|
|
span.practicelab {
|
|
margin: 1.5em 0.5em 1em 1em;
|
|
font-weight: bold;
|
|
font-style: italic;
|
|
}
|
|
|
|
span.practicelab { background: #dfffff; }
|
|
|
|
span.practicelab {
|
|
position: relative;
|
|
padding: 0 0.5em;
|
|
top: -1.5em;
|
|
}
|
|
p.practice
|
|
{
|
|
margin: 1.5em 0.5em 1em 1em;
|
|
}
|
|
|
|
@media screen {
|
|
p.practice {
|
|
position: relative;
|
|
top: -2em;
|
|
padding: 0;
|
|
margin: 1.5em 0.5em -1em 1em;
|
|
}
|
|
}
|
|
/**/ </style>
|
|
<link type="text/css" rel="stylesheet"
|
|
href="http://www.w3.org/StyleSheets/TR/W3C-WD.css" />
|
|
</head>
|
|
|
|
<body>
|
|
|
|
<div class="head">
|
|
<p><a href="http://www.w3.org/"><img width="72" height="48" alt="W3C"
|
|
src="http://www.w3.org/Icons/w3c_home" /></a></p>
|
|
|
|
<h1><a id="title" name="title"></a>Use cases and requirements for Media
|
|
Fragments</h1>
|
|
|
|
<h2><a id="w3c-doctype" name="w3c-doctype"></a>W3C Working Draft 17 December
|
|
2009</h2>
|
|
<dl>
|
|
<dt>This version:</dt>
|
|
<dd><a
|
|
href="http://www.w3.org/TR/2009/WD-media-frags-reqs-20091217">http://www.w3.org/TR/2009/WD-media-frags-reqs-20091217</a>
|
|
</dd>
|
|
<dt>Latest version:</dt>
|
|
<dd><a
|
|
href="http://www.w3.org/TR/media-frags-reqs">http://www.w3.org/TR/media-frags-reqs</a>
|
|
</dd>
|
|
<dt>Previous version:</dt>
|
|
<dd><a
|
|
href="http://www.w3.org/TR/2009/WD-media-frags-reqs-20090430">http://www.w3.org/TR/2009/WD-media-frags-reqs-20090430</a>
|
|
</dd>
|
|
<dt>Editors:</dt>
|
|
<dd><a href="http://www.eurecom.fr/~troncy/">Raphaël Troncy </a>, Center
|
|
for Mathematics and Computer Science (CWI Amsterdam)</dd>
|
|
<dd><a href="mailto:erik.mannens@ugent.be">Erik Mannens </a>, IBBT
|
|
Multimedia Lab, University of Ghent</dd>
|
|
<dt>Contributors:</dt>
|
|
<dd><a href="http://www.cwi.nl/~jack/">Jack Jansen </a>, Center for
|
|
Mathematics and Computer Science (CWI Amsterdam)</dd>
|
|
<dd><a href="http://www.w3.org/People/Lafon/">Yves Lafon </a>, W3C</dd>
|
|
<dd><a href="http://blog.gingertech.net/">Silvia Pfeiffer </a>, W3C Invited
|
|
Expert</dd>
|
|
<dd><a href="mailto:davy.vandeursen@ugent.be">Davy van Deursen </a>, IBBT
|
|
Multimedia Lab, University of Ghent</dd>
|
|
</dl>
|
|
|
|
<p class="copyright"><a
|
|
href="http://www.w3.org/Consortium/Legal/ipr-notice#Copyright">Copyright</a> © 2009 <a
|
|
href="http://www.w3.org/"><acronym
|
|
title="World Wide Web Consortium">W3C</acronym></a><sup>®</sup> (<a
|
|
href="http://www.csail.mit.edu/"><acronym
|
|
title="Massachusetts Institute of Technology">MIT</acronym></a>, <a
|
|
href="http://www.ercim.org/"><acronym
|
|
title="European Research Consortium for Informatics and Mathematics">ERCIM</acronym></a>,
|
|
<a href="http://www.keio.ac.jp/">Keio</a>), All Rights Reserved. W3C <a
|
|
href="http://www.w3.org/Consortium/Legal/ipr-notice#Legal_Disclaimer">liability</a>,
|
|
<a
|
|
href="http://www.w3.org/Consortium/Legal/ipr-notice#W3C_Trademarks">trademark</a>
|
|
and <a href="http://www.w3.org/Consortium/Legal/copyright-documents">document
|
|
use</a> rules apply.</p>
|
|
</div>
|
|
<hr />
|
|
|
|
<div>
|
|
<h2><a id="abstract" name="abstract"></a>Abstract</h2>
|
|
|
|
<p>This document describes use cases and requirements for the development of
|
|
the Media Fragments 1.0 specification. It includes a technology survey for
|
|
addressing fragments of multimedia document. </p>
|
|
</div>
|
|
|
|
<div>
|
|
<h2><a id="status" name="status"></a>Status of this Document</h2>
|
|
|
|
<p><em>This section describes the status of this document at the time of its
|
|
publication. Other documents may supersede this document. A list of current W3C
|
|
publications and the latest revision of this technical report can be found in
|
|
the <a href="http://www.w3.org/TR/">W3C technical reports index</a> at
|
|
http://www.w3.org/TR/.</em></p>
|
|
|
|
<p>This is the <a
|
|
href="http://www.w3.org/2005/10/Process-20051014/tr.html#first-wd">First Public
|
|
Working Draft</a> of the Use cases and requirements for Media Fragments
|
|
specification. It has been produced by the <a
|
|
href="http://www.w3.org/2008/WebVideo/Fragments/">Media Fragments Working
|
|
Group</a>, which is part of the <a href="http://www.w3.org/2008/WebVideo/">W3C
|
|
Video on the Web Activity</a>.</p>
|
|
|
|
<p>A list of changes is available in <a href="#change-log"><b>E Change
|
|
Log</b></a>. </p>
|
|
|
|
<p>Please send comments about this document to <a
|
|
href="mailto:public-media-fragment@w3.org">public-media-fragment@w3.org</a>
|
|
mailing list (<a
|
|
href="http://lists.w3.org/Archives/Public/public-media-fragment/">public
|
|
archive</a>).</p>
|
|
|
|
<p>Publication as a Working Draft does not imply endorsement by the W3C
|
|
Membership. This is a draft document and may be updated, replaced or obsoleted
|
|
by other documents at any time. It is inappropriate to cite this document as
|
|
other than work in progress. </p>
|
|
|
|
<p></p>
|
|
|
|
<p>This document was produced by a group operating under the <a
|
|
href="http://www.w3.org/Consortium/Patent-Policy-20040205/">5 February 2004 W3C
|
|
Patent Policy</a>. W3C maintains a <a rel="disclosure"
|
|
href="http://www.w3.org/2004/01/pp-impl/42785/status">public list of any patent
|
|
disclosures</a> made in connection with the deliverables of the group; that
|
|
page also includes instructions for disclosing a patent. An individual who has
|
|
actual knowledge of a patent which the individual believes contains <a
|
|
href="http://www.w3.org/Consortium/Patent-Policy-20040205/#def-essential">Essential
|
|
Claim(s)</a> must disclose the information in accordance with <a
|
|
href="http://www.w3.org/Consortium/Patent-Policy-20040205/#sec-Disclosure">section
|
|
6 of the W3C Patent Policy</a>. </p>
|
|
</div>
|
|
|
|
<div class="toc">
|
|
<h2><a id="contents" name="contents"></a>Table of Contents</h2>
|
|
|
|
<p class="toc">1 <a href="#introduction">Introduction</a><br />
|
|
2 <a href="#terminology">Terminology</a><br />
|
|
3 <a href="#side-conditions">Side Conditions</a><br />
|
|
3.1 <a href="#side1">Single Media Resource Definition</a><br />
|
|
3.2 <a href="#side2">Existing Standards</a><br />
|
|
3.3 <a href="#side3">Unique Resource</a><br />
|
|
3.4 <a href="#side4">Valid Resource</a><br />
|
|
3.5 <a href="#side5">Parent Resource</a><br />
|
|
3.6 <a href="#side6">Single Fragment</a><br />
|
|
3.7 <a href="#side7">Relevant Protocols</a><br />
|
|
3.8 <a href="#side8">No Recompression</a><br />
|
|
3.9 <a href="#side9">Minimize Impact on Existing Infrastructure</a><br
|
|
/>
|
|
3.10 <a href="#side10">Focus for Changes</a><br />
|
|
3.11 <a href="#side11">Browser Impact</a><br />
|
|
3.12 <a href="#side12">Fallback Action</a><br />
|
|
4 <a href="#use-cases">Use Cases</a><br />
|
|
4.1 <a href="#uc1">Linking to and Display of Media Fragments</a><br />
|
|
4.1.1 <a href="#scenario1.1">Scenario 1: Retrieve only segment
|
|
of a video</a><br />
|
|
4.1.2 <a href="#scenario1.2">Scenario 2: Region of an
|
|
Image</a><br />
|
|
4.1.3 <a href="#scenario1.3">Scenario 3: Portion of
|
|
Music</a><br />
|
|
4.1.4 <a href="#scenario1.4">Scenario 4: Image Region of video
|
|
over time</a><br />
|
|
4.2 <a href="#uc2">Browsing and Bookmarking Media Fragments</a><br />
|
|
4.2.1 <a href="#scenario2.1">Scenario 1: Temporal Video
|
|
Pagination</a><br />
|
|
4.2.2 <a href="#scenario2.2">Scenario 2: Audio Passage
|
|
Bookmark</a><br />
|
|
4.2.3 <a href="#scenario2.3">Scenario 3: Audio
|
|
Navigation</a><br />
|
|
4.2.4 <a href="#scenario2.4">Scenario 4: Caption and chapter
|
|
tracks for browsing Video</a><br />
|
|
4.2.5 <a href="#scenario2.5">Scenario 5: Jumping back in time
|
|
during live streaming</a><br />
|
|
4.2.6 <a href="#scenario2.6">Scenario 6: Jumping to a
|
|
particular event in a live stream</a><br />
|
|
4.3 <a href="#uc3">Recompositing Media Fragments </a><br />
|
|
4.3.1 <a href="#scenario3.1">Scenario 1: Reframing a photo in a
|
|
slideshow</a><br />
|
|
4.3.2 <a href="#scenario3.2">Scenario 2: Mosaic</a><br />
|
|
4.3.3 <a href="#scenario3.3">Scenario 3: Video Mashup</a><br />
|
|
4.3.4 <a href="#scenario3.4">Scenario 4: Spatial Video
|
|
Navigation</a><br />
|
|
4.3.5 <a href="#scenario3.5">Scenario 5: Selective
|
|
previews</a><br />
|
|
4.3.6 <a href="#scenario3.6">Scenario 6: Music Samples</a><br />
|
|
4.3.7 <a href="#scenario3.7">Scenario 7: Highlighting regions
|
|
(out-of-scope)</a><br />
|
|
4.4 <a href="#uc4">Annotating Media Fragments</a><br />
|
|
4.4.1 <a href="#scenario4.1">Scenario 1: Spatial Tagging of
|
|
Images</a><br />
|
|
4.4.2 <a href="#scenario4.2">Scenario 2: Temporal Tagging of
|
|
Audio and Video</a><br />
|
|
4.4.3 <a href="#scenario4.3">Scenario 3: Named Anchors</a><br />
|
|
4.4.4 <a href="#scenario4.4">Scenario 4: Spatial and Temporal
|
|
Tagging</a><br />
|
|
4.4.5 <a href="#scenario4.5">Scenario 5: Search Engine</a><br />
|
|
4.5 <a href="#uc5">Adapting Media Resources</a><br />
|
|
4.5.1 <a href="#scenario5.1">Scenario 1: Changing Video quality
|
|
(out-of-scope)</a><br />
|
|
4.5.2 <a href="#scenario5.2">Scenario 2: Selecting Regions in
|
|
Images </a><br />
|
|
4.5.3 <a href="#scenario5.3">Scenario 3: Selecting an Image
|
|
from a multi-part document (out-of-scope)</a><br />
|
|
4.5.4 <a href="#scenario5.4">Scenario 4: Retrieving an Image
|
|
embedded thumbnail (out-of-scope)</a><br />
|
|
4.5.5 <a href="#scenario5.5">Scenario 5: Switching of Video
|
|
Transmission</a><br />
|
|
4.5.6 <a href="#scenario5.6">Scenario 6: Toggle All Audio
|
|
OFF</a><br />
|
|
4.5.7 <a href="#scenario5.7">Scenario 7: Toggle specific Audio
|
|
tracks</a><br />
|
|
4.5.8 <a href="#scenario5.8">Scenario 8: Video aspect ratio
|
|
(out-of-scope)</a><br />
|
|
5 <a href="#media-fragment-requirements">Requirements for Media Fragment
|
|
URIs</a><br />
|
|
5.1 <a href="#req_temporal">Requirement r01: Temporal fragments</a><br
|
|
/>
|
|
5.2 <a href="#req_spatial">Requirement r02: Spatial fragments</a><br />
|
|
5.3 <a href="#req_tracks">Requirement r03: Track fragments</a><br />
|
|
5.4 <a href="#req_named">Requirement r04: Named fragments</a><br />
|
|
5.5 <a href="#fitness_req">Fitness Conditions on Media
|
|
Containers/Resources</a><br />
|
|
</p>
|
|
|
|
<h3><a id="appendices" name="appendices"></a>Appendices</h3>
|
|
|
|
<p class="toc">A <a href="#references-normative">References</a><br />
|
|
B <a href="#fitness-table">Evaluation of fitness per media formats</a><br />
|
|
C <a href="#technologies-survey">Technologies Survey</a><br />
|
|
C.1 <a href="#ExistingSchemes">Existing URI fragment schemes</a><br />
|
|
C.1.1 <a href="#GeneralURISchemes">General specification of URI
|
|
fragments</a><br />
|
|
C.1.2 <a href="#NonAudioVideoURISchemes">Fragment
|
|
specifications not for audio/video</a><br />
|
|
C.1.3 <a href="#AudioVideoURISchemes">Fragment specifications
|
|
for audio/video</a><br />
|
|
C.2 <a href="#ExistingApplications">Existing applications using
|
|
proprietary temporal media fragment URI schemes</a><br />
|
|
C.3 <a href="#MediaFragmentApproaches">Media fragment specification
|
|
approaches</a><br />
|
|
C.3.1 <a href="#URI-based">URI based</a><br />
|
|
C.3.1.1 <a href="#SVG_URI">SVG</a><br />
|
|
C.3.1.1.1 <a
|
|
href="#Spatial_SVG_URI">Spatial</a><br />
|
|
C.3.1.2 <a href="#TemporalURI">Temporal URI/Ogg
|
|
technologies</a><br />
|
|
C.3.1.2.1 <a
|
|
href="#Temporal_TemporalURI">Temporal</a><br />
|
|
C.3.1.2.2 <a
|
|
href="#Track_TemporalURI">Track</a><br />
|
|
C.3.1.2.3 <a
|
|
href="#Named_TemporalURI">Named</a><br />
|
|
C.3.1.3 <a href="#MPEG-21">MPEG-21</a><br />
|
|
C.3.1.3.1 <a
|
|
href="#Temporal_MPEG-21">Temporal</a><br />
|
|
C.3.1.3.2 <a
|
|
href="#Spatial_MPEG-21">Spatial</a><br />
|
|
C.3.1.3.3 <a href="#Track_MPEG-21">Track</a><br
|
|
/>
|
|
C.3.1.3.4 <a href="#Named_MPEG-21">Named</a><br
|
|
/>
|
|
C.3.2 <a href="#Non-URI-based">Non-URI-based</a><br />
|
|
C.3.2.1 <a href="#SMIL">SMIL</a><br />
|
|
C.3.2.1.1 <a
|
|
href="#Temporal_SMIL">Temporal</a><br />
|
|
C.3.2.1.2 <a
|
|
href="#Spatial_SMIL">Spatial</a><br />
|
|
C.3.2.1.3 <a href="#Track_SMIL">Track</a><br />
|
|
C.3.2.1.4 <a href="#Named_SMIL">Named</a><br />
|
|
C.3.2.2 <a href="#MPEG-7">MPEG-7</a><br />
|
|
C.3.2.2.1 <a
|
|
href="#Temporal_MPEG-7">Temporal</a><br />
|
|
C.3.2.2.2 <a
|
|
href="#Spatial_MPEG-7">Spatial</a><br />
|
|
C.3.2.2.3 <a href="#Track_MPEG-7">Track</a><br
|
|
/>
|
|
C.3.2.2.4 <a href="#Named_MPEG-7">Named</a><br
|
|
/>
|
|
C.3.2.3 <a href="#SVG">SVG</a><br />
|
|
C.3.2.3.1 <a
|
|
href="#Temporal_SVG">Temporal</a><br />
|
|
C.3.2.3.2 <a href="#Spatial_SVG">Spatial</a><br
|
|
/>
|
|
C.3.2.4 <a href="#TV-Anytime">TV-Anytime</a><br />
|
|
C.3.2.4.1 <a
|
|
href="#Temporal_TV-Anytime">Temporal</a><br />
|
|
C.3.2.4.2 <a
|
|
href="#Named_TV-Anytime">Named</a><br />
|
|
C.3.2.5 <a href="#ImageMaps">ImageMaps</a><br />
|
|
C.3.2.5.1 <a
|
|
href="#Spatial_ImageMaps">Spatial</a><br />
|
|
C.3.2.6 <a href="#HTML5">HTML 5</a><br />
|
|
D <a href="#acknowledgments">Acknowledgements</a> (Non-Normative)<br />
|
|
E <a href="#change-log">Change Log</a> (Non-Normative)<br />
|
|
</p>
|
|
</div>
|
|
<hr />
|
|
|
|
<div class="body">
|
|
|
|
<div class="div1">
|
|
<h2><a id="introduction" name="introduction"></a>1 Introduction</h2>
|
|
|
|
<p>Audio and video resources on the World Wide Web are currently treated as
|
|
"foreign" objects, which can only be embedded using a plugin that is capable of
|
|
decoding and interacting with the media resource. Specific media servers are
|
|
generally required to provide for server-side features such as direct access to
|
|
time offsets into a video without the need to retrieve the entire resource.
|
|
Support for such media fragment access varies between different media formats
|
|
and inhibits standard means of dealing with such content on the Web. </p>
|
|
|
|
<p>This document collects background information to the Media Fragment URI
|
|
specification <cite><a href="#mf-spec">Media Fragments URI 1.0</a></cite>. It
|
|
contains a collection of side conditions under which the specification was
|
|
developed. It further contains a large collection of use cases that are either
|
|
regarded as relevant to this specification or as out-of-scope. From these use
|
|
cases, it deducts the different required dimensions for the Media Fragment URI
|
|
specification. Finally, this document finishes with a survey of existing media
|
|
fragment addressing approaches. </p>
|
|
</div>
|
|
|
|
<div class="div1">
|
|
<h2><a id="terminology" name="terminology"></a>2 Terminology</h2>
|
|
|
|
<p>The keywords <strong>MUST</strong>, <strong>MUST NOT</strong>,
|
|
<strong>SHOULD</strong> and <strong>SHOULD NOT</strong> are to be interpreted
|
|
as defined in <cite><a href="#rfc2119">RFC 2119</a></cite>. </p>
|
|
|
|
<p>According to <cite><a href="#rfc3986">RFC 3986</a></cite>, URIs that contain
|
|
a fragment are actually not URIs, but URI references relative to the namespace
|
|
of another URI. In this document, when the term 'media fragment URIs' is used,
|
|
it actually means 'media fragment URI references'. </p>
|
|
</div>
|
|
|
|
<div class="div1">
|
|
<h2><a id="side-conditions" name="side-conditions"></a>3 Side Conditions</h2>
|
|
|
|
<p>This section lists a number of conditions which have directed the
|
|
development of this specification. These conditions help clarify some of the
|
|
decisions made, e.g. about what types of use cases are within the realm of this
|
|
specification and which are outside. Spelling out these side conditions should
|
|
help increase transparency of the specifications. </p>
|
|
|
|
<div class="div2">
|
|
<h3><a id="side1" name="side1"></a>3.1 Single Media Resource Definition</h3>
|
|
|
|
<p>The following picture explains the generic composition of a media resource:
|
|
<img
|
|
src="http://www.w3.org/2008/WebVideo/Fragments/WD-media-fragments-reqs/800px-Model_of_a_Video_Resource.png"
|
|
alt="Model of a media resource" /> </p>
|
|
|
|
<p>A media resource for the purposes of this Working Group is defined along a
|
|
single timeline. It can consist of multiple tracks of data that are parallel
|
|
along this timeline. These tracks can be audio, video, images, text or any
|
|
other time-aligned data. The main interest of this group is in audio and video.
|
|
A media resource also typically has some control information in data headers.
|
|
These may be located at a particular position in the resource, e.g. the
|
|
beginning or the end, or spread throughout the data tracks as headers for data
|
|
packets. There is possibly also a general header for the complete media
|
|
resource. The data tracks are typically encoded in an interleaved fashion,
|
|
which allows for progressive decoding. All of this is provided in a single
|
|
file. </p>
|
|
</div>
|
|
|
|
<div class="div2">
|
|
<h3><a id="side2" name="side2"></a>3.2 Existing Standards</h3>
|
|
|
|
<p>Media fragment URIs will work within the boundaries of existing standards as
|
|
much as possible, in particular within the URI specification <cite><a
|
|
href="#rfc3986">RFC 3986</a></cite>. </p>
|
|
</div>
|
|
|
|
<div class="div2">
|
|
<h3><a id="side3" name="side3"></a>3.3 Unique Resource</h3>
|
|
|
|
<p>Media fragments are a representation of the parent resource and should not
|
|
create a new resource, in particular not a new resource of a different Internet
|
|
media type (or MIME type). Note that there are use cases for creating a new
|
|
resource, such as the extraction of a thumbnail from a video. These are
|
|
currently outside the scope of this document. </p>
|
|
</div>
|
|
|
|
<div class="div2">
|
|
<h3><a id="side4" name="side4"></a>3.4 Valid Resource</h3>
|
|
|
|
<p>Resources delivered as a response to a media fragment URI request should be
|
|
valid media resources by themselves and thus be playable by existing media
|
|
players / image viewers. </p>
|
|
</div>
|
|
|
|
<div class="div2">
|
|
<h3><a id="side5" name="side5"></a>3.5 Parent Resource</h3>
|
|
|
|
<p>The entire resource should be accessible as the "context" of a fragment via
|
|
a simple change of the URI. The media fragment URI - as a selective view of the
|
|
resource - provides a mechanism to focus on a fragment whilst hinting at the
|
|
wider media context in which the fragment is included. </p>
|
|
</div>
|
|
|
|
<div class="div2">
|
|
<h3><a id="side6" name="side6"></a>3.6 Single Fragment</h3>
|
|
|
|
<table border="1" summary="Editorial note: Werner Bailer ">
|
|
<tbody>
|
|
<tr>
|
|
<td width="50%" valign="top" align="left"><b>Editorial note: Werner
|
|
Bailer </b></td>
|
|
<td width="50%" valign="top" align="right"> </td>
|
|
</tr>
|
|
<tr>
|
|
<td valign="top" align="left" colspan="2">Not sure that the term 'mask'
|
|
is the best choice here, e.g. in MPEG-7 mask is used for the opposite,
|
|
i.e. not a single segment but a segment composed of several unconnected
|
|
parts. </td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
|
|
<p>A media fragment URI should create only a single "mask" onto a media
|
|
resource and not a collection of potentially overlapping fragments. </p>
|
|
</div>
|
|
|
|
<div class="div2">
|
|
<h3><a id="side7" name="side7"></a>3.7 Relevant Protocols</h3>
|
|
|
|
<p>The main protocols we are concerned with are HTTP and RTSP, since they are
|
|
open protocols for media delivery. </p>
|
|
</div>
|
|
|
|
<div class="div2">
|
|
<h3><a id="side8" name="side8"></a>3.8 No Recompression</h3>
|
|
|
|
<p>Media fragments should preferably be delivered as byte-range subparts of the
|
|
media resource such as to make the fragments an actual subresource of the media
|
|
resource. The advantage of this is that such fragments are cachable as byte
|
|
ranges in existing caching Web proxies. This implies that we should avoid to
|
|
decode and recompress a media resource to create a fragment. </p>
|
|
</div>
|
|
|
|
<div class="div2">
|
|
<h3><a id="side9" name="side9"></a>3.9 Minimize Impact on Existing
|
|
Infrastructure</h3>
|
|
|
|
<p>The necessary changes to all software in the media delivery chain should be
|
|
kept to a minimum: User Agents, Proxies, Media Servers. </p>
|
|
</div>
|
|
|
|
<div class="div2">
|
|
<h3><a id="side10" name="side10"></a>3.10 Focus for Changes</h3>
|
|
|
|
<p>Focus for necessary changes should be as much as possible on the media
|
|
servers because in any case they have to implement fragmentation support for
|
|
the media formats as the most fundamental requirement for providing media
|
|
fragment addressing. </p>
|
|
</div>
|
|
|
|
<div class="div2">
|
|
<h3><a id="side11" name="side11"></a>3.11 Browser Impact</h3>
|
|
|
|
<p>Changes to the user agent need to be a one-off and not require adaptation
|
|
per media encapsulation/encoding format. </p>
|
|
</div>
|
|
|
|
<div class="div2">
|
|
<h3><a id="side12" name="side12"></a>3.12 Fallback Action</h3>
|
|
|
|
<p>If a User Agent connects with a media fragment URI to a media server that
|
|
does not support media fragments, the media server should reply with the full
|
|
resource. The User Agent will then have to take action to either cancel this
|
|
connection (if e.g. the media resource is too long) or do a fragment offset
|
|
locally. </p>
|
|
|
|
<p>A User Agent that does not understand media fragment URIs will simply hand
|
|
on the URI (potentially with a stripped off fragment part) to the server and
|
|
receive the full resource in lieu of the fragment. This may lead to unexpected
|
|
behaviour with media fragment URIs in non-conformant User Agents, e.g. where a
|
|
mash-up of media fragments is requested, but a sequence of the full files is
|
|
played. This is acceptable during a transition phase. </p>
|
|
|
|
<table border="1" summary="Editorial note: David Singer">
|
|
<tbody>
|
|
<tr>
|
|
<td width="50%" valign="top" align="left"><b>Editorial note: David
|
|
Singer</b></td>
|
|
<td width="50%" valign="top" align="right"> </td>
|
|
</tr>
|
|
<tr>
|
|
<td valign="top" align="left" colspan="2">The fallback plan needs to be
|
|
clarified. We must be able to handle the way the # is already used,
|
|
e.g. in YouTube, without breaking what is already working. </td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</div>
|
|
|
|
<div class="div1">
|
|
<h2><a id="use-cases" name="use-cases"></a>4 Use Cases</h2>
|
|
|
|
<p>In which situations do users need media fragment URIs? This section explains
|
|
the types of user interactions with media resources that media fragment URIs
|
|
will enable. For each type it shows how media fragment URIs can improve the
|
|
usefulness, usability, and functionality of online audio and video. </p>
|
|
|
|
<div class="div2">
|
|
<h3><a id="uc1" name="uc1"></a>4.1 Linking to and Display of Media
|
|
Fragments</h3>
|
|
|
|
<p>In this use case, a user is only interested in consuming a fragment of a
|
|
media resource rather than the complete resource. A media fragment URI allows
|
|
addressing this part of the resource directly and thus enables the User Agent
|
|
to receive just the relevant fragment. </p>
|
|
|
|
<div class="div3">
|
|
<h4><a id="scenario1.1" name="scenario1.1"></a>4.1.1 Scenario 1: Retrieve only
|
|
segment of a video</h4>
|
|
|
|
<div class="exampleOuter">
|
|
<p>Tim does a keyword search on a video search service. That keyword is found
|
|
in several videos in the search service's collection and it relates to clips
|
|
inside the videos that appear at a time offset. Tim would like the search
|
|
result to point him to just these media fragments so he can watch the relevant
|
|
clips rather than having to watch the full videos and manually scroll for the
|
|
relevant clips. </p>
|
|
</div>
|
|
</div>
|
|
|
|
<div class="div3">
|
|
<h4><a id="scenario1.2" name="scenario1.2"></a>4.1.2 Scenario 2: Region of an
|
|
Image</h4>
|
|
|
|
<div class="exampleOuter">
|
|
<p>Tim has discovered on an image hosting service a photo of his third school
|
|
year class. He is keen to put a link to his own face inside this photo onto his
|
|
private Web site where he is collecting old photos of himself. He does not want
|
|
the full photo to be displayed and he does not want to have to download and
|
|
crop the original image since he wants to reference the original resource. </p>
|
|
</div>
|
|
</div>
|
|
|
|
<div class="div3">
|
|
<h4><a id="scenario1.3" name="scenario1.3"></a>4.1.3 Scenario 3: Portion of
|
|
Music</h4>
|
|
|
|
<div class="exampleOuter">
|
|
<p>Tim is a Last.fm user. He wants his friend Sue to listen to a cool song,
|
|
Gypsy Davy. However, not really the entire song is worth it, Tim thinks. He
|
|
wants Sue to listen to the last 10 seconds only and sends her an email with a
|
|
link to just that subpart of the media resource. </p>
|
|
</div>
|
|
</div>
|
|
|
|
<div class="div3">
|
|
<h4><a id="scenario1.4" name="scenario1.4"></a>4.1.4 Scenario 4: Image Region
|
|
of video over time</h4>
|
|
|
|
<div class="exampleOuter">
|
|
<p>Tim is now creating an analysis of the movements of muscles of horses during
|
|
trotting and finds a few relevant videos online. His analysis is collected on a
|
|
Web page and he'd like to reference the relevant video sections, cropped both
|
|
in time and space to focus his viewers' attention on specific areas of interest
|
|
that he'd like to point out. </p>
|
|
</div>
|
|
</div>
|
|
</div>
|
|
|
|
<div class="div2">
|
|
<h3><a id="uc2" name="uc2"></a>4.2 Browsing and Bookmarking Media Fragments</h3>
|
|
|
|
<p>Media resources - audio, video and even images - are often very large
|
|
resources that users want to explore progressively. Progressive exploration of
|
|
text is well-known in the Web space under the term "pagination". Pagination in
|
|
the text space is realized by creating a series of Web pages and enabling
|
|
paging through them by scripts on a server, each page having their own URI. For
|
|
large media resources, such pagination can be provided by media fragment URIs,
|
|
which enable direct access to media fragments. </p>
|
|
|
|
<div class="div3">
|
|
<h4><a id="scenario2.1" name="scenario2.1"></a>4.2.1 Scenario 1: Temporal Video
|
|
Pagination</h4>
|
|
|
|
<div class="exampleOuter">
|
|
<p>Michael has a Website that collects recordings of the sittings of his
|
|
government's parliament. These recordings tend to be very long - generally on
|
|
the order of 7 hours in duration. Instead of splitting up the recordings into
|
|
short files by manual inspection of the change of topics or some other
|
|
segmentation approach, he prefers to provide many handles to a unique video
|
|
resource. As he publishes the files, however, he provides pagination on the
|
|
videos such that people can watch them 20 min at a time. </p>
|
|
</div>
|
|
</div>
|
|
|
|
<div class="div3">
|
|
<h4><a id="scenario2.2" name="scenario2.2"></a>4.2.2 Scenario 2: Audio Passage
|
|
Bookmark</h4>
|
|
|
|
<p>Users not only want to receive links to highlights in media resources, but
|
|
also like to bookmark them in their browsers to be able to get back to them.
|
|
</p>
|
|
|
|
<div class="exampleOuter">
|
|
<p>Sue likes the song segment that Tim has sent her and decides to add this
|
|
specific segment to her bookmarks. </p>
|
|
</div>
|
|
</div>
|
|
|
|
<div class="div3">
|
|
<h4><a id="scenario2.3" name="scenario2.3"></a>4.2.3 Scenario 3: Audio
|
|
Navigation</h4>
|
|
|
|
<p>When regarding media resources (in particular audio and video) as monolithic
|
|
blocks, they are very inaccessible. For example, it is difficult to find out
|
|
what they are about, where the highlights are, or what the logical structure of
|
|
the resources are. Lack of these features, in particular lack of captions and
|
|
audio annotations, further make the resources inaccessible to disabled people.
|
|
Introducing an ability to directly access highlights, fragments, or the logical
|
|
structure of a media resource will provide a big contribution towards making a
|
|
media resource more accessible. </p>
|
|
|
|
<div class="exampleOuter">
|
|
<p>Lena would like to browse the descriptive audio tracks of a video as she
|
|
does with Daisy audio books, by following the logical structure of the media.
|
|
Audio descriptions and captions generally come in blocks either timed or
|
|
separated by silences. Chapter by chapter and then section by section she
|
|
eventually jumps to a specific paragraph and down to the sentence level by
|
|
using the "tab" control as she would normally do in audio books. The
|
|
descriptive audio track is an extra spoken track that provides a description of
|
|
scenes happening in a video. When the descriptive audio track is not present,
|
|
Lena can similarly browse through captions and descriptive text tracks which
|
|
are either rendered through her braille reading device or through her
|
|
text-to-speech engine. </p>
|
|
</div>
|
|
</div>
|
|
|
|
<div class="div3">
|
|
<h4><a id="scenario2.4" name="scenario2.4"></a>4.2.4 Scenario 4: Caption and
|
|
chapter tracks for browsing Video</h4>
|
|
|
|
<div class="exampleOuter">
|
|
<p>Silvia has a deaf friend, Elaine, who would like to watch the holiday videos
|
|
that Silvia is publishing on her website. Silvia has created subtitle tracks
|
|
for her videos and also a segmentation (e.g. using CMML <cite><a
|
|
href="#cmml">CMML</a></cite>) with unique identifiers on the clips that she
|
|
describes. The clips were formed based on locations that Silvia has visited. In
|
|
this way, Elaine is able to watch the videos by going through the clips and
|
|
reading the subtitles for those clips that she is interested in. She watches
|
|
the sections on Korea, Australia, and France, but jumps over the ones of Great
|
|
Britain and Holland. </p>
|
|
</div>
|
|
</div>
|
|
|
|
<div class="div3">
|
|
<h4><a id="scenario2.5" name="scenario2.5"></a>4.2.5 Scenario 5: Jumping back
|
|
in time during live streaming</h4>
|
|
|
|
<p>A URL to a live video stream may look no different than a URL to a canned
|
|
video file, e.g. http://www.example.com/video.ogv . However, in contrast to
|
|
canned video file URLs, this URL always points to the live video data, i.e.
|
|
what is transmitted "now". This leads to different requirements on fragment
|
|
addressing than with canned video files. </p>
|
|
|
|
<div class="exampleOuter">
|
|
<p>Thomas is watching a live video stream, but has to take a business call
|
|
right in the middle. He stops his video player to take the call. As he
|
|
reconnects, he gets connected back with the live stream, but has missed the
|
|
last 5 min. He would like to rewind to 5 min ago. A URL scheme that can capture
|
|
the time at which a live video stream was transmitted and allow for direct
|
|
access to any time within that real-world clock time will allow such direct
|
|
access. </p>
|
|
</div>
|
|
</div>
|
|
|
|
<div class="div3">
|
|
<h4><a id="scenario2.6" name="scenario2.6"></a>4.2.6 Scenario 6: Jumping to a
|
|
particular event in a live stream</h4>
|
|
|
|
<div class="exampleOuter">
|
|
<p>Thomas is watching a Formula 1 race on a Website that is streaming live
|
|
video with a real-time commentary and interactive textual descriptions of
|
|
particular events that are happening. Thomas wants to directly jump to the
|
|
'Alonso accident' listed next to the video as a section of interest in the
|
|
video. </p>
|
|
</div>
|
|
</div>
|
|
</div>
|
|
|
|
<div class="div2">
|
|
<h3><a id="uc3" name="uc3"></a>4.3 Recompositing Media Fragments </h3>
|
|
|
|
<p>As we enable direct linking to media fragments in a URI, we can also enable
|
|
simple recompositing of such media fragments. Note that because the media
|
|
fragments in a composition may possibly originate from different codecs and
|
|
very different files, we can not realistically expect smooth playback between
|
|
the fragments. </p>
|
|
|
|
<div class="div3">
|
|
<h4><a id="scenario3.1" name="scenario3.1"></a>4.3.1 Scenario 1: Reframing a
|
|
photo in a slideshow</h4>
|
|
|
|
<div class="exampleOuter">
|
|
<p>Erik has a collection of photos and wants to create a slide show of some of
|
|
the photos and wants to highlight specific areas in each image. He uses xspf to
|
|
define the slide show (playlist) using spatial fragment URIs to address the
|
|
photo fragments. </p>
|
|
</div>
|
|
</div>
|
|
|
|
<div class="div3">
|
|
<h4><a id="scenario3.2" name="scenario3.2"></a>4.3.2 Scenario 2: Mosaic</h4>
|
|
|
|
<div class="exampleOuter">
|
|
<p>Jack wants to create a mosaic for his website with all the image fragments
|
|
that Erik defined collated together. He uses SMIL 3.0 Tiny Profile and the
|
|
spatial fragment URIs to layout the image fragments and stitch them together as
|
|
a new "image". </p>
|
|
</div>
|
|
</div>
|
|
|
|
<div class="div3">
|
|
<h4><a id="scenario3.3" name="scenario3.3"></a>4.3.3 Scenario 3: Video
|
|
Mashup</h4>
|
|
|
|
<div class="exampleOuter">
|
|
<p>Jack has a collection of videos and wants to create a mashup from segments
|
|
out of these videos without having to manually edit them together. He uses SMIL
|
|
3.0 Tiny Profile and temporal fragment URIs to address the clips out of the
|
|
videos and sequence them together. </p>
|
|
</div>
|
|
</div>
|
|
|
|
<div class="div3">
|
|
<h4><a id="scenario3.4" name="scenario3.4"></a>4.3.4 Scenario 4: Spatial Video
|
|
Navigation</h4>
|
|
|
|
<div class="exampleOuter">
|
|
<p>Elaine has recorded a video mosaic of all her 4 TV channels of an
|
|
international election day in a single video. She wants to keep the original
|
|
synchronised file, but now she wants to be able to play back each of the four
|
|
channels' recordings separately and in sequence. She creates a playlist of
|
|
media fragments URIs that each select a specific channel in the mosaic to play
|
|
each channel after one another. </p>
|
|
</div>
|
|
</div>
|
|
|
|
<div class="div3">
|
|
<h4><a id="scenario3.5" name="scenario3.5"></a>4.3.5 Scenario 5: Selective
|
|
previews</h4>
|
|
|
|
<p>Given an ability to link to media fragments through URIs, people will want
|
|
to decide whether they receive the full resource or just the data that relates
|
|
to the media fragment. This is particularly the case where the resource is
|
|
large, where the bandwidth is scarce or expensive, and/or where people have
|
|
limited time/patience to wait until the full resource is loaded. </p>
|
|
|
|
<div class="exampleOuter">
|
|
<p>Yves is a busy person. He doesn't have time to attend all meetings that he
|
|
is supposed to attend. He also uses his mobile device for accessing Web
|
|
resources while traveling, to make the most of his time. Some of the recent
|
|
meetings that Yves was supposed to attend have been recorded and published on
|
|
the Web. A colleague points out to Yves in an email which sections of the
|
|
meetings he should watch. While on his next trip, Yves goes back to this email
|
|
and watches the highlighted sections by simply clicking on them. The media
|
|
server of his company dynamically composes a valid media resource from the URIs
|
|
that Yves is sending it such that Yves' video player can play just the right
|
|
fragments. </p>
|
|
</div>
|
|
</div>
|
|
|
|
<div class="div3">
|
|
<h4><a id="scenario3.6" name="scenario3.6"></a>4.3.6 Scenario 6: Music
|
|
Samples</h4>
|
|
|
|
<div class="exampleOuter">
|
|
<p>Erik has a music collection. He creates an "audio podcast" in the form of an
|
|
RSS feed with URIs that link to samples from his music files. His friends can
|
|
play back the samples in their Web-attached music players. </p>
|
|
</div>
|
|
</div>
|
|
|
|
<div class="div3">
|
|
<h4><a id="scenario3.7" name="scenario3.7"></a>4.3.7 Scenario 7: Highlighting
|
|
regions (out-of-scope)</h4>
|
|
|
|
<div class="exampleOuter">
|
|
<p>Tim has discovered yet another alumni photo of his third school year class.
|
|
This time he doesn't want to crop his face but he wants to keep the photo in
|
|
the context of his classmates. He wants his region of the photo highlighted and
|
|
the rest grey scaled. </p>
|
|
</div>
|
|
|
|
<p>This scenario is out of scope for this Working Group because the display of
|
|
the highlighted region is up to the user agent and is not relevant to the
|
|
network interaction. This particular scenario is already possible with image
|
|
maps in HTML. </p>
|
|
</div>
|
|
</div>
|
|
|
|
<div class="div2">
|
|
<h3><a id="uc4" name="uc4"></a>4.4 Annotating Media Fragments</h3>
|
|
|
|
<p>Media resources typically don't just consist of the binary data. There is
|
|
often a lot of textual information available that relates to the media
|
|
resource. Enabling the addressing of media fragments ultimately creates a means
|
|
to attach annotations to media fragments, for example, using the <cite><a
|
|
href="#mediaont-10">Ontology for Media Resource 1.0</a></cite>. </p>
|
|
|
|
<div class="div3">
|
|
<h4><a id="scenario4.1" name="scenario4.1"></a>4.4.1 Scenario 1: Spatial
|
|
Tagging of Images</h4>
|
|
|
|
<div class="exampleOuter">
|
|
<p>Raphael systematically annotates some highlighted regions in his photos that
|
|
depict his friends, families, or the monuments he finds impressive. This
|
|
knowledge is represented by RDF descriptions that use spatial fragment URIs to
|
|
relate to the image fragments in his annotated collection. It makes it possible
|
|
later to search and retrieve all these media fragment URIs that relate to one
|
|
particular friend or monument. </p>
|
|
</div>
|
|
</div>
|
|
|
|
<div class="div3">
|
|
<h4><a id="scenario4.2" name="scenario4.2"></a>4.4.2 Scenario 2: Temporal
|
|
Tagging of Audio and Video</h4>
|
|
|
|
<div class="exampleOuter">
|
|
<p>Raphael also has a collection of audio and video files of all the
|
|
presentations he ever made. His RDF description collection extends to
|
|
describing all the temporal segments where he gave a demo of a software system
|
|
with structured details on the demo. </p>
|
|
</div>
|
|
|
|
<table border="1" summary="Editorial note: Silvia">
|
|
<tbody>
|
|
<tr>
|
|
<td width="50%" valign="top" align="left"><b>Editorial note:
|
|
Silvia</b></td>
|
|
<td width="50%" valign="top" align="right"> </td>
|
|
</tr>
|
|
<tr>
|
|
<td valign="top" align="left" colspan="2">Time-aligned text such as
|
|
captions, subtitles in multiple languages, and audio descriptions for
|
|
audio and video don't have to be created as separate documents and link
|
|
to each segment through a temporal URI. Such text can be made part of
|
|
the media resource by the media author or delivered as a separate, but
|
|
synchronised data stream to the media player. In either case, when it
|
|
comes to using these with the HTML5 <video> tag, they should be
|
|
made accessible to the Web page through a javascript API for the
|
|
video/audio/image element. This needs to be addressed in the HTML5
|
|
working group. </td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
|
|
<div class="div3">
|
|
<h4><a id="scenario4.3" name="scenario4.3"></a>4.4.3 Scenario 3: Named
|
|
Anchors</h4>
|
|
|
|
<p>Annotating media resources at the level of a complete resource is in certain
|
|
circumstances not enough. Support for annotating multimedia on the level of
|
|
fragments is often desired. The definition of "anchors" (or id tags) for
|
|
fragments of media resources will allow us to identify fragments by name. It
|
|
allows the creation of an author-defined segmentation of the resource - an
|
|
author-provided structure. </p>
|
|
|
|
<div class="exampleOuter">
|
|
<p>Raphael would like to attach an RDF-based annotation to a video fragment
|
|
that is specified through an "anchor". Identifying the media fragment by name
|
|
instead of through a temporal video fragment URI allows him to create a more
|
|
memorable URI than having to remember the time offsets. </p>
|
|
</div>
|
|
</div>
|
|
|
|
<div class="div3">
|
|
<h4><a id="scenario4.4" name="scenario4.4"></a>4.4.4 Scenario 4: Spatial and
|
|
Temporal Tagging</h4>
|
|
|
|
<div class="exampleOuter">
|
|
<p>Guillaume uses video fragment URIs in an MPEG-7 sign language profile to
|
|
describe a moving point of interest: he wants the focus region to be the
|
|
dominant hand in a Sign Language video. The series of video fragment URIs gives
|
|
the coordinates and timing of the trajectory followed by the hand, and by
|
|
naming them, can also describe the areas of changing hand-shapes. </p>
|
|
</div>
|
|
</div>
|
|
|
|
<div class="div3">
|
|
<h4><a id="scenario4.5" name="scenario4.5"></a>4.4.5 Scenario 5: Search
|
|
Engine</h4>
|
|
|
|
<div class="exampleOuter">
|
|
<p>Guillaume wants to retrieve the images of each bike present at a recent
|
|
cycling event. Group photos and general shots of the event have been published
|
|
online together with detailed RDF annotations. Thanks to a query in a search
|
|
engine that is able to parse the RDF annotations, Guillaume can now retrieve
|
|
multiple individual shots of each bike in the collection, where the URI is
|
|
created based on the RDF annotations. </p>
|
|
</div>
|
|
</div>
|
|
</div>
|
|
|
|
<div class="div2">
|
|
<h3><a id="uc5" name="uc5"></a>4.5 Adapting Media Resources</h3>
|
|
|
|
<p>When addressing a media resource as a user, one often has the desire not to
|
|
retrieve the full resource, but only a subpart of interest. This may be a
|
|
temporally or spatially consecutive subpart, but could also be e.g. a smaller
|
|
bandwidth version of the same resource, a lower framerate video, a image with
|
|
less colour depth or an audio file with a lower sampling rate. Media adaptation
|
|
is the general term used for such server-side created versions of media
|
|
resources. </p>
|
|
|
|
<div class="div3">
|
|
<h4><a id="scenario5.1" name="scenario5.1"></a>4.5.1 Scenario 1: Changing Video
|
|
quality (out-of-scope)</h4>
|
|
|
|
<div class="exampleOuter">
|
|
<p>Davy is looking for videos about allergies and would like to get previews at
|
|
a lower frame rate to decide whether to download and save them in his
|
|
collection. He would like to be able to specify in the URI a means of telling
|
|
the media server the adaptation that he is after. For video he would like to
|
|
adapt width, height, frame rate, colour depth, and temporal subpart selection.
|
|
Alternatively, he may want to get just a thumbnail of the video. </p>
|
|
</div>
|
|
|
|
<p>This scenario is out of scope for this Working Group because it requires
|
|
changes be made to the actual encoded data to retrieve a "fragment". URI based
|
|
media fragments should basically be achieved through cropping of one or more
|
|
byte sections. It is possible to develop in future a scheme for such transcoded
|
|
resources using a URI query (?) specification. </p>
|
|
</div>
|
|
|
|
<div class="div3">
|
|
<h4><a id="scenario5.2" name="scenario5.2"></a>4.5.2 Scenario 2: Selecting
|
|
Regions in Images </h4>
|
|
|
|
<div class="exampleOuter">
|
|
<p>Davy is interested to have precise coordinates on his browser address bar to
|
|
see and pan over large-size images maps. Through the same URI scheme he can now
|
|
generically address and locate different image subparts on his User Agent for
|
|
all image types. </p>
|
|
</div>
|
|
</div>
|
|
|
|
<div class="div3">
|
|
<h4><a id="scenario5.3" name="scenario5.3"></a>4.5.3 Scenario 3: Selecting an
|
|
Image from a multi-part document (out-of-scope)</h4>
|
|
|
|
<div class="exampleOuter">
|
|
<p>Davy is now interested in multi-resolution, multi-page medical images. He
|
|
wants to select the detailed image of the toe X-rays which appear on page 7 of
|
|
the TIFF document. </p>
|
|
</div>
|
|
|
|
<p>The support of particular media formats such as TIFF is out of scope - the
|
|
Working Group only deals with the specification of generic addressing
|
|
approaches, but support of particular file formats needs to be implemented by
|
|
the format developers. A spatial fragment URI to an image is, however, in
|
|
scope. </p>
|
|
</div>
|
|
|
|
<div class="div3">
|
|
<h4><a id="scenario5.4" name="scenario5.4"></a>4.5.4 Scenario 4: Retrieving an
|
|
Image embedded thumbnail (out-of-scope)</h4>
|
|
|
|
<div class="exampleOuter">
|
|
<p>Davy is also interested to have the kind of preview functionality for
|
|
pictures, in particular these large 10 mega-pixel JPEG files that have embedded
|
|
thumbnails in them. He can now provide a fast preview by selecting the embedded
|
|
thumbnail in the original image without even having to resize or create a new
|
|
separate file! </p>
|
|
</div>
|
|
|
|
<p>This particular scenario is out of scope for a media fragment URI, since it
|
|
creates a resource of a different mime type to the original resource. This
|
|
cannot be done using the URI fragment specifier, but only using the query
|
|
specifier. This is left as a future exercise. </p>
|
|
</div>
|
|
|
|
<div class="div3">
|
|
<h4><a id="scenario5.5" name="scenario5.5"></a>4.5.5 Scenario 5: Switching of
|
|
Video Transmission</h4>
|
|
|
|
<div class="exampleOuter">
|
|
<p>Davy has a blind friend called Katrina. Katrina would also like to watch the
|
|
videos that Davy has found, and is lucky that the videos have additional
|
|
alternative audio tracks, which describe to blind users what is happening in
|
|
the videos. Her Internet connection is of lower bandwidth and she would like to
|
|
switch off the video track, but receive the two audio tracks (original audio
|
|
plus audio annotations). She would like to do this track selection through
|
|
simple changes to the URI. </p>
|
|
</div>
|
|
</div>
|
|
|
|
<div class="div3">
|
|
<h4><a id="scenario5.6" name="scenario5.6"></a>4.5.6 Scenario 6: Toggle All
|
|
Audio OFF</h4>
|
|
|
|
<div class="exampleOuter">
|
|
<p>Sebo is Deaf and enjoys watching videos on the Web. Her friend sent her a
|
|
link to a new music video but she doesn't want to waste time and bandwidth
|
|
receiving any sounds. So when she enters the URI in her browser's address bar,
|
|
she also adds an extra parameter to select the video track only. </p>
|
|
</div>
|
|
</div>
|
|
|
|
<div class="div3">
|
|
<h4><a id="scenario5.7" name="scenario5.7"></a>4.5.7 Scenario 7: Toggle
|
|
specific Audio tracks</h4>
|
|
|
|
<div class="exampleOuter">
|
|
<p>Davy's girlfriend is a fan of Karaoke. She loves to be able to play back
|
|
videos from the Web that have a karaoke text, and two audio tracks, one each
|
|
for the music and for the singer. She practices the songs by playing back the
|
|
complete video with all tracks, but uses the video in Karaoke parties with
|
|
friends where she turns off the singer's audio track through a simple selection
|
|
of tracks in the User Agent. </p>
|
|
</div>
|
|
</div>
|
|
|
|
<div class="div3">
|
|
<h4><a id="scenario5.8" name="scenario5.8"></a>4.5.8 Scenario 8: Video aspect
|
|
ratio (out-of-scope)</h4>
|
|
|
|
<div class="exampleOuter">
|
|
<p>Silvia's television is brand new and with a display in 16:9, however she has
|
|
video on her media server that are in 3:2 format. To avoid paying a premium in
|
|
network fees, she would like the television to request only what can be
|
|
displayed to avoid wasting bandwidth. </p>
|
|
</div>
|
|
|
|
<p>This particular scenario is out of scope for a media fragment URI, since it
|
|
is unclear what a server should do with a request that has a different aspect
|
|
ratio. It is a display issue rather than a bandwidth or clipping issue. In
|
|
general, a user agent would create black borders around the video with a
|
|
diverging aspect ratio. However, it is up to the user agent what to do in such
|
|
a situation of diverging aspect ratio between what the server supplies and what
|
|
the user agent is requested to display. For example, HTML5 has specifications
|
|
for what to do in such a situation. </p>
|
|
</div>
|
|
</div>
|
|
</div>
|
|
|
|
<div class="div1">
|
|
<h2><a id="media-fragment-requirements"
|
|
name="media-fragment-requirements"></a>5 Requirements for Media Fragment
|
|
URIs</h2>
|
|
|
|
<p>This section describes the list of required media fragment addressing
|
|
dimensions that have resulted from the use case analysis.</p>
|
|
|
|
<p>It further analyses what format requirements the media resources has to
|
|
adhere to in order to allow the extraction of the data that relates to that
|
|
kind of addressing.</p>
|
|
|
|
<div class="div2">
|
|
<h3><a id="req_temporal" name="req_temporal"></a>5.1 Requirement r01: Temporal
|
|
fragments</h3>
|
|
|
|
<p>A temporal fragment of a media resource is a clipping along the time
|
|
dimension from a start to an end time that are within the duration of the media
|
|
resource. </p>
|
|
|
|
<p>Whether a media resource supports temporal fragment extraction is in the
|
|
first place dependent on the coding format and more specifically how encoding
|
|
parameters were set. For video coding formats, temporal fragments can be
|
|
extracted if the video stream provides random access points (i.e., a point that
|
|
is not dependent on previously encoded video data, typically corresponding to
|
|
an intra-coded frame) on a regular basis. The same holds true for audio coding
|
|
formats, i.e., the audio stream needs to be accessible at a point where the
|
|
decoder can start decoding without the need of previous coded data. </p>
|
|
</div>
|
|
|
|
<div class="div2">
|
|
<h3><a id="req_spatial" name="req_spatial"></a>5.2 Requirement r02: Spatial
|
|
fragments</h3>
|
|
|
|
<p>A spatial fragment of a media resource is a clipping of an image region. For
|
|
media fragment addressing we only regard rectangular regions. </p>
|
|
|
|
<p>Support for extraction of spatial fragments from a media resource in the
|
|
compressed domain depends on the coding format. The coding format must allow to
|
|
encode spatial regions independently from each other in order to support the
|
|
extraction of these regions in the compressed domain. Note that there are
|
|
currently two variants: region extraction and interactive region extraction. In
|
|
the first case, the regions (i.e., Regions Of Interest, ROI) are known at
|
|
encoding time and coded independently from each other. In the second case, ROIs
|
|
are not known at encoding time and can be chosen by a user agent. In this case,
|
|
the media resource is divided in a number of tiles, each encoded independently
|
|
from each other. Subsequently, the tiles covering the desired region are
|
|
extracted from the media resource. </p>
|
|
</div>
|
|
|
|
<div class="div2">
|
|
<h3><a id="req_tracks" name="req_tracks"></a>5.3 Requirement r03: Track
|
|
fragments</h3>
|
|
|
|
<p>A typical media resource consists of multiple tracks of data multiplexed
|
|
together into the media resource. A media resource could for example consist of
|
|
several audio, several video, and several textual annotation or metadata
|
|
tracks. Their individual extraction / addressing is desirable in particular
|
|
from a media adaptation point of view. </p>
|
|
|
|
<p>Whether the extraction of tracks from a media resource is supported or not
|
|
depends on the container format of the media resource. Since a container format
|
|
only defines a syntax and does not introduce any compression, it is always
|
|
possible to describe the structures of a container format. Hence, if a
|
|
container format allows the encapsulation of multiple tracks, then it is
|
|
possible to describe the tracks in terms of byte ranges. Examples of such
|
|
container formats are Ogg <cite><a href="#ogg">RFC 3533</a></cite> and MP4.
|
|
Note that it is possible that the tracks are multiplexed, implying that a
|
|
description of one track consists of a list of byte ranges. Also note that the
|
|
extraction of tracks (and fragments in general) from container formats often
|
|
introduces the necessity of syntax element modifications in the headers. </p>
|
|
</div>
|
|
|
|
<div class="div2">
|
|
<h3><a id="req_named" name="req_named"></a>5.4 Requirement r04: Named
|
|
fragments</h3>
|
|
|
|
<p>A named fragment of a media resource is a media fragment - either a track, a
|
|
time section, or a spatial region - that has been given a name through some
|
|
sort of annotation mechanism. Through this name, the media fragment can be
|
|
addressed in a more human-readable form. </p>
|
|
|
|
<p>No coding format provides support for named fragments, since naming is not
|
|
part of the encoding/decoding process. Hence, we have to consider container
|
|
formats for this feature. In general, if a container format allows the
|
|
insertion of metadata describing the named fragments, then the container format
|
|
supports named fragments, if the fragment class is also supported. For example,
|
|
you can include a CMML <cite><a href="#cmml">CMML</a></cite> or TimedText
|
|
description in an MP4 or Ogg <cite><a href="#ogg">RFC 3533</a></cite> container
|
|
and interpret this description to extract temporal fragments based on a name
|
|
given to them in the description. </p>
|
|
</div>
|
|
|
|
<div class="div2">
|
|
<h3><a id="fitness_req" name="fitness_req"></a>5.5 Fitness Conditions on Media
|
|
Containers/Resources</h3>
|
|
|
|
<p>There is a large number of media codecs and encapsulation formats that we
|
|
need to take into account as potential media resources on the Web. This section
|
|
analyses the general conditions for media formats that make them fit for
|
|
supporting the different types of fragment URIs. </p>
|
|
|
|
<p>Media resources should fulfill the following conditions to allow extraction
|
|
of fragments: </p>
|
|
<ul>
|
|
<li><p>The media fragments can be extracted in the compressed domain.</p>
|
|
</li>
|
|
<li><p>No syntax element modifications in the bitstream are needed to perform
|
|
the extraction.</p>
|
|
</li>
|
|
</ul>
|
|
|
|
<p>Not all media formats will be compliant with these two conditions. Hence, we
|
|
distinguish the following categories: </p>
|
|
<ul>
|
|
<li><p><b>Fit</b>: The media resource meets the two conditions (i.e.,
|
|
fragments can be extracted in the compressed domain and no syntax element
|
|
modifications are necessary). In this case, caching media fragments of such
|
|
media resources on the byte level is possible.</p>
|
|
</li>
|
|
<li><p><b>Conditionally fit</b>: Media fragments can be extracted in the
|
|
compressed domain, but syntax element modifications are required. These
|
|
media fragments provide cacheable byte ranges for the data, but syntax
|
|
element modifications are needed in headers applying to the whole media
|
|
resource/fragment. In this case, these headers could be sent to the client
|
|
in the first response of the server.</p>
|
|
</li>
|
|
<li><p><b>Unfit</b>: Media fragments cannot be extracted in the compressed
|
|
domain as byte ranges. In this case, transcoding operations are necessary
|
|
to extract media fragments. Since these media fragments do not create
|
|
reproducible bytes, it is not possible to cache these media fragments. Note
|
|
that media formats which enable extracting fragments in the compressed
|
|
domain, but are not compliant with category 2 (i.e., syntax element
|
|
modifications are not only applicable to the whole media resource), also
|
|
belong to this category.</p>
|
|
</li>
|
|
</ul>
|
|
|
|
<p>Those media types that are capable of doing what server-side media fragments
|
|
require are of interest to us. For those that aren't, the fall-back case
|
|
applies (i.e. full download and then offsetting). Appendix <a
|
|
href="#fitness-table"><b>B Evaluation of fitness per media formats</b></a>
|
|
lists a large number of typical formats and determines which we see fit,
|
|
conditionally fit, or currently unfit for supporting the different types of
|
|
media fragment URIs. </p>
|
|
|
|
<table border="1" summary="Editorial note: Silvia">
|
|
<tbody>
|
|
<tr>
|
|
<td width="50%" valign="top" align="left"><b>Editorial note:
|
|
Silvia</b></td>
|
|
<td width="50%" valign="top" align="right"> </td>
|
|
</tr>
|
|
<tr>
|
|
<td valign="top" align="left" colspan="2"><p>We ask for further input
|
|
into the table in the attachment, in particular where there are
|
|
question marks.</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</div>
|
|
</div>
|
|
|
|
<div class="back">
|
|
|
|
<div class="div1">
|
|
<h2><a id="references-normative" name="references-normative"></a>A
|
|
References</h2>
|
|
<dl>
|
|
<dt class="label"><a name="cmml"></a>[CMML] </dt>
|
|
<dd><a
|
|
href="http://www.annodex.net/TR/draft-pfeiffer-cmml-03.txt"><cite>Continuous
|
|
Media Markup Language (CMML), Version 2.1</cite></a>. IETF
|
|
Internet-Draft, March 2006. </dd>
|
|
<dt class="label"><a name="html40"></a>[HTML 4.0] </dt>
|
|
<dd>D. Ragett, A. Le Hors and I. Jacobs. <a
|
|
href="http://www.w3.org/TR/REC-html40/intro/intro.html#fragment-uri"><cite>HTML
|
|
Fragment identifiers</cite></a>. W3C Recommendation, December 1999.
|
|
Available at <a
|
|
href="http://www.w3.org/TR/REC-html40/intro/intro.html#fragment-uri">http://www.w3.org/TR/REC-html40/intro/intro.html#fragment-uri</a>.
|
|
</dd>
|
|
<dt class="label"><a name="isoBaseMediaFF"></a>[ISO Base Media File Format]
|
|
</dt>
|
|
<dd><a
|
|
href="http://standards.iso.org/ittf/PubliclyAvailableStandards/c051533_ISO_IEC_14496-12_2008.zip"><cite>Information
|
|
technology - Coding of audio-visual objects - Part 12: ISO base media
|
|
file format</cite></a>. April 2009. </dd>
|
|
<dt class="label"><a name="mf-spec"></a>[Media Fragments URI 1.0] </dt>
|
|
<dd><a
|
|
href="http://www.w3.org/2008/WebVideo/Fragments/WD-media-fragments-spec/"><cite>Media
|
|
Fragments URI 1.0</cite></a>. W3C Working Draft, October 2009. </dd>
|
|
<dt class="label"><a name="mpeg-7"></a>[MPEG-7] </dt>
|
|
<dd><cite>Information Technology - Multimedia Content Description Interface
|
|
(MPEG-7)</cite>. Standard No. ISO/IEC 15938:2001, International
|
|
Organization for Standardization(ISO), 2001. </dd>
|
|
<dt class="label"><a name="mpeg21"></a>[MPEG-21] </dt>
|
|
<dd><cite>Information Technology - Multimedia Framework (MPEG-21)</cite>.
|
|
Standard No. ISO/IEC 21000:2002, International Organization for
|
|
Standardization(ISO), 2002. Available at <a
|
|
href="http://www.chiariglione.org/mpeg/working_documents/mpeg-21/fid/fid-is.zip">http://www.chiariglione.org/mpeg/working_documents/mpeg-21/fid/fid-is.zip</a>.
|
|
</dd>
|
|
<dt class="label"><a name="mediaont-10"></a>[Ontology for Media Resource 1.0]
|
|
</dt>
|
|
<dd>W. Lee, T. Bürger, F. Sasaki, V. Malaisé, F. Stegmaier and Joakim
|
|
Söderberg. <a
|
|
href="http://www.w3.org/TR/2009/WD-mediaont-10-20090618/"><cite>Ontology
|
|
for Media Resource 1.0</cite></a>. W3C Working Draft, June 2009.
|
|
Available at <a
|
|
href="http://www.w3.org/TR/mediaont-10">http://www.w3.org/TR/mediaont-10</a>.
|
|
</dd>
|
|
<dt class="label"><a name="rfc2119"></a>[RFC 2119] </dt>
|
|
<dd>S. Bradner. <a href="http://www.ietf.org/rfc/rfc2119.txt"><cite>Key
|
|
Words for use in RFCs to Indicate Requirement Levels</cite></a>. IETF RFC
|
|
2119, March 1997. Available at <a
|
|
href="http://www.ietf.org/rfc/rfc2119.txt">http://www.ietf.org/rfc/rfc2119.txt</a>.
|
|
</dd>
|
|
<dt class="label"><a name="ogg"></a>[RFC 3533] </dt>
|
|
<dd>S. Pfeiffer. <cite>The Ogg Encapsulation Format Version 0</cite>. IETF
|
|
RFC 3533, May 2003. Available at <a
|
|
href="http://www.ietf.org/rfc/rfc3533.txt">http://www.ietf.org/rfc/rfc3533.txt</a>.
|
|
</dd>
|
|
<dt class="label"><a name="rfc3986"></a>[RFC 3986] </dt>
|
|
<dd>T. Berners-Lee, R. Fielding and L. Masinter. <a
|
|
href="http://www.ietf.org/rfc/rfc3986.txt"><cite>Uniform Resource
|
|
Identifier (URI): Generic Syntax</cite></a>. IETF RFC 3986, January 2005.
|
|
Available at <a
|
|
href="http://www.ietf.org/rfc/rfc3986.txt">http://www.ietf.org/rfc/rfc3986.txt</a>.
|
|
</dd>
|
|
<dt class="label"><a name="rfc5147"></a>[RFC 5147] </dt>
|
|
<dd>E. Wilde and M. Duerst. <a
|
|
href="http://tools.ietf.org/html/rfc5147"><cite>URI Fragment Identifiers
|
|
for the text/plain Media Type</cite></a>. IETF RFC 5147, April 2008.
|
|
Available at <a
|
|
href="http://tools.ietf.org/html/rfc5147">http://tools.ietf.org/html/rfc5147</a>.
|
|
</dd>
|
|
<dt class="label"><a name="abnf"></a>[RFC 5234] </dt>
|
|
<dd>D. Crocker. <a
|
|
href="http://tools.ietf.org/html/rfc5234"><cite>Augmented BNF for Syntax
|
|
Specifications: ABNF</cite></a>, IETF RFC 5234, January 2008. Available
|
|
at <a
|
|
href="http://tools.ietf.org/html/rfc5234">http://tools.ietf.org/html/rfc5234</a>.
|
|
</dd>
|
|
<dt class="label"><a name="roe"></a>[ROE] </dt>
|
|
<dd><a href="http://wiki.xiph.org/index.php/ROE"><cite>Rich Open multitrack
|
|
media Exposition (ROE)</cite></a>. Xiph.org Wiki, April 2009. </dd>
|
|
<dt class="label"><a name="skeleton"></a>[Skeleton] </dt>
|
|
<dd><a href="http://wiki.xiph.org/OggSkeleton"><cite>Ogg
|
|
Skeleton</cite></a>. Xiph .org Wiki, April 2009. </dd>
|
|
<dt class="label"><a name="smpte"></a>[SMPTE] </dt>
|
|
<dd><cite>SMPTE RP 136 Time and Control Codes for 24, 25 or 30
|
|
Frame-Per-Second Motion-Picture Systems</cite> </dd>
|
|
<dt class="label"><a name="svg"></a>[SVG] </dt>
|
|
<dd>J. Ferraiolo. <a
|
|
href="http://www.w3.org/TR/2001/REC-SVG-20010904/linking#FragmentIdentifiersSVG"><cite>SVG
|
|
Fragment identifiers</cite></a>. W3C Recommendation, September 2001.
|
|
Available at <a
|
|
href="http://www.w3.org/TR/2001/REC-SVG-20010904/linking#FragmentIdentifiersSVG">http://www.w3.org/TR/2001/REC-SVG-20010904/linking#FragmentIdentifiersSVG</a>.
|
|
</dd>
|
|
<dt class="label"><a name="temporalURI"></a>[Temporal URI] </dt>
|
|
<dd>S. Pfeiffer, C. Parker and A. Pang. <a
|
|
href="http://annodex.net/TR/draft-pfeiffer-temporal-fragments-03.html"><cite>Specifying
|
|
time intervals in URI queries and fragments of time-based Web
|
|
resources</cite></a>. Internet Draft, March 2005. Available at <a
|
|
href="http://annodex.net/TR/draft-pfeiffer-temporal-fragments-03.html">http://annodex.net/TR/draft-pfeiffer-temporal-fragments-03.html</a>.
|
|
IETF Internet-Draft, March 2005. </dd>
|
|
<dt class="label"><a name="xpointer"></a>[XPointer Framework] </dt>
|
|
<dd>P. Grosso, E. Maler, J. Marsh and N. Walsh. <a
|
|
href="http://www.w3.org/TR/xptr-framework/"><cite>XPointer
|
|
Framework</cite></a>. W3C Recommendation, March 2003. Available at <a
|
|
href="http://www.w3.org/TR/xptr-framework/">http://www.w3.org/TR/xptr-framework/</a>.
|
|
</dd>
|
|
</dl>
|
|
</div>
|
|
|
|
<div class="div1">
|
|
<h2><a id="fitness-table" name="fitness-table"></a>B Evaluation of fitness per
|
|
media formats</h2>
|
|
|
|
<p>In order to get a view on which media formats belong to which fitness
|
|
category, an overview is provided for key media formats. In the following
|
|
tables, the 'X' symbol indicates that the media format does not support a
|
|
particular fragment axis. The tables are separated by video/audio/image codecs
|
|
and container formats. </p>
|
|
|
|
<table border="1">
|
|
<tbody>
|
|
<tr>
|
|
<th>Video Codec</th>
|
|
<th>Track</th>
|
|
<th>Temporal</th>
|
|
<th>Spatial</th>
|
|
<th>Name</th>
|
|
<th>Remark</th>
|
|
</tr>
|
|
<tr>
|
|
<td>H.261</td>
|
|
<td>n/a</td>
|
|
<td>fit</td>
|
|
<td>unfit</td>
|
|
<td>n/a</td>
|
|
<td></td>
|
|
</tr>
|
|
<tr>
|
|
<td>MPEG-1 Video</td>
|
|
<td>n/a</td>
|
|
<td>fit</td>
|
|
<td>unfit</td>
|
|
<td>n/a</td>
|
|
<td></td>
|
|
</tr>
|
|
<tr>
|
|
<td>H.262/MPEG-2 Video</td>
|
|
<td>n/a</td>
|
|
<td>fit</td>
|
|
<td>unfit</td>
|
|
<td>n/a</td>
|
|
<td></td>
|
|
</tr>
|
|
<tr>
|
|
<td>H.263</td>
|
|
<td>n/a</td>
|
|
<td>fit</td>
|
|
<td>unfit</td>
|
|
<td>n/a</td>
|
|
<td></td>
|
|
</tr>
|
|
<tr>
|
|
<td>MPEG-4 Visual</td>
|
|
<td>n/a</td>
|
|
<td>fit</td>
|
|
<td>unfit</td>
|
|
<td>n/a</td>
|
|
<td></td>
|
|
</tr>
|
|
<tr>
|
|
<td>H.264/MPEG-4 AVC</td>
|
|
<td>n/a</td>
|
|
<td>fit</td>
|
|
<td>conditionally fit</td>
|
|
<td>n/a</td>
|
|
<td>Spatial fragment extraction is possible with Flexible Macroblock
|
|
Ordening (FMO)</td>
|
|
</tr>
|
|
<tr>
|
|
<td>AVS</td>
|
|
<td>n/a</td>
|
|
<td>fit</td>
|
|
<td>unfit</td>
|
|
<td>n/a</td>
|
|
<td></td>
|
|
</tr>
|
|
<tr>
|
|
<td>Motion JPEG</td>
|
|
<td>n/a</td>
|
|
<td>fit</td>
|
|
<td>unfit</td>
|
|
<td>n/a</td>
|
|
<td></td>
|
|
</tr>
|
|
<tr>
|
|
<td>Motion JPEG2000</td>
|
|
<td>n/a</td>
|
|
<td>fit</td>
|
|
<td>unfit</td>
|
|
<td>n/a</td>
|
|
<td>Spatial fragment extraction is possible in the compressed domain, but
|
|
syntax element modifications are needed for every frame.</td>
|
|
</tr>
|
|
<tr>
|
|
<td>VC-1</td>
|
|
<td>n/a</td>
|
|
<td>fit</td>
|
|
<td>unfit</td>
|
|
<td>n/a</td>
|
|
<td></td>
|
|
</tr>
|
|
<tr>
|
|
<td>Dirac</td>
|
|
<td>n/a</td>
|
|
<td>fit</td>
|
|
<td>unfit</td>
|
|
<td>n/a</td>
|
|
<td>When Dirac is stored in the Ogg <cite><a href="#ogg">RFC
|
|
3533</a></cite> container using Skeleton <cite><a
|
|
href="#skeleton">Skeleton</a></cite>, ROE <cite><a
|
|
href="#roe">ROE</a></cite> and CMML <cite><a
|
|
href="#cmml">CMML</a></cite>, track, temporal and named fragments are
|
|
supported. </td>
|
|
</tr>
|
|
<tr>
|
|
<td>Theora</td>
|
|
<td>n/a</td>
|
|
<td>fit</td>
|
|
<td>unfit</td>
|
|
<td>n/a</td>
|
|
<td>When Theora is stored in the Ogg <cite><a href="#ogg">RFC
|
|
3533</a></cite> container using Skeleton <cite><a
|
|
href="#skeleton">Skeleton</a></cite>, ROE <cite><a
|
|
href="#roe">ROE</a></cite> and CMML <cite><a
|
|
href="#cmml">CMML</a></cite>, track, temporal and named fragments are
|
|
supported. </td>
|
|
</tr>
|
|
<tr>
|
|
<td>RealVideo</td>
|
|
<td>n/a</td>
|
|
<td>fit(?)</td>
|
|
<td>unfit(?)</td>
|
|
<td>n/a</td>
|
|
<td></td>
|
|
</tr>
|
|
<tr>
|
|
<td>DV</td>
|
|
<td>n/a</td>
|
|
<td>fit</td>
|
|
<td>unfit</td>
|
|
<td>n/a</td>
|
|
<td></td>
|
|
</tr>
|
|
<tr>
|
|
<td>Betacam</td>
|
|
<td>n/a</td>
|
|
<td>fit</td>
|
|
<td>unfit</td>
|
|
<td>n/a</td>
|
|
<td></td>
|
|
</tr>
|
|
<tr>
|
|
<td>OMS</td>
|
|
<td>n/a</td>
|
|
<td>fit</td>
|
|
<td>unfit</td>
|
|
<td>n/a</td>
|
|
<td></td>
|
|
</tr>
|
|
<tr>
|
|
<td>SNOW</td>
|
|
<td>n/a</td>
|
|
<td>fit</td>
|
|
<td>unfit</td>
|
|
<td>n/a</td>
|
|
<td></td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
|
|
<p></p>
|
|
|
|
<table border="1">
|
|
<tbody>
|
|
<tr>
|
|
<th>Audio Codec</th>
|
|
<th>Track</th>
|
|
<th>Temporal</th>
|
|
<th>Spatial</th>
|
|
<th>Name</th>
|
|
<th>Remark</th>
|
|
</tr>
|
|
<tr>
|
|
<td>MPEG-1 Audio</td>
|
|
<td>n/a</td>
|
|
<td>fit</td>
|
|
<td>n/a</td>
|
|
<td>n/a</td>
|
|
<td></td>
|
|
</tr>
|
|
<tr>
|
|
<td>AAC</td>
|
|
<td>n/a</td>
|
|
<td>fit</td>
|
|
<td>n/a</td>
|
|
<td>n/a</td>
|
|
<td></td>
|
|
</tr>
|
|
<tr>
|
|
<td>Vorbis</td>
|
|
<td>n/a</td>
|
|
<td>fit</td>
|
|
<td>n/a</td>
|
|
<td>n/a</td>
|
|
<td>When Vorbis is stored in the Ogg <cite><a href="#ogg">RFC
|
|
3533</a></cite> container using Skeleton <cite><a
|
|
href="#skeleton">Skeleton</a></cite>, ROE <cite><a
|
|
href="#roe">ROE</a></cite> and CMML <cite><a
|
|
href="#cmml">CMML</a></cite>, track, temporal and named fragments are
|
|
supported. </td>
|
|
</tr>
|
|
<tr>
|
|
<td>FLAC</td>
|
|
<td>n/a</td>
|
|
<td>fit</td>
|
|
<td>n/a</td>
|
|
<td>n/a</td>
|
|
<td>When FLAC is stored in the Ogg <cite><a href="#ogg">RFC
|
|
3533</a></cite> container using Skeleton <cite><a
|
|
href="#skeleton">Skeleton</a></cite>, ROE <cite><a
|
|
href="#roe">ROE</a></cite> and CMML <cite><a
|
|
href="#cmml">CMML</a></cite>, track, temporal and named fragments are
|
|
supported. </td>
|
|
</tr>
|
|
<tr>
|
|
<td>Speex</td>
|
|
<td>n/a</td>
|
|
<td>fit</td>
|
|
<td>n/a</td>
|
|
<td>n/a</td>
|
|
<td>When Speex is stored in the Ogg <cite><a href="#ogg">RFC
|
|
3533</a></cite> container using Skeleton <cite><a
|
|
href="#skeleton">Skeleton</a></cite>, ROE <cite><a
|
|
href="#roe">ROE</a></cite> and CMML <cite><a
|
|
href="#cmml">CMML</a></cite>, track, temporal and named fragments are
|
|
supported. </td>
|
|
</tr>
|
|
<tr>
|
|
<td>AC-3/Dolby Digital</td>
|
|
<td>n/a</td>
|
|
<td>fit</td>
|
|
<td>n/a</td>
|
|
<td>n/a</td>
|
|
<td></td>
|
|
</tr>
|
|
<tr>
|
|
<td>TTA</td>
|
|
<td>n/a</td>
|
|
<td>fit</td>
|
|
<td>n/a</td>
|
|
<td>n/a</td>
|
|
<td></td>
|
|
</tr>
|
|
<tr>
|
|
<td>WMA</td>
|
|
<td>n/a</td>
|
|
<td>fit</td>
|
|
<td>n/a</td>
|
|
<td>n/a</td>
|
|
<td></td>
|
|
</tr>
|
|
<tr>
|
|
<td>MLP</td>
|
|
<td>n/a</td>
|
|
<td>fit</td>
|
|
<td>n/a</td>
|
|
<td>n/a</td>
|
|
<td></td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
|
|
<p></p>
|
|
|
|
<table border="1">
|
|
<tbody>
|
|
<tr>
|
|
<th>Image Codec</th>
|
|
<th>Track</th>
|
|
<th>Temporal</th>
|
|
<th>Spatial</th>
|
|
<th>Name</th>
|
|
<th>Remark</th>
|
|
</tr>
|
|
<tr>
|
|
<td>JPEG</td>
|
|
<td>n/a</td>
|
|
<td>n/a</td>
|
|
<td>unfit</td>
|
|
<td>n/a</td>
|
|
<td></td>
|
|
</tr>
|
|
<tr>
|
|
<td>JPEG2000</td>
|
|
<td>n/a</td>
|
|
<td>n/a</td>
|
|
<td>conditionally fit</td>
|
|
<td>n/a</td>
|
|
<td></td>
|
|
</tr>
|
|
<tr>
|
|
<td>JPEG LS</td>
|
|
<td>n/a</td>
|
|
<td>n/a</td>
|
|
<td>unfit</td>
|
|
<td>n/a</td>
|
|
<td></td>
|
|
</tr>
|
|
<tr>
|
|
<td>HD Photo</td>
|
|
<td>n/a</td>
|
|
<td>n/a</td>
|
|
<td>conditionally fit</td>
|
|
<td>n/a</td>
|
|
<td></td>
|
|
</tr>
|
|
<tr>
|
|
<td>GIF</td>
|
|
<td>n/a</td>
|
|
<td>n/a</td>
|
|
<td>unfit</td>
|
|
<td>n/a</td>
|
|
<td></td>
|
|
</tr>
|
|
<tr>
|
|
<td>PNG</td>
|
|
<td>n/a</td>
|
|
<td>n/a</td>
|
|
<td>unfit</td>
|
|
<td>n/a</td>
|
|
<td></td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
|
|
<p></p>
|
|
|
|
<table border="1">
|
|
<tbody>
|
|
<tr>
|
|
<th>Container Formats</th>
|
|
<th>Track</th>
|
|
<th>Temporal</th>
|
|
<th>Spatial</th>
|
|
<th>Name</th>
|
|
<th>Remark</th>
|
|
</tr>
|
|
<tr>
|
|
<td>MOV</td>
|
|
<td>conditionally fit</td>
|
|
<td>n/a</td>
|
|
<td>n/a</td>
|
|
<td>conditionally fit</td>
|
|
<td><a title="http://www.apple.com/quicktime/tutorials/texttracks.html"
|
|
class="external text"
|
|
href="http://www.apple.com/quicktime/tutorials/texttracks.html">QTText</a>
|
|
provides named chapters </td>
|
|
</tr>
|
|
<tr>
|
|
<td>MP4</td>
|
|
<td>conditionally fit</td>
|
|
<td>n/a</td>
|
|
<td>n/a</td>
|
|
<td>conditionally fit</td>
|
|
<td><a title="http://en.wikipedia.org/wiki/MPEG-4_Part_17"
|
|
class="external text"
|
|
href="http://en.wikipedia.org/wiki/MPEG-4_Part_17">MPEG-4 TimedText</a>
|
|
provides named sections </td>
|
|
</tr>
|
|
<tr>
|
|
<td>3GP</td>
|
|
<td>conditionally fit</td>
|
|
<td>n/a</td>
|
|
<td>n/a</td>
|
|
<td>conditionally fit</td>
|
|
<td><a title="http://en.wikipedia.org/wiki/MPEG-4_Part_17"
|
|
class="external text"
|
|
href="http://en.wikipedia.org/wiki/MPEG-4_Part_17">3GPP TimedText</a>
|
|
provides named sections </td>
|
|
</tr>
|
|
<tr>
|
|
<td>MPEG-21 FF</td>
|
|
<td>conditionally fit</td>
|
|
<td>n/a</td>
|
|
<td>n/a</td>
|
|
<td>conditionally fit</td>
|
|
<td><a
|
|
title="http://www.chiariglione.org/mpeg/technologies/mpeg-21/mp21-did/index.htm"
|
|
class="external text"
|
|
href="http://www.chiariglione.org/mpeg/technologies/mpeg-21/mp21-did/index.htm">MPEG-21
|
|
Digital Item Declaration</a> provides named sections </td>
|
|
</tr>
|
|
<tr>
|
|
<td>OGG</td>
|
|
<td>conditionally fit (1)</td>
|
|
<td>fit</td>
|
|
<td>n/a</td>
|
|
<td>conditionally fit (2)</td>
|
|
<td>(1) Using ROE <cite><a href="#roe">ROE</a></cite> and Skeleton
|
|
<cite><a href="#skeleton">Skeleton</a></cite>, track selection is
|
|
possible; (2) Using ROE, CMML <cite><a href="#cmml">CMML</a></cite> and
|
|
Skeleton, named addressing of temporal and track fragments is possible
|
|
</td>
|
|
</tr>
|
|
<tr>
|
|
<td>Matroska</td>
|
|
<td>conditionally fit</td>
|
|
<td>n/a</td>
|
|
<td>n/a</td>
|
|
<td>conditionally fit</td>
|
|
<td></td>
|
|
</tr>
|
|
<tr>
|
|
<td>MXF</td>
|
|
<td>conditionally fit</td>
|
|
<td>n/a</td>
|
|
<td>n/a</td>
|
|
<td>conditionally fit</td>
|
|
<td></td>
|
|
</tr>
|
|
<tr>
|
|
<td>ASF</td>
|
|
<td>conditionally fit</td>
|
|
<td>n/a</td>
|
|
<td>n/a</td>
|
|
<td>conditionally fit</td>
|
|
<td>Marker objects provide named anchor points</td>
|
|
</tr>
|
|
<tr>
|
|
<td>AVI</td>
|
|
<td>conditionally fit</td>
|
|
<td>n/a</td>
|
|
<td>n/a</td>
|
|
<td>X</td>
|
|
<td></td>
|
|
</tr>
|
|
<tr>
|
|
<td>FLV</td>
|
|
<td>conditionally fit</td>
|
|
<td>n/a</td>
|
|
<td>n/a</td>
|
|
<td>conditionally fit</td>
|
|
<td><a
|
|
title="http://help.adobe.com/en_US/Soundbooth/2.0/WSA5A1DDFB-6BE2-4486-BE0C-A10CEEF119ADa.html"
|
|
class="external text"
|
|
href="http://help.adobe.com/en_US/Soundbooth/2.0/WSA5A1DDFB-6BE2-4486-BE0C-A10CEEF119ADa.html">cue
|
|
points</a> provide named anchor points </td>
|
|
</tr>
|
|
<tr>
|
|
<td>RMFF</td>
|
|
<td>fit or conditionally fit(?)</td>
|
|
<td>n/a</td>
|
|
<td>n/a</td>
|
|
<td>?</td>
|
|
<td></td>
|
|
</tr>
|
|
<tr>
|
|
<td>WAV</td>
|
|
<td>X</td>
|
|
<td>n/a</td>
|
|
<td>n/a</td>
|
|
<td>X</td>
|
|
<td></td>
|
|
</tr>
|
|
<tr>
|
|
<td>AIFF</td>
|
|
<td>X</td>
|
|
<td>n/a</td>
|
|
<td>n/a</td>
|
|
<td>X</td>
|
|
<td></td>
|
|
</tr>
|
|
<tr>
|
|
<td>XMF</td>
|
|
<td>?</td>
|
|
<td>n/a</td>
|
|
<td>n/a</td>
|
|
<td>?</td>
|
|
<td></td>
|
|
</tr>
|
|
<tr>
|
|
<td>AU</td>
|
|
<td>X</td>
|
|
<td>n/a</td>
|
|
<td>n/a</td>
|
|
<td>X</td>
|
|
<td></td>
|
|
</tr>
|
|
<tr>
|
|
<td>TIFF</td>
|
|
<td>conditionally fit</td>
|
|
<td>n/a</td>
|
|
<td>n/a</td>
|
|
<td>conditionally fit</td>
|
|
<td>Can store multiple images (i.e., tracks) in one file, possibility to
|
|
insert "private tags" (i.e., proprietary information)</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
|
|
<div class="div1">
|
|
<h2><a id="technologies-survey" name="technologies-survey"></a>C Technologies
|
|
Survey</h2>
|
|
|
|
<div class="div2">
|
|
<h3><a id="ExistingSchemes" name="ExistingSchemes"></a>C.1 Existing URI
|
|
fragment schemes</h3>
|
|
|
|
<p>Some existing URI schemes define semantics for fragment identifiers. In this
|
|
section, we list these URI schemes and provide examples of their fragment
|
|
identifiers. </p>
|
|
|
|
<div class="div3">
|
|
<h4><a id="GeneralURISchemes" name="GeneralURISchemes"></a>C.1.1 General
|
|
specification of URI fragments</h4>
|
|
<ul>
|
|
<li>
|
|
<div>
|
|
<em>URI Fragment</em><cite><a href="#rfc3986">RFC 3986</a></cite>
|
|
|
|
<div class="exampleInner">
|
|
<pre>http://www.w3.org/2008/WebVideo/Fragments/wiki/Main_Page#Preparation_of_Working_Draft</pre>
|
|
</div>
|
|
cited from RFC3986: "The fragment identifier component of a URI allows
|
|
indirect identification of a secondary resource by reference to a primary
|
|
resource and additional identifying information. The identified secondary
|
|
resource may be some portion or subset of the primary resource, some view
|
|
on representations of the primary resource, or some other resource defined
|
|
or described by those representations. A fragment identifier component is
|
|
indicated by the presence of a number sign ("#") character and terminated
|
|
by the end of the URI." </div>
|
|
</li>
|
|
</ul>
|
|
</div>
|
|
|
|
<div class="div3">
|
|
<h4><a id="NonAudioVideoURISchemes" name="NonAudioVideoURISchemes"></a>C.1.2
|
|
Fragment specifications not for audio/video</h4>
|
|
<ul>
|
|
<li>
|
|
<div>
|
|
<em>HTML named anchors </em><cite><a href="#html40">HTML 4.0</a></cite>
|
|
|
|
<div class="exampleInner">
|
|
<pre>http://www.w3.org/2008/WebVideo/Fragments/wiki/Main_Page#Preparation_of_Working_Draft</pre>
|
|
</div>
|
|
refers to a specific named anchor within the resource
|
|
http://www.w3.org/2008/WebVideo/Fragments/wiki/Main_Page </div>
|
|
</li>
|
|
<li>
|
|
<div>
|
|
<em>XPointer named elements </em><cite><a href="#xpointer">XPointer
|
|
Framework</a></cite>
|
|
|
|
<div class="exampleInner">
|
|
<pre>http://www.w3schools.com/xlink/dogbreeds.xml#xpointer(id("Rottweiler"))</pre>
|
|
</div>
|
|
refers to the element with id equal to 'Rottweiler' in the target XML
|
|
document http://www.w3schools.com/xlink/dogbreeds.xml </div>
|
|
</li>
|
|
<li>
|
|
<div>
|
|
<em>text (plain) </em><cite><a href="#rfc5147">RFC 5147</a></cite>
|
|
|
|
<div class="exampleInner">
|
|
<pre>http://example.com/text.txt#line=10,20</pre>
|
|
</div>
|
|
identifies lines 11 to 20 of the text.txt MIME entity </div>
|
|
</li>
|
|
<li>
|
|
<div>
|
|
<em>SVG </em><cite><a href="#svg">SVG</a></cite>
|
|
|
|
<div class="exampleInner">
|
|
<pre>http://example.com/wiki/File:Yalta_summit_1945_with_Churchill,_Roosevelt,_Stalin.svg#svgView(14.64,15.73,146.98,147.48)
|
|
</pre>
|
|
</div>
|
|
specifies the region to be viewed of the SVG image
|
|
http://example.com/wiki/File:Yalta_summit_1945_with_Churchill,_Roosevelt,_Stalin.svg
|
|
</div>
|
|
</li>
|
|
</ul>
|
|
</div>
|
|
|
|
<div class="div3">
|
|
<h4><a id="AudioVideoURISchemes" name="AudioVideoURISchemes"></a>C.1.3 Fragment
|
|
specifications for audio/video</h4>
|
|
<ul>
|
|
<li>
|
|
<div>
|
|
<em>Temporal URI/Ogg technologies </em><cite><a
|
|
href="#temporalURI">Temporal URI</a></cite>
|
|
|
|
<div class="exampleInner">
|
|
<pre>http://example.com/video.ogv#t=12.3/21.16</pre>
|
|
</div>
|
|
specifies a temporal fragment of the OGG Theora video
|
|
http://example.com/video.ogv starting at 12.3 s and and ending at 21.16 s
|
|
</div>
|
|
</li>
|
|
<li>
|
|
<div>
|
|
<em>MPEG-21 </em><cite><a href="#mpeg21">MPEG-21</a></cite>
|
|
|
|
<div class="exampleInner">
|
|
<pre>http://www.example.com/myfile.mp4#mp(/~time('npt','10','30'))</pre>
|
|
</div>
|
|
specifies a temporal fragment of the MP4 resource
|
|
http://www.example.com/myfile.mp4 starting at 10 s and and ending at 30 s
|
|
</div>
|
|
</li>
|
|
</ul>
|
|
</div>
|
|
</div>
|
|
|
|
<div class="div2">
|
|
<h3><a id="ExistingApplications" name="ExistingApplications"></a>C.2 Existing
|
|
applications using proprietary temporal media fragment URI schemes</h3>
|
|
|
|
<p>In this section, we list a number of proprietary URI schemes which are able
|
|
to identify media fragments. Note that all of these schemes only provide
|
|
support for addressing temporal media fragments.</p>
|
|
<ul>
|
|
<li>
|
|
<div>
|
|
<em>Google Video</em> (<a
|
|
href="http://lists.w3.org/Archives/Public/public-media-fragment/2008Oct/0067.html">announcement</a>)
|
|
|
|
<div class="exampleInner">
|
|
<pre>http://video.google.com/videoplay?docid=4776181634656145640#&ei=MCH-SNfJD5HS2gKirMD2Dg&q=%22that%27s+a+tremendous+gift%22#50m16s</pre>
|
|
</div>
|
|
Syntax: <code>#50m16s</code></div>
|
|
</li>
|
|
<li>
|
|
<div>
|
|
<em>YouTube</em> (<a
|
|
href="http://www.techcrunch.com/2008/10/25/youtube-enables-deep-linking-within-videos/">announcement</a>)
|
|
|
|
<div class="exampleInner">
|
|
<pre>http://www.youtube.com/watch?v=1bibCui3lFM#t=1m45s</pre>
|
|
</div>
|
|
Syntax: <code>#t=1m45s</code>
|
|
<p>YouTube also does click-throughs on their embedded videos to time
|
|
offsets (see <a
|
|
href="http://googlesystem.blogspot.com/2009/08/clicking-on-youtube-video.html">announcement</a>),
|
|
and returns to time offsets when people browse away from a video they half
|
|
watched (see <a
|
|
href="">http://youtube-global.blogspot.com/2009/09/release-notes-91709.html</a>).
|
|
</p>
|
|
</div>
|
|
</li>
|
|
<li>
|
|
<div>
|
|
<em>Archive.org</em> (uses the temporalURI specification <cite><a
|
|
href="#temporalURI">Temporal URI</a></cite>)
|
|
|
|
<div class="exampleInner">
|
|
<pre>http://www.archive.org/download/to-SF/toSF_512kb.mp4?t=74.5</pre>
|
|
</div>
|
|
Syntax: <code>?t=74.5</code></div>
|
|
</li>
|
|
<li>
|
|
<div>
|
|
<em>Videosurf</em> (<a
|
|
href="http://solution.allthingsd.com/20081118/a-search-engine-with-a-real-eye-for-videos/">announcement</a>)
|
|
|
|
<div class="exampleInner">
|
|
<pre>http://www.videosurf.com/video/michael-jordan-1989-playoffs-gm-5-vs-cavs-the-shot-904591?t=140&e=184</pre>
|
|
</div>
|
|
Syntax: <code>?t=140&e=184</code> (with t=start, e=end) </div>
|
|
</li>
|
|
</ul>
|
|
</div>
|
|
|
|
<div class="div2">
|
|
<h3><a id="MediaFragmentApproaches" name="MediaFragmentApproaches"></a>C.3
|
|
Media fragment specification approaches</h3>
|
|
|
|
<table border="1">
|
|
<tbody>
|
|
<tr>
|
|
<th>Media fragment Approach</th>
|
|
<th>Temporal</th>
|
|
<th>Spatial</th>
|
|
<th>Track</th>
|
|
<th>Name</th>
|
|
</tr>
|
|
<tr>
|
|
<td colspan="5"><em>URI based</em> </td>
|
|
</tr>
|
|
<tr>
|
|
<td>SVG</td>
|
|
<td>No</td>
|
|
<td>Yes</td>
|
|
<td>No</td>
|
|
<td>No</td>
|
|
</tr>
|
|
<tr>
|
|
<td>Temporal URI/Ogg technologies</td>
|
|
<td>Yes</td>
|
|
<td>No</td>
|
|
<td>Yes</td>
|
|
<td>Yes</td>
|
|
</tr>
|
|
<tr>
|
|
<td>MPEG-21</td>
|
|
<td>Yes</td>
|
|
<td>Yes</td>
|
|
<td>Yes</td>
|
|
<td>Yes</td>
|
|
</tr>
|
|
<tr>
|
|
<td colspan="5"><em>Non-URI-based</em> </td>
|
|
</tr>
|
|
<tr>
|
|
<td>SMIL</td>
|
|
<td>Yes</td>
|
|
<td>Yes</td>
|
|
<td>No?</td>
|
|
<td>No?</td>
|
|
</tr>
|
|
<tr>
|
|
<td>MPEG-7</td>
|
|
<td>Yes</td>
|
|
<td>Yes</td>
|
|
<td>Yes</td>
|
|
<td>Yes</td>
|
|
</tr>
|
|
<tr>
|
|
<td>SVG</td>
|
|
<td>No</td>
|
|
<td>Yes</td>
|
|
<td>No</td>
|
|
<td>?</td>
|
|
</tr>
|
|
<tr>
|
|
<td>TV-Anytime</td>
|
|
<td>Yes</td>
|
|
<td>No</td>
|
|
<td>No</td>
|
|
<td>Yes</td>
|
|
</tr>
|
|
<tr>
|
|
<td>ImageMaps</td>
|
|
<td>No</td>
|
|
<td>Yes</td>
|
|
<td>No</td>
|
|
<td>?</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
|
|
<div class="div3">
|
|
<h4><a id="URI-based" name="URI-based"></a>C.3.1 URI based</h4>
|
|
|
|
<div class="div4">
|
|
<h5><a id="SVG_URI" name="SVG_URI"></a>C.3.1.1 SVG</h5>
|
|
|
|
<div class="div5">
|
|
<h6><a id="Spatial_SVG_URI" name="Spatial_SVG_URI"></a>C.3.1.1.1 Spatial</h6>
|
|
|
|
<p>Possible via SVG 1.1 Fragment Identifiers. Only rectangular spatial regions
|
|
are supported:</p>
|
|
|
|
<div>
|
|
|
|
<div class="exampleInner">
|
|
<pre>http://example.com/wiki/File:Yalta_summit_1945_with_Churchill,_Roosevelt,_Stalin.svg#svgView(14.64,15.73,146.98,147.48)</pre>
|
|
</div>
|
|
</div>
|
|
</div>
|
|
</div>
|
|
|
|
<div class="div4">
|
|
<h5><a id="TemporalURI" name="TemporalURI"></a>C.3.1.2 Temporal URI/Ogg
|
|
technologies</h5>
|
|
|
|
<div class="div5">
|
|
<h6><a id="Temporal_TemporalURI" name="Temporal_TemporalURI"></a>C.3.1.2.1
|
|
Temporal</h6>
|
|
|
|
<p>A Temporal URI <cite><a href="#temporalURI">Temporal URI</a></cite> is being
|
|
used to play back temporal fragments in Annodex. The clip's begin and end are
|
|
specified directly in the URI. When using "#" the URI fragment identfier, it is
|
|
expected that the media fragment is played after downloading the complete
|
|
resource, while using "?" URI query parameters, it is expected that the media
|
|
fragment is extracted on the server and downloaded as a new resource to the
|
|
client. Linking to such a resource looks as follows: </p>
|
|
|
|
<div class="exampleInner">
|
|
<pre> <a href="http://example.com/video.ogv#t=12.3/21.16" />
|
|
<a href="http://example.com/video.ogv?t=12.3/21.16" />
|
|
</pre>
|
|
</div>
|
|
|
|
<p>It it possible to use different temporal schemes, which give frame-accurate
|
|
clipping when used correctly:</p>
|
|
|
|
<div class="exampleInner">
|
|
<pre> <a href="http://example.com/video.ogv?t=npt:12.3/21.16" />
|
|
<a href="http://example.com/video.ogv?t=smpte-25:00:12:33:06/00:21:16:00" />
|
|
<a href="http://example.com/audio.ogv?t=clock:20021107T173045.25Z" />
|
|
</pre>
|
|
</div>
|
|
</div>
|
|
|
|
<div class="div5">
|
|
<h6><a id="Track_TemporalURI" name="Track_TemporalURI"></a>C.3.1.2.2 Track</h6>
|
|
|
|
<p>Tracks are an orthogonal concept to time-aligned annotations. Therefore,
|
|
Xiph/Annodex have invented another way of describing/annotating these. It's
|
|
only new (since January 2008) and is called: ROE (for Rich Open multitrack
|
|
media Encapsulation) <cite><a href="#roe">ROE</a></cite>. With ROE you would
|
|
describe the composition of your media resource on the server. This file can
|
|
also be downloaded to a client to find out about the "capabilities" of the
|
|
file. It is however mainly used for authoring-on-the-fly. Depending on what a
|
|
client requires, the ROE file can be used to find the different tracks and
|
|
multiplex them together. Here is an example file: </p>
|
|
|
|
<div class="exampleInner">
|
|
<pre> <ROE>
|
|
<head>
|
|
<link rel="alternate" type="text/html" href="http://example.com/complete_video.html" />
|
|
</head>
|
|
<body>
|
|
<track id="v" provides="video">
|
|
<seq>
|
|
<mediaSource id="v0" src="http://example.com/video.ogv" content-type="video/ogg" />
|
|
<mediaSource id="v1" src="http://example.com/theora.ogv?track=v1" content-type="video/theora" />
|
|
</seq>
|
|
</track>
|
|
<track id="a" provides="audio">
|
|
<mediaSource id="a1" src="http://example.com/theora.ogv?track=a1" content-type="audio/vorbis" />
|
|
</track>
|
|
<track id="c1" provides="caption">
|
|
<mediaSource src="http://example.com/cmml1.cmml" content-type="text/cmml" />
|
|
</track>
|
|
<track id="c2" provides="ticker">
|
|
<mediaSource src="http://example.com/cmml2.cmml" content-type="text/cmml" />
|
|
</track>
|
|
</body>
|
|
</ROE>
|
|
</pre>
|
|
</div>
|
|
|
|
<p>This has not completely been worked through and implemented, but Metavid is
|
|
using ROE as an export format to describe the different resources available as
|
|
subpart to one media resource. Note that ROE is also used to create an Ogg
|
|
Skeleton <cite><a href="#skeleton">Skeleton</a></cite> in a final multiplexed
|
|
file. Thus, the information inherent in ROE goes into the file (at least in
|
|
theory) and can be used to extract tracks in a URI: </p>
|
|
|
|
<div class="exampleInner">
|
|
<pre><video src="http://example.com/video.ogv?track=a/v/c1"/></pre>
|
|
</div>
|
|
</div>
|
|
|
|
<div class="div5">
|
|
<h6><a id="Named_TemporalURI" name="Named_TemporalURI"></a>C.3.1.2.3 Named</h6>
|
|
|
|
<p>To include outgoing hyperlinks into video, you have to define the
|
|
time-aligned markup of your video (or audio) stream. For this purpose, Annodex
|
|
uses CMML <cite><a href="#cmml">CMML</a></cite>. Here is an example CMML file
|
|
that can be used to include out-going hyperlinks next to or into Ogg <cite><a
|
|
href="#ogg">RFC 3533</a></cite> streams. ("next to" means here that the CMML
|
|
file is kept separate of the Ogg file, but that the client-side player knows to
|
|
synchronise the two, "into" means that CMML is multiplexed as a timed text
|
|
codec into the Ogg physical bitstream creating only one file that has to be
|
|
exchanged). The following defines a CMML clip that has an outgoing hyperlink
|
|
(this is a partial document extracted from a CMML file): </p>
|
|
|
|
<div class="exampleInner">
|
|
<pre> <clip id="tic1" start="npt:12.3" end="npt:21.16" title="Introduction">
|
|
<a href="http://example.com/fish.ogv?t=5" >Watch another fish video.</a>
|
|
<meta name="author" content="Frank"/>
|
|
<img src="fish.png"/>
|
|
<body>This is the introduction to the film Joe made about fish.</body>
|
|
</clip>
|
|
</pre>
|
|
</div>
|
|
|
|
<p>Note how there is also the possibility of naming a thumbnail, providing
|
|
metadata, and giving a full description of the clip in the body tag.
|
|
Interestingly, you can also address into temporal fragments of a CMML <cite><a
|
|
href="#cmml">CMML</a></cite> file, since it is a representation of a
|
|
time-continuous data resource: </p>
|
|
|
|
<div class="exampleInner">
|
|
<pre><a href="http://example.com/sample.cmml?t=npt:4" /></pre>
|
|
</div>
|
|
|
|
<p>With CMML and ROE you can address into named temporal regions of a CMML file
|
|
itself: </p>
|
|
|
|
<div class="exampleInner">
|
|
<pre><a href="http://example.com/sample.cmml?id="tic1" /></pre>
|
|
</div>
|
|
</div>
|
|
</div>
|
|
|
|
<div class="div4">
|
|
<h5><a id="MPEG-21" name="MPEG-21"></a>C.3.1.3 MPEG-21</h5>
|
|
|
|
<p>Four different schemes are specified in MPEG-21 Part 17 <cite><a
|
|
href="#mpeg21">MPEG-21</a></cite> to address parts of media resources: ffp(),
|
|
offset(), mp(), and mask(): </p>
|
|
<ul>
|
|
<li><p><em>ffp()</em> is applicable for file formats conforming to the ISO
|
|
Base Media File Format (aka MPEG-4 part 12 or ISO/IEC 14496-12) and is able
|
|
to identifiy tracks via track_ID located in the iloc and tkhd box
|
|
respectively </p>
|
|
</li>
|
|
<li><p><em>offset()</em> is applicable to any digital resource and identifies
|
|
a range of bytes in a data stream (similar functionality as the HTTP byte
|
|
range mechanism). </p>
|
|
</li>
|
|
<li><p><em>mp()</em> is applicable for media resources whose Internet media
|
|
type (or MIME type) is equal to audio/mpeg, video/mpeg, video/mp4,
|
|
audio/mp4, or application/mp4 and provides two complementary mechanisms for
|
|
identifying fragments in a multimedia resource via: </p>
|
|
<ul>
|
|
<li><p>a set of so-called dimensions (i.e., temporal, spatial or
|
|
spatiotemporal) which are independent of the coding/container format:
|
|
for the temporal dimension, the following time schemes are supported:
|
|
NPT, SMPTE, MPEG-7, and UTC.</p>
|
|
</li>
|
|
<li><p>a hierarchical logical model of the resource. Such a logical model
|
|
is dependent on the underlying container format (e.g., audio CD
|
|
contains a list of tracks). The structures defined in these logical
|
|
models are accessed with a syntax based on XPath.</p>
|
|
</li>
|
|
</ul>
|
|
</li>
|
|
<li><p><em>mask()</em> is applicable for media resources whose Internet media
|
|
type (or MIME type) is equal to video/mp4 or video/mpeg and addresses a
|
|
binary mask defined in a resource (binary masks can be achieved through
|
|
MPEG-4 shape coding). Note that this mask is meant to be applied to a video
|
|
resource and that the video resource may itself be the resource that
|
|
contains the mask. </p>
|
|
</li>
|
|
</ul>
|
|
|
|
<p>Note that hierarchical combinations of addressing schemes are also possible.
|
|
The '*' operator is used for this purpose. When two consecutive pointer parts
|
|
are separated by the '*' operator, the fragments located by the first pointer
|
|
part (to the left of the '*' operator) are used as a context for evaluating the
|
|
second pointer part (to the right of the '*' operator). </p>
|
|
|
|
<div class="div5">
|
|
<h6><a id="Temporal_MPEG-21" name="Temporal_MPEG-21"></a>C.3.1.3.1 Temporal</h6>
|
|
|
|
<p>Supported by the mp() scheme: </p>
|
|
<ul>
|
|
<li>
|
|
<div>
|
|
Select the first sixty seconds:
|
|
|
|
<div class="exampleInner">
|
|
<pre>http://www.example.com/myfile.mp4#mp(~time('npt','0','60'))</pre>
|
|
</div>
|
|
</div>
|
|
</li>
|
|
</ul>
|
|
</div>
|
|
|
|
<div class="div5">
|
|
<h6><a id="Spatial_MPEG-21" name="Spatial_MPEG-21"></a>C.3.1.3.2 Spatial</h6>
|
|
|
|
<p>Supported by the mp() scheme: </p>
|
|
<ul>
|
|
<li>
|
|
<div>
|
|
locates a 20x20 square region of an image:
|
|
|
|
<div class="exampleInner">
|
|
<pre>http://www.example.com/myfile.mp4#mp(~region(rect(20,20,40,40)))</pre>
|
|
</div>
|
|
</div>
|
|
</li>
|
|
<li>
|
|
<div>
|
|
address a moving region which is restricted to the time interval between 10
|
|
and 30 seconds NPT:
|
|
|
|
<div class="exampleInner">
|
|
<pre>http://www.example.com/myfile.mp4#mp(/~time('npt','10','30')/~moving-region(rect(0,0,5,5),pt(10,10,t(5)),pt(20,20)))</pre>
|
|
</div>
|
|
</div>
|
|
</li>
|
|
</ul>
|
|
</div>
|
|
|
|
<div class="div5">
|
|
<h6><a id="Track_MPEG-21" name="Track_MPEG-21"></a>C.3.1.3.3 Track</h6>
|
|
|
|
<p>Supported by the mp() and ffp() schemes: </p>
|
|
<ul>
|
|
<li>
|
|
<div>
|
|
Select the first track of a CD:
|
|
|
|
<div class="exampleInner">
|
|
<pre>http://www.example.com/myfile.mp4#mp(/CD/track[position()=1])</pre>
|
|
</div>
|
|
</div>
|
|
</li>
|
|
<li>
|
|
<div>
|
|
Select a track based on its id:
|
|
|
|
<div class="exampleInner">
|
|
<pre>http://www.example.com/myfile.mp4#ffp(track_ID=101)</pre>
|
|
</div>
|
|
</div>
|
|
</li>
|
|
</ul>
|
|
</div>
|
|
|
|
<div class="div5">
|
|
<h6><a id="Named_MPEG-21" name="Named_MPEG-21"></a>C.3.1.3.4 Named</h6>
|
|
|
|
<p>Supported by the ffp() and mask() schemes: </p>
|
|
<ul>
|
|
<li>
|
|
<div>
|
|
Select the fragment with item ID equal to 1:
|
|
|
|
<div class="exampleInner">
|
|
<pre>http://www.example.com/myfile.mp4#ffp(item_ID=1)</pre>
|
|
</div>
|
|
</div>
|
|
</li>
|
|
<li>
|
|
<div>
|
|
Selects a portion of an MPEG-4 video, myVideo.mp4, using a mask defined in
|
|
the first track of the same MPEG-4 video:
|
|
|
|
<div class="exampleInner">
|
|
<pre>http://www.example.com/myVideo.mp4#mask(%23ffp(item_ID=1),mpeg)</pre>
|
|
</div>
|
|
</div>
|
|
</li>
|
|
</ul>
|
|
</div>
|
|
</div>
|
|
</div>
|
|
|
|
<div class="div3">
|
|
<h4><a id="Non-URI-based" name="Non-URI-based"></a>C.3.2 Non-URI-based</h4>
|
|
|
|
<div class="div4">
|
|
<h5><a id="SMIL" name="SMIL"></a>C.3.2.1 SMIL</h5>
|
|
|
|
<div class="div5">
|
|
<h6><a id="Temporal_SMIL" name="Temporal_SMIL"></a>C.3.2.1.1 Temporal</h6>
|
|
|
|
<p><em>Playing temporal fragments out-of-context</em> </p>
|
|
|
|
<p>SMIL allows you to play only a fragment of the video by using the clipBegin
|
|
and clipEnd atributes. How this is implemented, though, is out of scope for the
|
|
SMIL spec (and for http-based URLs it may well mean that implementations get
|
|
the whole media item and cut it up locally): </p>
|
|
|
|
<div class="exampleInner">
|
|
<pre><video xml:id="toc1" src="http://www.w3.org/2008/WebVideo/Fragments/media/fragf2f.mp4" clipBegin="12.3s" clipEnd="21.16s" /></pre>
|
|
</div>
|
|
|
|
<p>It is possible to use different time schemes, which give frame-accurate
|
|
clipping when used correctly:</p>
|
|
|
|
<div class="exampleInner">
|
|
<pre> <video xml:id="toc2" src="http://www.w3.org/2008/WebVideo/Fragments/media/fragf2f.mp4" clipBegin="npt=12.3s" clipEnd="npt=21.16s" />
|
|
<video xml:id="toc3" src="http://www.w3.org/2008/WebVideo/Fragments/media/fragf2f.mp4" clipBegin="smpte=00:00:12:09" clipEnd="smpte=00:00:21:05" />
|
|
</pre>
|
|
</div>
|
|
|
|
<p>Adding metadata to such a fragment is supported since SMIL 3.0: </p>
|
|
|
|
<div class="exampleInner">
|
|
<pre> <video xml:id="toc4" src="http://www.w3.org/2008/WebVideo/Fragments/media/fragf2f.mp4" clipBegin="12.3s" clipEnd="21.16s">
|
|
<metadata>
|
|
<rdf:.... xmlns:rdf="....">
|
|
....
|
|
</rdf:...>
|
|
</metadata>
|
|
</video>
|
|
</pre>
|
|
</div>
|
|
|
|
<p><em>Referring to temporal fragments in-context</em> </p>
|
|
|
|
<p>The following piece of code will play back the whole video, and during the
|
|
interesting section of the video allow clicking on it to follow a link:</p>
|
|
|
|
<div class="exampleInner">
|
|
<pre> <video xml:id="tic1" src="http://www.w3.org/2008/WebVideo/Fragments/media/fragf2f.mp4" >
|
|
<area begin="12.3s" end="21.16s" href="http://www.example.com" />
|
|
</video>
|
|
</pre>
|
|
</div>
|
|
|
|
<p>It is also possible to have a link to the relevant section of the video.
|
|
Suppose the following SMIL code is located in
|
|
http://www.example.com/smilpresentation:</p>
|
|
|
|
<div class="exampleInner">
|
|
<pre> <video xml:id="tic2" src="http://www.w3.org/2008/WebVideo/Fragments/media/fragf2f.mp4" >
|
|
<area xml:id="tic2area" begin="12.3s" end="21.16s"/>
|
|
</video>
|
|
</pre>
|
|
</div>
|
|
|
|
<p>Now, we can link to the media fragment using the following URI:</p>
|
|
|
|
<div class="exampleInner">
|
|
<pre>http://www.example.com/smilpresentation#tic2area</pre>
|
|
</div>
|
|
|
|
<p>Jumping to #tic2area will start the video at the beginning of the
|
|
interesting section. The presentation will not stop at the end, however, it
|
|
will continue. </p>
|
|
</div>
|
|
|
|
<div class="div5">
|
|
<h6><a id="Spatial_SMIL" name="Spatial_SMIL"></a>C.3.2.1.2 Spatial</h6>
|
|
|
|
<p><em>Playing spatial fragments out-of-context</em> </p>
|
|
|
|
<p>SMIL 3.0 allows playing back only a specific rectangle of the media. The
|
|
following construct will play back the center quarter of the video: </p>
|
|
|
|
<div class="exampleInner">
|
|
<pre><video xml:id="soc1" src="http://www.w3.org/2008/WebVideo/Fragments/media/fragf2f.mp4" panZoom="25%, 25%, 50%, 50%"/></pre>
|
|
</div>
|
|
|
|
<p>Assuming the source video is 640x480, the following line plays back the
|
|
same: </p>
|
|
|
|
<div class="exampleInner">
|
|
<pre><video xml:id="soc2" src="http://www.w3.org/2008/WebVideo/Fragments/media/fragf2f.mp4" panZoom="160, 120, 320, 240" /></pre>
|
|
</div>
|
|
|
|
<p>This construct can be combined with the temporal clipping.</p>
|
|
|
|
<p>It is possible to change the panZoom rectangle over time. The following code
|
|
fragment will show the full video for 10 seconds, then zoom in on the center
|
|
quarter over 5 seconds, then show that for the rest of the duration. The video
|
|
may be scaled up or centered, or something else, depending on SMIL layout, but
|
|
this is out of scope for the purpose of this investigation. </p>
|
|
|
|
<div class="exampleInner">
|
|
<pre> <video xml:id="soc3" src="http://www.w3.org/2008/WebVideo/Fragments/media/fragf2f.mp4" panZoom="0, 0, 640, 480" />
|
|
<animate begin="10s" dur="5s" fill="freeze" attributeName="panZoom" to="160, 120, 320, 240 />
|
|
</video>
|
|
</pre>
|
|
</div>
|
|
|
|
<p><em>Referring to spatial fragments in-context</em> </p>
|
|
|
|
<p>The following bit of code will enable the top-right quarter of the video to
|
|
be clicked to follow a link. Note the difference in the way the rectangle is
|
|
specified (left, top, right, bottom) when compared to panZoom (left, top,
|
|
width, height). This is an unfortunate side-effect of this attribute being
|
|
compatible with HTML and panZoom being compatible with SVG. </p>
|
|
|
|
<div class="exampleInner">
|
|
<pre> <video xml:id="tic1" src="http://www.w3.org/2008/WebVideo/Fragments/media/fragf2f.mp4" >
|
|
<area shape="rect" coords="50%, 0%, 100%, 50%" href="http://www.example.com" />
|
|
</video>
|
|
</pre>
|
|
</div>
|
|
|
|
<p>Other shapes are possible, as in HTML and CSS. The spatial and temporal
|
|
constructs can be combined. The spatial coordinates can be animated, as for
|
|
panZoom. </p>
|
|
</div>
|
|
|
|
<div class="div5">
|
|
<h6><a id="Track_SMIL" name="Track_SMIL"></a>C.3.2.1.3 Track</h6>
|
|
|
|
<p>SMIL has no way to selectively enable or disable tracks in the video. It
|
|
only provides a general parameter mechanism which could conceivaby be used to
|
|
comminucate this information to a renderer, but this would make the document
|
|
non-portable. Moreover, no such implementations are known. </p>
|
|
|
|
<div class="exampleInner">
|
|
<pre> <video xml:id="st1" src="http://www.w3.org/2008/WebVideo/Fragments/media/fragf2f.mp4" >
|
|
<param name="jacks-remove-track" value="audio" />
|
|
</video>
|
|
</pre>
|
|
</div>
|
|
</div>
|
|
|
|
<div class="div5">
|
|
<h6><a id="Named_SMIL" name="Named_SMIL"></a>C.3.2.1.4 Named</h6>
|
|
|
|
<p>SMIL has no way to show named fragments in the base material out-of-context.
|
|
It has no support for referring to named fragments in-context either, but it
|
|
does have support for referring to "media markers" (named points in time in the
|
|
media) if the underlying media formats supports them. Yet, no such
|
|
implementations are known: </p>
|
|
|
|
<div class="exampleInner">
|
|
<pre> <video xml:id="nf1" src="http://www.w3.org/2008/WebVideo/Fragments/media/fragf2f.mp4" >
|
|
<area begin="nf1.marker(jack-frag-begin)" end="nf1.marker(jack-frag-end)" href="http://www.example.com" />
|
|
</video>
|
|
</pre>
|
|
</div>
|
|
</div>
|
|
</div>
|
|
|
|
<div class="div4">
|
|
<h5><a id="MPEG-7" name="MPEG-7"></a>C.3.2.2 MPEG-7</h5>
|
|
|
|
<div class="div5">
|
|
<h6><a id="Temporal_MPEG-7" name="Temporal_MPEG-7"></a>C.3.2.2.1 Temporal</h6>
|
|
|
|
<table border="1" summary="Editorial note: Raphaël">
|
|
<tbody>
|
|
<tr>
|
|
<td width="50%" valign="top" align="left"><b>Editorial note:
|
|
Raphaël</b></td>
|
|
<td width="50%" valign="top" align="right"> </td>
|
|
</tr>
|
|
<tr>
|
|
<td valign="top" align="left" colspan="2">For all dimensions covered by
|
|
MPEG-7 the use of indirection should not forgotten.
|
|
http://www.example.com/mpeg7file.mp7#speaker refers to the "speaker"
|
|
xml element of this resource. The UA needs to parse this element in
|
|
order to actually point to this fragment. </td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
|
|
<p>A video is divided into VideoSegments that can be described by a timestamp.
|
|
MediaTimes are described using a MediaTimePoint and MediaDuration, which are
|
|
the starting time and shot duration respectively. The MediaTimePoint is defined
|
|
as follows: YYYY-MM-DDThh:mm:ss:nnnFNNN (Y: year, M: month, D: day, T: a
|
|
separation sign between date and time, h: hours, m: minutes, s: seconds, F:
|
|
separation sign between n and N, n: number of fractions, N: number of fractions
|
|
in a second). The MediaDuration is defined as follows: PnDTnHnMnSnNnF with nD
|
|
number of days, nH number of hours, nM number of minutes, nS number of seconds,
|
|
nN number of fractions and nF fractions per second. The temporal fragments can
|
|
also be defined in Time Units or relative compared to a defined time. This
|
|
MPEG-7 example describes a 'shot1' starting at 6sec 2002/2500 sec and lasts for
|
|
9sec 13389/25000 sec.</p>
|
|
|
|
<div class="exampleInner">
|
|
<pre> <VideoSegment id=”video” >
|
|
<MediaLocator>
|
|
<MediaUri>http://www.w3.org/2008/WebVideo/Fragments/media/fragf2f.mp4</MediaUri>
|
|
</MediaLocator>
|
|
<TemporalDecomposition>
|
|
<VideoSegment id=”shot1”>
|
|
<TextAnnotation>
|
|
<KeywordAnnotation>
|
|
<Keyword>…</Keyword>
|
|
</KeywordAnnotation>
|
|
</TextAnnotation>
|
|
<MediaTime>
|
|
<MediaTimePoint>T00:00:06:2002F25000</MediaTimePoint>
|
|
<MediaDuration>PT9S13389N25000F</MediaDuration>
|
|
</MediaTime>
|
|
</VideoSegment>
|
|
</TemporalDecomposition>
|
|
</VideoSegment>
|
|
</pre>
|
|
</div>
|
|
</div>
|
|
|
|
<div class="div5">
|
|
<h6><a id="Spatial_MPEG-7" name="Spatial_MPEG-7"></a>C.3.2.2.2 Spatial</h6>
|
|
|
|
<p>Selecting a spatial fragment of the video is also possible, using a
|
|
SpatialDecomposition-element. This MPEG-7 example describes a spatial
|
|
(polygonal) mask called "speaker" which is given by the coordinates of the
|
|
polygon: (40, 300), (40,210), ..., (320,300).</p>
|
|
|
|
<div class="exampleInner">
|
|
<pre> <VideoSegment>
|
|
<SpatialDecomposition>
|
|
<StillRegion id = “speaker”>
|
|
<TextAnnotation>
|
|
<FreeTextAnnotation> Journalist</FreeTextAnnotation>
|
|
</TextAnnotation>
|
|
<Mask xsi:type="SpatialMaskType">
|
|
<SubRegion>
|
|
<Poly>
|
|
<Coords> 40 300, 40 210, ..., 320 300</Coords>
|
|
</Poly>
|
|
<SubRegion>
|
|
</Mask>
|
|
</StillRegion>
|
|
</SpatialDecomposition>
|
|
</VideoSegment>
|
|
</pre>
|
|
</div>
|
|
|
|
<p>The spatial video fragment can be combined with temporal information thus
|
|
creating a SpatialTemporalDecomposition-element. </p>
|
|
|
|
<div class="exampleInner">
|
|
<pre> <VideoSegment>
|
|
<SpatialTemporalDecomposition>
|
|
<MovingRegion>
|
|
<SpatioTemporalLocator>
|
|
<MediaTime>
|
|
<MediaTimePoint>T00:00:06:2002F25000</MediaTimePoint>
|
|
<MediaDuration>PT9S13389N25000F</MediaDuration>
|
|
</MediaTime>
|
|
</SpatioTemporalLocator>
|
|
<SpatioTemporalMask>
|
|
<Mask xsi:type="SpatialMaskType">
|
|
<SubRegion>
|
|
<Poly>
|
|
<Coords> 40 300, 105 210, ..., 320 240</Coords>
|
|
</Poly>
|
|
<SubRegion>
|
|
</Mask>
|
|
</SpatioTemporalMask>
|
|
</MovingRegion>
|
|
</ SpatialTemporalDecomposition >
|
|
</VideoSegment>
|
|
</pre>
|
|
</div>
|
|
|
|
<p>A region of an image can also be described in MPEG-7</p>
|
|
|
|
<div class="exampleInner">
|
|
<pre> <Image id="image_yalta"> <!-- whole image -->
|
|
<MediaLocator>
|
|
<MediaUri>http://example.com/wiki/File:Yalta_summit_1945_with_Churchill,_Roosevelt,_Stalin.jpg</MediaUri>
|
|
</MediaLocator>
|
|
[...]
|
|
<SpatialDecomposition>
|
|
<StillRegion id="SR1"> <!-- still region -->
|
|
<SpatialMask>
|
|
<SubRegion>
|
|
<Box>14.64 15.73 161.62 163.21</Box>
|
|
<SubRegion>
|
|
</SpatialMask>
|
|
</StillRegion>
|
|
</SpatialDecomposition>
|
|
</Image>
|
|
</pre>
|
|
</div>
|
|
</div>
|
|
|
|
<div class="div5">
|
|
<h6><a id="Track_MPEG-7" name="Track_MPEG-7"></a>C.3.2.2.3 Track</h6>
|
|
|
|
<p>Tracks can be described using the <em>MediaSourceDecompositionType</em>: </p>
|
|
|
|
<div class="exampleInner">
|
|
<pre> <Mpeg7>
|
|
<Description xsi:type="ContentEntityType">
|
|
<MultimediaContent xsi:type="MultimediaType">
|
|
<Multimedia>
|
|
<MediaSourceDecomposition gap="false" overlap="false">
|
|
<Segment xsi:type="VideoSegmentType">
|
|
<TextAnnotation>
|
|
<FreeTextAnnotation>video</FreeTextAnnotation>
|
|
</TextAnnotation>
|
|
<MediaTime>
|
|
<MediaTimePoint>T00:00:00</MediaTimePoint>
|
|
<MediaDuration>PT0M15S</MediaDuration>
|
|
</MediaTime>
|
|
</Segment>
|
|
<Segment xsi:type="AudioSegmentType">
|
|
<TextAnnotation>
|
|
<FreeTextAnnotation>audio</FreeTextAnnotation>
|
|
</TextAnnotation>
|
|
</Segment>
|
|
</MediaSourceDecomposition>
|
|
</Multimedia>
|
|
</MultimediaContent>
|
|
</Description>
|
|
</Mpeg7>
|
|
</pre>
|
|
</div>
|
|
</div>
|
|
|
|
<div class="div5">
|
|
<h6><a id="Named_MPEG-7" name="Named_MPEG-7"></a>C.3.2.2.4 Named</h6>
|
|
|
|
<p>Media fragments can be identified by their id.</p>
|
|
|
|
<div class="exampleInner">
|
|
<pre> <StillRegion id="speaker">
|
|
...
|
|
</StillRegion>
|
|
</pre>
|
|
</div>
|
|
|
|
<p>The description of this media fragment can then be retrieved using the
|
|
following URI:</p>
|
|
|
|
<div class="exampleInner">
|
|
<pre>http://www.example.com/mpeg7file.mp7#speaker</pre>
|
|
</div>
|
|
</div>
|
|
</div>
|
|
|
|
<div class="div4">
|
|
<h5><a id="SVG" name="SVG"></a>C.3.2.3 SVG</h5>
|
|
|
|
<div class="div5">
|
|
<h6><a id="Temporal_SVG" name="Temporal_SVG"></a>C.3.2.3.1 Temporal</h6>
|
|
|
|
<p>SVG relies either on SMIL or HTML5 as a foreign object to introduce temporal
|
|
media fragmentation. It has no temporal fragmentation of its own. One can add a
|
|
video to a scene (as can be seen in example 2). Although it is possible to add
|
|
a foreign object within SVG wherein HTML5 video elements can be added. This is
|
|
(at the moment) not a solution for temporal segmentation as HTML does not
|
|
support it either. </p>
|
|
|
|
<div class="exampleInner">
|
|
<pre> <foreignObject>
|
|
<div xmlns="http://www.w3.org/1999/xhtml">
|
|
<video src="myvideo.ogg"/>
|
|
</div>
|
|
</foreignObject>
|
|
</pre>
|
|
</div>
|
|
|
|
<p>Here's an example of a video that starts at second 5 and has a duration of
|
|
20 seconds:</p>
|
|
|
|
<div class="exampleInner">
|
|
<pre> <svg xmlns="http://www.w3.org/2000/svg" version="1.2" xmlns:xlink="http://www.w3.org/1999/xlink" width="320" height="240" viewBox="0 0 320 240">
|
|
<desc>SVG 1.2 video example</desc>
|
|
<g>
|
|
<video xlink:href="test.avi" volume=".8" type="video/x-msvideo" width="320" height="240" x="50" y="50" begin=”5s” dur=”20.0s” repeatCount="indefinite"/>
|
|
</g>
|
|
</svg>
|
|
</pre>
|
|
</div>
|
|
</div>
|
|
|
|
<div class="div5">
|
|
<h6><a id="Spatial_SVG" name="Spatial_SVG"></a>C.3.2.3.2 Spatial</h6>
|
|
|
|
<p>XML snippet specifying a region of an image within SVG: </p>
|
|
|
|
<div class="exampleInner">
|
|
<pre> <svg xmlns:svg="http://www.w3.org/2000/svg" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
|
|
<g id="layer1">
|
|
<"image" id="image_yalta" x="-0.34" y="0.20" width="400" height="167"
|
|
xlink:href="http://example.com/wiki/File:Yalta_summit_1945_with_Churchill,_Roosevelt,_Stalin.jpg"/>
|
|
<"rect" id="SR1" x="14.64" y="15.73" width="146.98" height="147.48"
|
|
style="opacity:1;stroke:#ff0000;stroke-opacity:1"/>
|
|
</g>
|
|
</svg>
|
|
</pre>
|
|
</div>
|
|
</div>
|
|
</div>
|
|
|
|
<div class="div4">
|
|
<h5><a id="TV-Anytime" name="TV-Anytime"></a>C.3.2.4 TV-Anytime</h5>
|
|
|
|
<div class="div5">
|
|
<h6><a id="Temporal_TV-Anytime" name="Temporal_TV-Anytime"></a>C.3.2.4.1
|
|
Temporal</h6>
|
|
|
|
<p>Within TV-Anytime, programmes can be divided in segments. Segmentation
|
|
refers to the ability to define, access, and manipulate temporal intervals
|
|
(i.e. segments) within an AV stream. The following excerpt of a TV-Anytime
|
|
description illustrates the use of segments:</p>
|
|
|
|
<div class="exampleInner">
|
|
<pre> <TVAMain>
|
|
<ProgramDescription>
|
|
<ProgramInformationTable>
|
|
...
|
|
</ProgramInformationTable>
|
|
<ProgramLocationTable>
|
|
...
|
|
</ProgramLocationTable>
|
|
<SegmentInformationTable>
|
|
<SegmentList>
|
|
<SegmentInformation segmentId="segment_2">
|
|
...
|
|
<SegmentLocator>
|
|
<MediaRelTimePoint>T00:00:06:2002F25000</MediaRelTimePoint>
|
|
<MediaDuration>PT9S13389N25000F</MediaDuration>
|
|
</SegmentLocator>
|
|
...
|
|
</SegmentInformation>
|
|
</SegmentList>
|
|
</SegmentInformationTable>
|
|
</ProgramDescription>
|
|
</TVAMain>
|
|
</pre>
|
|
</div>
|
|
|
|
<p><em>SegmentLocator</em> locates the segment within a programme (instance) in
|
|
terms of start time and duration (optional). If the duration is not specified,
|
|
the segment ends at the end of the programme. Note that the types of
|
|
<em>MediaRelTimePoint</em> and <em>MediaDuration</em> correspond to the MPEG-7
|
|
types <em>MediaRelTimePointType</em> and <em>MediaDurationType</em>
|
|
respectively. </p>
|
|
</div>
|
|
|
|
<div class="div5">
|
|
<h6><a id="Named_TV-Anytime" name="Named_TV-Anytime"></a>C.3.2.4.2 Named</h6>
|
|
|
|
<p>Supported by the <em>segmentId</em> attribute of the
|
|
<em>SegmentInformationType</em>. The description of the media fragment can be
|
|
retrieved as follows: </p>
|
|
|
|
<div class="exampleInner">
|
|
<pre>http://www.example.com/tv_anytime_description.tva#xpointer(//*[@segmentId="segment_2"])</pre>
|
|
</div>
|
|
</div>
|
|
</div>
|
|
|
|
<div class="div4">
|
|
<h5><a id="ImageMaps" name="ImageMaps"></a>C.3.2.5 ImageMaps</h5>
|
|
|
|
<div class="div5">
|
|
<h6><a id="Spatial_ImageMaps" name="Spatial_ImageMaps"></a>C.3.2.5.1
|
|
Spatial</h6>
|
|
|
|
<p><em>Client-side image maps</em>: The MAP element specifies a client-side
|
|
image map. An image map is associated with an element via the element's usemap
|
|
attribute. The MAP element content model includes then either AREA elements or
|
|
A elements for specifying the geometric regions and the link associated with
|
|
them. Possible shapes are: rectangle (rect), circle (circle) or arbitrary
|
|
polygon (poly) </p>
|
|
|
|
<div class="exampleInner">
|
|
<pre> <img src="image.gif" usemap="#my_map"/>
|
|
<map name="my_map">
|
|
<a href="guide.html" shape="rect" coords="0,0,118,28">Access Guide</a> |
|
|
<a href="shortcut.html" shape="rect" coords="118,0,184,28">Go</A> |
|
|
<a href="search.html" shape="circle" coords="184,200,60">Search</A> |
|
|
<a href="top10.html" shape="poly" coords="276,0,276,28,100,200,50,50,276,0">Top Ten</A>
|
|
</map>
|
|
</pre>
|
|
</div>
|
|
|
|
<div class="exampleInner">
|
|
<pre> <map name="my_map">
|
|
<area href="guide.html" alt="Access Guide" shape="rect" coords="0,0,118,28">
|
|
<area href="search.html" alt="Search" shape="rect" coords="184,0,276,28">
|
|
<area href="shortcut.html" alt="Go" shape="circle"coords="184,200,60">
|
|
<area href="top10.html" alt="Top Ten" shape="poly" coords="276,0,276,28,100,200,50,50,276,0">
|
|
</map>
|
|
</pre>
|
|
</div>
|
|
|
|
<p><em>Server-side image maps</em>: When the user activates the link by
|
|
clicking on the image, the screen coordinates are sent directly to the server
|
|
where the document resides. Screen coordinates are expressed as screen pixel
|
|
values relative to the image. The user agent derives a new URI from the URI
|
|
specified by the href attribute of the A element, by appending ? followed by
|
|
the x and y coordinates, separated by a comma. For instance, if the user clicks
|
|
at the location x=10, y=27 then the derived URI is:
|
|
http://www.example.com/images?10,27 </p>
|
|
|
|
<div class="exampleInner">
|
|
<pre> <a href="http://www.example.com/images" >
|
|
<img src="image.gif" ismap alt="target"/>
|
|
</a>
|
|
</pre>
|
|
</div>
|
|
</div>
|
|
</div>
|
|
|
|
<div class="div4">
|
|
<h5><a id="HTML5" name="HTML5"></a>C.3.2.6 HTML 5</h5>
|
|
|
|
<table border="1" summary="Editorial note: Silvia">
|
|
<tbody>
|
|
<tr>
|
|
<td width="50%" valign="top" align="left"><b>Editorial note:
|
|
Silvia</b></td>
|
|
<td width="50%" valign="top" align="right"> </td>
|
|
</tr>
|
|
<tr>
|
|
<td valign="top" align="left" colspan="2">Currently, HTML5 relies on the
|
|
abilities of the used media format for providing media fragment
|
|
addressing. In future, HTML5 is planning to adopt the fragment URI
|
|
specifications of this document for providing fragment addressing.
|
|
Input from the WHAT and HTML working groups is requested. </td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</div>
|
|
</div>
|
|
</div>
|
|
|
|
<div class="div1">
|
|
<h2><a id="acknowledgments" name="acknowledgments"></a>D Acknowledgements
|
|
(Non-Normative)</h2>
|
|
|
|
<p>This document is the work of the <a
|
|
href="http://www.w3.org/2008/WebVideo/Fragments/">W3C Media Fragments Working
|
|
Group</a>. Members of the Working Group are (at the time of writing, and in
|
|
alphabetical order): Eric Carlson (Apple, Inc.), Michael Hausenblas (DERI
|
|
Galway at the National University of Ireland, Galway, Ireland), Jack Jansen
|
|
(CWI), Yves Lafon (W3C), Wonsuk Lee (Electronics and Telecommunications
|
|
Research Institute), Erik Mannens (IBBT), Thierry Michel (W3C), Guillaume
|
|
(Jean-Louis) Olivrin (Meraka Institute), Soohong Daniel Park (Samsung
|
|
Electronics Co., Ltd.), Conrad Parker (W3C Invited Experts), Silvia Pfeiffer
|
|
(W3C Invited Experts), David Singer (Apple, Inc.), Raphaël Troncy (EURECOM),
|
|
Vassilis Tzouvaras (K-Space), Davy Van Deursen (IBBT) </p>
|
|
|
|
<p>The people who have contributed to <a
|
|
href="http://lists.w3.org/Archives/Public/public-media-fragment/">discussions
|
|
on public-media-fragment@w3.org</a> are also gratefully acknowledged. In
|
|
particular: Olivier Aubert, Werner Bailer, Pierre-Antoine Champin, Cyril
|
|
Concolato, Franck Denoual, Martin J. Dürst, Jean Pierre Evain, Ken
|
|
Harrenstien, Kilroy Hughes, Philip Jägenstedt, Ryo Kawaguchi, Véronique
|
|
Malaisé, Henrik Nordstrom, Yannick Prié, Yves Raimond, Julian Reschke,
|
|
Geoffrey Sneddon, Felix Sasaki, Philip Taylor, Christian Timmerer, Jorrit
|
|
Vermeiren and Munjo Yu. </p>
|
|
</div>
|
|
|
|
<div class="div1">
|
|
<h2><a id="change-log" name="change-log"></a>E Change Log (Non-Normative)</h2>
|
|
|
|
<p>@@This paragraph will be replaced by the change log.@@</p>
|
|
</div>
|
|
</div>
|
|
</body>
|
|
</html>
|