Another abandoned server code base... this is kind of an ancestor of taskrambler.
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 
 
 

1120 lines
49 KiB

<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE html PUBLIC '-//W3C//DTD XHTML 1.0 Transitional//EN' 'http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd'>
<html dir="ltr" xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>PROV Model Primer</title>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<!--
=== NOTA BENE ===
For the three scripts below, if your spec resides on dev.w3 you can check them
out in the same tree and use relative links so that they'll work offline,
-->
<!-- PM -->
<style type="text/css">
.note { font-size:small; margin-left:50px }
</style>
<style type="text/css">
/*****************************************************************
* ReSpec CSS
* Robin Berjon (robin at berjon dot com)
* v0.05 - 2009-07-31
*****************************************************************/
/* --- INLINES --- */
em.rfc2119 {
text-transform: lowercase;
font-variant: small-caps;
font-style: normal;
color: #900;
}
h1 acronym, h2 acronym, h3 acronym, h4 acronym, h5 acronym, h6 acronym, a acronym,
h1 abbr, h2 abbr, h3 abbr, h4 abbr, h5 abbr, h6 abbr, a abbr {
border: none;
}
dfn {
font-weight: bold;
}
a.internalDFN {
color: inherit;
border-bottom: 1px solid #99c;
text-decoration: none;
}
a.externalDFN {
color: inherit;
border-bottom: 1px dotted #ccc;
text-decoration: none;
}
a.bibref {
text-decoration: none;
}
code {
color: #ff4500;
}
/* --- WEB IDL --- */
pre.idl {
border-top: 1px solid #90b8de;
border-bottom: 1px solid #90b8de;
padding: 1em;
line-height: 120%;
}
pre.idl:before {
content: "WebIDL";
display: block;
width: 150px;
background: #90b8de;
color: #fff;
font-family: initial;
padding: 3px;
font-weight: bold;
margin: -1em 0 1em -1em;
}
.idlType {
color: #ff4500;
font-weight: bold;
text-decoration: none;
}
/*.idlModule*/
/*.idlModuleID*/
/*.idlInterface*/
.idlInterfaceID, .idlDictionaryID {
font-weight: bold;
color: #005a9c;
}
.idlSuperclass {
font-style: italic;
color: #005a9c;
}
/*.idlAttribute*/
.idlAttrType, .idlFieldType, .idlMemberType {
color: #005a9c;
}
.idlAttrName, .idlFieldName, .idlMemberName {
color: #ff4500;
}
.idlAttrName a, .idlFieldName a, .idlMemberName a {
color: #ff4500;
border-bottom: 1px dotted #ff4500;
text-decoration: none;
}
/*.idlMethod*/
.idlMethType {
color: #005a9c;
}
.idlMethName {
color: #ff4500;
}
.idlMethName a {
color: #ff4500;
border-bottom: 1px dotted #ff4500;
text-decoration: none;
}
/*.idlParam*/
.idlParamType {
color: #005a9c;
}
.idlParamName {
font-style: italic;
}
.extAttr {
color: #666;
}
/*.idlConst*/
.idlConstType {
color: #005a9c;
}
.idlConstName {
color: #ff4500;
}
.idlConstName a {
color: #ff4500;
border-bottom: 1px dotted #ff4500;
text-decoration: none;
}
/*.idlException*/
.idlExceptionID {
font-weight: bold;
color: #c00;
}
.idlTypedefID, .idlTypedefType {
color: #005a9c;
}
.idlRaises, .idlRaises a.idlType, .idlRaises a.idlType code, .excName a, .excName a code {
color: #c00;
font-weight: normal;
}
.excName a {
font-family: monospace;
}
.idlRaises a.idlType, .excName a.idlType {
border-bottom: 1px dotted #c00;
}
.excGetSetTrue, .excGetSetFalse, .prmNullTrue, .prmNullFalse, .prmOptTrue, .prmOptFalse {
width: 45px;
text-align: center;
}
.excGetSetTrue, .prmNullTrue, .prmOptTrue { color: #0c0; }
.excGetSetFalse, .prmNullFalse, .prmOptFalse { color: #c00; }
.idlImplements a {
font-weight: bold;
}
dl.attributes, dl.methods, dl.constants, dl.fields, dl.dictionary-members {
margin-left: 2em;
}
.attributes dt, .methods dt, .constants dt, .fields dt, .dictionary-members dt {
font-weight: normal;
}
.attributes dt code, .methods dt code, .constants dt code, .fields dt code, .dictionary-members dt code {
font-weight: bold;
color: #000;
font-family: monospace;
}
.attributes dt code, .fields dt code, .dictionary-members dt code {
background: #ffffd2;
}
.attributes dt .idlAttrType code, .fields dt .idlFieldType code, .dictionary-members dt .idlMemberType code {
color: #005a9c;
background: transparent;
font-family: inherit;
font-weight: normal;
font-style: italic;
}
.methods dt code {
background: #d9e6f8;
}
.constants dt code {
background: #ddffd2;
}
.attributes dd, .methods dd, .constants dd, .fields dd, .dictionary-members dd {
margin-bottom: 1em;
}
table.parameters, table.exceptions {
border-spacing: 0;
border-collapse: collapse;
margin: 0.5em 0;
width: 100%;
}
table.parameters { border-bottom: 1px solid #90b8de; }
table.exceptions { border-bottom: 1px solid #deb890; }
.parameters th, .exceptions th {
color: #fff;
padding: 3px 5px;
text-align: left;
font-family: initial;
font-weight: normal;
}
.parameters th { background: #90b8de; }
.exceptions th { background: #deb890; }
.parameters td, .exceptions td {
padding: 3px 10px;
border-top: 1px solid #ddd;
vertical-align: top;
}
.parameters tr:first-child td, .exceptions tr:first-child td {
border-top: none;
}
.parameters td.prmName, .exceptions td.excName, .exceptions td.excCodeName {
width: 100px;
}
.parameters td.prmType {
width: 120px;
}
table.exceptions table {
border-spacing: 0;
border-collapse: collapse;
width: 100%;
}
/* --- TOC --- */
.toc a {
text-decoration: none;
}
a .secno {
color: #000;
}
/* --- TABLE --- */
table.simple {
border-spacing: 0;
border-collapse: collapse;
border-bottom: 3px solid #005a9c;
}
.simple th {
background: #005a9c;
color: #fff;
padding: 3px 5px;
text-align: left;
}
.simple th[scope="row"] {
background: inherit;
color: inherit;
border-top: 1px solid #ddd;
}
.simple td {
padding: 3px 10px;
border-top: 1px solid #ddd;
}
/*.simple tr:nth-child(even) {
background: #f0f6ff;
}*/
/* --- DL --- */
.section dd > p:first-child {
margin-top: 0;
}
/*.section dd > p:last-child {
margin-bottom: 0;
}*/
.section dd {
margin-bottom: 1em;
}
.section dl.attrs dd, .section dl.eldef dd {
margin-bottom: 0;
}
/* --- EXAMPLES --- */
pre.example {
border-top: 1px solid #ff4500;
border-bottom: 1px solid #ff4500;
padding: 1em;
margin-top: 1em;
}
pre.example:before {
content: "Example";
display: block;
width: 150px;
background: #ff4500;
color: #fff;
font-family: initial;
padding: 3px;
font-weight: bold;
margin: -1em 0 1em -1em;
}
/* --- EDITORIAL NOTES --- */
.issue {
padding: 1em;
margin: 1em 0em 0em;
border: 1px solid #f00;
background: #ffc;
}
.issue:before {
content: "Issue";
display: block;
width: 150px;
margin: -1.5em 0 0.5em 0;
font-weight: bold;
border: 1px solid #f00;
background: #fff;
padding: 3px 1em;
}
.note {
margin: 1em 0em 0em;
padding: 1em;
border: 2px solid #cff6d9;
background: #e2fff0;
}
.note:before {
content: "Note";
display: block;
width: 150px;
margin: -1.5em 0 0.5em 0;
font-weight: bold;
border: 1px solid #cff6d9;
background: #fff;
padding: 3px 1em;
}
/* --- Best Practices --- */
div.practice {
border: solid #bebebe 1px;
margin: 2em 1em 1em 2em;
}
span.practicelab {
margin: 1.5em 0.5em 1em 1em;
font-weight: bold;
font-style: italic;
}
span.practicelab { background: #dfffff; }
span.practicelab {
position: relative;
padding: 0 0.5em;
top: -1.5em;
}
p.practicedesc {
margin: 1.5em 0.5em 1em 1em;
}
@media screen {
p.practicedesc {
position: relative;
top: -2em;
padding: 0;
margin: 1.5em 0.5em -1em 1em;
}
}
/* --- SYNTAX HIGHLIGHTING --- */
pre.sh_sourceCode {
background-color: white;
color: black;
font-style: normal;
font-weight: normal;
}
pre.sh_sourceCode .sh_keyword { color: #005a9c; font-weight: bold; } /* language keywords */
pre.sh_sourceCode .sh_type { color: #666; } /* basic types */
pre.sh_sourceCode .sh_usertype { color: teal; } /* user defined types */
pre.sh_sourceCode .sh_string { color: red; font-family: monospace; } /* strings and chars */
pre.sh_sourceCode .sh_regexp { color: orange; font-family: monospace; } /* regular expressions */
pre.sh_sourceCode .sh_specialchar { color: #ffc0cb; font-family: monospace; } /* e.g., \n, \t, \\ */
pre.sh_sourceCode .sh_comment { color: #A52A2A; font-style: italic; } /* comments */
pre.sh_sourceCode .sh_number { color: purple; } /* literal numbers */
pre.sh_sourceCode .sh_preproc { color: #00008B; font-weight: bold; } /* e.g., #include, import */
pre.sh_sourceCode .sh_symbol { color: blue; } /* e.g., *, + */
pre.sh_sourceCode .sh_function { color: black; font-weight: bold; } /* function calls and declarations */
pre.sh_sourceCode .sh_cbracket { color: red; } /* block brackets (e.g., {, }) */
pre.sh_sourceCode .sh_todo { font-weight: bold; background-color: #00FFFF; } /* TODO and FIXME */
/* Predefined variables and functions (for instance glsl) */
pre.sh_sourceCode .sh_predef_var { color: #00008B; }
pre.sh_sourceCode .sh_predef_func { color: #00008B; font-weight: bold; }
/* for OOP */
pre.sh_sourceCode .sh_classname { color: teal; }
/* line numbers (not yet implemented) */
pre.sh_sourceCode .sh_linenum { display: none; }
/* Internet related */
pre.sh_sourceCode .sh_url { color: blue; text-decoration: underline; font-family: monospace; }
/* for ChangeLog and Log files */
pre.sh_sourceCode .sh_date { color: blue; font-weight: bold; }
pre.sh_sourceCode .sh_time, pre.sh_sourceCode .sh_file { color: #00008B; font-weight: bold; }
pre.sh_sourceCode .sh_ip, pre.sh_sourceCode .sh_name { color: #006400; }
/* for Prolog, Perl... */
pre.sh_sourceCode .sh_variable { color: #006400; }
/* for LaTeX */
pre.sh_sourceCode .sh_italics { color: #006400; font-style: italic; }
pre.sh_sourceCode .sh_bold { color: #006400; font-weight: bold; }
pre.sh_sourceCode .sh_underline { color: #006400; text-decoration: underline; }
pre.sh_sourceCode .sh_fixed { color: green; font-family: monospace; }
pre.sh_sourceCode .sh_argument { color: #006400; }
pre.sh_sourceCode .sh_optionalargument { color: purple; }
pre.sh_sourceCode .sh_math { color: orange; }
pre.sh_sourceCode .sh_bibtex { color: blue; }
/* for diffs */
pre.sh_sourceCode .sh_oldfile { color: orange; }
pre.sh_sourceCode .sh_newfile { color: #006400; }
pre.sh_sourceCode .sh_difflines { color: blue; }
/* for css */
pre.sh_sourceCode .sh_selector { color: purple; }
pre.sh_sourceCode .sh_property { color: blue; }
pre.sh_sourceCode .sh_value { color: #006400; font-style: italic; }
/* other */
pre.sh_sourceCode .sh_section { color: black; font-weight: bold; }
pre.sh_sourceCode .sh_paren { color: red; }
pre.sh_sourceCode .sh_attribute { color: #006400; }
</style><link href="http://www.w3.org/StyleSheets/TR/W3C-WD" rel="stylesheet" type="text/css" charset="utf-8" /></head>
<body style="display: inherit; "><div class="head"><p><a href="http://www.w3.org/"><img width="72" height="48" src="http://www.w3.org/Icons/w3c_home" alt="W3C" /></a></p><h1 class="title" id="title">PROV Model Primer</h1><h2 id="w3c-working-draft-10-january-2012"><acronym title="World Wide Web Consortium">W3C</acronym> Working Draft 10 January 2012</h2><dl><dt>This version:</dt><dd><a href="http://www.w3.org/TR/2012/WD-prov-primer-20120110/">http://www.w3.org/TR/2012/WD-prov-primer-20120110/</a></dd><dt>Latest published version:</dt><dd><a href="http://www.w3.org/TR/prov-primer/">http://www.w3.org/TR/prov-primer/</a></dd><dt>Latest editor's draft:</dt><dd><a href="http://dvcs.w3.org/hg/prov/raw-file/default/primer/Primer.html">http://dvcs.w3.org/hg/prov/raw-file/default/primer/Primer.html</a></dd><dt>Editors:</dt><dd><a href="http://www.isi.edu/~gil/">Yolanda Gil</a>, Information Sciences Institute, University of Southern California, US</dd>
<dd><a href="http://www.inf.kcl.ac.uk/staff/simonm">Simon Miles</a>, King's College London, UK</dd>
<dt>Authors:</dt><dd><span><a href="http://semanticweb.org/wiki/Khalid_Belhajjame">Khalid Belhajjame</a></span>, University of Manchester</dd>
<dd><span>Helena Deus</span>, Digital Enterprise Research Institute (DERI), NUI Galway</dd>
<dd><span><a href="http://www.oeg-upm.net/index.php/en/phdstudents/28-dgarijo">Daniel Garijo</a></span>, Universidad Politécnica de Madrid</dd>
<dd><span>Graham Klyne</span>, University of Oxford</dd>
<dd><span><a href="http://www.cs.ncl.ac.uk/people/Paolo.Missier">Paolo Missier</a></span>, Newcastle University</dd>
<dd><span><a href="http://soiland-reyes.com/stian/">Stian Soiland-Reyes</a></span>, University of Manchester</dd>
<dd><span><a href="http://tw.rpi.edu/web/person/StephanZednik">Stephan Zednik</a></span>, Rensselaer Polytechnic Institute</dd>
</dl><p class="copyright"><a href="http://www.w3.org/Consortium/Legal/ipr-notice#Copyright">Copyright</a> © 2012 <a href="http://www.w3.org/"><acronym title="World Wide Web Consortium">W3C</acronym></a><sup>®</sup> (<a href="http://www.csail.mit.edu/"><acronym title="Massachusetts Institute of Technology">MIT</acronym></a>, <a href="http://www.ercim.eu/"><acronym title="European Research Consortium for Informatics and Mathematics">ERCIM</acronym></a>, <a href="http://www.keio.ac.jp/">Keio</a>), All Rights Reserved. <acronym title="World Wide Web Consortium">W3C</acronym> <a href="http://www.w3.org/Consortium/Legal/ipr-notice#Legal_Disclaimer">liability</a>, <a href="http://www.w3.org/Consortium/Legal/ipr-notice#W3C_Trademarks">trademark</a> and <a href="http://www.w3.org/Consortium/Legal/copyright-documents">document use</a> rules apply.</p><hr /></div>
<div id="abstract" class="introductory section"><h2>Abstract</h2>
<p>
This document provides an intuitive introduction and guide to the
PROV data model for provenance [<cite><a class="bibref" rel="biblioentry" href="#bib-PROV-DM">PROV-DM</a></cite>]. PROV-DM is a core data model for
provenance for building representations of the entities, people and
processes involved in producing a piece of data or thing in the world.
This primer explains the fundamental PROV-DM concepts in non-normative
terms, and provides worked examples applying the PROV-O OWL2
ontology [<cite><a class="bibref" rel="biblioentry" href="#bib-PROV-O">PROV-O</a></cite>]. The primer is intended as a starting point for those wishing
to create or make use of PROV-DM data.
</p>
<!-- p>
This is a document for internal discussion, which will ultimately
evolve in the first Public Working Draft of the Primer.</p -->
</div><div id="sotd" class="introductory section"><h2>Status of This Document</h2><p><em>This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current <acronym title="World Wide Web Consortium">W3C</acronym> publications and the latest revision of this technical report can be found in the <a href="http://www.w3.org/TR/"><acronym title="World Wide Web Consortium">W3C</acronym> technical reports index</a> at http://www.w3.org/TR/.</em></p>
This document is part of a set of specifications aiming to define the
various aspects that are necessary to achieve the vision of
interoperable interchange of provenance information in heterogeneous
environments such as the Web. This document is a non-normative,
intuitive introduction and guide to the [<cite><a class="bibref" rel="biblioentry" href="#bib-PROV-DM">PROV-DM</a></cite>] data model for
provenance. It includes simple worked examples applying the [<cite><a class="bibref" rel="biblioentry" href="#bib-PROV-O">PROV-O</a></cite>]
OWL2 ontology. The document is expected to become a Note once it is stable.
<p>This document was published by the <a href="http://www.w3.org/2011/prov/">Provenance Working Group</a> as a First Public Working Draft.
If you wish to make comments regarding this document, please send them to <a href="mailto:public-prov-wg@w3.org">public-prov-wg@w3.org</a> (<a href="mailto:public-prov-wg-request@w3.org?subject=subscribe">subscribe</a>, <a href="http://lists.w3.org/Archives/Public/public-prov-wg/">archives</a>). All feedback is welcome.</p><p>Publication as a Working Draft does not imply endorsement by the <acronym title="World Wide Web Consortium">W3C</acronym> Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.</p><p>This document was produced by a group operating under the <a href="http://www.w3.org/Consortium/Patent-Policy-20040205/">5 February 2004 <acronym title="World Wide Web Consortium">W3C</acronym> Patent Policy</a>. The group does not expect this document to become a <acronym title="World Wide Web Consortium">W3C</acronym> Recommendation. <acronym title="World Wide Web Consortium">W3C</acronym> maintains a <a href="http://www.w3.org/2004/01/pp-impl/46974/status" rel="disclosure">public list of any patent disclosures</a> made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains <a href="http://www.w3.org/Consortium/Patent-Policy-20040205/#def-essential">Essential Claim(s)</a> must disclose the information in accordance with <a href="http://www.w3.org/Consortium/Patent-Policy-20040205/#sec-Disclosure">section 6 of the <acronym title="World Wide Web Consortium">W3C</acronym> Patent Policy</a>.</p></div><div id="toc" class="section"><h2 class="introductory">Table of Contents</h2><ul class="toc"><li class="tocline"><a href="#introduction" class="tocxref"><span class="secno">1. </span>Introduction</a></li><li class="tocline"><a href="#intuitive-overview-of-prov-dm" class="tocxref"><span class="secno">2. </span>Intuitive overview of PROV-DM</a><ul class="toc"><li class="tocline"><a href="#entities" class="tocxref"><span class="secno">2.1 </span>Entities</a></li><li class="tocline"><a href="#activities" class="tocxref"><span class="secno">2.2 </span>Activities</a></li><li class="tocline"><a href="#use-and-generation" class="tocxref"><span class="secno">2.3 </span>Use and Generation</a></li><li class="tocline"><a href="#agents" class="tocxref"><span class="secno">2.4 </span>Agents</a></li><li class="tocline"><a href="#roles" class="tocxref"><span class="secno">2.5 </span>Roles</a></li><li class="tocline"><a href="#revisions-and-derivation" class="tocxref"><span class="secno">2.6 </span>Revisions and Derivation</a></li></ul></li><li class="tocline"><a href="#examples-of-use-of-the-prov-o-ontology" class="tocxref"><span class="secno">3. </span>Examples of Use of the PROV-O Ontology</a><ul class="toc"><li class="tocline"><a href="#entities-1" class="tocxref"><span class="secno">3.1 </span>Entities</a></li><li class="tocline"><a href="#activities-1" class="tocxref"><span class="secno">3.2 </span>Activities</a></li><li class="tocline"><a href="#use-and-generation-1" class="tocxref"><span class="secno">3.3 </span>Use and Generation</a></li><li class="tocline"><a href="#agents-1" class="tocxref"><span class="secno">3.4 </span>Agents</a></li><li class="tocline"><a href="#roles-1" class="tocxref"><span class="secno">3.5 </span>Roles</a></li><li class="tocline"><a href="#revision-and-derivation" class="tocxref"><span class="secno">3.6 </span>Revision and Derivation</a></li></ul></li><li class="tocline"><a href="#frequently-asked-questions" class="tocxref"><span class="secno">4. </span>Frequently asked questions</a></li><li class="tocline"><a href="#abstract-syntax-notation-for-examples" class="tocxref"><span class="secno">A. </span>Abstract Syntax Notation for Examples</a><ul class="toc"><li class="tocline"><a href="#entities-2" class="tocxref"><span class="secno">A.1 </span>Entities</a></li><li class="tocline"><a href="#activities-2" class="tocxref"><span class="secno">A.2 </span>Activities</a></li><li class="tocline"><a href="#use-and-generation-2" class="tocxref"><span class="secno">A.3 </span>Use and Generation</a></li><li class="tocline"><a href="#agents-2" class="tocxref"><span class="secno">A.4 </span>Agents</a></li><li class="tocline"><a href="#roles-2" class="tocxref"><span class="secno">A.5 </span>Roles</a></li><li class="tocline"><a href="#revision-and-derivation-1" class="tocxref"><span class="secno">A.6 </span>Revision and Derivation</a></li></ul></li><li class="tocline"><a href="#acknowledgements" class="tocxref"><span class="secno">B. </span>Acknowledgements</a></li><li class="tocline"><a href="#references" class="tocxref"><span class="secno">C. </span>References</a><ul class="toc"><li class="tocline"><a href="#normative-references" class="tocxref"><span class="secno">C.1 </span>Normative references</a></li><li class="tocline"><a href="#informative-references" class="tocxref"><span class="secno">C.2 </span>Informative references</a></li></ul></li></ul></div>
<div id="introduction" class="section">
<!-- OddPage -->
<h2><span class="secno">1. </span>Introduction</h2>
<p>
This primer document provides an accessible introduction to the PROV Data Model
([<cite><a class="bibref" rel="biblioentry" href="#bib-PROV-DM">PROV-DM</a></cite>]) standard for representing provenance on the Web, and its representation
in the PROV Ontology ([<cite><a class="bibref" rel="biblioentry" href="#bib-PROV-O">PROV-O</a></cite>]). Provenance describes
the origins of things, so PROV-DM data consists of assertions about the past.
</p>
<p>
This primer document aims to ease the adoption of the standard by providing:
</p>
<ul>
<li>An intuitive explanation of how PROV-DM models provenance.</li>
<li>Worked examples that can be followed to produce your own PROV-DM data.</li>
<li>Answers to frequently asked questions regarding how the model should be applied.</li>
</ul>
<p>
The <i>provenance</i> of digital objects represents their origins. The PROV-DM is a
proposed standard to represent provenance records, which contain <i>assertions</i> about the entities
and activities involved in producing and delivering or otherwise influencing a
given object. By knowing the provenance of an object, we can make determinations
about how to use it. Provenance records can be used for many purposes, such as
understanding how data was collected so it can be meaningfully used, determining
ownership and rights over an object, making judgments about information to
determine whether to trust it, verifying that the process and steps used to obtain a
result complies with given requirements, and reproducing how something it was generated.
</p>
<p>
As a standard for provenance, PROV-DM accommodates all those different uses
of provenance. Different people may have different perspectives on provenance,
and as a result different types of information might be captured in provenance records.
One perspective might focus on <i>agent-centered provenance</i>, that is, what entities
were involved in generating or manipulating the information in question. For example,
in the provenance of a picture in a news article we might capture the photographer who
took it, the person that edited it, and the newspaper that published it. A second perspective
might focus on <i>object-centered provenance</i>, by tracing the origins of portions of a
document to other documents. An example is having a web page that was assembled from content
from a news article, quotes of interviews with experts, and a chart that plots data from a
government agency. A third perspective one might take is on <i>process-centered provenance</i>,
capturing the actions and steps taken to generate the information in question. For example, a
chart may have been generated by invoking a service to retrieve data from a database, then
extracting certain statistics from the data using some statistics package, and finally
processing these results with a graphing tool.
</p>
<p>
Provenance records are metadata. There are other kinds of metadata that is
not provenance. For example, the size of an image is a metadata property of
that image but it is not provenance.
</p>
<p>
For general background on provenance, a
comprehensive overview of requirements, use cases, prior research, and proposed
vocabularies for provenance are available from the
<a href="http://www.w3.org/2005/Incubator/prov/XGR-prov/">Final Report of the <acronym title="World Wide Web Consortium">W3C</acronym> Provenance Incubator Group</a>.
That document contains three general scenarios
that may help identify the provenance aspects of your planned applications and
help plan the design of your provenance system.
</p>
<p>
The next section gives an introductory overview of PROV-DM using simple examples.
The following section shows how the formal ontology PROV-O can be used to represent the PROV-DM assertions
as RDF triples. The document also contains frequently asked questions, and an appendix giving example
snippets of the PROV-DM Abstract Syntax Notation (ASN).
For a detailed description of [<cite><a class="bibref" rel="biblioentry" href="#bib-PROV-DM">PROV-DM</a></cite>] and [<cite><a class="bibref" rel="biblioentry" href="#bib-PROV-O">PROV-O</a></cite>], please refer to the respective documents.
</p>
</div>
<div id="intuitive-overview-of-prov-dm" class="section">
<!-- OddPage -->
<h2><span class="secno">2. </span>Intuitive overview of PROV-DM</h2>
<p>This section provides an intuitive explanation of the concepts in PROV-DM.
As with the rest of this document, it should be treated as a starting point for
understanding the model, and not normative in itself. The PROV-DM model specification
provides precise definitions and constraints to be used.</p>
<div class="note">
Please note that, as they
are being developed in parallel, there will be points at which this document
does not yet exactly match the current data model or ontology.
</div>
<p>
The following ER diagram provides a high level overview of the <strong>structure of PROV-DM records</strong>.
The diagram is the same that appears in the [<cite><a class="bibref" rel="biblioentry" href="#bib-PROV-DM">PROV-DM</a></cite>],
but note that this primer document only describes some of the terms shown in the diagram.
</p>
<div style="text-align: center;">
<img src="overview.png" alt="PROV-DM overview" />
</div>
<div id="entities" class="section">
<h3><span class="secno">2.1 </span>Entities</h3>
<p>
In PROV-DM, the things that one may ask the provenance of are called <i>entities</i>.
Examples of such entities are a web page, a chart, and a spellchecker.
</p>
<p>
An entity’s provenance may refer to many other entities. For example, a document D is
an entity whose provenance refers to other entities such as a chart inserted into D,
the dataset that was used to create that chart, or the author of the document.
</p>
<p>
Entities may be described from different perspectives that may be more or less specific. For example,
document D as stored in my file system, the second version of document D after someone edited it,
and D as an evolving document,
are three distinct entities for which we may describe the provenance. They
may all be perspectives on the same thing in the world (document D may exist only
in its second version and on my file system), but are <i>characterized</i> in
different ways by being described using different <i>attributes</i> (version, location, and
so on).
</p>
<p>
The characterization of an entity means that the provenance assertions
about the entity are only about the thing when it has those attributes.
For example, the second version of document D is characterized by being the
second version, and so assertions about who reviewed that entity apply only
to the document as it is in its second version. When the document becomes
the third version, a new entity exists (the third version of D) and the
provenance assertions about who reviewed the second version do not apply.
</p>
</div>
<div id="activities" class="section">
<h3><span class="secno">2.2 </span>Activities</h3>
<p>
Activities are how entities come into
existence and how their attributes change to become new entities,
often making use of previously existing entities to achieve this.
For example, if the second version of document D was generated
by a translation from the first version of the document in another language,
then this translation is an activity.
An activity may have either already occurred or be still
taking place when a new entity is generated.
While entities are static aspects in the world (things), <i>activities</i> are
dynamic aspects (actions, processes, etc.)
</p>
</div>
<div id="use-and-generation" class="section">
<h3><span class="secno">2.3 </span>Use and Generation</h3>
<p>
Activities <i>generate</i> new entities.
For example, writing a document brings the document into existence, while
revising the document brings a new version into existence.
</p>
<p>
Activities also make <i>use</i> of entities. For example, revising a document
to fix spelling mistakes uses the original version of the document as well
as a list of corrections.
</p>
<p>
Assertions can be made in a provenance record to state that
particular activities used or generated particular entities.
</p>
</div>
<div id="agents" class="section">
<h3><span class="secno">2.4 </span>Agents</h3>
<p>
An agent is a type of entity that takes an active role in an activity such
that it can be assigned some degree of responsibility for the activity taking
place. An agent can be a person, a piece of software, or an inanimate object.
Several agents can be associated with an activity.
Consider a chart displaying some statistics
regarding crime rates over time in a linear regression. To represent the
provenance of a that chart, we could state that the person who created the
chart was an agent involved in its creation, and that the software used to
create the chart was also an agent involved in that activity.
</p>
<p>
Since agents are a kind of entity, it is therefore possible to
associate provenance records with the agents themselves.
In the running example, we
can also represent the provenance of the software used to create the chart, and specify the agents involved in
producing that software, such as the vendor.
</p>
</div>
<div id="roles" class="section">
<h3><span class="secno">2.5 </span>Roles</h3>
<p>
A <i>role</i> is a description of the function or the part that an entity
played in an activity. Roles specify
the relationship between an entity and an activity, whether
how an activity used an entity or generated an entity. Roles also specify how agents are
involved in an activity, qualifying their participation in the activity or
specifying what agents controlled it.
For example, an agent may play the role of &quot;editor&quot; in an activity that uses
one entity in the role of &quot;document to be edited&quot; and another in the role of
&quot;addition to be made to the document&quot;, to generate a further entity in the role of &quot;edited document&quot;.
Roles are application specific.
</p>
<!-- p>Roles are intended as an extension point in the model; it is expected users will define and use custom role taxonomies. Role interpretation is application specific.</p -->
</div>
<div id="revisions-and-derivation" class="section">
<h3><span class="secno">2.6 </span>Revisions and Derivation</h3>
<p>
A given entity, such as a document, may go through multiple <i>revisions</i>
(also called versions and other comparable terms) over time. Between revisions,
one or more attributes of the entity may change.
The result of each revision is a new entity,
and PROV-DM allows one to relate those entities by making an assertion that
one is a revision of another.
</p>
<p>
When one entity's existence, content, characteristics and so on are
at least partly due to another entity, then we say that the former is
<i>derived</i> from the latter. For example, one document may contain
material copied from another,
and a chart is derived from the data that is used to create it.
</p>
<p>
There are different kinds of derivation expressible in PROV-DM. For
example, the data may be normalized before creating the chart.
In PROV-DM terms, we say that the chart <i>was derived from</i>
the normalized data and <i>was eventually derived from</i> the original data.
</p>
</div>
</div>
<div id="examples-of-use-of-the-prov-o-ontology" class="section">
<!-- OddPage -->
<h2><span class="secno">3. </span>Examples of Use of the PROV-O Ontology</h2>
<p>In the following sections, we show how PROV-DM can be used to model
provenance in specific examples.</p>
<p>We include examples of how the formal ontology PROV-O
can be used to represent the PROV-DM assertions as RDF triples.
These are shown using the Turtle notation. In
the latter depictions, the namespace prefix <b>prov</b> denotes
terms from the Prov ontology, while <b>ex1</b>, <b>ex2</b>, etc.
denote terms specific to the example.</p>
<p>We also provide a representation of the examples in the Abstract
Syntax Model ASM used in the conceptual model document. The full ASM data
for the examples in this section is
included in the appendix.</p>
<div id="entities-1" class="section">
<h3><span class="secno">3.1 </span>Entities</h3>
<p>
An online newspaper publishes an article with a chart about crime statistics making using of data (GovData) provided through a government portal.
The article includes a chart based on the data, with data values aggregated by
geographical regions.
</p>
<p>
A blogger, Betty, looking at the article, spots what she thinks to be an error in the chart.
Betty retrieves the provenance record of the article, how it was created.
</p>
<p>Betty would find the following assertions about entities in the provenance record:</p>
<pre class="turtle example">ex1:newspaper1 a prov:Entity .
ex1:article1 a prov:Entity .
ex1:regionList1 a prov:Entity .
ex1:aggregate1 a prov:Entity .
ex1:chart1 a prov:Entity .</pre>
<p>
These statements, in order, assert that there is a newspaper (<code>ex1:newspaper1</code>) and an article (<code>ex1:article1</code>),
that the original data set is an entity (<code>ex1:dataSet1</code>),
there is a list of regions
(<code>ex1:regionList1</code>) that is an entity, that the data aggregated by region is an entity (<code>ex1:aggregate1</code>),
and that the chart (<code>ex1:chart1</code>) is an entity.
</p>
</div>
<div id="activities-1" class="section">
<h3><span class="secno">3.2 </span>Activities</h3>
<p>
Further, the provenance record asserts that there was
an activity (<code>ex1:compiled</code>) denoting the compilation of the
chart from the data set.
</p>
<pre class="turtle example">ex1:compiled a prov:Activity .</pre>
<p>
The provenance record also includes reference to the more specific steps involved in this compilation,
which are first aggregating the data by region and then generating the chart graphic.
</p>
<pre class="turtle example">ex1:aggregated a prov:Activity .
ex1:illustrated a prov:Activity .</pre>
</div>
<div id="use-and-generation-1" class="section">
<h3><span class="secno">3.3 </span>Use and Generation</h3>
<p>
Finally, the provenance record asserts the key relations among the above
entities and activities, i.e. the use of an entity by an activity,
or the generation of an entity by an activity.
</p>
<p>
For example, the assertions below state that the aggregation activity
(<code>ex1:aggregated</code>) used the original data set, that it used the list of
regions, and that the aggregated data was generated by this activity.
</p>
<pre class="turtle example">ex1:aggregated prov:used ex1:dataSet1 ;
prov:used ex1:regionList1 .
ex1:aggregate1 prov:wasGeneratedBy ex1:aggregated .</pre>
<p>
Similarly, the chart graphic creation activity (<code>ex1:illustrated</code>)
used the aggregated data, and the chart was generated by this activity.
</p>
<pre class="turtle example">ex1:illustrated prov:used ex1:aggregate1 .
ex1:chart1 prov:wasGeneratedBy ex1:illustrated .</pre>
<!-- p>
For example, the provenance declares the event (of type <code>prov:Usage</code>)
where the aggregation activity used the GovData data set, and the event
(of type <code>prov:Generation</code>) where the same activity generated
the data aggregated by region.
</p>
<pre class="turtle example">
ex1:dataSet1Usage a prov:Usage .
ex1:aggregate1Generation a prov:Generation .
</pre>
<p>
To describe these events, the provenance says within which activity
they occur and what entity is used or generated.
</p>
<pre class="turtle example">
ex1:aggregated prov:qualifiedUsage ex1:dataSet1Usage .
ex1:aggregated prov:qualifiedGeneration ex1:aggregate1Generation .
ex1:dataSet1Usage prov:entity ex1:dataSet1 .
ex1:aggregate1Generation prov:entity ex1:aggregate1 .
</pre>
<p>
Comparable events are described for the activity of generating the chart image
from the aggregated data.
</p>
<pre class="turtle example">
ex1:aggregate1Usage a prov:Usage .
ex1:chart1Generation a prov:Generation .
ex1:illustrated prov:qualifiedUsage ex1:aggregate1Usage .
ex1:illustrated prov:qualifiedGeneration ex1:chart1Generation .
ex1:aggregate1Usage prov:entity ex1:aggregate1 .
ex1:chart1Generation prov:entity ex1:chart1 .
</pre>
<p>
From this information Betty can see that
the mistake could have been in the original data set or else was introduced
in the compilation activity, and sets out to discover which.
</p>
</p -->
</div>
<div id="agents-1" class="section">
<h3><span class="secno">3.4 </span>Agents</h3>
<p>
Digging deeper, Betty wants to know who compiled the chart.
Betty sees that Derek was involved in both the aggregation and
chart creation activities:
</p>
<pre class="turtle example">ex1:aggregated prov:wasControlledBy ex1:derek .
ex1:illustrated prov:wasControlledBy ex1:derek .</pre>
<p>
The record for Derek provides the
following information, of which the first line is a PROV-O statement that
Derek is an agent, followed by statements about general properties of Derek.
</p>
<pre class="turtle example">ex1:derek a prov:Agent ;
a foaf:Person ;
foaf:givenName &quot;Derek&quot;^^xsd:string ;
foaf:mbox &lt;mailto:dererk@example.org&gt; .</pre>
</div>
<div id="roles-1" class="section">
<h3><span class="secno">3.5 </span>Roles</h3>
<p>
For Betty to understand where the error lies, she needs to have more detailed
information on how entities have been used in, participated in, and generated
by activities. Betty has determined that <code>ex1:aggregated</code> used
entities <code>ex1:regionList1</code> and <code>ex1:dataSet1</code>, but she does not
know what function these entities played in the processing. Betty
also knows that <code>ex1:derek</code> controlled the activities, but she does
not know if Derek was the analyst responsible for determining how the data
should be aggregated.
</p>
<p>
The above information is described as roles in the provenance records. The aggregation
activity involved entities in four roles: the data to be aggregated (<code>ex1:dataToAggregate</code>),
the regions to aggregate by (<code>ex1:regionsToAggregateBy</code>), the
resulting aggregated data (<code>ex1:aggregatedData</code>), and the
analyst doing the aggregation (<code>ex1:analyst</code>).
</p>
<pre class="turtle example">ex1:dataToAggregate a prov:Role .
ex1:regionsToAggregateBy a prov:Role .
ex1:aggregatedData a prov:Role .
ex1:analyst a prov:Role .</pre>
<p>
In addition to the simple facts that the aggregation activity used, generated or
was controlled by entities/agents as described in the sections above, the
provenance record contains more details of <i>how</i> these entities and agents
were involved, i.e. the roles they played. For example, the assertions below state
that the aggregation activity (<code>ex1:aggregated</code>) included the usage
of the government data set (<code>ex1:dataSet1</code>) in the role of the data
to be aggregated (<code>ex1:dataToAggregate</code>).
</p>
<pre class="turtle example">ex1:aggregated prov:hadQualifiedUsage [ a prov:Usage ;
prov:hadQualifiedEntity ex1:dataSet1 ;
prov:hadRole ex1:dataToAggregate ] .</pre>
<p>
This can then be distinguished from the same activity's usage of the list of
regions because the roles played are different.
</p>
<pre class="turtle example">ex1:aggregated prov:hadQualifiedUsage [ a prov:Usage ;
prov:hadQualifiedEntity ex1:regionList1 ;
prov:hadRole ex1:regionsToAggregateBy ] .</pre>
<p>
Similarly, the provenance includes assertions that the same activity was
controlled in a particular way (<code>ex1:analyst</code>) by Derek, and that
the entity <code>ex1:aggregate1</code> took the role of the aggregated
data in what the activity generated.
</p>
<pre class="turtle example">ex1:aggregated
prov:hadQualifiedControl [ a prov:Control ;
prov:hadQualifiedEntity ex1:derek ;
prov:hadRole ex1:analyst
] ;
prov:hadQualifiedGeneration [ a prov:Generation ;
prov:hadQualifiedEntity ex1:aggregate1 ;
prov:hadRole ex1:aggregatedData
] .</pre>
</div>
<div id="revision-and-derivation" class="section">
<h3><span class="secno">3.6 </span>Revision and Derivation</h3>
<p>
After looking at the detail of the compilation activity, there appears
to be nothing wrong, so Betty concludes the error is in the government dataset.
She looks at the characterization of the dataset <code>ex1:dataSet1</code>,
and sees that it is missing data from one of the zipcodes in the area. She contacts
the government, and a new version of GovData is created, declared to be the
next revision of the data by Edith. The provenance record of this new dataset,
<code>ex1:dataSet2</code>, states that it is a revision of the
old data set, <code>ex1:dataSet1</code>.
</p>
<pre class="turtle example">ex1:dataSet2 prov:wasRevisionOf ex1:dataSet1 .</pre>
<p>
Derek notices that there is a new dataset available and creates a new chart based on the revised data,
using the same compilation activity as before. Betty checks the article again at a
later point, and wants to know if it is based on the old or new GovData.
She sees two new assertions about derivation in the provenance data, plus
an assertion about how the new chart was generated.
</p>
<pre class="example turtle">ex1:chart2 prov:wasEventuallyDerivedFrom ex1:dataSet2 .
ex1:chart2 prov:wasDerivedFrom ex1:dataSet2 .
ex1:chart2 prov:wasGeneratedBy ex1:compiled2 .</pre>
<p>
She interprets these assertions as follows. The first says that the new chart
is as it because of the revised
data set, i.e. there is an explicit influence of the data on the chart.
Finally, the third and fourth assertions together say further that it was
the activity <code>ex1:compiled2</code> that derived the new chart
from the revised data set.
</p>
</div>
</div>
<div id="frequently-asked-questions" class="section">
<!-- OddPage -->
<h2><span class="secno">4. </span>Frequently asked questions</h2>
</div>
<div class="appendix section" id="abstract-syntax-notation-for-examples">
<!-- OddPage -->
<h2><span class="secno">A. </span>Abstract Syntax Notation for Examples</h2>
<p>
Below we give translations of the working example snippets into the PROV-DM
abstract syntax notation (ASN).
</p>
<div id="entities-2" class="section">
<h3><span class="secno">A.1 </span>Entities</h3>
<pre class="example asn">entity(ex1:dataSet1).
entity(ex1:regionList1).
entity(ex1:aggregate1).
entity(ex1:chart1).</pre>
</div>
<div id="activities-2" class="section">
<h3><span class="secno">A.2 </span>Activities</h3>
<pre class="example asn">activity(ex1:compiled).
activity(ex1:aggregated).
activity(ex1:illustrated).</pre>
<!--
<p>
In the first assertion above, 'compilation_step' is an optional reference to the 'recipe' that describes
what the 'compiled' activity did. The interpretation of its name,
'compilation_step', is left to applications (it is not further resolved within PROV-DM).
</p>
<p>
In the second assertion, optional 'recipe' has been omitted.
</p>
-->
<!-- PM comment: here readers will be confused by the processExecutiion / activity disconnect!
also this does not show start/end times, optional attributes. At least one example would be useful -->
</div>
<div id="use-and-generation-2" class="section">
<h3><span class="secno">A.3 </span>Use and Generation</h3>
<pre class="example asn">used(ex1:aggregated, ex1:dataSet1).
used(ex1:aggregated, ex1:regionList1).
wasGeneratedBy(ex1:aggregate1, ex1:aggregated).
used(ex1:illustrated, ex1:aggregate1).
wasGeneratedBy(ex1:chart1, ex1:illustrated).</pre>
</div>
<div id="agents-2" class="section">
<h3><span class="secno">A.4 </span>Agents</h3>
<pre class="example asn">entity(ex1:derek, [ type=&quot;foaf:Person&quot;, foaf:givenName = &quot;Derek&quot;,
foaf:mbox= &quot;&lt;mailto:derek@example.org&gt;&quot;]).
agent(ex1:derek).
wasControlledBy(ex1:aggregated, ex1:derek).
wasControlledBy(ex1:illustrated, ex1:derek).</pre>
</div>
<div id="roles-2" class="section">
<h3><span class="secno">A.5 </span>Roles</h3>
<p>
Roles are not declared directly in PROV-DM, rather they are attributes of
relations. Thus, the entire Turtle example in sec. 3.5 is rendered as follows:
</p>
<pre class="example asn">used(ex1:aggregated, ex1:dataSet1, [ prov:role = &quot;dataToAggregate&quot;]).
used(ex1:aggregated, ex1:regionList1, [ prov:role = &quot;regionsToAggregteBy&quot;]).</pre>
<p>
In the first assertion above, note that this adds a &quot;role&quot; attribute to the first 'used' assertion of Ex. 3.
Similarly in the second assertion, we have added a &quot;role&quot; attribute to the second 'used' assertion of Ex. 3.
</p>
</div>
<div id="revision-and-derivation-1" class="section">
<h3><span class="secno">A.6 </span>Revision and Derivation</h3>
<pre class="example asn">wasRevisionOf(ex1:dataSet2, ex1:dataSet1).</pre>
<pre class="example asn">wasEventuallyDerivedFrom(ex1:chart2, ex1:dataSet2).
wasDerivedFrom(ex1:chart2, ex1:dataSet2).
wasGeneratedBy(ex1:chart2, ex1:compiled2).</pre>
</div>
</div>
<div class="appendix section" id="acknowledgements">
<!-- OddPage -->
<h2><span class="secno">B. </span>Acknowledgements</h2>
<p>
The Provenance Working Group members.
</p>
</div>
<div id="references" class="appendix section">
<!-- OddPage -->
<h2><span class="secno">C. </span>References</h2><div id="normative-references" class="section"><h3><span class="secno">C.1 </span>Normative references</h3><p>No normative references.</p></div><div id="informative-references" class="section"><h3><span class="secno">C.2 </span>Informative references</h3><dl class="bibliography"><dt id="bib-PROV-DM">[PROV-DM]</dt><dd>Luc Moreau, Paolo Missier<a href="http://www.w3.org/TR/2011/WD-prov-dm-20111215/"><cite>The PROV Data Model and Abstract Syntax Notation</cite></a>. 15 December 2011. W3C Working Draft. (Work in progress.) URL: <a href="http://www.w3.org/TR/2011/WD-prov-dm-20111215/">http://www.w3.org/TR/2011/WD-prov-dm-20111215/</a>
</dd><dt id="bib-PROV-O">[PROV-O]</dt><dd>Satya Sahoo, Deborah McGuinness<a href="http://www.w3.org/TR/2011/WD-prov-o-20111213/"><cite>The PROV Ontology: Model and Formal Semantics</cite></a>. 13 December 2011. W3C Working Draft. (Work in progress.) URL: <a href="http://www.w3.org/TR/2011/WD-prov-o-20111213/">http://www.w3.org/TR/2011/WD-prov-o-20111213</a>
</dd></dl></div></div></body></html>