You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
644 lines
36 KiB
644 lines
36 KiB
<?xml version="1.0" encoding="UTF-8"?>
|
|
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
|
|
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
|
|
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
|
|
<head>
|
|
<meta http-equiv="content-type" content="text/html; charset=UTF-8" />
|
|
<meta name="RCS-Id"
|
|
content="$Id: Overview.html,v 1.7 2008/06/07 08:29:30 eric Exp $" />
|
|
<title>Experiences with the conversion of SenseLab databases to
|
|
RDF/OWL</title>
|
|
<style type="text/css">
|
|
|
|
|
|
/*<![CDATA[*/
|
|
.mesh { background-color: #ffc }
|
|
.goa { background-color: #fcf }
|
|
.glbl { background-color: #ccf }
|
|
.plbl { background-color: #cfc }
|
|
.var { font-weight: bold }
|
|
.db { font-weight: bold }
|
|
.gene { color: blue }
|
|
.process{ color: red }
|
|
.senselab { background-color: #0ff }
|
|
.identifier { font-weight: bold }
|
|
.comment{ color: orange; font-size: 1.3em; }
|
|
.schema th { text-align: left }
|
|
table, td, th { border-style: solid;
|
|
border-width: 1px;
|
|
border-color: black;
|
|
border-bottom-color: gray;
|
|
border-right-color: gray; }
|
|
table.dbsTable { border-collapse: collapse; border-color: #000000; }
|
|
table.dbsTable td:first-child { vertical-align: top; }
|
|
table.dbsTable td { padding: 2px 5px 2px 5px; }
|
|
.at-issue {text-decoration: underline;}
|
|
.issue {background-color: #fcc;}
|
|
/*]]>*/
|
|
</style>
|
|
<link rel="stylesheet" type="text/css" href="http://www.w3.org/StyleSheets/TR/W3C-IG-NOTE" />
|
|
</head>
|
|
|
|
<body>
|
|
|
|
<div class="head">
|
|
<p><a href="http://www.w3.org/"><img src="http://www.w3.org/Icons/w3c_home"
|
|
alt="W3C" height="48" width="72" /></a></p>
|
|
|
|
<h1 id="main">Experiences with the conversion of SenseLab databases to
|
|
RDF/OWL</h1>
|
|
<h2 class="no-num no-toc" id="w3c-doctype">W3C Interest Group Note 4 June 2008</h2>
|
|
<dl>
|
|
<!-- dt>Editors working draft.</dt>
|
|
<dd><span class="cvs-id">$Revision: 1.7 $ of
|
|
$Date: 2008/06/07 08:29:30 $</span></dd -->
|
|
<dt>This version:</dt>
|
|
<dd><a href="http://www.w3.org/TR/2008/NOTE-hcls-senselab-20080604/">http://www.w3.org/TR/2008/NOTE-hcls-senselab-20080604/</a></dd>
|
|
<dt>Latest version:</dt>
|
|
<dd><a href="http://www.w3.org/TR/hcls-senselab/">http://www.w3.org/TR/hcls-senselab/</a></dd>
|
|
<dt>Previous version:</dt>
|
|
<dd><a href="http://www.w3.org/TR/2008/WD-hcls-senselab-20080404/">http://www.w3.org/TR/2008/WD-hcls-senselab-20080404/</a></dd>
|
|
<dt>Editors:</dt>
|
|
<dd>Matthias Samwald, <a href="http://ycmi.med.yale.edu/">Yale Center for
|
|
Medical Informatics</a> / <a href="http://www.deri.ie/">DERI Galway</a>
|
|
/ <a href="http://www.semantic-web.at/">Semantic Web Company</a> <<a
|
|
href="mailto:samwald@gmx.at">samwald@gmx.at</a>></dd>
|
|
<dd>Kei-Hoi Cheung, Yale Center for Medical Informatics <<a
|
|
href="kei.cheung@yale.edu">kei.cheung@yale.edu</a>></dd>
|
|
<dt>Contributors:</dt>
|
|
<dd>Alan Ruttenberg, <a href="http://sciencecommons.org/">Science
|
|
Commons</a> <<a
|
|
href="mailto:alanruttenberg@gmail.com">alanruttenberg@gmail.com</a>></dd>
|
|
<dd>Huajun Chen, Yale Center for Medical Informatics / <a
|
|
href="http://www.zju.edu.cn/english/">Zhejiang University</a> <<a
|
|
href="mailto:huajunsir@zju.edu.cn">huajunsir@zju.edu.cn</a>></dd>
|
|
</dl>
|
|
|
|
<p class="copyright"><a href="http://www.w3.org/Consortium/Legal/ipr-notice#Copyright">Copyright</a> © 2008 <a href="http://www.w3.org/"><acronym title="World Wide Web Consortium">W3C</acronym></a><sup>®</sup> (<a href="http://www.csail.mit.edu/"><acronym title="Massachusetts Institute of Technology">MIT</acronym></a>, <a href="http://www.ercim.org/"><acronym title="European Research Consortium for Informatics and Mathematics">ERCIM</acronym></a>, <a href="http://www.keio.ac.jp/">Keio</a>), All Rights Reserved. W3C <a href="http://www.w3.org/Consortium/Legal/ipr-notice#Legal_Disclaimer">liability</a>, <a href="http://www.w3.org/Consortium/Legal/ipr-notice#W3C_Trademarks">trademark</a> and <a href="http://www.w3.org/Consortium/Legal/copyright-documents">document use</a> rules apply.</p>
|
|
</div>
|
|
<hr title="Separator for header" />
|
|
|
|
<div>
|
|
<h2 class="notoc" id="abstract">Abstract</h2>
|
|
|
|
<p>One of the challenges facing Semantic Web for Health Care and Life
|
|
Sciences is that of converting relational databases into Semantic Web format.
|
|
The issues and the steps involved in such a conversion have not been well
|
|
documented. To this end, we have created this document to describe the
|
|
process of converting SenseLab databases into OWL. SenseLab is a collection
|
|
of relational (Oracle) databases for neuroscientific research. The conversion
|
|
of these databases into RDF/OWL format is an important step towards realizing
|
|
the benefits of Semantic Web in integrative neuroscience research. This
|
|
document describes how we represented some of the SenseLab databases in
|
|
Resource Description Framework (RDF) and Web Ontology Language (OWL), and
|
|
discusses the advantages and disadvantages of these representations. Our OWL
|
|
representation is based on the reuse and extension of existing standard OWL
|
|
ontologies developed in the biomedical ontology communities. The purpose of
|
|
this document is to share our implementation experience with the
|
|
community.</p>
|
|
</div>
|
|
|
|
<div>
|
|
<h2 id="status">Status of This Document</h2>
|
|
|
|
<p><em>This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the <a href="http://www.w3.org/TR/">W3C technical reports index</a> at http://www.w3.org/TR/.</em></p>
|
|
|
|
<p>This is
|
|
an Interest Group Note
|
|
<!-- an editor's draft -->
|
|
of the <a href="http://www.w3.org/2001/sw/hcls/">Semantic Web in Health Care and Life Sciences Interest Group (HCLS)</a>, part of the <a href="http://www.w3.org/2001/sw/">W3C Semantic Web Activity</a>. It is considered stable and expected to be published as an Interest Group Note in May 2008.
|
|
|
|
This document serves as a companion to
|
|
<a href="http://www.w3.org/TR/2008/NOTE-hcls-kb-20080604/">A Prototype Knowledge Base for the Life Sciences</a>
|
|
and describes the process for integrating new data into an existing biological database. We hope other groups who plan to convert their databases into RDF/OWL format will benefit from this document.</p>
|
|
|
|
<p>The document was produced by the <a href="http://www.w3.org/2001/sw/hcls/">Semantic Web in Health Care and Life Sciences Interest Group (HCLS)</a>, part of the <a href="http://www.w3.org/2001/sw/">W3C Semantic Web Activity</a> (<a href="http://www.w3.org/2001/sw/hcls/charter">see charter</a>). Comments may be sent to the <a href="http://lists.w3.org/Archives/Public/public-semweb-lifesci/">publicly archived</a> <a href="mailto:public-semweb-lifesci@w3.org">public-semweb-lifesci@w3.org</a> mailing list. Feedback is encouraged, as is participation in the recently <a href="http://www.w3.org/2008/05/HCLSIGCharter">re-charted</a> HCLSIG. A <a href="WD2NOTE">list of changes since the last publication</a> is available.</p>
|
|
|
|
<p>Publication as an Interest Group Note does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.</p>
|
|
|
|
<p>This document was produced by a group operating under the disclosure
|
|
obligations of the <a href="http://www.w3.org/Consortium/Patent-Policy-20040205/">5 February 2004 W3C Patent Policy</a>. The group does
|
|
not expect this document to become a W3C Recommendation. An
|
|
individual who has actual knowledge of a patent which the individual
|
|
believes contains <a href="http://www.w3.org/Consortium/Patent-Policy-20040205/#def-essential">Essential Claim(s)</a> must disclose the information to
|
|
<a href="mailto:public-semweb-lifesci@w3.org">public-semweb-lifesci@w3.org</a> [<a href="http://lists.w3.org/Archives/Public/public-semweb-lifesci/">public archive</a>] in accordance with
|
|
in accordance with <a href="http://www.w3.org/Consortium/Patent-Policy-20040205/#sec-Disclosure">section 6 of the W3C Patent Policy</a>.</p>
|
|
|
|
</div>
|
|
<hr />
|
|
|
|
<div class="toc">
|
|
<h2 id="TOC">Table of Contents</h2>
|
|
<ul class="toc">
|
|
<li><a href="#process">Conversion process</a>
|
|
<ul>
|
|
<li><a href="#sources">Original data sources</a></li>
|
|
<li><a href="#first">Initial RDF and OWL conversions</a>
|
|
<ul>
|
|
<li><a href="#Motivation">Motivation</a></li>
|
|
<li><a href="#Process">Process</a></li>
|
|
<li><a href="#Outcome">Outcome </a></li>
|
|
</ul>
|
|
</li>
|
|
<li><a href="#revised">Revised OWL conversions</a>
|
|
<ul>
|
|
<li><a href="#Motivation1">Motivation</a></li>
|
|
<li><a href="#Process1">Process</a></li>
|
|
<li><a href="#Outcome1">Outcome</a></li>
|
|
</ul>
|
|
</li>
|
|
</ul>
|
|
</li>
|
|
<li><a href="#advantages">Advantages</a></li>
|
|
<li><a href="#disadvantages">Disadvantages</a></li>
|
|
<li><a href="#future">Future directions and plans</a></li>
|
|
<li><a href="#suggestions">Suggestions based on our experiences</a></li>
|
|
<li><a href="#conclusion">Conclusion</a></li>
|
|
<li><a href="#references">References</a></li>
|
|
<li><a href="#Acknowledg">Acknowledgements (Informative)</a></li>
|
|
</ul>
|
|
</div>
|
|
|
|
<hr />
|
|
|
|
<h2 id="process">Conversion process</h2>
|
|
|
|
<h3 id="sources">Original data sources</h3>
|
|
|
|
<p>The SenseLab databases can be accessed through a web interface at the
|
|
SenseLab web site [<a href="#ref-SENSELAB-WEB">SENSELAB-WEB</a>]. SenseLab is
|
|
divided into a number of specialised databases, of which we have converted
|
|
three to Semantic Web formats. These databases are NeuronDB, BrainPharm and
|
|
ModelDB. All databases are based on compartmental models of neurons. NeuronDB
|
|
contains descriptions of anatomic locations, cell architecture and
|
|
physiologic parameters of neuronal cells. The pilot BrainPharm database is
|
|
intended to support research on drugs for the treatment of neurological
|
|
disorders. It enhances the descriptions in a portion of NeuronDB with
|
|
descriptions of the actions of pathological and pharmacological agents.
|
|
ModelDB is a large repository of computational neuroscience models and
|
|
simulations. The mathematical models in ModelDB are annotated with references
|
|
to NeuronDB. Taken together, these databases allow the researcher to query
|
|
information and to run simulations pertaining to the function of neurons in
|
|
healthy and disease states. All databases contain extensive literature
|
|
references and excerpts from texts that have been used to curate the database
|
|
entries.</p>
|
|
|
|
<p>The databases are based on the "entity-attribute-value with classes and
|
|
relationships" (EAV/CR) schema [<a href="#ref-EAV-CR">EAV-CR</a>]. The data
|
|
can also be downloaded from the SenseLab Semantic Web development portal [<a
|
|
href="#ref-SENSELAB-SW">SENSELAB-SW</a>] as a database dump in Microsoft
|
|
Access format and as text.</p>
|
|
|
|
<h3 id="first">Initial RDF and OWL conversions</h3>
|
|
|
|
<h4 id="Motivation">Motivation</h4>
|
|
|
|
<p>Our motivation was to make the SenseLab databases available in RDF(S) [<a
|
|
href="#ref-RDF">RDFS</a>] (without OWL) and in OWL DL [<a
|
|
href="#ref-OWL-Overview">OWL Overview</a>]. The two versions were developed
|
|
in parallel in order to compare the difference between the conversion
|
|
processes and the outcomes. We wanted to explore the issues in mapping
|
|
relational databases to RDF/OWL structure. In addition, we wanted to explore
|
|
the possibility of automatic translation from EAV/CR to RDF.</p>
|
|
|
|
<h4 id="Process">Process</h4>
|
|
|
|
<p>We developed a converter application in Java that queried the SenseLab
|
|
database and wrote RDF/XML files. The conversion was fully automatic for the
|
|
RDF version, but required some manual editing for the OWL version.</p>
|
|
|
|
<h4 id="Outcome">Outcome </h4>
|
|
|
|
<p>These conversions were too tied to the original database structure, which
|
|
resulted in inconsistent OWL ontologies. Some shortcomings of the first
|
|
conversion to OWL were: </p>
|
|
<ul>
|
|
<li>'Part of' relations were incorrectly represented as subclass relations.
|
|
This seems to be one of the most common mistakes in ontology development
|
|
in general. </li>
|
|
<li>Class disjoints<a id="disjoint-ref" href="#disjoint">¹</a> were
|
|
missing which made it hard to find inconsistencies and data entry errors.
|
|
</li>
|
|
<li>After disjoints were introduced, we found some previously unidentified
|
|
inconsistencies with the help of OWL reasoners: some classes (e.g.
|
|
'GABA') were subclasses of both 'neurotransmitter' and 'receptor', which
|
|
was wrong. This was an artifact caused by the automated conversion --
|
|
both GABA transmitters and GABA receptors were simply labeled with 'GABA'
|
|
in the source database. The conversion algorithm generated URIs based on
|
|
these labels, so they were represented with identical URIs
|
|
(<code>http://neuroweb.med.yale.edu/senselab/neuron_ontology.owl#<strong>GABA</strong></code>).
|
|
This grave mistake would not have been noticed without the use of OWL
|
|
reasoning. </li>
|
|
<li>Some of the labels of entities generated by the Java converter were
|
|
very terse and not understandable outside the user interface of the
|
|
original database. For example, "Ded" was the label of the "distal part
|
|
of the dendrite". </li>
|
|
</ul>
|
|
|
|
<p id="disjoint"><a href="#disjoint-ref">¹</a> Disjoint classes are used in
|
|
OWL to assert that they have no members in common. Inferences from this can
|
|
be used to flag any inconsistent models.</p>
|
|
|
|
<h3 id="revised">Revised OWL conversions</h3>
|
|
|
|
<p>The revised OWL conversion was based on the first OWL conversions
|
|
described above. The design of the revised SenseLab ontologies follows the
|
|
"ontological realism" approach [<a href="#ref-SMITH-2004">SMITH-2004</a>].
|
|
This means that the revised ontologies are focused on direct representations
|
|
of physical objects and processes (e.g., neuronal cells, ionic currents), and
|
|
not on their abstractions (e.g., concepts or database entries). </p>
|
|
|
|
<h4 id="Motivation1">Motivation</h4>
|
|
|
|
<p>Manually correcting the logical inconsistencies in the first version of
|
|
the OWL ontology; making use of foundational ontologies (BFO, Relation
|
|
Ontology) where possible; mapping the ontology to other neuroscience
|
|
ontologies. </p>
|
|
|
|
<h4 id="Process1">Process</h4>
|
|
|
|
<p>An ontology containing basic class hierarchies and relations was manually
|
|
created, based on the structure of existing SenseLab databases. This basic
|
|
ontology could not be created from the database structure in an automated
|
|
process because this would not have resulted in a logically consistent
|
|
ontology. This ontology was edited by a domain expert, based on inspection
|
|
and manual editing with Protege 3.2 [<a href="#ref-PROTEGE">PROTEGE</a>] and
|
|
Topbraid Composer [<a href="#ref-TOPBRAID">TOPBRAID</a>]. The ontologies were
|
|
built upon established foundational ontologies in order to maximize the
|
|
interoperability with other existing and forthcoming biomedical Semantic Web
|
|
resources. The foundational ontologies used were:</p>
|
|
<ul>
|
|
<li>the Relation Ontology [<a href="#ref-RO">RO</a>] from the Open
|
|
Biomedical Ontologies repository [<a href="#ref-OBO">OBO</a>], which
|
|
defines basic relations such as 'part of', 'participant of' or 'contained
|
|
in'. </li>
|
|
<li>the Basic Formal Ontology [<a href="#ref-BFO">BFO</a>], which defines
|
|
basic classes such as 'process', 'object', 'quality' or 'function'. </li>
|
|
</ul>
|
|
|
|
<p>Based on this manually created basic ontology, the data from the SenseLab
|
|
databases were then automatically converted to OWL using programs written in
|
|
Java and Python. The automated export scripts extended the manually created
|
|
basic ontology through the creation of subclasses, OWL property restrictions
|
|
and individuals. The resulting ontologies show no clearly distinguishable
|
|
divide between the 'schema' and 'data'. </p>
|
|
|
|
<p>The OWL export of NeuronDB was based on a transformation from the EAV/CR
|
|
model of the SenseLab database to files in RDF/XML syntax by a Java program.
|
|
The export from ModelDB and BrainPharm was based on a simple flat text file
|
|
export of the databases. The text file exports were converted to RDF/XML
|
|
files with a Python script. </p>
|
|
|
|
<p>For mappings to external bioinformatics databases that did not yet offer
|
|
stable URIs for reference on the Semantic Web, we used the URI scheme for
|
|
database record identifiers established by Science Commons [<a
|
|
href="#ref-SC-URI">SC-URI</a>]. URIs for database records could simply be
|
|
generated by concatenating the record identifier to a predefined namespace.
|
|
For example, the Entrez Gene record with ID '3579' was identified by the URI
|
|
<a
|
|
href="http://purl.org/commons/record/ncbi_gene/3579"><code>http://purl.org/commons/record/ncbi_gene/<strong>3579</strong></code></a>,
|
|
the Uniprot record 'P46663' was identified by <a
|
|
href="http://purl.org/commons/record/uniprotkb/P46663"><code>http://purl.org/commons/record/uniprotkb/<strong>P46663</strong></code></a>
|
|
and the Pubmed record with ID '11160518' was identified by <a
|
|
href="http://purl.org/commons/record/pmid/11160518"><code>http://purl.org/commons/record/pmid/<strong>11160518</strong></code></a>.
|
|
The database entries were connected to the ontological representations of
|
|
real-word entities through relations such as
|
|
<code>has_nucleotide_sequence_described_by</code>. For example, the gene of
|
|
the Dopamine Receptor D1 (DRD1) is defined through a reference to NCBI record
|
|
1812, which contains a description of the sequence of this specific gene: </p>
|
|
|
|
<p><tt><http://purl.org/ycmi/senselab/neuron_ontology.owl#DRD1_Gene>
|
|
owl:equivalentClass _:property_restriction1 .</tt><br />
|
|
<tt>_:property_restriction1 owl:onProperty
|
|
senselab:has_nucleotide_sequence_described_by .</tt><br />
|
|
<tt>_:property_restriction1 owl:hasValue
|
|
<http://purl.org/commons/record/ncbi_gene/1812> .</tt></p>
|
|
|
|
<p>Mappings were made to the following ontologies: </p>
|
|
<ul>
|
|
<li>the BAMS ontology which was derived from the Brain Architecture
|
|
Management System [<a href="#ref-BAMS">BAMS</a>]</li>
|
|
<li>the Subcellular Anatomy Ontology (SAO) created by the Cell Centered
|
|
Database project. [<a href="#ref-SAO">SAO</a>]</li>
|
|
<li>the BirnLex ontology developed by members of the Biomedical Informatics
|
|
Research Network [<a href="#ref-BIRNLEX">BIRNLEX</a>]</li>
|
|
<li>the Common Anatomy Reference Ontology (CARO)<!-- [<a
|
|
href="#ref-CARO">CARO</a>] --></li>
|
|
<li>the Gene Ontology [<a href="#ref-GO">GO</a>]</li>
|
|
<li>the Ontology of Biomedical Investigation (OBI) [<a
|
|
href="#ref-OBI">OBI</a>]</li>
|
|
</ul>
|
|
|
|
<p>The mappings were made with the following cross-ontology relations: <a
|
|
href="http://www.w3.org/TR/owl-ref/#equivalentClass-def">owl:equivalentClass</a>,
|
|
<a href="http://www.w3.org/TR/rdf-schema/#ch_subclassof">rdfs:subClassOf</a>
|
|
and the <a href="http://www.obofoundry.org/ro/#details">"has part" relation
|
|
from the OBO relation ontology</a>. </p>
|
|
|
|
<p><img alt="Ontology import hierarchy" src="ontology_import_hierarchy.png"
|
|
/></p>
|
|
|
|
<p>Figure 1: Import hierarchy of OWL ontologies. Ontologies printed in bold
|
|
have been created by the SenseLab team, other ontologies have been created by
|
|
other groups. The arrows point from the imported ontology to the importing
|
|
ontology, e.g., the NeuronDB Ontology imports the Relation Ontology. Import
|
|
statements are transitive, e.g., the ModelDB Ontology imports both the
|
|
NeuronDB ontology and the Relation ontology.</p>
|
|
|
|
<p></p>
|
|
|
|
<p><img alt="Examples of ontology mappings" src="examples-of-mappings.png"
|
|
/></p>
|
|
|
|
<p>Figure 2: Examples of relations ('mappings') spanning between classes from
|
|
the NeuronDB ontology (in the middle) and classes from external
|
|
ontologies.</p>
|
|
|
|
<p></p>
|
|
|
|
<p>Terse rdfs:labels were replaced by more descriptive ones that could be
|
|
better understood without knowledge about context. For example, the
|
|
rdfs:label "Ded" was changed to "Distal part of equivalent dendrite (Ded)".
|
|
Note that, in this case, the original label was also preserved (in brackets),
|
|
because it might still be useful for people that <em>do</em> know about the
|
|
context. </p>
|
|
|
|
<p>The ontology development was moved to a Subversion (SVN) system on a
|
|
central web server. During most of the development, the ontologies were
|
|
simply developed on the client side and were periodically uploaded via FTP.
|
|
Of course this led to problems when more than one person was working on the
|
|
ontologies at a time, and it was also impossible for users of the ontology to
|
|
access previous versions of the ontology, since only the most recent version
|
|
was available on the web site. </p>
|
|
|
|
<p>The namespaces / ontology locations were changed to PURL-based URIs. For
|
|
example, the URI
|
|
<code>http://<strong>neuroweb.med.yale.edu</strong>/senselab/neuron_ontology.owl#Dopamine</code>
|
|
was changed to <a
|
|
href="http://purl.org/ycmi/senselab/neuron_ontology.owl#Dopamine"><code>http://<strong>purl.org/ycmi</strong>/senselab/neuron_ontology.owl#Dopamine</code></a>
|
|
('ycmi' stands for 'Yale Center for Medical Informatics'). PURL-based URIs
|
|
are easier to maintain when server configurations change or (in the worst
|
|
case) the original server is unavailable and the ontologies need to be served
|
|
from a different location. The increased stability of PURLs encourages the
|
|
re-use of entities in ontologies developed by other groups -- which is a key
|
|
factor in the creation of a coherent Semantic Web. </p>
|
|
|
|
<p>A SPARQL endpoint for the SenseLab ontologies was set up using the open
|
|
source version of the Openlink Virtuoso server [<a
|
|
href="#ref-VIRTUOSO">VIRTUOSO</a>]. A SPARQL endpoint is a service that
|
|
allows clients to query a RDF store with the SPARQL query language through
|
|
simple HTTP GET requests. The ontologies were loaded into the triple store of
|
|
the server to make them accessible to SPARQL queries. Each ontology file was
|
|
put into a separate labeled graph, the label of each graph was identical to
|
|
the URL of the ontology file. For example, the ontology located at <a
|
|
href="http://purl.org/ycmi/senselab/neuron_ontology.owl">http://purl.org/ycmi/senselab/neuron_ontology.owl</a>
|
|
was loaded into a graph labeled
|
|
<code>http://purl.org/ycmi/senselab/neuron_ontology.owl</code>. Loading each
|
|
ontology into a separate graph makes it possible to restrict SPARQL queries
|
|
to certain graphs and hence, certain ontologies. This has the advantage that
|
|
queries can be more selective and can be executed with better performance.</p>
|
|
|
|
<h4 id="Outcome1">Outcome</h4>
|
|
|
|
<p>The final products of the project are accessible at <a
|
|
href="http://neuroweb.med.yale.edu/senselab/">http://neuroweb.med.yale.edu/senselab/</a>.
|
|
A SVN repository can be accessed through a web interface at <a
|
|
href="http://neuroweb.med.yale.edu/svn/trunk/ontology/senselab/">http://neuroweb.med.yale.edu/svn/trunk/ontology/senselab/</a>.
|
|
The SPARQL endpoint can be accessed at <a
|
|
href="http://hcls.deri.ie/sparql">http://hcls.deri.ie/sparql</a>. The
|
|
SenseLab OWL ontologies are mentioned as an example of the application of OBO
|
|
ontologies in the article <em>The OBO Foundry: coordinated evolution of
|
|
ontologies to support biomedical data integration</em><em></em> [<a
|
|
href="#ref-OBO-ARTICLE">OBO-ARTICLE</a>]. </p>
|
|
|
|
<h2 id="advantages">Advantages</h2>
|
|
|
|
<p>We experienced the following advantages from using RDF/OWL:</p>
|
|
<ul>
|
|
<li>The use of OWL significantly eased the integration of SenseLab data
|
|
with ontologies developed by other projects. OWL-based data integration
|
|
does not require the development and maintenance of central mediators,
|
|
reducing development and maintenance costs. The ontology integration can
|
|
be accomplished by creating meaningful relations between entities in
|
|
distributed ontologies.<br />
|
|
</li>
|
|
<li>Ontologies can be modularized; dependencies between ontologies can be
|
|
made explicit through 'owl:imports' statements. This makes distributed
|
|
development of ontology modules feasible and encourages the re-use of
|
|
selected ontology modules by other groups.<br />
|
|
</li>
|
|
<li>Good OWL ontologies are self-descriptive because every entity can be
|
|
annotated with text.</li>
|
|
<li>Reasoners can be used to identify errors and real (i.e., conscious)
|
|
contradictions in submitted data sets. You might find more errors and
|
|
contradictions than you expected.</li>
|
|
<li>Ontologies can be used to directly represent biological reality without
|
|
introducing unnecessary abstractions such as database tables, data
|
|
dictionaries, and documents.</li>
|
|
</ul>
|
|
|
|
<h2 id="disadvantages">Disadvantages</h2>
|
|
|
|
<p>We experienced the following problems while using RDF/OWL:</p>
|
|
<ul>
|
|
<li>The open-source ontology editors used for this project (conducted in
|
|
2007) were relatively unreliable. A lot of time was spent with steering
|
|
around software bugs that caused instability of the software and errors
|
|
in the generated RDF/OWL. Future versions of freely available editors or
|
|
currently available commercial ontology editors might be preferable. </li>
|
|
<li>Descriptions of OWL classes and their relations (i.e., OWL property
|
|
restrictions) result in very complex and unintuitive RDF graphs. This
|
|
makes it hard to generate them automatically, or use SPARQL to query such
|
|
ontologies. </li>
|
|
<li>Current reasoners can still have performance problems when checking /
|
|
classifying complex OWL ontologies. </li>
|
|
<li>The RDF/XML serialisation of RDF is not very easy to work with. It is
|
|
often a source of errors. </li>
|
|
</ul>
|
|
|
|
<h2 id="future">Future directions and plans</h2>
|
|
|
|
<p>The SenseLab ontologies will be further integrated with other
|
|
neuroscientific and biomedical ontologies. User friendly applications will be
|
|
developed to query a multitude of interrelated ontologies in a scientifically
|
|
meaningful way. To this end, we have implemented a prototype Web application
|
|
called 'Entrez Neuron' that allows the user to query data across multiple
|
|
sources based on key words. The user can browse the query results and
|
|
retrieve more detailed information about neurons based on a
|
|
'brain-anatomy/neuron' view. A paper describing this application was
|
|
published in the <a href="http://esw.w3.org/topic/HCLS/WWW2008">WWW/HCLS2008
|
|
workshop</a>. Currently, we are expanding this application to include more
|
|
views and features.</p>
|
|
|
|
<h2 id="suggestions">Suggestions based on our experiences</h2>
|
|
|
|
<p>Based on our experiences we can make the following suggestions for other
|
|
projects that have similar goals:</p>
|
|
<ul>
|
|
<li>Try to create consistent OWL DL ontologies. Pure RDF(S) without OWL
|
|
constructs is not much simpler than OWL DL and often leads to the
|
|
creation of too many properties because pure RDF(S) does not support
|
|
property restrictions. </li>
|
|
<li>Try to re-use entities and properties from existing ontologies where
|
|
possible. </li>
|
|
<li>If you do not want to import another ontology in its entirety (e.g.
|
|
because it would be too large, too buggy or would introduce unnecessary
|
|
constructs), you can still 'copy & paste' portions of the ontology
|
|
into your own. </li>
|
|
<li>Try to base your ontology on a foundational ontology like BFO, OBO
|
|
Relation Ontology or DOLCE [<a href="#ref-DOL">DOL</a>]. </li>
|
|
<li>Where possible use the rdfs:label property to give clear,
|
|
understandable labels to each entity and property in the ontology. Try to
|
|
formulate labels in a way that makes them understandable without too much
|
|
additional context (e.g. a certain user interface). </li>
|
|
<li>Where possible, give concise rdfs:comments. </li>
|
|
<li>Make a habit out of running your ontology through the RDF validator [<a
|
|
href="#ref-RDF-VALID">RDF-VALID</a>] periodically, especially when you
|
|
create RDF/XML with scripts that you wrote yourself. Keep in mind that
|
|
the RDF validator does not throw an error message when URIs contain blank
|
|
spaces. Blank spaces in URIs are problematic for many Semantic Web
|
|
applications, so try to make sure that your URIs do not contain blank
|
|
spaces. </li>
|
|
<li>Check the consistency of your OWL ontology periodically. We used the
|
|
Pellet reasoner [<a href="#ref-PELLET">PELLET</a>], which seems to be the
|
|
best choice at the moment. </li>
|
|
<li>Use purl.org URIs for your ontologies. You can easily register a
|
|
sub-domain at purl.org free of charge. </li>
|
|
<li>If you write a program that generates RDF/OWL, do <strong>not</strong>
|
|
try to write RDF/XML code directly. RDF/XML is relatively complicated and
|
|
messy, and it is very easy to produce syntactic or even semantic errors
|
|
because of that. So if you write a program that generates RDF, use a RDF
|
|
or OWL API for writing triples. If that is not possible, generate your
|
|
RDF in the much simpler TURTLE syntax instead of RDF/XML. The TURTLE
|
|
syntax is a subset of the N3 syntax [<a href="#ref-n3">N3</a>]. You can
|
|
save the resulting RDF in TURTLE format to a text file. If you need
|
|
RDF/XML for another application, you can convert the TURTLE to RDF/XML in
|
|
a second step. </li>
|
|
</ul>
|
|
|
|
<h2 id="conclusion">Conclusion</h2>
|
|
|
|
<p>We experienced clear benefits from using Semantic Web technologies for the
|
|
integration of SenseLab data with other neuroscientific data in a consistent,
|
|
flexible and decentralised manner. The main obstacle in our work was the lack
|
|
of mature and scalable open source software for editing the complex,
|
|
expressive ontologies we were dealing with. Since the quality of these tools
|
|
is rapidly improving, this may cease to be an issue in the near future. The
|
|
detailed analysis of the experiences with the SenseLab ontologies and other
|
|
complex biomedical ontologies may help drive the improvement of current
|
|
ontology editors.</p>
|
|
|
|
<h2 id="references">References</h2>
|
|
<dl>
|
|
<dt><a name="ref-EAV-CR" id="ref-EAV-CR"></a>[EAV-CR]</dt>
|
|
<dd><i>L. Marenco, N. Tosches, C. Crasto, G. Shepherd, P.L. Millera and
|
|
P.M. Nadkarni, Achieving evolvable Web-database bioscience applications
|
|
using the EAV/CR framework: recent advances, J Am Med Inform Assoc.
|
|
(2003) 10(5):444-53</i> </dd>
|
|
<dt><a name="ref-SENSELAB-WEB" id="ref-SENSELAB-WEB"></a>[SENSELAB-WEB]</dt>
|
|
<dd><i><a href="">SenseLab database</a></i>,
|
|
http://senselab.med.yale.edu/</dd>
|
|
<dt><a name="ref-SENSELAB-SW" id="ref-SENSELAB-SW"></a>[SENSELAB-SW]</dt>
|
|
<dd><i><a href="http://neuroweb.med.yale.edu/senselab/">SenseLab Semantic
|
|
Web Development</a></i>, http://neuroweb.med.yale.edu/senselab/ </dd>
|
|
<dt><a name="ref-PROTEGE" id="ref-PROTEGE"></a>[PROTEGE]</dt>
|
|
<dd><i><a href="http://protege.stanford.edu/">The Protege Ontology Editor
|
|
and Knowledge Acquisition System</a></i>, http://protege.stanford.edu/
|
|
</dd>
|
|
<dt><a name="ref-TOPBRAID" id="ref-TOPBRAID"></a>[TOPBRAID]</dt>
|
|
<dd><i><a href=""></a></i><em><a
|
|
href="http://www.topbraidcomposer.org/">TopBraid Composer</a></em>,
|
|
http://www.topbraidcomposer.org/ </dd>
|
|
<dt><a name="ref-RO" id="ref-RO"></a>[RO]</dt>
|
|
<dd><i><a href="http://www.obofoundry.org/ro/">Relation Ontology</a></i>,
|
|
http://www.obofoundry.org/ro/ </dd>
|
|
<dt><a name="ref-OBO" id="ref-OBO"></a>[OBO]</dt>
|
|
<dd><i><a href="http://obofoundry.org">The Open Biomedical
|
|
Ontologies</a></i>, http://obofoundry.org/</dd>
|
|
<dt><a name="ref-BFO" id="ref-BFO"></a>[BFO]</dt>
|
|
<dd><i><a href="http://www.ifomis.uni-saarland.de/bfo/">Basic Formal
|
|
Ontology (BFO)</a></i>, http://www.ifomis.uni-saarland.de/bfo/</dd>
|
|
<dt><a name="ref-SC-URI" id="ref-SC-URI"></a>[SC-URI]</dt>
|
|
<dd><i><a
|
|
href="http://sw.neurocommons.org/2007/uri-explanation.html">Explanation
|
|
of HCLS and Science Commons URIs</a></i>,
|
|
http://sw.neurocommons.org/2007/uri-explanation.html</dd>
|
|
<dt><a name="ref-BAMS" id="ref-BAMS"></a>[BAMS]</dt>
|
|
<dd><i><a href="http://brancusi.usc.edu/bkms/">The Brain Architecture
|
|
Management System</a></i>, http://brancusi.usc.edu/bkms/ </dd>
|
|
<dt><a name="ref-SAO" id="ref-SAO"></a>[SAO]</dt>
|
|
<dd><i><a href="http://ccdb.ucsd.edu/CCDBWebSite/sao.html">CCDB
|
|
Subcellular Anatomy Ontology</a></i>,
|
|
http://ccdb.ucsd.edu/CCDBWebSite/sao.html</dd>
|
|
<dt>[CARO]</dt>
|
|
<dd><i><a
|
|
href="http://www.obofoundry.org/cgi-bin/detail.cgi?id=caro">Common
|
|
Anatomy Reference Ontology </a></i>,
|
|
http://www.obofoundry.org/cgi-bin/detail.cgi?id=caro </dd>
|
|
<dt><a name="ref-BIRNLEX" id="ref-BIRNLEX"></a>[BIRNLEX]</dt>
|
|
<dd><i><a href="">BIRNLex Ontology Documentation</a></i>,
|
|
http://fireball.drexelmed.edu/birnlex/OWLdocs/ </dd>
|
|
<dt><a name="ref-GO" id="ref-GO"></a>[GO]</dt>
|
|
<dd><i><a href="http://geneontology.org/">Gene Ontology</a></i>,
|
|
http://geneontology.org/</dd>
|
|
<dt><a name="ref-OBI" id="ref-OBI"></a>[OBI]</dt>
|
|
<dd><i><a href="http://obi.sourceforge.net/">Ontology of Biomedical
|
|
Investigation</a></i>, http://obi.sourceforge.net/ </dd>
|
|
<dt><a name="ref-VIRTUOSO" id="ref-VIRTUOSO"></a>[VIRTUOSO]</dt>
|
|
<dd><i><a href="http://virtuoso.openlinksw.com/">OpenLink Universal
|
|
Integration Middleware - Virtuoso Product Family</a></i>,
|
|
http://virtuoso.openlinksw.com/ </dd>
|
|
<dt><a name="ref-OBO-ARTICLE" id="ref-OBO-ARTICLE"></a>[OBO-ARTICLE]</dt>
|
|
<dd><i>The OBO Foundry: coordinated evolution of ontologies to support
|
|
biomedical data integration</i>, Barry Smith, Michael Ashburner,
|
|
Cornelius Rosse, Jonathan Bard, William Bug, Werner Ceusters <em>et
|
|
al.</em>, Nature Biotechnology 25, 1251 - 1255, 2007,
|
|
http://dx.doi.org/10.1038/nbt1346 </dd>
|
|
<dt><a name="ref-DOL" id="ref-DOL"></a>[DOL]</dt>
|
|
<dd><i><a href="http://www.loa-cnr.it/DOLCE.html">DOLCE Ontology</a></i>,
|
|
http://www.loa-cnr.it/DOLCE.html </dd>
|
|
<dt><a name="ref-RDF-VALID" id="ref-RDF-VALID"></a>[RDF-VALID]</dt>
|
|
<dd><i><a href="http://www.w3.org/RDF/Validator/">RDF Validator</a></i>,
|
|
http://www.w3.org/RDF/Validator/</dd>
|
|
<dt><a name="ref-PELLET" id="ref-PELLET"></a>[PELLET]</dt>
|
|
<dd><i><a href="http://pellet.owldl.org/">The PELLET Open Source OWL DL
|
|
Reasoner</a></i>, http://pellet.owldl.org/ </dd>
|
|
<dt><a name="ref-SMITH-2004" id="ref-SMITH-2004"></a>[SMITH-2004]</dt>
|
|
<dd><em>Beyond Concepts: Ontology as Reality Representation</em>, Barry
|
|
Smith, iin A. Varzi, L. Vieu, eds., Proceedings of FOIS (IOS Press,
|
|
Amsterdam, 2004) 319-330. <a
|
|
href="http://ontology.buffalo.edu/bfo/BeyondConcepts.pdf">http://ontology.buffalo.edu/bfo/BeyondConcepts.pdf</a></dd>
|
|
<dt><a name="ref-kb" id="ref-kb"></a>[KB]</dt>
|
|
<dd><i><a href="../NOTE-hcls-kb-20080604/">A Prototype Knowledge Base for the Life Sciences</a></i>,
|
|
http://www.w3.org/TR/2008/NOTE-hcls-kb-20080604/ </dd>
|
|
<dt><a name="ref-n3" id="ref-n3"></a>[N3]</dt>
|
|
<dd><i><a href="http://www.w3.org/2000/10/swap/Primer">Primer: Getting
|
|
into RDF and Semantic Web using N3</a></i>,
|
|
http://www.w3.org/2000/10/swap/Primer </dd>
|
|
<dt><a name="ref-OWL-Overview" id="ref-OWL-Overview"></a>[OWL Overview]</dt>
|
|
<dd><i><a href="http://www.w3.org/TR/2004/REC-owl-features-20040210/">OWL
|
|
Web Ontology Language Overview</a></i>, Deborah L. McGuinness and Frank
|
|
van Harmelen, Editors, W3C Recommendation, 10 February 2004,
|
|
http://www.w3.org/TR/2004/REC-owl-features-20040210/ . <a
|
|
href="http://www.w3.org/TR/owl-features/">Latest version</a> available
|
|
at http://www.w3.org/TR/owl-features/ </dd>
|
|
<dt><a id="ref-RDF" name="ref-RDF">[RDFS]</a></dt>
|
|
<dd><a href="http://www.w3.org/TR/2004/REC-rdf-schema-20040210/">RDF
|
|
Vocabulary Description Language 1.0: RDF Schema </a>, Dan Brickley and
|
|
R.V. Guha, Editors. W3C Recommendation, 10 February 2004,<br />
|
|
http://www.w3.org/TR/2004/REC-rdf-schema-20040210/ .<br />
|
|
<a href="http://www.w3.org/TR/rdf-schema/">Latest version</a> available
|
|
at http://www.w3.org/TR/rdf-schema/. </dd>
|
|
</dl>
|
|
|
|
<h2 id="Acknowledg">Acknowledgements (Informative)</h2>
|
|
|
|
<p>Thanks to Huajun Chen and Ernest Lim who contributed to the SenseLab
|
|
conversion. Thanks to Gordon Shepherd, Perry Miller, Luis Marenco and Tom
|
|
Morse for their input, suggestions and support. Thanks to Susie Stephens for
|
|
her detailed suggestions for improving this document. Thanks to Alan
|
|
Ruttenberg for his technical suggestions during the conversion process.
|
|
Thanks to Eric Prud'hommeaux for technical advice and assistance on the
|
|
creation of this document.</p>
|
|
</body>
|
|
</html>
|