Updated for Sesame release 1.2.6
Copyright © 2002-2006 Aduna B.V., Sirma AI Ltd.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.1 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in GNU Free Documentation License.
Table of Contents
List of Figures
List of Tables
When they were out of sight Ali Baba came down, and, going up to the rock, said, "Open, Sesame." The door at once opened, and Ali Baba, entering, found himself in a large cave, lighted from a hole in the top, and full of all kinds of treasure--rich silks and carpets, gold and silver ware, and great bags of money. He loaded his three asses with as many of the bags of gold as they could carry; and, after closing the door by saying, "Shut, Sesame," made his way home.
--Tales of 1001 Nights
In February 2000 the European IST project On-To-Knowledge kicked off. The goal of this project was to provide tools and a methodology for “content-driven knowledge management through evolving ontologies”.
In this project, the Dutch company Aduna (then known as Aidministrator Nederland b.v.) developed Sesame. Sesame fullfills the role of storage and retrieval middleware for ontologies and metadata expressed in RDF and RDF Schema. Another tool developed in On-To-Knowledge is OMM, the Ontology Middleware Module, which was developed by OntoText. OMM is an extension of Sesame that adds features such as change tracking and improved security.
Currently, Sesame and OMM are being further developed as an open source software product, by Aduna in cooperation with and partially funded by the NLNet Foundation, and by OntoText. The goal is to provide a stable, efficient and scalable middleware platform for storing, retrieving, manipulating and managing ontologies and metadata stored in RDF, RDF Schema and more expressive languages like OWL.
We aren't there yet, but it's looking good, we hope. This document is here to provide you, the Sesame user, with helpful information on how to deploy Sesame in various contexts, such as a database add-on in a client-server setting, or as a Java library to add functionality to stand-alone applications.
We hope this document will get you started, and of course we hope that you find Sesame easy to use and, well, good. Being an open source product in development also means that we are very keen on receiving feedback from our users. If you have questions, comments, if you think something is wrong with Sesame, or you have a good idea on how to improve it, please let us know. Contact us through the forums and/or issue tracker that are available on the Sesame website: www.openrdf.org.
We wish to conclude with a big thank you to all of you who have been (and indeed still are) supportive of this project, in particular Teus Hagen, Wytze van der Raay, Frank van Harmelen, Andy Seaborne, Peter Mika and Jacco van Ossenbruggen. Special thanks go to Holger Lausen for providing the Oracle implementation of the RDF Sail.
The Sesame and OMM development teams.
Table of Contents
Sesame is an open source Java framework for storing, querying and reasoning with RDF and RDF Schema. It can be used as a database for RDF and RDF Schema, or as a Java library for applications that need to work with RDF internally. For example, suppose you need to read a big RDF file, find the relevant information for your application, and use that information. Sesame provides you with the necessary tools to parse, interpret, query and store all this information, embedded in your own application if you want, or, if you prefer, in a seperate database or even on a remote server. More generally: Sesame provides application developers a toolbox that contains useful hammers, screwdrivers etc. for doing 'Do-It-Yourself' with RDF.
In the next sections, we will take a closer look at Sesame.
The Sesame library consists of a set of java archives:
These archives (which are located in the lib/ directory) contain Java classes ready for use in your own application. In Chapter 7, The Sesame API, you can find instructions and examples on how to use Sesame in your own code: how to do queries, how to add and remove data, etc.
Sesame can be used as a Server with which client applications (or human users) can communicate over HTTP (see Figure 1.1, “Sesame Server”). Sesame can be deployed as a Java Servlet Application in Apache Tomcat, a webserver that supports Java Servlets and JSP technology.
In Chapter 2, Installing Sesame, you will find detailed information on how to install Sesame as a server.
A central concept in the Sesame framework is the repository. A repository is a storage container for RDF. This can simply mean a Java object (or set of Java objects) in memory, or it can mean a relational database. Whatever way of storage is chosen however, it is important to realize that almost every operation in Sesame happens with respect to a repository: when you add RDF data, you add it to a repository. When you do a query, you query a particular repository.
Sesame, as mentioned, supports RDF Schema inferencing. This means that given a set of RDF and/or RDF Schema, Sesame can find the implicit information in the data. Sesame supports this by simply adding all implicit information to the repository as well when data is being added.
It is important to realize that inferencing in Sesame is associated with the type of repository that you use. Sesame supports several different types of repositories (see Chapter 4, Advanced repository configuration for details). Some of these support inferencing, others do not. Whether you want Sesame to do inferencing for you is a choice that depends very much on your application.
In Figure 1.2, “The Sesame architecture” an overview of Sesame's overall architecture is given.
Starting at the bottom, the Storage And Inference Layer, or SAIL API, is an internal Sesame API that abstracts from the storage format used (i.e. whether the data is stored in an RDBMS, in memory, or in files, for example), and provides reasoning support. SAIL implementations can also be stacked on top of each other, to provide functionality such as caching or concurrent access handling. Each Sesame repository has its own SAIL object to represent it.
On top of the SAIL, we find Sesame's functional modules, such as the SeRQL, RQL and RDQL query engines, the admin module, and RDF export. Access to these functional modules is available through Sesame's Access APIs, consisting of two seperate parts: the Repository API and the Graph API. The Repository API provides high-level access to Sesame repositories, such as querying, storing of rdf files, extracting RDF, etc. The Graph API provides more fine-grained support for RDF manipulation, such as adding and removing individual statements, and creation of small RDF models directly from code. The two APIs complement each other in functionality, and are in practice often used together.
The Access APIs provide direct access to Sesame's functional modules, either to a client program (for example, a desktop application that uses Sesame as a library), or to the next component of Sesame's architecture, the Sesame server. This is a component that provides HTTP-based access to Sesame's APIs. Then, on the remote HTTP client side, we again find the access APIs, which can again be used for communicating with Sesame, this time not as a library, but as a server running on a remote location.
While each part of the Sesame code is publicly available and extensible, most developers will be primarily interested in the Access APIs, for communicating with a Sesame RDF model or a Sesame repository from their application. In Chapter 7, The Sesame API, these APIs are described in more detail, through several code examples.
Table of Contents
Sesame can deployed in several ways. The two most common scenarios include deployment as a java library, or deployment as a server. In this chapter, both installation scenarios are explained.
To use Sesame as a library in a java application, one needs the Sesame jar files. These are:
These files can be found in the lib directory of the binary download. Simply including them in your classpath will allow you to use the functionality of Sesame in your own Java application.
Sesame requires Java 2, version 1.4 or newer, to function properly.
If you intend to use client/server communication over HTTP, then you will additionally need the following third-party libraries:
These third-party libraries can be found in the ext/ directory of the source distribution of Sesame, and are also included in the library directory of the Sesame Web Archive (sesame.war). More information about these libraries, including license information, can be found in the file doc/thirdparty.txt.
For more information on how to use Sesame as a library, see Chapter 7, The Sesame API
The Sesame server requires the following software:
Sesame should be able to run on any Java servlet container that supports the Servlet 2.2 and JSP 1.1 specifications, or newer. So far, it has been tested with Tomcat and, in the case of Oracle, with OC4J. It has also been reported to work without problems on BEA WebLogic 8.1 SP2 and Jetty. Please post any reports about compatibility with other servlet containers to our forums.
Sesame has several options for storage of RDF data: it can store data in main-memory (optionally with a dump to file for persistence), it can store data on disk in a dedicated file structure, or it can store data in a relational database. See section More on the RDBMS for more info about the last option.
The following steps describe the easiest procedure to install Sesame on Tomcat 4.x or 5.x. The described procedure doesn't require any reconfiguration of Tomcat itself, but it might not be the best option for you. Please see the documentation that came with your copy of Tomcat if you want more fine-grained control over where and how Sesame should be installed.
Installation of Sesame under Tomcat 3 is almost identical to the procedure described above. It requires one additional step:
Sesame has been tested with OC4J v9.0.3.0. The installed procedure differs from the standard installation procedure.
<web-module id="sesame" path="../../home/applications/sesame"/>
<web-app application="default" name="sesame" root="/sesame"/>
If you have followed the installation instruction described in the previous section, the Sesame server should now be up-and-running. Pointing a browser to the location where you have installed Sesame (e.g. http://[MACHINE_NAME]:8080/sesame/ if you have installed Sesame under Tomcat as described in this document) should now display the Sesame web interface.
You should also test whether the Sesame servlets are running correctly, and whether Sesame can talk to the RDBMS (if applicable). Select one of the repositories that you have configured and press 'Go>>'. Click on the 'Add (www)' link in the toolbar at the top of the window, and try to upload the test.rdf file from Sesame's admin directory (e.g. http://[MACHINE_NAME]:8080/sesame/admin/test.rdf). You can do this by typing this URL in the first text field and by clicking on the 'Add data' button after that.
If the data-upload was successful, you should also be able to extract the uploaded data. Click on the 'Extract' link in the toolbar and press the 'Extract' button. This should yield an RDF document describing all classes and properties in the repository.
Sesame has been successfully installed if all of this works. You can remove or reconfigure the test user account and repository if you want. If you haven't done so already, you can take a look at the next chapter for information on how to add and remove user accounts and repositories to/from Sesame.
Sesame has an RDBMS-Sail that uses an RDBMS for storing RDF data. Currently, this Sail supports PostgreSQL, MySQL, Microsoft SQL Server and Oracle databases. The first two RDBMS's are open source and are freely available on the Internet. Please check the documentation delivered with these databases for any questions on how to get them installed.
Sesame's RDBMS Sail has been extensively tested in combination with MySQL and PostgreSQL. The RDBMS Sail used to perform much better in combination with MySQL than with PostgreSQL, but recent versions of PostgreSQL have shown major performance increases. These two RDBMS's now have comparable performance, with PostgreSQL having an edge over MySQL where it concerns the evaluation of complex queries. The SQL Server- and Oracle support are third-party contribution and we have no comparitive data on their performance.
Note: We do not have first-hand experience with using the RDBMS Sail on Oracle. The comments below are based on feedback that we have received from users.
Table of Contents
Sesame's configuration is specified in the file [SESAME_DIR]/WEB-INF/system.conf. You can edit this configuration file locally on your Sesame server using the Configure Sesame! tool available in [SESAME_DIR]/WEB-INF/bin/. You can start the tool with the command configSesame.bat (on Windows) or configSesame.sh (on UNIX. Note that in UNIX you may have to give the file executable permissions first). Alternatively, you can download Configure Sesame! as a standalone tool from the Sesame project website, and install it on a remote client machine to configure your Sesame server over HTTP.
It is possible to edit the system configuration manually with an XML or text editor, but we do not recommend this.
When you have started Configure Sesame! (Figure 3.1, “Configure Sesame!”), load the file system.conf. If you want to configure a running Sesame server, you can do this by loading the file directly from the server (menu option [File]-->[Load from server...], see Figure 3.2, “Loading the configuration from a running server”). If the server is not running, you can open the file from disk (menu option [File]-->[Open file...]).
If this is a fresh installation of Sesame, the admin password is still the default password. In this case, you will need to enter "admin" as the Current admin password to load the configuration. If this is not a fresh installation of Sesame you will need to enter the admin password that you have specified earlier.
When you have modified the configuration, you can store it on disk (menu option [File]-->[Save file as...]), or you can send it directly to a running Sesame server (menu option [File]-->[Send to server...], see Figure 3.3, “Send a configuration to a running server”). Notice that if you use the "Send to Server" option, the changes to the configuration will be applied to the running server immediately. You do not need to restart or refresh the server separately.
If you have not changed the admin password before, it's probably a good idea to set one right now. To set the admin password, go to the server tab (Figure 3.4, “The Server tab”) and fill in an admin password.
To add a user account to Sesame, perform the following steps:
Open the Tab "Users" (Figure 3.5, “The "Users" configuration tab”). You see an overview of currently configured users.
Click the "Add user" icon on the bottom of the window. A new entry is added to the overview.
Enter the credentials for the new user, hitting Enter to go to the next column.
To remove a user account, simply select the user you wish to remove and click the "Remove user" button in the bottom right of the window.
You can configure existing repositories, add new ones and of course remove repository configuration using Configure Sesame!.
In this section, we introduce the basic use of Configure Sesame! In Chapter 4, Advanced repository configuration we go into the details of repository config parameters.
Open the "Repositories" tab (Figure 3.6, “The "Repository" tab”). Select the repository you wish to configure and click the "Repository details" button on the bottom left.
In the Repository details window (Figure 3.7, “The "Repository details" window”), you can edit several parameters of your repository. The top part of the window shows the Sail stack, the bottom part shows the parameters of the selected Sail in the stack. In most cases, you will only want to edit the parameters of the bottom sail in the stack (for more information about the Sail stack and configuration parameter see Chapter 4, Advanced repository configuration).
The Repository details window also allows you to change the access rights of users to a repository. To change the access rights, click the "Access rights" tab, where you can edit the rights of existing users or add new users (Figure 3.8, “The "access rights" tab”). The entry anonymous is a special entry that represents users who do not log in. Giving this entry access rights means that anybody can access the repository.
The easiest way to add a new repository is to "clone" an existing repository. Use the "Clone" button on the repository tab (Figure 3.6, “The "Repository" tab”) to do this. This creates a copy of the currently selected repository configuration, which you can then edit. Take special care to change details such as the jdbcUrl or file attributes of the Sail (see Chapter 4, Advanced repository configuration).
Alternatively, you can add a new repository by using the "Add" button and filling in all the information by yourself.
To remove a repository, simply open the repository tab (Figure 3.6, “The "Repository" tab”), select a repository and click the "Remove" button on the bottom right.
Table of Contents
When setting up a repository in Sesame, you can make a number of choices: should the repository support versioning or security, or should it be as fast as possible? What database will it use, or will it be in-memory?
In this chapter, we look at several of these configuration options in more detail.
The setup for each Sesame repository is configured using Configure Sesame!. As we have already seen in Server administration, this configuration tool allows tweaking of numerous parameters, which we will discuss in more detail here.
In the repository tab (Figure 3.6, “The "Repository" tab”), the repository id and title are declared. The id is how the repository will be known by Sesame: all client access will need to use this identifier.
The title is for human convenience and can be used to give a short description of the repository's purpose. Clients such as the web interface use it to represent the repository to the end user.
The most important part of the repository configuration is the sail stack, which can be found in the "repository details" screen (Figure 3.7, “The "Repository details" window”). Here, you configure where the actual repository storage resides, whether or not inferencing, security and versioning, etc. should be used, and what additional options are needed.
The sail stack is represented top-to-bottom. In the example, we see two sail declarations: org.openrdf.sesame.sailimpl.sync.SyncRdfSchemaRepository and org.openrdf.sesame.sailimpl.rdbms.RdfSchemaRepository. The first sail is stacked on top of the second one (which means that it operates by calling methods on the Sail underneath it). The second sail is the base sail: it is the lowest of the stack and does not operate on another sail, but directly on the actual data source. In this example, the base sail is an RDF Schema-aware driver for a relational database that supports (currently) MySQL (3.23.47 and higher), PostgreSQL (7.0.2 and higher) and Oracle 9i.
The SyncRdfSchemaRepository is optional, but we strongly recommend using it. This Sail handles concurrent access issues, without it Sesame would behave unpredictably when several users access the repository simultaneously.
Other base sails to choose from include:
All base sails that work on relational databases need a number of parameters to function:
The RDBMS-based sails also take some optional parameters:
The memory-based sails take four optional parameters:
The native sail has one required parameter:
The native sail also has an optional triple-indexes parameter, with which one can specify the indexing strategy the native sail should take. We will explain this in more detail in the next section.
The native store uses B-Trees for indexing statements, where the index key consists of three fields: subject (s), predicate (p) and object (o). The order in which each of these fields is used in the key determines the usability of an index on a specify triple query pattern: searching triples with a specific subject in an index that has the subject as the first field is signifantly faster than searching these same triples in an index where the subject field is second or third. In the worst case, the 'wrong' triple pattern will result in a sequential scan over the entire set of triples.
By default, the native store only uses a single index, with a subject-predicate-object key pattern. However, it is possible to define different indexes for the native store, using the triple-indexes parameter. This can be used to optimize performance for query patterns that occur frequently.
The subject-, predicate- and object fields are represented by the characters 's', 'p' and 'o', respectively. Indexes can be specified by creating 3-letter words from these three characters. Multiple indexes can be specified by separating these words with comma's, spaces and/or tabs. For example, the string "spo, pos" specifies two indexes; a subject-predicate-object index and a predicate-object-subject index.
Of course, creating multiple indexes speeds up querying, but there is a cost factor to take into account as well: adding and removing data will become more expensive, because each index will have to be updated. Also, each index takes up additional disk space.
The native store automatically creates/drops indexes upon (re)initialization, so the parameter can be adjusted and upon the first refresh of the configuration the native store will change its indexing strategy, without loss of data.
The basic set of RDFS inference rules (as defined in the RDF(S) MT semantics) sometimes can be insufficient to build custom applications. For example, in some applications there is a need for defining one's own transitive, symmetric or inverse properties. Providing an infrastructure to define such custom inference rules helps developers to tune the Sesame inferencer so it can suit better in the application.
Since Sesame release 0.95, we provide an alternative inferencer that works with org.openrdf.sesame.sailimpl.rdbms.RdfSchemaRepository SAIL. This custom inferencer can be initialized with a set of axiomatic triples and inference rules defined in an external file. The format of these definitions is very simple and intuitive and it is explained in greater detail in the next section.
Support for inter-rule dependency is also added to the customizable inferencer. Now we can state explicitly which rules are triggered if a rule infers a new statement. This information is given within an additional tag within the 'rule' one - 'triggers_rule'. It consists of several 'rule' tags with a name attribute specifying the rules affected.
The definition file is in XML and should conform to the following DTD:
<!DOCTYPE InferenceRules [
<!ENTITY rdf 'http://www.w3.org/1999/02/22-rdf-syntax-ns#'>
<!ENTITY rdfs 'http://www.w3.org/2000/01/rdf-schema#'>
<!ENTITY daml 'http://www.daml.org/2001/03/daml+oil#'>
<!ELEMENT InferenceRules (axiom | rule)*>
<!ELEMENT axiom (subject, predicate, object)>
<!ELEMENT rule ((premise+, consequent, triggers_rule?) | EMPTY)>
<!ATTLIST rule
name CDATA #REQUIRED>
<!ELEMENT premise (subject, predicate, object)>
<!ELEMENT consequent (subject, predicate, object)>
<!ELEMENT triggers_rule (rule)*>
<!ELEMENT subject EMPTY>
<!ATTLIST subject
var CDATA #IMPLIED
uri CDATA #IMPLIED
pattern CDATA #IMPLIED
escape CDATA #IMPLIED
type (resource) #IMPLIED>
<!ELEMENT predicate EMPTY>
<!ATTLIST predicate
var CDATA #IMPLIED
uri CDATA #IMPLIED
pattern CDATA #IMPLIED
escape CDATA #IMPLIED
type (resource) #IMPLIED>
<!ELEMENT object EMPTY>
<!ATTLIST object
var CDATA #IMPLIED
uri CDATA #IMPLIED
pattern CDATA #IMPLIED
escape CDATA #IMPLIED
type (resource) #IMPLIED>
]>If a 'uri' attribute is present within the 'subject', 'predicate' or 'object' tags, its value is assumed to be a name of a resource.
The value of the 'var' attribute of the above tags gives the name of that variable. This attribute cannot be used within an 'axiom' tag.
For example, here are two of the axiomatic triples, as they are defined in the RDF(S) MT semantics. They appear in the configuration file like this:
<axiom> <subject uri="&rdfs;subPropertyOf"/> <predicate uri="&rdfs;domain"/> <object uri="&rdf;Property"/> </axiom> <axiom> <subject uri="&rdfs;subPropertyOf"/> <predicate uri="&rdfs;range"/> <object uri="&rdf;Property"/> </axiom>
An example of an inference rule (one stating that - if a resource is used as predicate then it is of 'type' 'Property') looks like:
<rule name="rdfs1">
<premise>
<subject var="xxx"/>
<predicate var="aaa"/>
<object var="yyy"/>
</premise>
<consequent>
<subject var="aaa"/>
<predicate uri="&rdf;type"/>
<object uri="&rdf;Property"/>
</consequent>
<triggers_rule>
<rule name="rdfs2" />
<rule name="rdfs3" />
<rule name="rdfs4a" />
<rule name="rdfs5b" />
<rule name="rdfs6" />
<rule name="rdfs9" />
</triggers_rule>
</rule>In the above example 'xxx', 'aaa' and 'yyy' are variables and 'rdf:type' and 'rdf:Property' are exact resource URIs.
A 'pattern' attribute with conjunction with an 'escape' attribute is used to define a pattern for matching resource names. They both can appear only in a triple component denoting variables, e.g. with 'var' attribute specified. Use '?' to denote any single character and '*' to match any character combination with length greater than 0.
Use a character declared in 'escape' attribute to escape '?' or '*' characters within pattern. You need to specify 'pattern' and 'escape' attributes for a given variable only once per rule (note that pattern and escape are used only once for variable 'id'.
An example of rule using pattern matching:
<rule name="rdfsXI">
<premise>
<subject var="xxx"/>
<predicate var="id" pattern="&rdf;_*" escape="\"/>
<object var="yyy"/>
</premise>
<consequent>
<subject var="id"/>
<predicate uri="&rdf;type"/>
<object uri="&rdfs;ContainerMembershipProperty"/>
</consequent>
<triggers_rule>
<rule name="rdfs2" />
<rule name="rdfs3" />
<rule name="rdfs6" />
<rule name="rdfs9" />
<rule name="rdfs10" />
</triggers_rule>
</rule>Note that you can match these triple templates by the values to the variables used in them and the specified resources used as subjects, predicates or objects of a triple.
Consider the property URI is http://somewhere.org#partOf. In our example domain, we wish to ensure that this resource is always inserted in the repository, so we add the axiomatic triple stating that it is a property:
<axiom>
<subject uri="http://somewhere.org#partOf"/>
<predicate uri="&rdf;type"/>
<object uri="&rdf;Property"/>
</axiom>We also wish to define that the property is transitive. To this end, we add a single inference rule:
<rule name="userPartOf">
<premise>
<subject var="xxx"/>
<predicate uri="http://somewhere.org#partOf"/>
<object var="yyy"/>
</premise>
<premise>
<subject var="yyy"/>
<predicate uri="http://somewhere.org#partOf"/>
<object var="zzz"/>
</premise>
<consequent>
<subject var="xxx"/>
<predicate uri="http://somewhere.org#partOf"/>
<object var="zzz"/>
</consequent>
<triggers_rule>
<rule name="rdfs2" />
<rule name="rdfs3" />
<rule name="rdfs6" />
<rule name="userPartOf" />
</triggers_rule>
</rule>If the repository has these two triples: T1 - (finger.1, partOf, Hand.Left) and T2 - (Hand.Left, partOf, Human.1) and if they match the condition (since the same 'yyy' variable is used in both 'premise' tags) T1.object = T2.subject, a triple corresponding to the 'consequent' tag is added to the repository, using the current variable bindings and will have the form TInfer = (T1.subject, partOf, T2.object) e.g. Tinfer=(Finger.1, partOf, Human.1).
The inferencer used by a repository based on org.openrdf.sesame.sailimpl.rdbms.RdfSchemaRepository sail is defined by a parameter passed to it during the initialization. To start using the custom inferencer on a repository, add the following extra parameter to the configuration of that repository:
An example rules file, containing the axioms and entailment rules as specified by the January 23 Working Draft of the RDF Model Theory, can be found in the Sesame source tree, specifically in src/org/openrdf/sesame/sailimpl/rdbms/entailment-rdf-mt-20030123.xml. This file is used per default by the custom inferencer if the rule-file parameter is not specified.
Changes to the rules file do not lead to automatic reapplication of the rules over the existing data in the repository. So clean the repository first to avoid inconsistency problems.
The dependency information used by the TMS system is also affected by the rules. The default inferencer uses dependency database table, that can handle cases where up to two triples leads to the inference of a new one. Since there can exist inference rules involving arbitrary number of 'premise' tags in the configuration file - the structure of the default dependency table cannot handle them. To avoid loss of data, the structure of that table is not altered and it is created only if it not exist. This check is performed during repository initialization phase. So it is better to apply new/modified inference rules on a completely clean datastorage (database).
[This section not yet available. See the documentation at http://www.ontotext.com/omm/ for details.]
Table of Contents
Sesame comes with a Web interface to allow access to repositories through a normal Web browser. In this chapter, we will briefly describe the user interface.
If you installed Sesame according to the guidelines in Chapter 2, Installing Sesame, the Sesame entry page is located at http://[MACHINE_NAME]:8080/sesame/. If you point your browser to this address, it should display the Sesame entry page shown in Figure 5.1, “The Sesame entry page”.
The screen provides you with a choice of repositories to work on. Since you are not yet logged in, and no publicly accessible repositories are available, you get a notification that no repositories can be accessed.
To log in to Sesame, click the "log in" link and provide user name and password (see Figure 5.2, “The Sesame login page”).
After you have logged in, you are returned to the entry page, which now shows that you are logged in and gives you a choice of repositories that you have access on.
After selecting a repository, you come to the actual Sesame function interface (see Figure 5.4, “The repository function screen”).
The toolbar at the top of the screen shows user and repository information, and allows the selection of different actions on the repository. The actions are divided in read actions (such as queries) and write actions (adding and removing data).
The web interface offers three options for adding data to a Sesame repository: Add file, Add (www) and Add (copy-paste).
The Add file and Add (www) options are fairly straightforward. The first option allows you to select an RDF document from your local disk to add to the Sesame repository. The second option allows you to add RDF documents that are accessible through a URL to the repository.
Optionally, you can supply a base URL. The base URL is used by the RDF parser to disambiguate any relative resource references in the RDF document. By default, Sesame uses foo:bar as a base URL, but this may not always be desirable, for example when the file is being from a temporary location, or when resources defined in the document are referenced by other RDF sources with different URLs.
The Add (copy-paste) option allows you to upload data to Sesame by typing (or copying and pasting) it in the text area. The pasted text should be valid RDF/XML document.
Table of Contents
SeRQL revision 1.1 is a syntax revision (see issue tracker item SES-75). This document describes the revised syntax. From Sesame release 1.2-RC1 onwards, the old syntax is no longer supported.
SeRQL revision 1.2 covers a set of new functions and operators:
New operations have been marked with (R1.2) where appropriate in this document.
SeRQL ("Sesame RDF Query Language", pronounced "circle") is a new RDF/RDFS query language that is currently being developed by Aduna as part of Sesame. It combines the best features of other (query) languages (RQL, RDQL, N-Triples, N3) and adds some of its own. This document briefly shows all of these features. After reading through this document one should be able to write SeRQL queries.
Some of SeRQL's most important features are:
URIs and literals are the basic building blocks of RDF. For a query language like SeRQL, variables are added to this list. The following sections will show how to write these down in SeRQL.
Variables are identified by names. These names must start with a letter or an underscore ('_') and can be followed by zero or more letters, numbers, underscores, dashes ('-') or dots ('.'). Examples variable names are:
SeRQL keywords are not allowed to be used as variable names. Currently, the following keywords are used or reserved for future use in SeRQL: select, construct, from, where, using, namespace, true, false, not, and, or, like, label, lang, datatype, null, isresource, isliteral, sort, in, union, intersect, minus, exists, forall, distinct, limit, offset.
Keywords in SeRQL are all case-insensitive, this in contrast to variable names; these are case-sensitive.
There are two ways to write down URIs in SeRQL: either as full URIs or as abbreviated URIs. Full URIs must be surrounded with "<" and ">". Examples of this are:
As URIs tend to be long strings with the first part being shared by several of them (i.e. the namespace), SeRQL allows one to use abbreviated URIs (or QNames) by defining (short) names for these namespaces which are called "prefixes". A QName always starts with one of the defined prefixes and a colon (":"). After this colon, the part of the URI that is not part of the namespace follows. The first part, consisting of the prefix and the colon, is replaced by the full namespace by the query engine. Some example QNames are:
RDF literals consist of three parts: a label, a language tag, and a datatype. The language tag and the datatype are optional and at most one of these two can accompany a label (a literal can not have both a language tag and a datatype). The notation of literals in SeRQL has been modelled after their notation in N-Triples; literals start with the label, which is surrounded by double quotes, optionally followed by a language tag with a "@" prefix or by a datatype URI with a "^^" prefix. Example literals are:
The SeRQL notation for abbreviated URIs can also be used. When the prefix rdf is mapped to the namespace http://www.w3.org/1999/02/22-rdf-syntax-ns#, the last example literal could also have been written down like:
SeRQL has also adopted the character escapes from N-Triples; special characters can be escaped by prefixing them with a backslash. One of the special characters is the double quote. Normally, a double quote would signal the end of a literal's label. If the double quote is part of the label, it needs to be escaped. For example, the sentence John said: "Hi!" can be encoded in a SeRQL literals as: "John said: \"Hi!\"".
As the backslash is a special character itself, it also needs to be escaped. To encode a single backslash in a literal's label, two backslashes need to be written in the label. For example, a Windows directory would be encoded as: "C:\\Program Files\\Apache Tomcat\\".
SeRQL has functions for extracting each of the three parts of a literal. These functions are label, lang, and datatype. label("foo"@en) extracts the label "foo", lang("foo"@en) extracts the language tag "en", and datatype("foo"^^rdf:XMLLiteral) extracts the datatype rdf:XMLLiteral. The use of these functions is explained later.
RDF has a notion of blank nodes. These are nodes in the RDF graph that are not labeled with a URI or a literal. The interpretation of such blank nodes is as a form of existential quantification: it allows one to assert that "there exists a node such that..." without specifying what that particular node is. Blank nodes do in fact often have identifiers, but these identifiers are assigned internally by whatever processor is processing the graph and they are only valid in the local context, not as global identifiers (unlike URIs).
Strictly speaking blank nodes are only addressable indirectly, by querying for one or more properties of the node. However, SeRQL, as a practical shortcut, allows blank node identifiers to be used in queries. The syntax for blank nodes is adopted from N-Triples, using a QName-like syntax with "_" as the namespace prefix, and the internal blank node identifier as the local name. For example:
This identifies the blank node with internal identifier "bnode1". These blank node identifiers can be used in the same way that normal URIs or QNames can be used.
Caution: It is important to realize that addressing blank nodes in this way makes SeRQL queries non-portable across repositories. There is no guarantee that in two repositories, even if they contain identical datasets, the blank node identifiers will be identical. It may well be that "bnode1" in repository A is a completely different blank node than "bnode1" in repository B. Even in the same repository, it is not guaranteed that blank node identifiers are stable over updates: if certain statements are added to or removed from a repository, it is not guaranteed "bnode1" still identifies the same blank node that it did before the update operation.
One of the most prominent parts of SeRQL are path expressions. Path expressions are expressions that match specific paths through an RDF graph. Most current RDF query languages allow you to define path expressions of length 1, which can be used to find (combinations of) triples in an RDF graph. SeRQL, like RQL, allows you to define path expressions of arbitrary length.
Imagine that we want to query an RDF graph for persons who work for companies that are IT companies. Querying for this information comes down to finding the following pattern in the RDF graph (gray nodes denote variables):
The SeRQL notation for path expressions resembles the picture above; it is written down as:
{Person} ex:worksFor {Company} rdf:type {ex:ITCompany}The parts surrounded by curly brackets represent the nodes in the RDF graph, the parts between these nodes represent the edges in the graph. The direction of the arcs (properties) in SeRQL path expressions is always from left to right.
In SeRQL queries, multiple path expressions can be specified by seperating them with commas. For example, the path expression show before can also be written down as two smaller path expressions:
{Person} ex:worksFor {Company},
{Company} rdf:type {ex:ITCompany}The nodes and edges in the path expressions can be variables, URIs and literals. Also, a node can be left empty in case one is not interested in the value of that node. Here are some more example path expressions to illustrate this:
Each and every path can be constructed using a set of basic path expressions. Sometimes, however, it is nicer to use one of the available short cuts. There are three types of short cuts, all of them are explained below.
In situations where one wants to query for two or more triples with identical subject and predicate, the subject and predicate do not have to be repeated over and over again. Instead, a multi-value node can be used:
{subj1} pred1 {obj1, obj2, obj3}A built-in constraint on this construction is that each value for the variables in the multi-value node is unique (i.e. they are pairwise disjoint). Therefore, this path expression is equivalent to the following combination of path expressions and boolean constraints:
FROM
{subj1} pred1 {obj1},
{subj1} pred1 {obj2},
{subj1} pred1 {obj3}
WHERE obj1 != obj2 AND obj1 != obj3 AND obj2 != obj3Or graphically:
Multi-value nodes can also be used when statements share the predicate and object, e.g.:
{subj1, subj2, subj3} pred1 {obj1}When used in a longer path expression, multi-value nodes apply to both the part left of the node and the part right of the node. The following path expression:
{first} pred1 {middle1, middle2} pred2 {last}matches the following graph:
When using variables in multi-value nodes, a constraint on its values is implicitly added: the variable's value is not allowed to be equal to any other value in the multi-value node. So, in the first example, the variables obj1, obj2 and obj3 will not match identical values at the same time. This prevents the path from matching a single triple three times.
One of the shorts cuts that is likely going to be used most, is the notation for branches in path expressions. There are lots of situations where one wants to query multiple properties of a single subject. Instead of repeating the subject over and over again, one can use a semi-colon to attach a predicate-object combination to the subject of the last part of a path expression, e.g.:
{subj1} pred1 {obj1};
pred2 {obj2}Which is equivalent to:
{subj1} pred1 {obj1},
{subj1} pred2 {obj2}Or graphically:
Or a slightly more complicated example:
{first} pred {} pred1 {obj1};
pred2 {obj2} pred3 {obj3}Which matches the following graph:
Note that an anonymous variable is used in the middle of the path expressions.
The last short cut is a short cut for reified statements. A path expression representing a single statement (i.e. {node} edge {node}) can be written between the curly brackets of a node, e.g.:
{ {reifSubj} reifPred {reifObj} } pred {obj}This would be equivalent to querying (using "rdf:" as a prefix for the RDF namespace, and "_Statement" as a variable for storing the statement's URI):
{_Statement} rdf:type {rdf:Statement},
{_Statement} rdf:subject {reifSubj},
{_Statement} rdf:predicate {reifPred},
{_Statement} rdf:object {reifObj},
{_Statement} pred {obj}Again, graphically:
Optional path expressions differ from 'normal' path expressions in that they do not have to be matched to find query results. The SeRQL query engine will try to find paths in the RDF graph matching the path expression, but when it cannot find any paths it will skip the expression and leave any variables in it uninstantiated (they will have the value null).
Consider an RDF graph that contains information about people that have names, ages, and optionally e-mail addresses. This is a situation that is likely to be very common in RDF data. A logical query on this data is a query that yields all names, ages and, when available, e-mail addresses of people, e.g.:
{Person} ex:name {Name};
ex:age {Age};
ex:email {EmailAddress}However, using normal path expressions like in the query above, people without e-mail address will not be returned by the SeRQL query engine. With optional path expressions, one can indicate that a specific (part of a) path expression is optional. This is done using square brackets, i.e.:
{Person} ex:name {Name};
ex:age {Age};
[ex:email {EmailAddress}]Or alternatively:
{Person} ex:name {Name};
ex:age {Age},
[{Person} ex:email {EmailAddress}]In contrast to the first path expressions, this expression will also match with people without an e-mail address. For these people, the variable EmailAddress will not be assigned a value.
Optional path expressions can also be nested. This is useful in situations where the existence of a specific path is dependent on the existence of another path. For example, the following path expression queries for the titles of all known documents and, if the author of the document is known, the name of the author (if it is known) and his e-mail address (if it is known):
{Document} ex:title {Title};
[ex:author {Author} [ex:name {Name}];
[ex:email {Email}]]With this path expression, the SeRQL query engine will not try to find the name and e-mail address of an author when it cannot even find the resource representing the author.
The SeRQL query language supports two querying concepts. The first one can be characterized as returning a table of values, or a set of variable-value bindings. The second one returns a true RDF graph, which can be a subgraph of the graph being queried, or a graph containing information that is derived from it. The first type of queries are called "select queries", the second type of queries are called "construct queries".
A SeRQL query is typically built up from one to six clauses. For select queries these clauses are: SELECT, FROM, WHERE, LIMIT, OFFSET and USING NAMESPACE. One might recognize the first five clauses from SQL, but their usage is slightly different. For construct queries the clauses are the same with the exception of the first; construct queries start with a CONSTRUCT clause instead of a SELECT clause.
The first clause (i.e. SELECT or CONSTRUCT) determines what is done with the results that are found. In a SELECT clause, one can specify which variable values should be returned and in what order. In a CONSTRUCT clause, one can specify which triples should be returned.
The FROM clause is optional and always contains path expressions, which were explained in the previous section. It defines the paths in an RDF graph that are relevant to the query. Note that when the FROM clause is not specified, the query will simply return the constants specified in the SELECT or CONSTRUCT clause.
The WHERE clause is optional and can contain additional (Boolean) constraints on the values in the path expressions. These are constraints on the nodes and edges of the paths, which cannot be expressed in the path expressions themselves.
The LIMIT and OFFSET clauses are also optional. These clauses can be used separately or combined in order to get a subset of all query answers. Their usage is very similar to the LIMIT and OFFSET clauses in SQL queries. The LIMIT clause determines the (maximum) amount of query answers that will be returned. The OFFSET clause determines which query answer will be returned as the first result, skipping as many query results as specified in this clause.
Finally, the USING NAMESPACE clause is also optional and it can contain namespace declarations; these are the mappings from prefixes to namespaces that were referred to in one of previous sections about (abbreviated) URIs.
The WHERE, LIMIT, OFFSET and USING NAMESPACE clauses will be explained in one of the next sections. The following section will explain the SELECT and FROM clause.
As said before, select queries return tables of values, or sets of variable-value bindings. Which values are returned can be specified in the select clause. One can specify variables and/or values in the select clause, seperated by commas. The following example query returns all URIs of classes:
SELECT C
FROM {C} rdf:type {rdfs:Class}It is also possible to use a '*' in the SELECT clause. In that case, all variable values will be returned in the order in which they appear in the query, e.g.:
SELECT *
FROM {S} rdfs:label {O}This query will return the values of the variables S and O, in that order. If a different order is preferred, one needs to specify the variables in the select clause, e.g.:
SELECT O, S
FROM {S} rdfs:label {O}By default, the results of a select query are not filtered for duplicate rows. Because of the nature of the above queries, these queries will never return duplicates. However, more complex queries might result in duplicate result rows. These duplicates can be filtered out by the SeRQL query engine. To enable this functionality, one needs to specify the DISTINCT keyword after the select keyword. For example:
SELECT DISTINCT *
FROM {Country1} ex:borders {} ex:borders {Country2}
USING NAMESPACE
ex = <http://example.org/things#>Construct queries return RDF graphs as set of triples. The triples that a query should return can be specified in the construct clause using the previously explained path expressions. The following is an example construct query:
CONSTRUCT {Parent} ex:hasChild {Child}
FROM {Child} ex:hasParent {Parent}
USING NAMESPACE
ex = <http://example.org/things#>This query defines the inverse of the property foo:hasParent to be foo:hasChild. This is just one example of a query that produces information that is derived from the original information. Here is one more example:
CONSTRUCT
{Artist} rdf:type {ex:Painter};
ex:hasPainted {Painting}
FROM
{Artist} rdf:type {ex:Artist};
ex:hasCreated {Painting} rdf:type {ex:Painting}
USING NAMESPACE
ex = <http://example.org/things#>This query derives that an artist who has created a painting, is a painter. The relation between the painter and the painting is modelled to be art:hasPainted.
Instead of specifying a path expression in the CONSTRUCT clause, one can also use a '*'. In that case, the CONSTRUCT clause is identical to the FROM clause. This allows one to extract a subgraph from a larger graph, e.g.:
CONSTRUCT *
FROM {SUB} rdfs:subClassOf {SUPER}This query extracts all rdfs:subClassOf relations from an RDF graph.
Just like with select queries, the results of a construct query are not filtered for duplicate triples by default. Again, these duplicates are filtered out by the SeRQL query engine if the DISTINCT keyword is specified after the construct keyword, for example:
CONSTRUCT DISTINCT
{Artist} rdf:type {ex:Painter}
FROM
{Artist} rdf:type {ex:Artist};
ex:hasCreated {} rdf:type {ex:Painting}
USING NAMESPACE
ex = <http://example.org/things#>The third clause in a query is the WHERE clause. This is an optional clause in which one can specify Boolean constraints on variables.
The following sections will explain the available Boolean expressions for use in the WHERE clause. Section 6.8.8, “Nested WHERE clauses (R1.2)” will explain how WHERE clauses can be nested inside optional path expressions.
There are two Boolean constants, TRUE and FALSE. The first one is simply always true, the last one is always false. The following query will never produce any results because the constraint in the where clause will never evaluate to true:
SELECT *
FROM {X} Y {Z}
WHERE FALSEThe most common boolean constraint is equality or inequality of values. Values can be compared using the operators "=" (equality) and "!=" (inequality). The expression
Var = <foo:bar>
is true if the variable Var contains the URI <foo:bar>, and the expression
Var1 != Var2
checks whether two variables are not equal.
Numbers can be compared to each other using the operators "<" (lower than), "<=" (lower than or equal to), ">" (greater than) and ">=" (greater than or equal to). SeRQL uses a literal's datatype to determine whether its value is numerical. All XML Schema built-in numerical datatypes are supported, i.e.: xsd:float, xsd:double, xsd:decimal and all subtypes of xsd:decimal (xsd:long, xsd:nonPositiveInteger, xsd:byte, etc.), where the prefix xsd is used to reference the XML Schema namespace.
In the following query, a comparison between values of type xsd:positiveInteger is used to retrieve all countries that have a population of less than 1 million:
SELECT Country
FROM {Country} ex:population {Population}
WHERE Population < "1000000"^^xsd:positiveInteger
USING NAMESPACE
ex = <http://example.org/things#>SeRQL is currently restricted to numerical comparisons between values with identical datatypes. This means that e.g. xsd:int values cannot (yet) be compared to xsd:byte values.
If only one of the parameters of a comparison has a datatype, SeRQL will try to assign the other parameter the same datatype. This means that the above query can still be used when the population literals don't have any datatype. SeRQL will try to interpret the literal as a positive integer and compare it to the one million constant.
The LIKE operator can check whether a value matches a specified pattern of characters. '*' characters can be used as wildcards, matching with zero or more characters. The rest of the characters are compared lexically. The pattern is surrounded with double quotes, just like a literal's label.
SELECT Country
FROM {Country} ex:name {Name}
WHERE Name LIKE "Belgium"
USING NAMESPACE
ex = <http://example.org/things#>By default, the LIKE operator does a case-sensitive comparison: in the above query, the operator fails is the variable Name is bound to the value "belgium" instead of "Belgium". Optionally, one can specify that the operator should perform a case-insensitive comparison:
SELECT Country
FROM {Country} ex:name {Name}
WHERE Name LIKE "belgium" IGNORE CASE
USING NAMESPACE
ex = <http://example.org/things#>In this query, the operator will succeed for "Belgium", "belgium", "BELGIUM", etc.
The '*' character can be used as a wildcard to indicate substring matches, for example:
SELECT Country
FROM {Country} ex:name {Name}
WHERE Name LIKE "*Netherlands"
USING NAMESPACE
ex = <http://example.org/things#>This query will match any country names that end with the string "Netherlands", for example "The Netherlands".
The isResource() and isLiteral() boolean functions check whether a variable contains a resource or a literal, respectively. For example:
SELECT *
FROM {R} rdfs:label {L}
WHERE isLiteral(L)The isURI() and isBNode() boolean functions are more specific versions of isResource(). They check whether a variable is bound to a URI value or a BNode value, respectively. For example, the following query returns only URIs (and filters out all bNodes and literals):
SELECT V
FROM {R} prop {V}
WHERE isURI(V)Boolean constraints and functions can be combined using the AND and OR operators, and negated using the NOT operator. The NOT operator has the highest presedence, then the AND operator, and finally the OR operator. Parentheses can be used to override the default presedence of these operators. The following query is a (kind of artifical) example of this:
SELECT *
FROM {X} Prop {Y} rdfs:label {L}
WHERE NOT L LIKE "*FooBar*" AND
(Y = <foo:bar> OR Y = <bar:foo>) AND
isLiteral(L)