User Guide for Sesame

Updated for Sesame release 1.2.6

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.1 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in GNU Free Documentation License.


Table of Contents

Preface: Open, Sesame
1. Introduction: what is Sesame?
1.1. The Sesame library
1.2. The Sesame Server
1.3. Repositories and Inferencing
1.4. An Overview of the Sesame Architecture
2. Installing Sesame
2.1. Library installation
2.2. Server installation
2.2.1. Required software
2.2.2. Installation under Tomcat 4 or 5
2.2.3. Installation under Tomcat 3
2.2.4. Installation under Oracle Container for Java (OC4J)
2.3. Testing Your Installation
2.4. More on the RDBMS
2.4.1. Notes on PostgreSQL
2.4.2. Notes on MySQL
2.4.3. Notes on Oracle
3. Server administration
3.1. Changing the system configuration
3.2. Loading a system configuration
3.3. Storing a system configuration
3.4. Setting the admin password
3.5. Adding and removing user accounts
3.6. Configuring repositories
3.6.1. Editing an existing repository configuration
3.6.2. Adding new repositories
3.6.3. Removing repositories
4. Advanced repository configuration
4.1. Basic setup
4.1.1. The repository id and title
4.1.2. The Sail stack
4.2. Native Sail Indexing
4.3. Custom inferencing
4.3.1. XML syntax
4.3.2. Example
4.3.3. Configuration
4.3.4. Notes and Hints
4.4. Change Tracking
5. The web interface
5.1. Logging in
5.2. Adding data to a repository
6. The SeRQL query language (revision 1.2)
6.1. Revisions
6.1.1. revision 1.1
6.1.2. revision 1.2
6.2. Introduction
6.3. URIs, literals and variables
6.3.1. Variables
6.3.2. URIs
6.3.3. Literals
6.3.4. Blank Nodes (R1.2)
6.4. Path expressions
6.4.1. Basic path expressions
6.4.2. Path expression short cuts
6.4.3. Optional path expressions
6.5. Select- and construct queries
6.6. Select queries
6.7. Construct queries
6.8. The WHERE clause
6.8.1. Boolean constants
6.8.2. Value (in)equality
6.8.3. Numerical comparisons
6.8.4. The LIKE operator (R1.2)
6.8.5. isResource() and isLiteral()
6.8.6. isURI() and isBNode() (R1.2)
6.8.7. AND, OR, NOT
6.8.8. Nested WHERE clauses (R1.2)
6.9. Other functions
6.9.1. label(), lang() and datatype()
6.9.2. namespace() and localName() (R1.2)
6.10. The LIMIT and OFFSET clauses
6.11. The USING NAMESPACE clause
6.12. Built-in predicates
6.13. Set combinatory operations
6.13.1. UNION (R1.2)
6.13.2. INTERSECT (R1.2)
6.13.3. MINUS (R1.2)
6.14. NULL values
6.15. Query Nesting
6.15.1. The IN operator (R1.2)
6.15.2. ANY and ALL (R1.2)
6.15.3. EXISTS (R1.2)
6.16. Example SeRQL queries
6.16.1. Query 1
6.16.2. Query 2
6.16.3. Query 3
6.17. Comments/feedback
6.18. References
6.19. SeRQL grammar
7. The Sesame API
7.1. An Overview of the Sesame Architecture
7.2. The Repository API
7.2.1. Accessing a repository
7.2.2. Querying a repository
7.2.3. Adding RDF data to a repository
7.3. The Graph API
7.3.1. Creating an empty Graph and adding statements to it
7.3.2. Adding/removing a Graph to/from a repository
7.3.3. Creating a Graph for an existing repository
7.3.4. Creating a graph using graph queries
7.3.5. Using graphs and graph queries for updates
8. Communication protocols
8.1. Communicating over HTTP
8.1.1. Logging in
8.1.2. Logging out
8.1.3. Requesting a list of available repositories
8.1.4. Evaluating a SeRQL-select, RQL or RDQL query
8.1.5. Evaluating a SeRQL-construct query
8.1.6. Extracting RDF from a repository
8.1.7. Uploading data to a repository
8.1.8. Adding data from the web to a repository
8.1.9. Clearing a repository
8.1.10. Removing statements
9. Frequently Asked Questions
9.1. General Questions
9.1.1. I've got a Sesame-related question, where can I get an answer?
9.1.2. Something goes wrong when I use Sesame, what do I do?
9.1.3. How do I report a bug?
9.1.4. Why doesn't Sesame support $FEATURE?
9.1.5. I need $FEATURE right now!
9.1.6. Can you keep me informed of any Sesame-related news?
9.1.7. Is this user guide the only documentation for Sesame?
9.2. Troubleshooting
9.2.1. I get a "HTTP error 500" message in the toolbar frame
9.2.2. I get a "HTTP error 500" message in the main frame
9.2.3. I get "error while adding new triples: Invalid byte 2 of 3-byte UTF-8 sequence"
9.2.4. I get a warning: "Unable to set namespace prefix 'foo' for namespace ..."
9.2.5. My in-memory repository is very slow and/or runs out of memory
9.2.6. On upload I get an error "java.lang.IllegalStateException: Post too large"
9.2.7. Can not evaluate directSubClassOf on a non-inferencing repository
A. GNU Free Documentation License
A.1. PREAMBLE
A.2. APPLICABILITY AND DEFINITIONS
A.3. VERBATIM COPYING
A.4. COPYING IN QUANTITY
A.5. MODIFICATIONS
A.6. COMBINING DOCUMENTS
A.7. COLLECTIONS OF DOCUMENTS
A.8. AGGREGATION WITH INDEPENDENT WORKS
A.9. TRANSLATION
A.10. TERMINATION
A.11. FUTURE REVISIONS OF THIS LICENSE
A.12. How to use this License for your documents

List of Figures

1.1. Sesame Server
1.2. The Sesame architecture
3.1. Configure Sesame!
3.2. Loading the configuration from a running server
3.3. Send a configuration to a running server
3.4. The Server tab
3.5. The "Users" configuration tab
3.6. The "Repository" tab
3.7. The "Repository details" window
3.8. The "access rights" tab
5.1. The Sesame entry page
5.2. The Sesame login page
5.3. The Sesame entry page when logged in
5.4. The repository function screen
6.1. A basic path expression
6.2. Multi-value nodes
6.3. Multi-value nodes in a longer path expression
6.4. Branches in a path expression
6.5. Branches in a longer path expression
6.6. A reification path expression
6.7. Path expression for query 1
6.8. Path expression for query 2
6.9. Path expression for query 3
7.1. The Sesame architecture

List of Tables

6.1. Default namespaces
8.1. Parameters for login
8.2. Parameters for evaluateTableQuery
8.3. Response formats for SeRQL-select, RQL and RDQL queries
8.4. Parameters for evaluateGraphQuery
8.5. RDF encodings for SeRQL-construct queries
8.6. Parameters for extractRDF
8.7. Parameters for uploadData
8.8. Parameters for uploadData
8.9. Parameters for clearRepository
8.10. Parameters for removeStatements

Preface: Open, Sesame

When they were out of sight Ali Baba came down, and, going up to the rock, said, "Open, Sesame." The door at once opened, and Ali Baba, entering, found himself in a large cave, lighted from a hole in the top, and full of all kinds of treasure--rich silks and carpets, gold and silver ware, and great bags of money. He loaded his three asses with as many of the bags of gold as they could carry; and, after closing the door by saying, "Shut, Sesame," made his way home.

--Tales of 1001 Nights

In February 2000 the European IST project On-To-Knowledge kicked off. The goal of this project was to provide tools and a methodology for “content-driven knowledge management through evolving ontologies”.

In this project, the Dutch company Aduna (then known as Aidministrator Nederland b.v.) developed Sesame. Sesame fullfills the role of storage and retrieval middleware for ontologies and metadata expressed in RDF and RDF Schema. Another tool developed in On-To-Knowledge is OMM, the Ontology Middleware Module, which was developed by OntoText. OMM is an extension of Sesame that adds features such as change tracking and improved security.

Currently, Sesame and OMM are being further developed as an open source software product, by Aduna in cooperation with and partially funded by the NLNet Foundation, and by OntoText. The goal is to provide a stable, efficient and scalable middleware platform for storing, retrieving, manipulating and managing ontologies and metadata stored in RDF, RDF Schema and more expressive languages like OWL.

We aren't there yet, but it's looking good, we hope. This document is here to provide you, the Sesame user, with helpful information on how to deploy Sesame in various contexts, such as a database add-on in a client-server setting, or as a Java library to add functionality to stand-alone applications.

We hope this document will get you started, and of course we hope that you find Sesame easy to use and, well, good. Being an open source product in development also means that we are very keen on receiving feedback from our users. If you have questions, comments, if you think something is wrong with Sesame, or you have a good idea on how to improve it, please let us know. Contact us through the forums and/or issue tracker that are available on the Sesame website: www.openrdf.org.

We wish to conclude with a big thank you to all of you who have been (and indeed still are) supportive of this project, in particular Teus Hagen, Wytze van der Raay, Frank van Harmelen, Andy Seaborne, Peter Mika and Jacco van Ossenbruggen. Special thanks go to Holger Lausen for providing the Oracle implementation of the RDF Sail.

The Sesame and OMM development teams.

Chapter 1. Introduction: what is Sesame?

Sesame is an open source Java framework for storing, querying and reasoning with RDF and RDF Schema. It can be used as a database for RDF and RDF Schema, or as a Java library for applications that need to work with RDF internally. For example, suppose you need to read a big RDF file, find the relevant information for your application, and use that information. Sesame provides you with the necessary tools to parse, interpret, query and store all this information, embedded in your own application if you want, or, if you prefer, in a seperate database or even on a remote server. More generally: Sesame provides application developers a toolbox that contains useful hammers, screwdrivers etc. for doing 'Do-It-Yourself' with RDF.

In the next sections, we will take a closer look at Sesame.

1.1. The Sesame library

The Sesame library consists of a set of java archives:

  • sesame.jar. The Sesame core classes.
  • rio.jar. Rio (RDF I/O) is a set of parsers and writers for different RDF serialization formats (RDF/XML, Turtle, N-Triples).
  • openrdf-model.jar. Shared interfaces and classes for the RDF model.
  • openrdf-util.jar. Shared utility classes.

These archives (which are located in the lib/ directory) contain Java classes ready for use in your own application. In Chapter 7, The Sesame API, you can find instructions and examples on how to use Sesame in your own code: how to do queries, how to add and remove data, etc.

1.2. The Sesame Server

Sesame can be used as a Server with which client applications (or human users) can communicate over HTTP (see Figure 1.1, “Sesame Server”). Sesame can be deployed as a Java Servlet Application in Apache Tomcat, a webserver that supports Java Servlets and JSP technology.

Figure 1.1. Sesame Server

Sesame Server

In Chapter 2, Installing Sesame, you will find detailed information on how to install Sesame as a server.

1.3. Repositories and Inferencing

A central concept in the Sesame framework is the repository. A repository is a storage container for RDF. This can simply mean a Java object (or set of Java objects) in memory, or it can mean a relational database. Whatever way of storage is chosen however, it is important to realize that almost every operation in Sesame happens with respect to a repository: when you add RDF data, you add it to a repository. When you do a query, you query a particular repository.

Sesame, as mentioned, supports RDF Schema inferencing. This means that given a set of RDF and/or RDF Schema, Sesame can find the implicit information in the data. Sesame supports this by simply adding all implicit information to the repository as well when data is being added.

It is important to realize that inferencing in Sesame is associated with the type of repository that you use. Sesame supports several different types of repositories (see Chapter 4, Advanced repository configuration for details). Some of these support inferencing, others do not. Whether you want Sesame to do inferencing for you is a choice that depends very much on your application.

1.4. An Overview of the Sesame Architecture

In Figure 1.2, “The Sesame architecture” an overview of Sesame's overall architecture is given.

Figure 1.2. The Sesame architecture

The Sesame architecture

Starting at the bottom, the Storage And Inference Layer, or SAIL API, is an internal Sesame API that abstracts from the storage format used (i.e. whether the data is stored in an RDBMS, in memory, or in files, for example), and provides reasoning support. SAIL implementations can also be stacked on top of each other, to provide functionality such as caching or concurrent access handling. Each Sesame repository has its own SAIL object to represent it.

On top of the SAIL, we find Sesame's functional modules, such as the SeRQL, RQL and RDQL query engines, the admin module, and RDF export. Access to these functional modules is available through Sesame's Access APIs, consisting of two seperate parts: the Repository API and the Graph API. The Repository API provides high-level access to Sesame repositories, such as querying, storing of rdf files, extracting RDF, etc. The Graph API provides more fine-grained support for RDF manipulation, such as adding and removing individual statements, and creation of small RDF models directly from code. The two APIs complement each other in functionality, and are in practice often used together.

The Access APIs provide direct access to Sesame's functional modules, either to a client program (for example, a desktop application that uses Sesame as a library), or to the next component of Sesame's architecture, the Sesame server. This is a component that provides HTTP-based access to Sesame's APIs. Then, on the remote HTTP client side, we again find the access APIs, which can again be used for communicating with Sesame, this time not as a library, but as a server running on a remote location.

While each part of the Sesame code is publicly available and extensible, most developers will be primarily interested in the Access APIs, for communicating with a Sesame RDF model or a Sesame repository from their application. In Chapter 7, The Sesame API, these APIs are described in more detail, through several code examples.

Chapter 2. Installing Sesame

Sesame can deployed in several ways. The two most common scenarios include deployment as a java library, or deployment as a server. In this chapter, both installation scenarios are explained.

2.1. Library installation

To use Sesame as a library in a java application, one needs the Sesame jar files. These are:

  • sesame.jar. The Sesame core classes.
  • rio.jar. Rio (RDF I/O) is a set of parsers and writers for different RDF serialization formats (RDF/XML, Turtle, N-Triples).
  • openrdf-model.jar. Shared interfaces and classes for the RDF model.
  • openrdf-util.jar. Shared utility classes.

These files can be found in the lib directory of the binary download. Simply including them in your classpath will allow you to use the functionality of Sesame in your own Java application.

Sesame requires Java 2, version 1.4 or newer, to function properly.

If you intend to use client/server communication over HTTP, then you will additionally need the following third-party libraries:

These third-party libraries can be found in the ext/ directory of the source distribution of Sesame, and are also included in the library directory of the Sesame Web Archive (sesame.war). More information about these libraries, including license information, can be found in the file doc/thirdparty.txt.

For more information on how to use Sesame as a library, see Chapter 7, The Sesame API

2.2. Server installation

2.2.1. Required software

The Sesame server requires the following software:

  • Sesame itself
  • A Java servlet container (running Java 2, version 1.4 or newer) for running the Sesame servlets

Sesame should be able to run on any Java servlet container that supports the Servlet 2.2 and JSP 1.1 specifications, or newer. So far, it has been tested with Tomcat and, in the case of Oracle, with OC4J. It has also been reported to work without problems on BEA WebLogic 8.1 SP2 and Jetty. Please post any reports about compatibility with other servlet containers to our forums.

Sesame has several options for storage of RDF data: it can store data in main-memory (optionally with a dump to file for persistence), it can store data on disk in a dedicated file structure, or it can store data in a relational database. See section More on the RDBMS for more info about the last option.

2.2.2. Installation under Tomcat 4 or 5

The following steps describe the easiest procedure to install Sesame on Tomcat 4.x or 5.x. The described procedure doesn't require any reconfiguration of Tomcat itself, but it might not be the best option for you. Please see the documentation that came with your copy of Tomcat if you want more fine-grained control over where and how Sesame should be installed.

  1. Install Tomcat. This usually consists of downloading the binary Tomcat distribution from http://jakarta.apache.org/tomcat and installing it in an appropriate location on your disk (we will refer to this location as [TOMCAT_DIR] here). Please see the Tomcat documentation for more information on how to get it up and running.
  2. Go to the web applications directory ([TOMCAT_DIR]/webapps/ by default) and create a directory 'sesame' there.
  3. Extract the sesame.war file (which can be found in the lib directory of the binary Sesame distribution) to the newly created 'sesame' directory. You can do this on the command line by changing directories to the new 'sesame' directory and executing the command jar -xf [PATH/TO/]sesame.war. You can also use a program like WinZip or unzip to extract the archive. We will refer to the directory where you have extracted the sesame.war file as [SESAME_DIR] in the rest of this document.
  4. In case you are planning to use a database with Sesame, copy the appropriate JDBC-driver file(s) to the directory [SESAME_DIR]/WEB-INF/lib/.
  5. Copy the file [SESAME_DIR]/WEB-INF/system.conf.example to [SESAME_DIR]/WEB-INF/system.conf. NOTE: only do this for a fresh install of Sesame, if you are upgrading you will already have a config file, and copying the example over it will destroy your existing configuration. The example file contains some repository entries for different Sails and databases, and one user account. The file may require some modifications in order to work on your machine. Please check out Server administration if you want to learn how to do this.
  6. (Re)start your Tomcat server and Sesame should now be up and running. You can access the Sesame web interface at http://[MACHINE_NAME]:8080/sesame.

2.2.3. Installation under Tomcat 3

Installation of Sesame under Tomcat 3 is almost identical to the procedure described above. It requires one additional step:

  1. Perform the steps described in Installation under Tomcat 4 or 5.
  2. Copy the file [SESAME_DIR]/WEB-INF/lib/web.xml-2.2 over the existing web.xml file and (re)start your Tomcat server.

2.2.4. Installation under Oracle Container for Java (OC4J)

Sesame has been tested with OC4J v9.0.3.0. The installed procedure differs from the standard installation procedure.

  • Create a directory [OC4J_HOME]/j2ee/home/applications/sesame and extract the sesame.war file there. We will refer to the directory where you have installed Sesame as [SESAME_DIR] in the rest of this document.
  • Add the following line to [OC4J_HOME]/j2ee/home/config/application.xml:
    <web-module id="sesame" path="../../home/applications/sesame"/>
  • Add the following line to [OC4J_HOME]/j2ee/home/config/http-web-site.xml:
    <web-app application="default" name="sesame" root="/sesame"/>

2.3. Testing Your Installation

If you have followed the installation instruction described in the previous section, the Sesame server should now be up-and-running. Pointing a browser to the location where you have installed Sesame (e.g. http://[MACHINE_NAME]:8080/sesame/ if you have installed Sesame under Tomcat as described in this document) should now display the Sesame web interface.

You should also test whether the Sesame servlets are running correctly, and whether Sesame can talk to the RDBMS (if applicable). Select one of the repositories that you have configured and press 'Go>>'. Click on the 'Add (www)' link in the toolbar at the top of the window, and try to upload the test.rdf file from Sesame's admin directory (e.g. http://[MACHINE_NAME]:8080/sesame/admin/test.rdf). You can do this by typing this URL in the first text field and by clicking on the 'Add data' button after that.

If the data-upload was successful, you should also be able to extract the uploaded data. Click on the 'Extract' link in the toolbar and press the 'Extract' button. This should yield an RDF document describing all classes and properties in the repository.

Sesame has been successfully installed if all of this works. You can remove or reconfigure the test user account and repository if you want. If you haven't done so already, you can take a look at the next chapter for information on how to add and remove user accounts and repositories to/from Sesame.

2.4. More on the RDBMS

Sesame has an RDBMS-Sail that uses an RDBMS for storing RDF data. Currently, this Sail supports PostgreSQL, MySQL, Microsoft SQL Server and Oracle databases. The first two RDBMS's are open source and are freely available on the Internet. Please check the documentation delivered with these databases for any questions on how to get them installed.

Sesame's RDBMS Sail has been extensively tested in combination with MySQL and PostgreSQL. The RDBMS Sail used to perform much better in combination with MySQL than with PostgreSQL, but recent versions of PostgreSQL have shown major performance increases. These two RDBMS's now have comparable performance, with PostgreSQL having an edge over MySQL where it concerns the evaluation of complex queries. The SQL Server- and Oracle support are third-party contribution and we have no comparitive data on their performance.

2.4.1. Notes on PostgreSQL

  • Recent versions of PostgreSQL are a lot faster than the 7.1 version that Sesame was originally developed against. The most recents tests have been done using version 8.1.4. If possibly, you should use the newest possible (stable) version of PostgreSQL that is available.
  • JDBC-drivers for PostgreSQL can be found at http://jdbc.postgresql.org, but these also come bundled with (some of) the PostgreSQL installation packages.
  • Make sure that the PostgreSQL server is running with the TCP/IP connections enabled. If TCP/IP is not enabled, the JDBC-driver will not be able to talk to the server.
  • pgAdmin is an excellent tool to administer a PostgreSQL server (for those who don't mind using GUIs).
  • The RDBMS Sail will need an (empty) database on the PostgreSQL server, as well as a user account that has access to (or owns) that database. You need to create these manually, for example by using the pgAdmin tool mentioned above. In most cases, the encoding of the new database should be set to 'UTF8'. The RDBMS Sail will take care of creating the required tables and indexes when first run. We'll assume that the user account is called 'sesame' in the rest of this document.

2.4.2. Notes on MySQL

  • Sesame tries to use the character set features that were introduced in MySQL 4.1 to properly handle non-ASCII characters. Sesame will automatically detect whether these features are available and will fall back to using BLOBs when an older version of MySQL is used. In that case, Sesame will not be able to properly handle non-ASCII characters in literals and namespace names.
  • Sesame's RDBMS Sail makes extensive use of the TRUNCATE command. Unfortunately, this command is mapped to the much slower DELETE command for InnoDB tables in pre-MySQL 5.0.3 releases. To achieve reasonable performance one should use MySQL 5.0.3 or newer, or configure MySQL to use the older MyISAM tables instead of InnoDB tables.
  • Several people have reported problems with 4.x and early 5.0.x version of MySQL, especially with versions that came pre-installed with Linux distributions. So far, all these problems were resolved by installing the most recent stable version of MySQL and its JDBC-driver. Please try this before reporting problems when using unstable/beta releases.
  • MySQL Administrator is an great tool to administer a MySQL server (for those who don't mind using GUIs).
  • The RDBMS Sail will need an (empty) database on the MySQL server, as well as a user account that has access to that database. You need to create these manually, for example by using the MySQL Administrator tool mentioned above. In most cases, the encoding of the new database should be set to 'UTF8'. The RDBMS Sail will take care of creating the required tables and indexes when first run. We'll assume that the user account is called 'sesame' in the rest of this document.

2.4.3. Notes on Oracle

Note: We do not have first-hand experience with using the RDBMS Sail on Oracle. The comments below are based on feedback that we have received from users.

  • The RDBMS Sail has been tested with Oracle 9i and newer, Oracle 8i does not work due to lack of support for left (outer) joins. Oracle 9.2.0.1.0 has a bug affecting ANSI-style left (outer) joins which makes it incompatible with Sesame. This bug has been fixed in version 9.2.0.4.0.
  • The Oracle JDBC driver can be found at [ORACLE_HOME]/jdbc/lib/classes12.jar or at http://otn.oracle.com/.
  • You will have to create a user with appropriate rights (resource + connect).
  • If ORA-0150 (maximum key length (...) exceeded) is raised during creation of your DB-Schema the reason might be in your DB configuration. You have got the following options:
    • Increase db_block_size in your init.ora and create a new database to allow larger index columns (refer to Oracle Note:136158.1).
    • Edit Oracle.java in package org.openrdf.sesame.sailimpl.rdbms and alter the size of the Datatype NAME and LABEL. The new size can be calculated from the error message, e.g.: "ORA-01450 maximum key length (3166) exceeded at varchar(3157)" (length-9 for block overhead).

Chapter 3. Server administration

3.1. Changing the system configuration

Sesame's configuration is specified in the file [SESAME_DIR]/WEB-INF/system.conf. You can edit this configuration file locally on your Sesame server using the Configure Sesame! tool available in [SESAME_DIR]/WEB-INF/bin/. You can start the tool with the command configSesame.bat (on Windows) or configSesame.sh (on UNIX. Note that in UNIX you may have to give the file executable permissions first). Alternatively, you can download Configure Sesame! as a standalone tool from the Sesame project website, and install it on a remote client machine to configure your Sesame server over HTTP.

It is possible to edit the system configuration manually with an XML or text editor, but we do not recommend this.

3.2. Loading a system configuration

Figure 3.1. Configure Sesame!

Configure Sesame!

When you have started Configure Sesame! (Figure 3.1, “Configure Sesame!”), load the file system.conf. If you want to configure a running Sesame server, you can do this by loading the file directly from the server (menu option [File]-->[Load from server...], see Figure 3.2, “Loading the configuration from a running server”). If the server is not running, you can open the file from disk (menu option [File]-->[Open file...]).

Figure 3.2. Loading the configuration from a running server

Loading the configuration from a running server

If this is a fresh installation of Sesame, the admin password is still the default password. In this case, you will need to enter "admin" as the Current admin password to load the configuration. If this is not a fresh installation of Sesame you will need to enter the admin password that you have specified earlier.

3.3. Storing a system configuration

When you have modified the configuration, you can store it on disk (menu option [File]-->[Save file as...]), or you can send it directly to a running Sesame server (menu option [File]-->[Send to server...], see Figure 3.3, “Send a configuration to a running server”). Notice that if you use the "Send to Server" option, the changes to the configuration will be applied to the running server immediately. You do not need to restart or refresh the server separately.

Figure 3.3. Send a configuration to a running server

Send a configuration to a running server

3.4. Setting the admin password

If you have not changed the admin password before, it's probably a good idea to set one right now. To set the admin password, go to the server tab (Figure 3.4, “The Server tab”) and fill in an admin password.

Figure 3.4. The Server tab

The Server tab

3.5. Adding and removing user accounts

To add a user account to Sesame, perform the following steps:

  1. Open the Tab "Users" (Figure 3.5, “The "Users" configuration tab”). You see an overview of currently configured users.

    Figure 3.5. The "Users" configuration tab

    The "Users" configuration tab
  2. Click the "Add user" icon on the bottom of the window. A new entry is added to the overview.

  3. Enter the credentials for the new user, hitting Enter to go to the next column.

To remove a user account, simply select the user you wish to remove and click the "Remove user" button in the bottom right of the window.

3.6. Configuring repositories

You can configure existing repositories, add new ones and of course remove repository configuration using Configure Sesame!.

In this section, we introduce the basic use of Configure Sesame! In Chapter 4, Advanced repository configuration we go into the details of repository config parameters.

3.6.1. Editing an existing repository configuration

  1. Open the "Repositories" tab (Figure 3.6, “The "Repository" tab”). Select the repository you wish to configure and click the "Repository details" button on the bottom left.

    Figure 3.6. The "Repository" tab

    The "Repository" tab
  2. In the Repository details window (Figure 3.7, “The "Repository details" window”), you can edit several parameters of your repository. The top part of the window shows the Sail stack, the bottom part shows the parameters of the selected Sail in the stack. In most cases, you will only want to edit the parameters of the bottom sail in the stack (for more information about the Sail stack and configuration parameter see Chapter 4, Advanced repository configuration).

    Figure 3.7. The "Repository details" window

    The "Repository details" window

The Repository details window also allows you to change the access rights of users to a repository. To change the access rights, click the "Access rights" tab, where you can edit the rights of existing users or add new users (Figure 3.8, “The "access rights" tab”). The entry anonymous is a special entry that represents users who do not log in. Giving this entry access rights means that anybody can access the repository.

Figure 3.8. The "access rights" tab

The "access rights" tab

3.6.2. Adding new repositories

The easiest way to add a new repository is to "clone" an existing repository. Use the "Clone" button on the repository tab (Figure 3.6, “The "Repository" tab”) to do this. This creates a copy of the currently selected repository configuration, which you can then edit. Take special care to change details such as the jdbcUrl or file attributes of the Sail (see Chapter 4, Advanced repository configuration).

Alternatively, you can add a new repository by using the "Add" button and filling in all the information by yourself.

3.6.3. Removing repositories

To remove a repository, simply open the repository tab (Figure 3.6, “The "Repository" tab”), select a repository and click the "Remove" button on the bottom right.

Chapter 4. Advanced repository configuration

When setting up a repository in Sesame, you can make a number of choices: should the repository support versioning or security, or should it be as fast as possible? What database will it use, or will it be in-memory?

In this chapter, we look at several of these configuration options in more detail.

4.1. Basic setup

The setup for each Sesame repository is configured using Configure Sesame!. As we have already seen in Server administration, this configuration tool allows tweaking of numerous parameters, which we will discuss in more detail here.

4.1.1. The repository id and title

In the repository tab (Figure 3.6, “The "Repository" tab”), the repository id and title are declared. The id is how the repository will be known by Sesame: all client access will need to use this identifier.

The title is for human convenience and can be used to give a short description of the repository's purpose. Clients such as the web interface use it to represent the repository to the end user.

4.1.2. The Sail stack

The most important part of the repository configuration is the sail stack, which can be found in the "repository details" screen (Figure 3.7, “The "Repository details" window”). Here, you configure where the actual repository storage resides, whether or not inferencing, security and versioning, etc. should be used, and what additional options are needed.

The sail stack is represented top-to-bottom. In the example, we see two sail declarations: org.openrdf.sesame.sailimpl.sync.SyncRdfSchemaRepository and org.openrdf.sesame.sailimpl.rdbms.RdfSchemaRepository. The first sail is stacked on top of the second one (which means that it operates by calling methods on the Sail underneath it). The second sail is the base sail: it is the lowest of the stack and does not operate on another sail, but directly on the actual data source. In this example, the base sail is an RDF Schema-aware driver for a relational database that supports (currently) MySQL (3.23.47 and higher), PostgreSQL (7.0.2 and higher) and Oracle 9i.

The SyncRdfSchemaRepository is optional, but we strongly recommend using it. This Sail handles concurrent access issues, without it Sesame would behave unpredictably when several users access the repository simultaneously.

Other base sails to choose from include:

  • org.openrdf.sesame.sailimpl.rdbms.RdfRepository: an non-inferencing driver for relational database storage.
  • org.openrdf.sesame.sailimpl.omm.versioning.VersioningRdbmsSail: an inferencing driver for relational database storage that supports change tracking.
  • org.openrdf.sesame.sailimpl.memory.RdfRepository: a non-inferencing driver for storage in main memory.
  • org.openrdf.sesame.sailimpl.memory.RdfSchemaRepository: an inferencing driver for storage in main memory that support RDF and RDF Schema entailment.
  • org.openrdf.sesame.sailimpl.nativerdf.NativeRdfRepository: a non-inferencing driver for storage directly on disk.

All base sails that work on relational databases need a number of parameters to function:

  • jdbcDriver identifies the JDBC (Java Data Base Connectivity) driver that is to be used to access the database. In the example, com.mysql.jdbc.Driver, the standard MySQL JDBC driver, is used.
  • jdbcUrl identifies the location of the database through a URL. The precise syntax of this URL is DBMS-dependent. An example URl for a MySQL database would be jdbc:mysql://localhost:3306/testdb. This specifies a database names testdb on a MySQL server running on localhost, which uses port 3306 for communication. The last part of the URL identifies the name of the database (in this case testdb). Note that this is the name of the database as it is known to the DBMS, and that it is not related to the Sesame repository id (though it might be convenient to assign them identical names).
  • user identifies a username with which Sesame can access the database. This must therefore be a user which is known to the DBMS, and which has been granted access rights (see also Server administration).
  • password identifies a password with which Sesame can access the database. This must therefore be a password that matches the username configured in the user parameter.

The RDBMS-based sails also take some optional parameters:

  • dependency-inferencing indicates whether the dependency-based truth maintenance should be used (possible values are 'yes' and 'no', the default is 'yes'). Dependency-based truth maintenance speeds up removal operations, but performance of uploads is slowed down.
  • commitInterval indicates a number of triples to be added before the sail does an in-between commit during upload of large datasets. The default is '1000'. This figure can be tweaked to improve upload performance.

The memory-based sails take four optional parameters:

  • file specifies a file in which the in-memory repository stores its contents on local disk. This file is automatically saved and reloaded on (re)start of the server.
  • dataFormat specifies the format of the data in the file. Legal values are 'rdfxml' (the default), 'ntriples' and 'turtle'.
  • compressFile specifies whether the file used for storage should be compressed with gzip.
  • syncDelay specifies the time (in milliseconds) to wait after a transaction was commited before writing the changed data to file. Setting this variable to '0' (the default value) will force a file sync immediately after each commit. A negative value will deactivate file synchronization until the Sail is shut down. A positive value will postpone the synchronization for at least that amount of milliseconds. If in the meantime a new transaction is started, the file synchronization will be rescheduled to wait for another syncDelay ms. This way, bursts of transaction events can be combined in one file sync, improving performance.

The native sail has one required parameter:

  • dir specifies the directory that can be used by the native sail to store its files.

The native sail also has an optional triple-indexes parameter, with which one can specify the indexing strategy the native sail should take. We will explain this in more detail in the next section.

4.2. Native Sail Indexing

The native store uses B-Trees for indexing statements, where the index key consists of three fields: subject (s), predicate (p) and object (o). The order in which each of these fields is used in the key determines the usability of an index on a specify triple query pattern: searching triples with a specific subject in an index that has the subject as the first field is signifantly faster than searching these same triples in an index where the subject field is second or third. In the worst case, the 'wrong' triple pattern will result in a sequential scan over the entire set of triples.

By default, the native store only uses a single index, with a subject-predicate-object key pattern. However, it is possible to define different indexes for the native store, using the triple-indexes parameter. This can be used to optimize performance for query patterns that occur frequently.

The subject-, predicate- and object fields are represented by the characters 's', 'p' and 'o', respectively. Indexes can be specified by creating 3-letter words from these three characters. Multiple indexes can be specified by separating these words with comma's, spaces and/or tabs. For example, the string "spo, pos" specifies two indexes; a subject-predicate-object index and a predicate-object-subject index.

Of course, creating multiple indexes speeds up querying, but there is a cost factor to take into account as well: adding and removing data will become more expensive, because each index will have to be updated. Also, each index takes up additional disk space.

The native store automatically creates/drops indexes upon (re)initialization, so the parameter can be adjusted and upon the first refresh of the configuration the native store will change its indexing strategy, without loss of data.

4.3. Custom inferencing

The basic set of RDFS inference rules (as defined in the RDF(S) MT semantics) sometimes can be insufficient to build custom applications. For example, in some applications there is a need for defining one's own transitive, symmetric or inverse properties. Providing an infrastructure to define such custom inference rules helps developers to tune the Sesame inferencer so it can suit better in the application.

Since Sesame release 0.95, we provide an alternative inferencer that works with org.openrdf.sesame.sailimpl.rdbms.RdfSchemaRepository SAIL. This custom inferencer can be initialized with a set of axiomatic triples and inference rules defined in an external file. The format of these definitions is very simple and intuitive and it is explained in greater detail in the next section.

Support for inter-rule dependency is also added to the customizable inferencer. Now we can state explicitly which rules are triggered if a rule infers a new statement. This information is given within an additional tag within the 'rule' one - 'triggers_rule'. It consists of several 'rule' tags with a name attribute specifying the rules affected.

4.3.1. XML syntax

The definition file is in XML and should conform to the following DTD:

<!DOCTYPE InferenceRules [
  <!ENTITY rdf 'http://www.w3.org/1999/02/22-rdf-syntax-ns#'>
  <!ENTITY rdfs 'http://www.w3.org/2000/01/rdf-schema#'>
  <!ENTITY daml 'http://www.daml.org/2001/03/daml+oil#'>

  <!ELEMENT InferenceRules (axiom | rule)*>

  <!ELEMENT axiom (subject, predicate, object)>

  <!ELEMENT rule ((premise+, consequent, triggers_rule?) | EMPTY)>
  <!ATTLIST rule
            name CDATA #REQUIRED>

  <!ELEMENT premise (subject, predicate, object)>
  <!ELEMENT consequent (subject, predicate, object)>
  <!ELEMENT triggers_rule (rule)*>

  <!ELEMENT subject EMPTY>
  <!ATTLIST subject
            var     CDATA      #IMPLIED
            uri     CDATA      #IMPLIED
            pattern CDATA      #IMPLIED
            escape  CDATA      #IMPLIED
            type    (resource) #IMPLIED>

  <!ELEMENT predicate EMPTY>
  <!ATTLIST predicate
            var     CDATA      #IMPLIED
            uri     CDATA      #IMPLIED
            pattern CDATA      #IMPLIED
            escape  CDATA      #IMPLIED
            type    (resource) #IMPLIED>

  <!ELEMENT object EMPTY>
  <!ATTLIST object
            var     CDATA      #IMPLIED
            uri     CDATA      #IMPLIED
            pattern CDATA      #IMPLIED
            escape  CDATA      #IMPLIED
            type    (resource) #IMPLIED>
]>

If a 'uri' attribute is present within the 'subject', 'predicate' or 'object' tags, its value is assumed to be a name of a resource.

The value of the 'var' attribute of the above tags gives the name of that variable. This attribute cannot be used within an 'axiom' tag.

For example, here are two of the axiomatic triples, as they are defined in the RDF(S) MT semantics. They appear in the configuration file like this:

<axiom>
	<subject   uri="&rdfs;subPropertyOf"/> 
	<predicate uri="&rdfs;domain"/> 
	<object    uri="&rdf;Property"/>
</axiom>
<axiom>
	<subject   uri="&rdfs;subPropertyOf"/>
	<predicate uri="&rdfs;range"/>
	<object    uri="&rdf;Property"/>
</axiom>

An example of an inference rule (one stating that - if a resource is used as predicate then it is of 'type' 'Property') looks like:

<rule name="rdfs1">
    <premise>
        <subject   var="xxx"/>
        <predicate var="aaa"/>
        <object    var="yyy"/>
    </premise>

    <consequent>
        <subject   var="aaa"/>
        <predicate uri="&rdf;type"/>
        <object    uri="&rdf;Property"/>
    </consequent>

    <triggers_rule>
        <rule name="rdfs2" />
        <rule name="rdfs3" />
        <rule name="rdfs4a" />
        <rule name="rdfs5b" />
        <rule name="rdfs6" />
        <rule name="rdfs9" />
    </triggers_rule>
</rule>

In the above example 'xxx', 'aaa' and 'yyy' are variables and 'rdf:type' and 'rdf:Property' are exact resource URIs.

A 'pattern' attribute with conjunction with an 'escape' attribute is used to define a pattern for matching resource names. They both can appear only in a triple component denoting variables, e.g. with 'var' attribute specified. Use '?' to denote any single character and '*' to match any character combination with length greater than 0.

Use a character declared in 'escape' attribute to escape '?' or '*' characters within pattern. You need to specify 'pattern' and 'escape' attributes for a given variable only once per rule (note that pattern and escape are used only once for variable 'id'.

An example of rule using pattern matching:

<rule name="rdfsXI">
    <premise>
        <subject   var="xxx"/>
        <predicate var="id" pattern="&rdf;_*" escape="\"/>
        <object    var="yyy"/>
    </premise>

    <consequent>
        <subject   var="id"/>
        <predicate uri="&rdf;type"/>
        <object    uri="&rdfs;ContainerMembershipProperty"/>
    </consequent>

    <triggers_rule>
        <rule name="rdfs2" />
        <rule name="rdfs3" />
        <rule name="rdfs6" />
        <rule name="rdfs9" />
        <rule name="rdfs10" />
    </triggers_rule>
</rule>

Note that you can match these triple templates by the values to the variables used in them and the specified resources used as subjects, predicates or objects of a triple.

4.3.2. Example

Consider the property URI is http://somewhere.org#partOf. In our example domain, we wish to ensure that this resource is always inserted in the repository, so we add the axiomatic triple stating that it is a property:

<axiom>
    <subject   uri="http://somewhere.org#partOf"/> 
    <predicate uri="&rdf;type"/> 
    <object    uri="&rdf;Property"/>
</axiom>

We also wish to define that the property is transitive. To this end, we add a single inference rule:

<rule name="userPartOf">
    <premise>
        <subject   var="xxx"/>
        <predicate uri="http://somewhere.org#partOf"/>
        <object    var="yyy"/>
    </premise>
    <premise>
        <subject   var="yyy"/>
        <predicate uri="http://somewhere.org#partOf"/>
        <object    var="zzz"/>
    </premise>

    <consequent>
        <subject   var="xxx"/>
        <predicate uri="http://somewhere.org#partOf"/>
        <object    var="zzz"/>
    </consequent>

    <triggers_rule>
        <rule name="rdfs2" />
        <rule name="rdfs3" />
        <rule name="rdfs6" />
        <rule name="userPartOf" />
    </triggers_rule>
</rule>

If the repository has these two triples: T1 - (finger.1, partOf, Hand.Left) and T2 - (Hand.Left, partOf, Human.1) and if they match the condition (since the same 'yyy' variable is used in both 'premise' tags) T1.object = T2.subject, a triple corresponding to the 'consequent' tag is added to the repository, using the current variable bindings and will have the form TInfer = (T1.subject, partOf, T2.object) e.g. Tinfer=(Finger.1, partOf, Human.1).

4.3.3. Configuration

The inferencer used by a repository based on org.openrdf.sesame.sailimpl.rdbms.RdfSchemaRepository sail is defined by a parameter passed to it during the initialization. To start using the custom inferencer on a repository, add the following extra parameter to the configuration of that repository:

  • use-inferencer specifies the full classname of the inferencer. To use the custom inferencer, use the value org.openrdf.sesame.sailimpl.rdbms.CustomInferenceServices.
  • rule-file specifies the location of the XML file in which the inference rules for the custom inferencer are specified. Make sure that you specify the full path name.

4.3.4. Notes and Hints

An example rules file, containing the axioms and entailment rules as specified by the January 23 Working Draft of the RDF Model Theory, can be found in the Sesame source tree, specifically in src/org/openrdf/sesame/sailimpl/rdbms/entailment-rdf-mt-20030123.xml. This file is used per default by the custom inferencer if the rule-file parameter is not specified.

Changes to the rules file do not lead to automatic reapplication of the rules over the existing data in the repository. So clean the repository first to avoid inconsistency problems.

The dependency information used by the TMS system is also affected by the rules. The default inferencer uses dependency database table, that can handle cases where up to two triples leads to the inference of a new one. Since there can exist inference rules involving arbitrary number of 'premise' tags in the configuration file - the structure of the default dependency table cannot handle them. To avoid loss of data, the structure of that table is not altered and it is created only if it not exist. This check is performed during repository initialization phase. So it is better to apply new/modified inference rules on a completely clean datastorage (database).

4.4. Change Tracking

[This section not yet available. See the documentation at http://www.ontotext.com/omm/ for details.]

Chapter 5. The web interface

Sesame comes with a Web interface to allow access to repositories through a normal Web browser. In this chapter, we will briefly describe the user interface.

5.1. Logging in

If you installed Sesame according to the guidelines in Chapter 2, Installing Sesame, the Sesame entry page is located at http://[MACHINE_NAME]:8080/sesame/. If you point your browser to this address, it should display the Sesame entry page shown in Figure 5.1, “The Sesame entry page”.

Figure 5.1. The Sesame entry page

The Sesame entry page

The screen provides you with a choice of repositories to work on. Since you are not yet logged in, and no publicly accessible repositories are available, you get a notification that no repositories can be accessed.

To log in to Sesame, click the "log in" link and provide user name and password (see Figure 5.2, “The Sesame login page”).

Figure 5.2. The Sesame login page

The Sesame login page

After you have logged in, you are returned to the entry page, which now shows that you are logged in and gives you a choice of repositories that you have access on.

Figure 5.3. The Sesame entry page when logged in

The Sesame entry page when logged in

After selecting a repository, you come to the actual Sesame function interface (see Figure 5.4, “The repository function screen”).

Figure 5.4. The repository function screen

The repository function screen

The toolbar at the top of the screen shows user and repository information, and allows the selection of different actions on the repository. The actions are divided in read actions (such as queries) and write actions (adding and removing data).

5.2. Adding data to a repository

The web interface offers three options for adding data to a Sesame repository: Add file, Add (www) and Add (copy-paste).

The Add file and Add (www) options are fairly straightforward. The first option allows you to select an RDF document from your local disk to add to the Sesame repository. The second option allows you to add RDF documents that are accessible through a URL to the repository.

Optionally, you can supply a base URL. The base URL is used by the RDF parser to disambiguate any relative resource references in the RDF document. By default, Sesame uses foo:bar as a base URL, but this may not always be desirable, for example when the file is being from a temporary location, or when resources defined in the document are referenced by other RDF sources with different URLs.

The Add (copy-paste) option allows you to upload data to Sesame by typing (or copying and pasting) it in the text area. The pasted text should be valid RDF/XML document.

Chapter 6. The SeRQL query language (revision 1.2)

6.1. Revisions

6.1.1. revision 1.1

SeRQL revision 1.1 is a syntax revision (see issue tracker item SES-75). This document describes the revised syntax. From Sesame release 1.2-RC1 onwards, the old syntax is no longer supported.

6.1.2. revision 1.2

SeRQL revision 1.2 covers a set of new functions and operators:

New operations have been marked with (R1.2) where appropriate in this document.

6.2. Introduction

SeRQL ("Sesame RDF Query Language", pronounced "circle") is a new RDF/RDFS query language that is currently being developed by Aduna as part of Sesame. It combines the best features of other (query) languages (RQL, RDQL, N-Triples, N3) and adds some of its own. This document briefly shows all of these features. After reading through this document one should be able to write SeRQL queries.

Some of SeRQL's most important features are:

  • Graph transformation.
  • RDF Schema support.
  • XML Schema datatype support.
  • Expressive path expression syntax.
  • Optional path matching.

6.3. URIs, literals and variables

URIs and literals are the basic building blocks of RDF. For a query language like SeRQL, variables are added to this list. The following sections will show how to write these down in SeRQL.

6.3.1. Variables

Variables are identified by names. These names must start with a letter or an underscore ('_') and can be followed by zero or more letters, numbers, underscores, dashes ('-') or dots ('.'). Examples variable names are:

  • Var1
  • _var2
  • unwise.var-name_isnt-it

SeRQL keywords are not allowed to be used as variable names. Currently, the following keywords are used or reserved for future use in SeRQL: select, construct, from, where, using, namespace, true, false, not, and, or, like, label, lang, datatype, null, isresource, isliteral, sort, in, union, intersect, minus, exists, forall, distinct, limit, offset.

Keywords in SeRQL are all case-insensitive, this in contrast to variable names; these are case-sensitive.

6.3.2. URIs

There are two ways to write down URIs in SeRQL: either as full URIs or as abbreviated URIs. Full URIs must be surrounded with "<" and ">". Examples of this are:

  • <http://www.openrdf.org/index.html>
  • <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
  • <mailto:sesame@openrdf.org>
  • <file:///C:\rdffiles\test.rdf>

As URIs tend to be long strings with the first part being shared by several of them (i.e. the namespace), SeRQL allows one to use abbreviated URIs (or QNames) by defining (short) names for these namespaces which are called "prefixes". A QName always starts with one of the defined prefixes and a colon (":"). After this colon, the part of the URI that is not part of the namespace follows. The first part, consisting of the prefix and the colon, is replaced by the full namespace by the query engine. Some example QNames are:

  • sesame:index.html
  • rdf:type
  • foaf:Person

6.3.3. Literals

RDF literals consist of three parts: a label, a language tag, and a datatype. The language tag and the datatype are optional and at most one of these two can accompany a label (a literal can not have both a language tag and a datatype). The notation of literals in SeRQL has been modelled after their notation in N-Triples; literals start with the label, which is surrounded by double quotes, optionally followed by a language tag with a "@" prefix or by a datatype URI with a "^^" prefix. Example literals are:

  • "foo"
  • "foo"@en
  • "<foo/>"^^<http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral>

The SeRQL notation for abbreviated URIs can also be used. When the prefix rdf is mapped to the namespace http://www.w3.org/1999/02/22-rdf-syntax-ns#, the last example literal could also have been written down like:

  • "<foo/>"^^rdf:XMLLiteral

SeRQL has also adopted the character escapes from N-Triples; special characters can be escaped by prefixing them with a backslash. One of the special characters is the double quote. Normally, a double quote would signal the end of a literal's label. If the double quote is part of the label, it needs to be escaped. For example, the sentence John said: "Hi!" can be encoded in a SeRQL literals as: "John said: \"Hi!\"".

As the backslash is a special character itself, it also needs to be escaped. To encode a single backslash in a literal's label, two backslashes need to be written in the label. For example, a Windows directory would be encoded as: "C:\\Program Files\\Apache Tomcat\\".

SeRQL has functions for extracting each of the three parts of a literal. These functions are label, lang, and datatype. label("foo"@en) extracts the label "foo", lang("foo"@en) extracts the language tag "en", and datatype("foo"^^rdf:XMLLiteral) extracts the datatype rdf:XMLLiteral. The use of these functions is explained later.

6.3.4. Blank Nodes (R1.2)

RDF has a notion of blank nodes. These are nodes in the RDF graph that are not labeled with a URI or a literal. The interpretation of such blank nodes is as a form of existential quantification: it allows one to assert that "there exists a node such that..." without specifying what that particular node is. Blank nodes do in fact often have identifiers, but these identifiers are assigned internally by whatever processor is processing the graph and they are only valid in the local context, not as global identifiers (unlike URIs).

Strictly speaking blank nodes are only addressable indirectly, by querying for one or more properties of the node. However, SeRQL, as a practical shortcut, allows blank node identifiers to be used in queries. The syntax for blank nodes is adopted from N-Triples, using a QName-like syntax with "_" as the namespace prefix, and the internal blank node identifier as the local name. For example:

  • _:bnode1

This identifies the blank node with internal identifier "bnode1". These blank node identifiers can be used in the same way that normal URIs or QNames can be used.

Caution: It is important to realize that addressing blank nodes in this way makes SeRQL queries non-portable across repositories. There is no guarantee that in two repositories, even if they contain identical datasets, the blank node identifiers will be identical. It may well be that "bnode1" in repository A is a completely different blank node than "bnode1" in repository B. Even in the same repository, it is not guaranteed that blank node identifiers are stable over updates: if certain statements are added to or removed from a repository, it is not guaranteed "bnode1" still identifies the same blank node that it did before the update operation.

6.4. Path expressions

One of the most prominent parts of SeRQL are path expressions. Path expressions are expressions that match specific paths through an RDF graph. Most current RDF query languages allow you to define path expressions of length 1, which can be used to find (combinations of) triples in an RDF graph. SeRQL, like RQL, allows you to define path expressions of arbitrary length.

6.4.1. Basic path expressions

Imagine that we want to query an RDF graph for persons who work for companies that are IT companies. Querying for this information comes down to finding the following pattern in the RDF graph (gray nodes denote variables):

Figure 6.1. A basic path expression

A basic path expression

The SeRQL notation for path expressions resembles the picture above; it is written down as:

{Person} ex:worksFor {Company} rdf:type {ex:ITCompany}

The parts surrounded by curly brackets represent the nodes in the RDF graph, the parts between these nodes represent the edges in the graph. The direction of the arcs (properties) in SeRQL path expressions is always from left to right.

In SeRQL queries, multiple path expressions can be specified by seperating them with commas. For example, the path expression show before can also be written down as two smaller path expressions:

{Person} ex:worksFor {Company},
{Company} rdf:type {ex:ITCompany}

The nodes and edges in the path expressions can be variables, URIs and literals. Also, a node can be left empty in case one is not interested in the value of that node. Here are some more example path expressions to illustrate this:

  • {Person} ex:worksFor {} rdf:type {ex:ITCompany}
  • {Painting} ex:painted_by {} ex:name {"Picasso"}
  • {comic:RoadRunner} SomeRelation {foo:WillyECoyote}

6.4.2. Path expression short cuts

Each and every path can be constructed using a set of basic path expressions. Sometimes, however, it is nicer to use one of the available short cuts. There are three types of short cuts, all of them are explained below.

6.4.2.1. Multi-value nodes

In situations where one wants to query for two or more triples with identical subject and predicate, the subject and predicate do not have to be repeated over and over again. Instead, a multi-value node can be used:

{subj1} pred1 {obj1, obj2, obj3}

A built-in constraint on this construction is that each value for the variables in the multi-value node is unique (i.e. they are pairwise disjoint). Therefore, this path expression is equivalent to the following combination of path expressions and boolean constraints:

FROM
  {subj1} pred1 {obj1},
  {subj1} pred1 {obj2},
  {subj1} pred1 {obj3}
WHERE obj1 != obj2 AND obj1 != obj3 AND obj2 != obj3

Or graphically:

Figure 6.2. Multi-value nodes

Multi-value nodes

Multi-value nodes can also be used when statements share the predicate and object, e.g.:

{subj1, subj2, subj3} pred1 {obj1}

When used in a longer path expression, multi-value nodes apply to both the part left of the node and the part right of the node. The following path expression:

{first} pred1 {middle1, middle2} pred2 {last}

matches the following graph:

Figure 6.3. Multi-value nodes in a longer path expression

Multi-value nodes in a longer path expression

When using variables in multi-value nodes, a constraint on its values is implicitly added: the variable's value is not allowed to be equal to any other value in the multi-value node. So, in the first example, the variables obj1, obj2 and obj3 will not match identical values at the same time. This prevents the path from matching a single triple three times.

6.4.2.2. Branches

One of the shorts cuts that is likely going to be used most, is the notation for branches in path expressions. There are lots of situations where one wants to query multiple properties of a single subject. Instead of repeating the subject over and over again, one can use a semi-colon to attach a predicate-object combination to the subject of the last part of a path expression, e.g.:

{subj1} pred1 {obj1};
        pred2 {obj2}

Which is equivalent to:

{subj1} pred1 {obj1},
{subj1} pred2 {obj2}

Or graphically:

Figure 6.4. Branches in a path expression

Branches in a path expression

Or a slightly more complicated example:

{first} pred {} pred1 {obj1};
                pred2 {obj2} pred3 {obj3}

Which matches the following graph:

Figure 6.5. Branches in a longer path expression

Branches in a longer path expression

Note that an anonymous variable is used in the middle of the path expressions.

6.4.2.3. Reified statements

The last short cut is a short cut for reified statements. A path expression representing a single statement (i.e. {node} edge {node}) can be written between the curly brackets of a node, e.g.:

{ {reifSubj} reifPred {reifObj} } pred {obj}

This would be equivalent to querying (using "rdf:" as a prefix for the RDF namespace, and "_Statement" as a variable for storing the statement's URI):

{_Statement} rdf:type {rdf:Statement},
{_Statement} rdf:subject {reifSubj},
{_Statement} rdf:predicate {reifPred},
{_Statement} rdf:object {reifObj},
{_Statement} pred {obj}

Again, graphically:

Figure 6.6. A reification path expression

A reification path expression

6.4.3. Optional path expressions

Optional path expressions differ from 'normal' path expressions in that they do not have to be matched to find query results. The SeRQL query engine will try to find paths in the RDF graph matching the path expression, but when it cannot find any paths it will skip the expression and leave any variables in it uninstantiated (they will have the value null).

Consider an RDF graph that contains information about people that have names, ages, and optionally e-mail addresses. This is a situation that is likely to be very common in RDF data. A logical query on this data is a query that yields all names, ages and, when available, e-mail addresses of people, e.g.:

{Person} ex:name {Name};
         ex:age  {Age};
         ex:email {EmailAddress}

However, using normal path expressions like in the query above, people without e-mail address will not be returned by the SeRQL query engine. With optional path expressions, one can indicate that a specific (part of a) path expression is optional. This is done using square brackets, i.e.:

{Person} ex:name {Name};
         ex:age  {Age};
        [ex:email {EmailAddress}]

Or alternatively:

 {Person} ex:name {Name};
          ex:age  {Age},
[{Person} ex:email {EmailAddress}]

In contrast to the first path expressions, this expression will also match with people without an e-mail address. For these people, the variable EmailAddress will not be assigned a value.

Optional path expressions can also be nested. This is useful in situations where the existence of a specific path is dependent on the existence of another path. For example, the following path expression queries for the titles of all known documents and, if the author of the document is known, the name of the author (if it is known) and his e-mail address (if it is known):

{Document} ex:title {Title};
          [ex:author {Author} [ex:name {Name}];
                              [ex:email {Email}]]

With this path expression, the SeRQL query engine will not try to find the name and e-mail address of an author when it cannot even find the resource representing the author.

6.5. Select- and construct queries

The SeRQL query language supports two querying concepts. The first one can be characterized as returning a table of values, or a set of variable-value bindings. The second one returns a true RDF graph, which can be a subgraph of the graph being queried, or a graph containing information that is derived from it. The first type of queries are called "select queries", the second type of queries are called "construct queries".

A SeRQL query is typically built up from one to six clauses. For select queries these clauses are: SELECT, FROM, WHERE, LIMIT, OFFSET and USING NAMESPACE. One might recognize the first five clauses from SQL, but their usage is slightly different. For construct queries the clauses are the same with the exception of the first; construct queries start with a CONSTRUCT clause instead of a SELECT clause.

The first clause (i.e. SELECT or CONSTRUCT) determines what is done with the results that are found. In a SELECT clause, one can specify which variable values should be returned and in what order. In a CONSTRUCT clause, one can specify which triples should be returned.

The FROM clause is optional and always contains path expressions, which were explained in the previous section. It defines the paths in an RDF graph that are relevant to the query. Note that when the FROM clause is not specified, the query will simply return the constants specified in the SELECT or CONSTRUCT clause.

The WHERE clause is optional and can contain additional (Boolean) constraints on the values in the path expressions. These are constraints on the nodes and edges of the paths, which cannot be expressed in the path expressions themselves.

The LIMIT and OFFSET clauses are also optional. These clauses can be used separately or combined in order to get a subset of all query answers. Their usage is very similar to the LIMIT and OFFSET clauses in SQL queries. The LIMIT clause determines the (maximum) amount of query answers that will be returned. The OFFSET clause determines which query answer will be returned as the first result, skipping as many query results as specified in this clause.

Finally, the USING NAMESPACE clause is also optional and it can contain namespace declarations; these are the mappings from prefixes to namespaces that were referred to in one of previous sections about (abbreviated) URIs.

The WHERE, LIMIT, OFFSET and USING NAMESPACE clauses will be explained in one of the next sections. The following section will explain the SELECT and FROM clause.

6.6. Select queries

As said before, select queries return tables of values, or sets of variable-value bindings. Which values are returned can be specified in the select clause. One can specify variables and/or values in the select clause, seperated by commas. The following example query returns all URIs of classes:

SELECT C
FROM {C} rdf:type {rdfs:Class}

It is also possible to use a '*' in the SELECT clause. In that case, all variable values will be returned in the order in which they appear in the query, e.g.:

SELECT *
FROM {S} rdfs:label {O}

This query will return the values of the variables S and O, in that order. If a different order is preferred, one needs to specify the variables in the select clause, e.g.:

SELECT O, S
FROM {S} rdfs:label {O}

By default, the results of a select query are not filtered for duplicate rows. Because of the nature of the above queries, these queries will never return duplicates. However, more complex queries might result in duplicate result rows. These duplicates can be filtered out by the SeRQL query engine. To enable this functionality, one needs to specify the DISTINCT keyword after the select keyword. For example:

SELECT DISTINCT *
FROM {Country1} ex:borders {} ex:borders {Country2}
USING NAMESPACE
    ex = <http://example.org/things#>

6.7. Construct queries

Construct queries return RDF graphs as set of triples. The triples that a query should return can be specified in the construct clause using the previously explained path expressions. The following is an example construct query:

CONSTRUCT {Parent} ex:hasChild {Child}
FROM {Child} ex:hasParent {Parent}
USING NAMESPACE
    ex = <http://example.org/things#>

This query defines the inverse of the property foo:hasParent to be foo:hasChild. This is just one example of a query that produces information that is derived from the original information. Here is one more example:

CONSTRUCT
    {Artist} rdf:type {ex:Painter};
             ex:hasPainted {Painting}
FROM
    {Artist} rdf:type {ex:Artist};
             ex:hasCreated {Painting} rdf:type {ex:Painting}
USING NAMESPACE
    ex = <http://example.org/things#>

This query derives that an artist who has created a painting, is a painter. The relation between the painter and the painting is modelled to be art:hasPainted.

Instead of specifying a path expression in the CONSTRUCT clause, one can also use a '*'. In that case, the CONSTRUCT clause is identical to the FROM clause. This allows one to extract a subgraph from a larger graph, e.g.:

CONSTRUCT *
FROM {SUB} rdfs:subClassOf {SUPER}

This query extracts all rdfs:subClassOf relations from an RDF graph.

Just like with select queries, the results of a construct query are not filtered for duplicate triples by default. Again, these duplicates are filtered out by the SeRQL query engine if the DISTINCT keyword is specified after the construct keyword, for example:

CONSTRUCT DISTINCT
    {Artist} rdf:type {ex:Painter}
FROM
    {Artist} rdf:type {ex:Artist};
             ex:hasCreated {} rdf:type {ex:Painting}
USING NAMESPACE
    ex = <http://example.org/things#>

6.8. The WHERE clause

The third clause in a query is the WHERE clause. This is an optional clause in which one can specify Boolean constraints on variables.

The following sections will explain the available Boolean expressions for use in the WHERE clause. Section 6.8.8, “Nested WHERE clauses (R1.2)” will explain how WHERE clauses can be nested inside optional path expressions.

6.8.1. Boolean constants

There are two Boolean constants, TRUE and FALSE. The first one is simply always true, the last one is always false. The following query will never produce any results because the constraint in the where clause will never evaluate to true:

SELECT *
FROM {X} Y {Z}
WHERE FALSE

6.8.2. Value (in)equality

The most common boolean constraint is equality or inequality of values. Values can be compared using the operators "=" (equality) and "!=" (inequality). The expression

Var = <foo:bar>

is true if the variable Var contains the URI <foo:bar>, and the expression

Var1 != Var2

checks whether two variables are not equal.

6.8.3. Numerical comparisons

Numbers can be compared to each other using the operators "<" (lower than), "<=" (lower than or equal to), ">" (greater than) and ">=" (greater than or equal to). SeRQL uses a literal's datatype to determine whether its value is numerical. All XML Schema built-in numerical datatypes are supported, i.e.: xsd:float, xsd:double, xsd:decimal and all subtypes of xsd:decimal (xsd:long, xsd:nonPositiveInteger, xsd:byte, etc.), where the prefix xsd is used to reference the XML Schema namespace.

In the following query, a comparison between values of type xsd:positiveInteger is used to retrieve all countries that have a population of less than 1 million:

SELECT Country
FROM {Country} ex:population {Population}
WHERE Population < "1000000"^^xsd:positiveInteger
USING NAMESPACE
    ex = <http://example.org/things#>

SeRQL is currently restricted to numerical comparisons between values with identical datatypes. This means that e.g. xsd:int values cannot (yet) be compared to xsd:byte values.

If only one of the parameters of a comparison has a datatype, SeRQL will try to assign the other parameter the same datatype. This means that the above query can still be used when the population literals don't have any datatype. SeRQL will try to interpret the literal as a positive integer and compare it to the one million constant.

6.8.4. The LIKE operator (R1.2)

The LIKE operator can check whether a value matches a specified pattern of characters. '*' characters can be used as wildcards, matching with zero or more characters. The rest of the characters are compared lexically. The pattern is surrounded with double quotes, just like a literal's label.

SELECT Country
FROM {Country} ex:name {Name}
WHERE Name LIKE "Belgium"
USING NAMESPACE
    ex = <http://example.org/things#>

By default, the LIKE operator does a case-sensitive comparison: in the above query, the operator fails is the variable Name is bound to the value "belgium" instead of "Belgium". Optionally, one can specify that the operator should perform a case-insensitive comparison:

SELECT Country
FROM {Country} ex:name {Name}
WHERE Name LIKE "belgium" IGNORE CASE
USING NAMESPACE
    ex = <http://example.org/things#>

In this query, the operator will succeed for "Belgium", "belgium", "BELGIUM", etc.

The '*' character can be used as a wildcard to indicate substring matches, for example:

SELECT Country
FROM {Country} ex:name {Name}
WHERE Name LIKE "*Netherlands"
USING NAMESPACE
    ex = <http://example.org/things#>

This query will match any country names that end with the string "Netherlands", for example "The Netherlands".

6.8.5. isResource() and isLiteral()

The isResource() and isLiteral() boolean functions check whether a variable contains a resource or a literal, respectively. For example:

SELECT *
FROM {R} rdfs:label {L}
WHERE isLiteral(L)

6.8.6. isURI() and isBNode() (R1.2)

The isURI() and isBNode() boolean functions are more specific versions of isResource(). They check whether a variable is bound to a URI value or a BNode value, respectively. For example, the following query returns only URIs (and filters out all bNodes and literals):

SELECT V
FROM {R} prop {V}
WHERE isURI(V)

6.8.7. AND, OR, NOT

Boolean constraints and functions can be combined using the AND and OR operators, and negated using the NOT operator. The NOT operator has the highest presedence, then the AND operator, and finally the OR operator. Parentheses can be used to override the default presedence of these operators. The following query is a (kind of artifical) example of this:

SELECT *
FROM {X} Prop {Y} rdfs:label {L}
WHERE NOT L LIKE "*FooBar*" AND
      (Y = <foo:bar> OR Y = <bar:foo>) AND
      isLiteral(L)

6.8.8. Nested WHERE clauses (R1.2)

In order to be able to express boolean constraints on variables in optional path expres