DRAFT Updated for Sesame release 2.0-beta4
Copyright © 2002-2007 Aduna B.V.
Table of Contents
List of Figures
List of Tables
For those who have previously worked with Sesame 1.x: Sesame 2 is an important step forward from the Sesame 1.x series, building on the feedback that has been gathered during the past years. A lot of improvements have been made to various APIs as well as the framework as a whole.
Unfortunately, because of all the changes to APIs and the like, Sesame 2 is not backwards compatible with earlier releases. Therefore, we suggest you read through this user guide before starting to port any applications to Sesame 2. Please also note that Sesame 2 is currently in beta stage, meaning that although the core APIs are now stable, many features are not yet implemented (for example, the custom inferencer and the RDBMS backend are not yet available).
So what's new in Sesame 2? Well, in short:
A lot of these new features will be covered in the next chapters.
Before diving into the internals of Sesame, we will start with a short introduction to Sesame, giving a high-level overview of its components. It's important to have some basic knowledge about this as the rest of this document will often refer to various components that are touched upon here. It is assumed that the reader has at least some basic knowledge about RDF, RDF Schema, OWL, etc. If this is not the case, some introductory articles can be found at the following locations:
We will try to explain the Sesame framework using the following figure, which shows the most prominent components and APIs in Sesame and how they are built on top of each other. Each component/API depends on the components/APIs that are directly beneath them.
All the way at the bottom of the diagram is the RDF Model, the foundation of the Sesame framework. Being an RDF-oriented framework, all parts of Sesame are to some extent dependent on this RDF model, which defines interfaces and implementation for all basic RDF entities: URI, blank node, literal and statement.
Rio, which stands for "RDF I/O", consists of a set of parsers and writers for various RDF file formats. The parsers can be used to translate RDF files to lists of statements, and the writers for the reverse operation. Rio can also be used independent of the rest of Sesame.
The Storage And Inference Layer (Sail) API is a low level System API
(SPI) for RDF stores and inferencers. Its purpose is to abstract from
the storage and inference details, allowing various types of storage and
inference to be used. The Sail API is mainly of interest for those who
are developing Sail implementations, for all others it suffices to know
how to create and configure one. There are several implementations of
the Sail API, for example the MemoryStore which
stores RDF data in main memory, and the
NativeStore which uses dedicated on-disk data
structures for storage.
The Repository API is a higer level API that offers a large number of
developer-oriented methods for handling RDF data. The main goal of this
API is to make the life of application developers as easy as possible.
It offers various methods for uploading data files, querying, and
extracting and manipulating data. There are several implementations of
this API, the ones shown in this figure are the
SailRepository and the
HTTPRepository. The former translates calls to a
Sail implementation of choice, the latter offers transparent
client-server communication with a Sesame server over HTTP.
The top-most component in the diagram is the HTTP Server. The HTTP
Server consists of a number of Java Servlets that implement a protocol
for accessing Sesame repositories over HTTP. The details of this
protocol can be found in Sesame's system documentation, but most people
can simply use a client library to handle the communication. The
HTTPClient that is used by the
HTTPRepository is one such library.
While each part of the Sesame code is publicly available and extensible, most application developers will be primarily interested in the Repository API. This API is described in more detail in one of the following chapters.
Table of Contents
In this chapter, we explain how you can install a Sesame 2.0 Server. You can skip chapter if you are not planning to run a Sesame server but intend to use Sesame as a library to program against.
The Sesame 2.0 HTTP Server requires the following software:
The Sesame 2.0 server software comes in the form of two Java Web Applications: the Sesame HTTP server and the OpenRDF Workbench. The former provides HTTP access to Sesame repositories and is meant to be accessed by other applications. OpenRDF Workbench is a web application that provides a user interface to Sesame servers. Both webapps can be installed independently of one another.
If you haven't done so already, you will first need to download the
Sesame 2.0 SDK.
Both the Sesame server webapp and the Workbench webapp are included
in this SDK, they can be found in the war
directory. The war-files in this directory need to be deployed in a
Java Servlet Container, for example in
Apache Tomcat. The
deployment process is container-specific, please consult the
documentation for your container on how to deploy a web application.
After you have deployed the Sesame server webapp, you should have a
running Sesame 2.0 server that can be accessed at path
/openrdf-sesame. You can point your browser at
this location to verify that the deployment succeeded. Your browser
should show the Sesame welcome screen as well as some options to
view the server logs, among other things. Similarly, the OpenRDF
Workbench should be available at path
/openrdf-workbench.
A Sesame server stores all its configuration files and repository
data in a single directory (with subdirectories). On Windows
machines, this directory is
%APPDATA%\Aduna\openrdf\server\
by default, where %APPDATA% is the application data directory of the
user that runs the server. For example, in case the server runs
under the 'LocalService' user account on Windows XP, the directory is
C:\Documents and Settings\LocalService\Application Data\Aduna\OpenRDF Sesame.
On Linux/UNIX, the default location is
$HOME/.aduna/openrdf/server/, where $HOME is
the home directory of the user that runs the server, for example
/home/tomcat/.aduna/openrdf/server/.
The location of this data directory can be reconfigured using the
Java system property
info.aduna.platform.appdata.basedir. When you
are using Tomcat as the servlet container than you can set this
property using the JAVA_OPTS parameter, for
example:
set JAVA_OPTS='-Dinfo.aduna.platform.appdata.basedir=\path\to\other\dir\'
(on Windows)
export JAVA_OPTS='-Dinfo.aduna.platform.appdata.basedir=/path/to/other/dir/'
(on Linux/UNIX)
If you are using Apache Tomcat as a Windows Service you should use the Windows Service Configure tool to set this property. Other users can either edit the Tomcat startup script or set the property some other way.
In the rest of this manual, we will refer to the Sesame Server's
data directory as [SESAME_DATA].
A clean installation of a Sesame server has a single repository by default: the SYSTEM repository. This SYSTEM repository contains all configuration data for the server, including what other repositories exists and (in future releases) the access rights on these repositories. This SYSTEM repository should not be used to store data that is not related to the server configuration.
The easiest way to create/manager repositories in a SYSTEM
repository is to use the Sesame Console. The Sesame Console can be
started using the start-console.bat/.sh scripts
that can be found in the bin directory of the
Sesame SDK.
On startup, the Sesame console will show you the repositories that are available. It's important to realize that these repostories are not related to the server's repositories: the console has it's own set of repositories and it can be used independently of the Sesame server for various purposes. We can, however, access the server's SYSTEM repository from the console by creating a "remote repository" for it. You can accomplish this by entering the following command in the console:
create remote.
The console will then ask you to fill in a number of parameters for the remote repository. The default values will create a remote repository that connects to the SYSTEM repository on localhost, assuming a default installation. After entering the correct values for the parameters, you should now have a new repository that represents the SYSTEM repository on the server. Open this repository by executing the 'open' command with the repository's ID ('SYSTEM@localhost' by default):
open SYSTEM@localhost.
You can now use the same 'create' command to create repositories on the server. Next to 'remote', you can also use 'memory', 'memory-rdfs' and 'native' as arguments for the create command, which will respectively create a memory store, a memory store with RDF Schema inferencing, and a native store. The following sections describe the various parameters that can be specified for these repository types.
A memory store is an RDF repository that stores its data in main
memory. Apart from the standard ID and
title parameters, this type of repository
has a Persist and
Sync delay parameter.
The Persist parameter controls
whether the memory store will use a data file for
persistence over sessions. Persistent memory stores write
their data to disk before being shut down and read this data
back in the next time they are initialized. Non-persistent
memory stores are always empty upon initialization.
By default, the memory store persistence mechanism synchronizes the disk backup directly upon any change to the contents of the store. That means that directly after any change (upload, removal) completes, the disk backup is updated. It is possible to configure a synchronization delay however. This can be useful if your application performs several transactions in sequence and you want to prevent disk synchronization in the middle of this sequence.
The synchronization delay is specified by a number, indicating the time in milliseconds that the store will wait before it synchronizes changes to disk. The value 0 indicates that there should be no delay. Negative values can be used to postpone the synchronization indefinitely, i.e. until the store is shut down.
A native store stores and retrieves its data directly to/from disk. The advantage of this over the memory store is that it scales much better as it isn't limited to the size of available memory. Of course, since it has to access the disk, it is also slower than the in-memory store, but it is a good solution for larger data sets.
The native store uses on-disk indexes to speed up querying. It uses B-Trees for indexing statements, where the index key consists of four fields: subject (s), predicate (p), object (o) and context (c). The order in which each of these fields is used in the key determines the usability of an index on a specify statement query pattern: searching statements with a specific subject in an index that has the subject as the first field is signifantly faster than searching these same statements in an index where the subject field is second or third. In the worst case, the 'wrong' statement pattern will result in a sequential scan over the entire set of statements.
By default, the native repository only uses two indexes, one
with a subject-predicate-object-context (spoc) key pattern
and one with a predicate-object-subject-context (posc) key
pattern. However, it is possible to define more or other
indexes for the native repository, using the
Triple indexes parameter. This can be
used to optimize performance for query patterns that occur
frequently.
The subject, predicate, object and context fields are represented by the characters 's', 'p', 'o' and 'c' respectively. Indexes can be specified by creating 4-letter words from these four characters. Multiple indexes can be specified by separating these words with commas, spaces and/or tabs. For example, the string "spoc, posc" specifies two indexes; a subject-predicate-object-context index and a predicate-object-subject-context index.
Creating more indexes potentially speeds up querying (a lot), but also adds overhead for maintaining the indexes. Also, every added index takes up additional disk space.
The native store automatically creates/drops indexes upon (re)initialization, so the parameter can be adjusted and upon the first refresh of the configuration the native store will change its indexing strategy, without loss of data.
The Repository API is the central access point for Sesame repositories. Its purpose is to give a developer-friendly access point to RDF repositories, offering various methods for querying and updating the data, while hiding a lot of the nitty gritty details of the underlying machinery.
In this chapter, we will try to explain the basics of how to program
against the Repository API. The interfaces for the Repository API
can be found in package
org.openrdf.repository. Several
implementations for these interface exist in various sub-packages.
The Javadoc reference for the API is available online and
can also be found in the doc directory of the
download.
The first step in any action that involves Sesame repositories
is to create a Repository for it.
Repository objects operate on (stacks of) Sail object(s) for
storage and retrieval of RDF data. An important thing to
remember is that the behaviour of a repository is determined by
the Sail(s) that it operates on; for example, the repository
will only support RDF Schema or OWL semantics if the Sail stack
includes an inferencer for this.
The central interface of the repository API is the
Repository interface. There are several
implementations available of this interface:
org.openrdf.repository.sail.SailRepository
is Repository that operates directly on
top of a Sail. This is the class most
commonly used when accessing a local Sesame repository.
org.openrdf.repository.http.HTTPRepository
is, as the name implies, a Repository
implementation that acts as a proxy to a Sesame repository
available on a remote Sesame server, accessible through
HTTP.
In the following section, we will first take a look at the use of
the SailRepository class in order to
create and use a local Sesame repository.
One of the simplest configurations is a repository that just stores RDF data in main memory without applying any inferencing or whatsoever. This is also by far the fastest type of repository that can be used. The following code creates and initialize a non-inferencing main-memory repository:
import org.openrdf.repository.Repository; import org.openrdf.repository.sail.SailRepository; import org.openrdf.sail.memory.MemoryStore; ... Repository myRepository = new SailRepository(new MemoryStore()); myRepository.initialize();
The constructor of the SailRepository
class accepts any object of type Sail,
so we simply pass it a new main-memory store object (which is,
of course, a Sail implementation).
Following this, the repository needs to be initialized to
prepare the Sail(s) that it operates on, which includes
operations such as restoring previously stored data, setting
up connections to a relational database, etc.
The repository that is created by the above code is volatile: its contents are lost when the object is garbage collected or when the program is shut down. This is fine for cases where, for example, the repository is used as a means for manipulating an RDF model in memory.
Different types of Sail objects take parameters in their
constructor that change their behaviour. The
MemoryStore for example takes a data
directory parameter that specifies a data directory for
persisent storage. If specified, the MemoryStore will write its
contents to this directory so that it can restore it when it is
initialized in a future session:
File dataDir = new File("c:\\temp\\myRepository\\");
Repository myRepository = new SailRepository( new MemoryStore(dataDir) );
myRepository.initialize();
As you can see, we can fine-tune the configuration of our
repository by passing parameters to the constructor of the
Sail object. Some Sail types may offer additional
configuration methods, all of which need to be called before
the repository is initialized. The
MemoryStore currently has one such
method: setSyncDelay(long), which can
be used to control the strategy that is used for writing to
the data file, e.g.:
File dataDir = new File("c:\\temp\\myRepository\\");
MemoryStore memStore = new MemoryStore(dataDir);
memStore.setSyncDelay(1000L);
Repository myRepository = new SailRepository(memStore);
myRepository.initialize();
As we have seen, we can create Repository
objects for any kind of back-end store by passing them a
reference to the appropriate Sail object. We can pass any
stack of Sails this way, allowing all kinds of different
repository configurations to be created quite easily. For
example, to stack an RDF Schema inferencer on top of a
memory store, we simply create a repository like so:
import org.openrdf.repository.Repository;
import org.openrdf.repository.sail.SailRepository;
import org.openrdf.sail.memory.MemoryStore;
import org.openrdf.sail.memory.MemoryStoreRDFSInferencer;
...
Repository myRepository = new SailRepository(
new MemoryStoreRDFSInferencer(
new MemoryStore()));
myRepository.initialize();
Each layer in the Sail stack is created by a constructor
that takes the underlying Sail as a parameter. Finally, we
create the SailRepository object as a
functional wrapper around the Sail stack.
Working with remote repositories is just as easy as working
with local ones. We can simply use a different
Repository object, the
HTTPRepository, instead of the
SailRepository class.
A requirement is of course that there is a Sesame 2 server
running on some remote system, which is accessible over HTTP.
For example, suppose that at
http://example.org/sesame2/ a
Sesame server is running, which has a repository with the
identification 'example-db'. We can access this repository in
our code as follows:
import org.openrdf.repository.Repository; import org.openrdf.repository.http.HTTPRepository; ... String sesameServer = "http://example.org/sesame2"; String repositoryID = "example-db"; Repository myRepository = new HTTPRepository(sesameServer, repositoryID); myRepository.initialize();
Now that we have created a Repository, we
want to do something with it. In Sesame 2, this is achieved through
the use of RepositoryConnection objects, which can be
created by the Repository.
A RepositoryConnection represents - as the name
suggests - an open connection to the actual store. We can issue
operations over this connection, and close it when we are done to
make sure we are not keeping resources unnnecessarily occupied.
In the following sections, we will show some examples of basic operations.
The Repository API offers various methods for adding data to a repository. Data can be added by specifying the location of a file that contains RDF data, and statements can be added individually or in collections.
We perform operations on a repository by requesting a
RepositoryConnection from the repository. On this
RepositoryConnection object we can the various
operations, such as query evaluation, getting, adding, or
removing statements, etc.
The following example code adds two files, one local and one available through HTTP, to a repository:
import org.openrdf.OpenRDFException;
import org.openrdf.repository.Repository;
import org.openrdf.repository.RepositoryConnection;
import org.openrdf.rio.RDFFormat;
import java.io.File;
import java.net.URL;
...
File file = new File("/path/to/example.rdf");
String baseURI = "http://example.org/example/local";
try {
RepositoryConnection con = myRepository.getConnection();
con.add(file, baseURI, RDFFormat.RDFXML);
URL url = new URL("http://example.org/example/remote");
con.add(url, url.toString(), RDFFormat.RDFXML);
con.close();
}
catch (OpenRDFException e) {
// handle exception
}
catch (java.io.IOEXception e) {
// handle io exception
}
More information on other available methods can be found in the
javadoc reference of the RepositoryConnection interface.
The Repository API has a number of methods for creating and evaluating queries. Two types of queries are distinguished: tuple queries and graph queries. The query types differ in the type of results that they produce.
The result of a tuple query is a set of tuples (or variable bindings), where each tuple represents a solution of a query. This type of query is commonly used to get specific values (URIs, blank nodes, literals) from the stored RDF data.
The result of Graph queries is an RDF graph (or set of statements). This type of query is very useful for extracting sub-graphs from the stored RDF data, which can then be queried further, serialized to an RDF document, etc.
Note: Sesame 2 currently supports two query languages: SeRQL and SPARQL. The former is explained in Chapter 5, The SeRQL query language (revision 2.0), the specification for the latter is available online.
To evaluate a tuple query we simply do the following:
import java.util.List;
import org.openrdf.OpenRDFException;
import org.openrdf.repository.RepositoryConnection;
import org.openrdf.query.TupleQuery;
import org.openrdf.query.TupleQueryResult;
import org.openrdf.query.BindingSet;
...
try {
RepositoryConnection con = myRepository.getConnection();
try {
String queryString = "SELECT x, y FROM {x} p {y}";
TupleQuery tupleQuery = con.prepareTupleQuery(QueryLanguage.SERQL, queryString);
TupleQueryResult result = tupleQuery.evaluate();
try {
.... // do something with the result
}
finally {
result.close();
}
}
finally {
con.close();
}
}
catch (OpenRDFException e) {
// handle exception
}
This evaluates a SeRQL query and returns a
TupleQueryResult, which consists of a
sequence of BindingSet objects. Each
binding set is a set of Binding
objects. A binding is pair relating a name (as used in the
projection) with a value.
We can use the TupleQueryResult to iterate
over all results and get each individual result for
x and y:
while (result.hasNext()) {
BindingSet bindingSet = result.next();
Value valueOfX = bindingSet.getValue("x");
Value valueOfY = bindingSet.getValue("y");
// do something interesting with the values here...
}
As you can see, we retrieve values by name rather than by an
index. The names used should be the names as used in the
projection of your query. The
TupleQueryResult.getBindingNames() method
returns a list of binding names, in the order in which they were
specified in the query. To process the bindings in each binding
set in the order specified by the projection, you can do the
following:
List<String> bindingNames = result.getBindingNames();
while (result.hasNext()) {
BindingSet bindingSet = result.next();
Value firstValue = bindingSet.getValue(bindingNames.get(0));
Value secondValue = bindingSet.getValue(bindingNames.get(1));
// do something interesting with the values here...
}
It is important to invoke the close()
operation on the TupleQueryResult,
after we are done with it. A
TupleQueryResult evaluates lazily and
keeps resources (such as connections to the underlying
database) open. Closing the
TupleQueryResult frees up these
resources. Do not forget that iterating over a result may
cause exceptions! The best way to make sure no connections are
kept open unnecessarily is to invoke
close() in the
finally clause.
An alternative to producing a
TupleQueryResult is to supply an object
that implements the
TupleQueryResultHandler interface to
the query's evaluate() method. The main
difference is that when using a return object, the client has
control over when the next answer is retrieved, whereas with
the use of a handler, the server side simply pushes answers to
the handler object as soon as it has them available.
As an example we will use
SPARQLResultsXMLWriter, which is a
TupleQueryResultHandler implementation that writes
SPARQL Results XML documents to an outputstream or to a writer:
import org.openrdf.query.resultio.sparqlxml.SPARQLResultsXMLWriter;
...
FileOutputStream out = new FileOutputStream("/path/to/result.srx");
try {
SPARQLResultsXMLWriter sparqlWriter = new SPARQLResultsXMLWriter(out);
RepositoryConnection con = myRepository.getConnection();
try {
String queryString = "SELECT * FROM {x} p {y}";
TupleQuery tupleQuery = con.prepareTupleQuery(QueryLanguage.SERQL, queryString);
tupleQuery.evaluate(sparqlWriter);
}
finally {
con.close();
}
}
finally {
out.close();
}
You can just as easily supply your own application-specific
implementation of TupleQueryResultHandler though.
Lastly, an important warning: as soon as you are done with the
RepositoryConnection object, you should close it.
Notice that during processing of the
TupleQueryResult object (for example,
when iterating over its contents), the
RepositoryConnection should still be open. We can
invoke con.close() after we have finished
with the result.
The following code evaluates a graph query on a repository:
import org.openrdf.query.GraphQueryResult;
GraphQueryResult graphResult = con.prepareGraphQuery(
QueryLanguage.SERQL, "CONSTRUCT * FROM {x} p {y}").evaluate();
A GraphQueryResult is similar to
TupleQueryResult in that is an object
that iterates over the query results. However, for graph queries
the query results are RDF statements, so a
GraphQueryResult iterates over
statements:
while (graphResult.hasNext()) {
Statement st = graphResult.next();
// ... do something with the resulting statement here.
}
The TupleQueryResultHandler equivalent
for graph queries is
org.openrdf.rio.RDFHandler. Again, this
is a generic interface, each object implementing it can
process the reported RDF statements in any way it wants.
All writers from Rio (such as the
N3Writer,
RDFXMLWriter,
TurtleWriter, etc.) implement the
RDFHandler interface. This allows them to
be used in combination with querying quite easily. In the
following example, we use a TurtleWriter
to write the result of a SeRQL graph query to standard output
in Turtle format:
import org.openrdf.rio.turtle.TurtleWriter;
...
RepositoryConnection con = myRepository.getConnection();
try {
TurtleWriter turtleWriter = new TurtleWriter(System.out);
con.prepareGraphQuery(QueryLanguage.SERQL,
"CONSTRUCT * FROM {x} p {y}").evaluate(turtleWriter);
}
finally {
con.close();
}
Again, note that as soon as we are done with the result of the query
(either after iterating over the contents of the
GraphQueryResult or after invoking the
RDFHandler), we invoke
con.close() to close the connection and free
resources.
In the previous sections we have simply created a query from a
string and immediately evaluated it. However, the
prepareTupleQuery and
prepareGraphQuery methods return objects of
type Query, specifically
TupleQuery and
GraphQuery.
A Query object, once created, can be
(re)used. For example, we can evaluate a Query object , then add
some data to our repository, and evaluate the same query
again.
The Query object also has a
setBinding method, which can be used to
fill in certain prepared values in the query. As a simple
example, suppose we have a repository containing names and
e-mail addresses of people, and we want to do a query for each
person, retrieve his/her e-mail address, for example, but we
want to do a separate query for each person. This can be
achieved using the setBinding
functionality, as follows:
RepositoryConnection con = myRepository.getConnection();
// First, prepare a query that retrieves all names of persons
TupleQuery nameQuery = con.prepareTupleQuery(QueryLanguage.SERQL,
"SELECT name FROM {person} ex:name {name}");
// Then, prepare another query that retrieves all e-mail addresses of persons:
TupleQuery mailQuery = con.prepareTupleQuery(QueryLanguage.SERQL,
"SELECT mail FROM {person} ex:mail {mail}; ex:name {name}");
// Evaluate the first query to get all names
TupleQueryResult nameResult = nameQuery.evaluate();
try {
// Loop over all names, and retrieve the corresponding e-mail address.
while (nameResult.hasNext()) {
BindingSet bindingSet = nameResult.next();
Value name = bindingSet.get("name");
// Retrieve the matching mailbox, by setting the binding for
// the variable 'name' to the retrieved value:
mailQuery.setBinding("name", name);
TupleQueryResult mailResult = mailQuery.evaluate();
// mailResult now contains the e-mail addresses for one particular person
try {
....
}
finally {
// after we are done, close the result
mailResult.close();
}
}
}
finally {
nameResult.close();
}
con.close();
The values with which you perform the
setBinding operation of course do not
necessarily have to come from a previous query result (as they do
in the above example). Using a ValueFactory
we can create our own value objects from string values. Thus, we
can very easily use this functionality to for example query for a
particular keyword that is given by user input:
ValueFactory factory = myRepository.getValueFactory();
// In this example, we specify the keyword string. Of course, this
// could just as easily be obtained by user input, or by reading from
// a file, or...
String keyword = "foobar";
// We prepare a query that retrieves all documents for a keyword.
// Notice that in this query the 'keyword' variable is not bound to
// any specific value yet.
TupleQuery keywordQuery = con.prepareTupleQuery(QueryLanguage.SERQL,
"SELECT document FROM {document} ex:keyword {keyword}");
// Then we set the binding to a literal representation of our keyword.
// Evaluation of the query object will now effectively be the same as
// if we had specified the query as follows:
// SELECT document FROM {document} ex:keyword {"foobar"}
keywordQuery.setBinding("keyword", factory.createLiteral(keyword));
// we then evaluate the prepared query and can process the result.
TupleQueryResult keywordQueryResult = keywordQuery.evaluate();
The RepositoryConnection can also be used
for adding, retrieving, removing or otherwise manipulating
individual statements, or sets of statements.
To be able to add new statements, we can use a
ValueFactory to create the
Values out of which the statements consist.
For example, we want to add a few statements about two resources,
Alice and Bob:
import org.openrdf.model.vocabulary.RDF;
import org.openrdf.model.vocabulary.RDFS;
...
ValueFactory f = myRepository.getValueFactory();
// create some resources and literals to make statements out of
URI alice = f.createURI("http://example.org/people/alice");
URI bob = f.createURI("http://example.org/people/bob");
URI name = f.createURI("http://example.org/ontology/name");
URI person = f.createURI("http://example.org/ontology/Person");
Literal bobsName = f.createLiteral("Bob");
Literal alicesName = f.createLiteral("Alice");
try {
RepositoryConnection con = myRepository.getConnection();
try {
// alice is a person
con.add(alice, RDF.TYPE, person);
// alice's name is "Alice"
con.add(alice, name, alicesName);
// bob is a person
con.add(bob, RDF.TYPE, person);
// bob's name is "Bob"
con.add(bob, name, bobsName);
}
finally {
con.close();
}
}
catch (OpenRDFException e) {
// handle exception
}
Of course, it will not always be necessary to use a
ValueFactory to create URIs. In practice,
you will find that you quite often retrieve existing URIs from the
repository (for example, by evaluating a query) and then reusing
those values to add new statements.
As you can see in the above code, for the default RDF and RDF Schema
properties (such as 'rdf:type' and 'rdfs:subClassOf') it is not
necessary to create new URI objects. Instead,
you can import the vocabulary classes
org.openrdf.model.vocabulary.RDF and
RDFS which provide you static references to
the vocabulary primitives.
Retrieving statements works in a very similar way. One way of
retrieving statements we have already seen actually: we can get a
GraphQueryResult containing statements by
evaluating a graph query.
However, we can also use direct method calls to retrieve (sets of)
statements. For example, to retrieve all statements about
Alice, we could do:
RepositoryResult<Statement> statements = con.getStatements(alice, null, null, true);
The additional boolean parameter at the end (set to 'true' in this example) indicates wether inferred triples should be included in the result. Of course, this parameter only makes a difference if your repository uses an inferencer.
The RepositoryResult is an iterator-like
object that that lazily retrieves each matching statement from the
repository when its next() method is called.
Note that, like is the case with
QueryResult objects, iterating over a
RepositoryResult may result in exceptions
which you should catch to make sure that the
RepositoryResult is always properly closed
after use:
RepositoryResult<Statement> statements = con.getStatements(alice, null, null, true);
try {
while (statements.hasNext()) {
Statement st = statements.next();
... // do something with the statement
}
}
finally {
statements.close(); // make sure the result object is closed properly
}
In the above method invocation, we see four parameters being
passed. The first three represent the subject, predicate and object
of the RDF statements which should be retrieved. A
null value indicates a wildcard, so the
above method call retrieves all statements which have as their
subject Alice, and have any kind of predicate and object. The
fourth parameter indicates whether or not inferred statements
should be included or not.
Removing statements again works in a very similar fashion. Suppose we want to retract the statement that the name of Alice is "Alice"):
con.remove(alice, name, alicesName);
Or, if we want to erase all statements about Alice completely, we can do:
con.remove(alice, null, null);
Most of these examples have been on the level of individual
statements. However, the Repository API offers several methods
that work with Collections of statements,
allowing more batch-like update operations.
For example, in the following bit of code, we first retrieve all
statements about Alice, put them in a
Collection and then remove them:
import info.aduna.commons.collections.util.iterations.Iterations; // Retrieve all statements about Alice and put them in a list RepositoryResult<Statement> statements = con.getStatements(alice, null, null, true)); List<Statement> aboutAlice = Iterations.addAll(statements, new ArrayList<Statement>()); // Then, remove them from the repository con.remove(aboutAlice);
As you can see, the
info.aduna.iteration.Iterations
class provides a convenient method that takes an
Iteration (of which
RepositoryResult is a subclass) and a
Collection as input, and returns the Collection with the contents
of the iterator added to it. It also automatically closes the
Iteration for you.
In the above code, you first retrieve all statements, put them in a list, and then remove them. Although this works fine, it can be done in an easier fashion, by simply supplying the resulting object directly:
con.remove(con.getStatements(alice, null, null, true));
The RepositoryConnection interface has several variations of
add, retrieve and remove operations. See the Javadoc API
documentation for a full overview of the options.
Sesame 2 supports the notion of context, which you can think of as a way to group sets of statements together through a single group identifier (this identifier can be a blank node or a URI).
A very typical way to use context is tracking provenance of the statements in a repository, that is, which file these statements originate from. For example, consider an application where you add RDF data from different files to a repository, and then one of those files is updated. You would then like to replace the data from that single file in the repository, and to be able to do this you need a way to figure out which statements need to be removed. The context mechanism gives you a way to do that.
In the following example, we add an RDF document from the Web to our repository, in a context. In the example, we make the context identifier equal to the Web location of the file being uploaded.
String location = "http://example.org/example/example.rdf"; String baseURI = location; URL url = new URL(location); URI context = f.createURI(location); con.add(url, baseURI, RDFFormat.RDFXML, context);
We can now use the context mechanism to specifically address these statements in the repository for retrieve and remove operations:
// Get all statements in the context
RepositoryResult<Statement> result =
con.getStatements(null, null, null, true, context);
try {
while (result.hasNext()) {
Statement st = result.next();
... // do something interesting with the result
}
}
finally {
result.close();
}
// Export all statements in the context to System.out, in RDF/XML format
RDFHandler rdfxmlWriter = new RDFXMLWriter(System.out);
con.export(context, rdfxmlWriter);
// Remove all statements in the context from the repository
con.clear(context);In most methods in the Repository API, the context parameter is a vararg, meaning that you can specify an arbitrary number (zero, one, or more) of context identifiers. This way, you can very flexibly combine different contexts together. For example, we can very easily retrieve statements that appear in either 'context1' or 'context2'.
In the following example we add information about Bob and Alice again, but this time each has their own context. We also create a new property called 'creator' that has as its value the name of the person who is the creator a particular context. The knowledge about creators of contexts we do not add to any particular context, however:
URI context1 = f.createURI("http://example.org/context1");
URI context2 = f.createURI("http://example.org/context2");
URI creator = f.createURI("http://example.org/ontology/creator");
// Add stuff about Alice to context1
con.add(alice, RDF.TYPE, person, context1);
con.add(alice, name, alicesName, context1);
// Alice is the creator of context1
con.add(context1, creator, alicesName);
// Add stuff about Bob to context2
con.add(bob, RDF.TYPE, person, context2);
con.add(bob, name, bobsName, context2);
// Bob is the creator of context2
con.add(context2, creator, bobsName);Once we have this information in our repository, we can retrieve all statements about either Alice or Bob by using the context vararg:
// Get all statements in either context1 or context2
RepositoryResult<Statement> result =
con.getStatements(null, null, null, true, context1, context2);You should observe that the above RepositoryResult will not contain the information that context1 was created by Alice and context2 by Bob. This is because those statements were added without any context, thus they do not appear in context1 or context2, themselves.
To explicitly retrieve statements that do not have an associated context, we do the following:
// Get all statements that do not have an associated context
RepositoryResult<Statement> result =
con.getStatements(null, null, null, true, (Resource)null);
This will give us only the statements about the
creators of the contexts, because those are the only statements that do
not have an associated context. Note that we have to explicitly cast the
null argument to Resource, because otherwise it
is ambiguous whether we are specifying a single value or an entire array
that is null (a vararg is internally treated as an array). Simply
invoking getStatements(s, p, o, true, null) without
an explicit cast will result in an IllegalArgumentException.
We can also get everything that either has no context or is in context1:
// Get all statements that do not have an associated context, or that are in context1
RepositoryResult<Statement> result =
con.getStatements(null, null, null, true, (Resource)null, context1);So as you can see, you can freely combine contexts in this fashion.
Important:
getStatements(null, null, null, true);
is not the same as:
getStatements(null, null, null, true, (Resource)null);
The former (without any context id parameter) retrieves all statements in the repository, ignoring any context information. The latter, however, only retrieves statements that explicitly do not have any associated context.
So far, we have shown individual operations on repositories:
adding statements, removing them, etc. By default, a
RepositoryConnection
runs in autoCommit mode, meaning that each
operation on a RepositoryConnection is immediately sent
to the store and committed.
The RepositoryConnection interface supports a full
transactional mechanism that allows one to group modification
operations together and treat them as a single update: before the
transaction is committed, none of the operations in the transaction
has taken effect, and after, they all take effect. If something goes
wrong at any point during a transaction, it can be
rolled back so that the state of the repository
is the same as before the transaction started. Bundling update
operations in a single transaction often also improves update
performance compared to multiple smaller transactions.
We can achieve this behaviour by switching off the
RepositoryConnection's autoCommit mode. In
the following example, we use a non-autocommit connection to
bundle two file addition operations in a single transaction:
File inputFile1 = new File("/path/to/example1.rdf");
String baseURI1 = "http://example.org/example1/";
File inputFile2 = new File("/path/to/example2.rdf");
String baseURI2 = "http://example.org/example2/";
RepositoryConnection con = myRepository.getConnection();
try {
con.setAutoCommit(false);
// Add the first file
con.add(inputFile1, baseURI1, RDFFormat.RDFXML);
// Add the second file
con.add(inputFile2, baseURI2, RDFFormat.RDFXML);
// If everything went as planned, we can commit the result
con.commit();
}
catch (RepositoryException e) {
// Something went wrong during the transaction, so we roll it back
con.rollback();
}
finally {
// Whatever happens, we want to close the connection when we are done.
con.close();
}In the above example, we use a transaction to add two files to the repository. Only if both files can be successfully added will the repository change. If one of the files can not be added (for example because it can not be read), then the entire transaction is cancelled and none of the files is added to the repository.
Table of Contents
SeRQL revision 1.1 is a syntax revision (see issue tracker item SES-75). This document describes the revised syntax. From Sesame release 1.2-RC1 onwards, the old syntax is no longer supported.
SeRQL revision 1.2 covers a set of new functions and operators:
New operations have been marked with (R1.2) where appropriate in this document.
SeRQL revision 2.0 is an extension of SeRQL that offers functionality for querying contexts. See Section 5.16, “Querying context (R2.0)” for details.
SeRQL ("Sesame RDF Query Language", pronounced "circle") is a new RDF/RDFS query language that is currently being developed by Aduna as part of Sesame. It combines the best features of other (query) languages (RQL, RDQL, N-Triples, N3) and adds some of its own. This document briefly shows all of these features. After reading through this document one should be able to write SeRQL queries.
Some of SeRQL's most important features are:
URIs and literals are the basic building blocks of RDF. For a query language like SeRQL, variables are added to this list. The following sections will show how to write these down in SeRQL.
Variables are identified by names. These names must start with a letter or an underscore ('_') and can be followed by zero or more letters, numbers, underscores, dashes ('-') or dots ('.'). Examples variable names are:
SeRQL keywords are not allowed to be used as variable names. Currently, the following keywords are used or reserved for future use in SeRQL: select, construct, from, where, using, namespace, true, false, not, and, or, like, label, lang, datatype, null, isresource, isliteral, sort, in, union, intersect, minus, exists, forall, distinct, limit, offset.
Keywords in SeRQL are all case-insensitive, this in contrast to variable names; these are case-sensitive.
There are two ways to write down URIs in SeRQL: either as full URIs or as abbreviated URIs. Full URIs must be surrounded with "<" and ">". Examples of this are:
As URIs tend to be long strings with the first part being shared by several of them (i.e. the namespace), SeRQL allows one to use abbreviated URIs (or QNames) by defining (short) names for these namespaces which are called "prefixes". A QName always starts with one of the defined prefixes and a colon (":"). After this colon, the part of the URI that is not part of the namespace follows. The first part, consisting of the prefix and the colon, is replaced by the full namespace by the query engine. Some example QNames are:
RDF literals consist of three parts: a label, a language tag, and a datatype. The language tag and the datatype are optional and at most one of these two can accompany a label (a literal can not have both a language tag and a datatype). The notation of literals in SeRQL has been modelled after their notation in N-Triples; literals start with the label, which is surrounded by double quotes, optionally followed by a language tag with a "@" prefix or by a datatype URI with a "^^" prefix. Example literals are:
The SeRQL notation for abbreviated URIs can also be used. When the prefix rdf is mapped to the namespace http://www.w3.org/1999/02/22-rdf-syntax-ns#, the last example literal could also have been written down like:
SeRQL has also adopted the character escapes from N-Triples; special characters can be escaped by prefixing them with a backslash. One of the special characters is the double quote. Normally, a double quote would signal the end of a literal's label. If the double quote is part of the label, it needs to be escaped. For example, the sentence John said: "Hi!" can be encoded in a SeRQL literals as: "John said: \"Hi!\"".
As the backslash is a special character itself, it also needs to be escaped. To encode a single backslash in a literal's label, two backslashes need to be written in the label. For example, a Windows directory would be encoded as: "C:\\Program Files\\Apache Tomcat\\".
SeRQL has functions for extracting each of the three parts of a literal. These functions are label, lang, and datatype. label("foo"@en) extracts the label "foo", lang("foo"@en) extracts the language tag "en", and datatype("foo"^^rdf:XMLLiteral) extracts the datatype rdf:XMLLiteral. The use of these functions is explained later.
RDF has a notion of blank nodes. These are nodes in the RDF graph that are not labeled with a URI or a literal. The interpretation of such blank nodes is as a form of existential quantification: it allows one to assert that "there exists a node such that..." without specifying what that particular node is. Blank nodes do in fact often have identifiers, but these identifiers are assigned internally by whatever processor is processing the graph and they are only valid in the local context, not as global identifiers (unlike URIs).
Strictly speaking blank nodes are only addressable indirectly, by querying for one or more properties of the node. However, SeRQL, as a practical shortcut, allows blank node identifiers to be used in queries. The syntax for blank nodes is adopted from N-Triples, using a QName-like syntax with "_" as the namespace prefix, and the internal blank node identifier as the local name. For example:
This identifies the blank node with internal identifier "bnode1". These blank node identifiers can be used in the same way that normal URIs or QNames can be used.
Caution: It is important to realize that addressing blank nodes in this way makes SeRQL queries non-portable across repositories. There is no guarantee that in two repositories, even if they contain identical datasets, the blank node identifiers will be identical. It may well be that "bnode1" in repository A is a completely different blank node than "bnode1" in repository B. Even in the same repository, it is not guaranteed that blank node identifiers are stable over updates: if certain statements are added to or removed from a repository, it is not guaranteed "bnode1" still identifies the same blank node that it did before the update operation.
One of the most prominent parts of SeRQL are path expressions. Path expressions are expressions that match specific paths through an RDF graph. Most current RDF query languages allow you to define path expressions of length 1, which can be used to find (combinations of) triples in an RDF graph. SeRQL, like RQL, allows you to define path expressions of arbitrary length.
Imagine that we want to query an RDF graph for persons who work for companies that are IT companies. Querying for this information comes down to finding the following pattern in the RDF graph (gray nodes denote variables):
The SeRQL notation for path expressions resembles the picture above; it is written down as:
{Person} ex:worksFor {Company} rdf:type {ex:ITCompany}The parts surrounded by curly brackets represent the nodes in the RDF graph, the parts between these nodes represent the edges in the graph. The direction of the arcs (properties) in SeRQL path expressions is always from left to right.
In SeRQL queries, multiple path expressions can be specified by seperating them with commas. For example, the path expression show before can also be written down as two smaller path expressions:
{Person} ex:worksFor {Company},
{Company} rdf:type {ex:ITCompany}The nodes and edges in the path expressions can be variables, URIs and literals. Also, a node can be left empty in case one is not interested in the value of that node. Here are some more example path expressions to illustrate this:
Each and every path can be constructed using a set of basic path expressions. Sometimes, however, it is nicer to use one of the available short cuts. There are three types of short cuts, all of them are explained below.
In situations where one wants to query for two or more triples with identical subject and predicate, the subject and predicate do not have to be repeated over and over again. Instead, a multi-value node can be used:
{subj1} pred1 {obj1, obj2, obj3}A built-in constraint on this construction is that each value for the variables in the multi-value node is unique (i.e. they are pairwise disjoint). Therefore, this path expression is equivalent to the following combination of path expressions and boolean constraints:
FROM
{subj1} pred1 {obj1},
{subj1} pred1 {obj2},
{subj1} pred1 {obj3}
WHERE obj1 != obj2 AND obj1 != obj3 AND obj2 != obj3Or graphically:
Multi-value nodes can also be used when statements share the predicate and object, e.g.:
{subj1, subj2, subj3} pred1 {obj1}When used in a longer path expression, multi-value nodes apply to both the part left of the node and the part right of the node. The following path expression:
{first} pred1 {middle1, middle2} pred2 {last}matches the following graph:
When using variables in multi-value nodes, a constraint on its values is implicitly added: the variable's value is not allowed to be equal to any other value in the multi-value node. So, in the first example, the variables obj1, obj2 and obj3 will not match identical values at the same time. This prevents the path from matching a single triple three times.
One of the shorts cuts that is likely going to be used most, is the notation for branches in path expressions. There are lots of situations where one wants to query multiple properties of a single subject. Instead of repeating the subject over and over again, one can use a semi-colon to attach a predicate-object combination to the subject of the last part of a path expression, e.g.:
{subj1} pred1 {obj1};
pred2 {obj2}Which is equivalent to:
{subj1} pred1 {obj1},
{subj1} pred2 {obj2}Or graphically:
Or a slightly more complicated example:
{first} pred {} pred1 {obj1};
pred2 {obj2} pred3 {obj3}Which matches the following graph:
Note that an anonymous variable is used in the middle of the path expressions.
The last short cut is a short cut for reified statements. A path expression representing a single statement (i.e. {node} edge {node}) can be written between the curly brackets of a node, e.g.:
{ {reifSubj} reifPred {reifObj} } pred {obj}This would be equivalent to querying (using "rdf:" as a prefix for the RDF namespace, and "_Statement" as a variable for storing the statement's URI):
{_Statement} rdf:type {rdf:Statement},
{_Statement} rdf:subject {reifSubj},
{_Statement} rdf:predicate {reifPred},
{_Statement} rdf:object {reifObj},
{_Statement} pred {obj}Again, graphically:
Optional path expressions differ from 'normal' path expressions in that they do not have to be matched to find query results. The SeRQL query engine will try to find paths in the RDF graph matching the path expression, but when it cannot find any paths it will skip the expression and leave any variables in it uninstantiated (they will have the value null).
Consider an RDF graph that contains information about people that have names, ages, and optionally e-mail addresses. This is a situation that is likely to be very common in RDF data. A logical query on this data is a query that yields all names, ages and, when available, e-mail addresses of people, e.g.:
{Person} ex:name {Name};
ex:age {Age};
ex:email {EmailAddress}However, using normal path expressions like in the query above, people without e-mail address will not be returned by the SeRQL query engine. With optional path expressions, one can indicate that a specific (part of a) path expression is optional. This is done using square brackets, i.e.:
{Person} ex:name {Name};
ex:age {Age};
[ex:email {EmailAddress}]Or alternatively:
{Person} ex:name {Name};
ex:age {Age},
[{Person} ex:email {EmailAddress}]In contrast to the first path expressions, this expression will also match with people without an e-mail address. For these people, the variable EmailAddress will not be assigned a value.
Optional path expressions can also be nested. This is useful in situations where the existence of a specific path is dependent on the existence of another path. For example, the following path expression queries for the titles of all known documents and, if the author of the document is known, the name of the author (if it is known) and his e-mail address (if it is known):
{Document} ex:title {Title};
[ex:author {Author} [ex:name {Name}];
[ex:email {Email}]]With this path expression, the SeRQL query engine will not try to find the name and e-mail address of an author when it cannot even find the resource representing the author.
The SeRQL query language supports two querying concepts. The first one can be characterized as returning a table of values, or a set of variable-value bindings. The second one returns a true RDF graph, which can be a subgraph of the graph being queried, or a graph containing information that is derived from it. The first type of queries are called "select queries", the second type of queries are called "construct queries".
A SeRQL query is typically built up from one to seven clauses. For select queries these clauses are: SELECT, FROM, FROM CONTEXT, WHERE, LIMIT, OFFSET and USING NAMESPACE. One might recognize some of these clauses from SQL, but their usage is slightly different. For construct queries the clauses are the same with the exception of the first; construct queries start with a CONSTRUCT clause instead of a SELECT clause.
The first clause (i.e. SELECT or CONSTRUCT) determines what is done with the results that are found. In a SELECT clause, one can specify which variable values should be returned and in what order. In a CONSTRUCT clause, one can specify which triples should be returned.
The FROM clause is optional and always contains path expressions, which were explained in the previous section. It defines the paths in an RDF graph that are relevant to the query. Note that when the FROM clause is not specified, the query will simply return the constants specified in the SELECT or CONSTRUCT clause.
The FROM CONTEXT clause is new in SeRQL revision 2.0. It is a variant of the FROM clause that allows one to constrain the path expressions in the clause to a context. Using context in querying will be explained in more detail in Section 5.16, “Querying context (R2.0)”.
The WHERE clause is optional and can contain additional (Boolean) constraints on the values in the path expressions. These are constraints on the nodes and edges of the paths, which cannot be expressed in the path expressions themselves.
The LIMIT and OFFSET clauses are also optional. These clauses can be used separately or combined in order to get a subset of all query answers. Their usage is very similar to the LIMIT and OFFSET clauses in SQL queries. The LIMIT clause determines the (maximum) amount of query answers that will be returned. The OFFSET clause determines which query answer will be returned as the first result, skipping as many query results as specified in this clause.
Finally, the USING NAMESPACE clause is also optional and it can contain namespace declarations; these are the mappings from prefixes to namespaces that were referred to in one of previous sections about (abbreviated) URIs.
The WHERE, LIMIT, OFFSET and USING NAMESPACE clauses will be explained in one of the next sections. The following section will explain the SELECT and FROM clause.
As said before, select queries return tables of values, or sets of variable-value bindings. Which values are returned can be specified in the select clause. One can specify variables and/or values in the select clause, seperated by commas. The following example query returns all URIs of classes:
SELECT C
FROM {C} rdf:type {rdfs:Class}It is also possible to use a '*' in the SELECT clause. In that case, all variable values will be returned in the order in which they appear in the query, e.g.:
SELECT *
FROM {S} rdfs:label {O}This query will return the values of the variables S and O, in that order. If a different order is preferred, one needs to specify the variables in the select clause, e.g.:
SELECT O, S
FROM {S} rdfs:label {O}By default, the results of a select query are not filtered for duplicate rows. Because of the nature of the above queries, these queries will never return duplicates. However, more complex queries might result in duplicate result rows. These duplicates can be filtered out by the SeRQL query engine. To enable this functionality, one needs to specify the DISTINCT keyword after the select keyword. For example:
SELECT DISTINCT *
FROM {Country1} ex:borders {} ex:borders {Country2}
USING NAMESPACE
ex = <http://example.org/things#>Construct queries return RDF graphs as set of triples. The triples that a query should return can be specified in the construct clause using the previously explained path expressions. The following is an example construct query:
CONSTRUCT {Parent} ex:hasChild {Child}
FROM {Child} ex:hasParent {Parent}
USING NAMESPACE
ex = <http://example.org/things#>This query defines the inverse of the property foo:hasParent to be foo:hasChild. This is just one example of a query that produces information that is derived from the original information. Here is one more example:
CONSTRUCT
{Artist} rdf:type {ex:Painter};
ex:hasPainted {Painting}
FROM
{Artist} rdf:type {ex:Artist};
ex:hasCreated {Painting} rdf:type {ex:Painting}
USING NAMESPACE
ex = <http://example.org/things#>This query derives that an artist who has created a painting, is a painter. The relation between the painter and the painting is modelled to be art:hasPainted.
Instead of specifying a path expression in the CONSTRUCT clause, one can also use a '*'. In that case, the CONSTRUCT clause is identical to the FROM clause. This allows one to extract a subgraph from a larger graph, e.g.:
CONSTRUCT *
FROM {SUB} rdfs:subClassOf {SUPER}This query extracts all rdfs:subClassOf relations from an RDF graph.
Just like with select queries, the results of a construct query are not filtered for duplicate triples by default. Again, these duplicates are filtered out by the SeRQL query engine if the DISTINCT keyword is specified after the construct keyword, for example:
CONSTRUCT DISTINCT
{Artist} rdf:type {ex:Painter}
FROM
{Artist} rdf:type {ex:Artist};
ex:hasCreated {} rdf:type {ex:Painting}
USING NAMESPACE
ex = <http://example.org/things#>The third clause in a query is the WHERE clause. This is an optional clause in which one can specify Boolean constraints on variables.
The following sections will explain the available Boolean expressions for use in the WHERE clause. Section 5.8.8, “Nested WHERE clauses (R1.2)” will explain how WHERE clauses can be nested inside optional path expressions.
There are two Boolean constants, TRUE and FALSE. The first one is simply always true, the last one is always false. The following query will never produce any results because the constraint in the where clause will never evaluate to true:
SELECT *
FROM {X} Y {Z}
WHERE FALSEThe most common boolean constraint is equality or inequality of values. Values can be compared using the operators "=" (equality) and "!=" (inequality). The expression
Var = <foo:bar>
is true if the variable Var contains the URI <foo:bar>, and the expression
Var1 != Var2
checks whether two variables are not equal.
Numbers can be compared to each other using the operators "<" (lower than), "<=" (lower than or equal to), ">" (greater than) and ">=" (greater than or equal to). SeRQL uses a literal's datatype to determine whether its value is numerical. All XML Schema built-in numerical datatypes are supported, i.e.: xsd:float, xsd:double, xsd:decimal and all subtypes of xsd:decimal (xsd:long, xsd:nonPositiveInteger, xsd:byte, etc.), where the prefix xsd is used to reference the XML Schema namespace.
In the following query, a comparison between values of type xsd:positiveInteger is used to retrieve all countries that have a population of less than 1 million:
SELECT Country
FROM {Country} ex:population {Population}
WHERE Population < "1000000"^^xsd:positiveInteger
USING NAMESPACE
ex = <http://example.org/things#>SeRQL is currently restricted to numerical comparisons between values with identical datatypes. This means that e.g. xsd:int values cannot (yet) be compared to xsd:byte values.
If only one of the parameters of a comparison has a datatype, SeRQL will try to assign the other parameter the same datatype. This means that the above query can still be used when the population literals don't have any datatype. SeRQL will try to interpret the literal as a positive integer and compare it to the one million constant.
The LIKE operator can check whether a value matches a specified pattern of characters. '*' characters can be used as wildcards, matching with zero or more characters. The rest of the characters are compared lexically. The pattern is surrounded with double quotes, just like a literal's label.
SELECT Country
FROM {Country} ex:name {Name}
WHERE Name LIKE "Belgium"
USING NAMESPACE
ex = <http://example.org/things#>By default, the LIKE operator does a case-sensitive comparison: in the above query, the operator fails is the variable Name is bound to the value "belgium" instead of "Belgium". Optionally, one can specify that the operator should perform a case-insensitive comparison:
SELECT Country
FROM {Country} ex:name {Name}
WHERE Name LIKE "belgium" IGNORE CASE
USING NAMESPACE
ex = <http://example.org/things#>In this query, the operator will succeed for "Belgium", "belgium", "BELGIUM", etc.
The '*' character can be used as a wildcard to indicate substring matches, for example:
SELECT Country
FROM {Country} ex:name {Name}
WHERE Name LIKE "*Netherlands"
USING NAMESPACE
ex = <http://example.org/things#>This query will match any country names that end with the string "Netherlands", for example "The Netherlands".
The isResource() and isLiteral() boolean functions check whether a variable contains a resource or a literal, respectively. For example:
SELECT *
FROM {R} rdfs:label {L}
WHERE isLiteral(L)The isURI() and isBNode() boolean functions are more specific versions of isResource(). They check whether a variable is bound to a URI value or a BNode value, respectively. For example, the following query returns only URIs (and filters out all bNodes and literals):
SELECT V
FROM {R} prop {V}
WHERE isURI(V)Boolean constraints and functions can be combined using the AND and OR operators, and negated using the NOT operator. The NOT operator has the highest presedence, then the AND operator, and finally the OR operator. Parentheses can be used to override the default presedence of these operators. The following query is a (kind of artifical) example of this:
SELECT *
FROM {X} Prop {Y} rdfs:label {L}
WHERE NOT L LIKE "*FooBar*" AND
(Y = <foo:bar> OR Y = <bar:foo>) AND
isLiteral(L)In order to be able to express boolean constraints on variables in optional path expressions, it is possible to use a nested WHERE clause. The constraints in such a nested WHERE clause restrict the potential matches of the optional path expressions, without causing the entire query to fail if the boolean constraint fails.
To illustrate the difference between a nested WHERE clause and a 'normal' WHERE clause, consider the following two queries on the same data:
Data (using Turtle format):
@prefix foaf: <http://xmlns.com/foaf/0.1/> . @prefix ex: <http://example.org/> . _:a foaf:name "Michael" . _:b foaf:name "Rubens" . _:b ex:email "rubinho@example.work". _:b foaf:name "Giancarlo" . _:b ex:email "giancarlo@example.work".
Query 1 (normal WHERE-clause):
SELECT
Name, EmailAddress
FROM
{Person} foaf:name {Name};
[ex:email {EmailAddress}]
WHERE EmailAddress LIKE "g*"
Query 2 (nested WHERE-clause):
SELECT
Name, EmailAddress
FROM
{Person} foaf:name {Name};
[ex:email {EmailAddress} WHERE EmailAddress LIKE "g*"]
In query 1, a normal WHERE clause specifies that the EmailAddress found by the optional expression must begin with the letter "g". The result of this query will be:
| Name | EmailAddress |
|---|---|
| Giancarlo | "giancarlo@example.work" |
Despite the fact that the match on EmailAddress is defined as optional, the persons named "Michael" and "Rubens" are not returned. The reason is that the WHERE clause explicitly says that the value bound to the optional variable must start with the letter "g". For Michael, no value is found, hence the variable is equal to NULL, and the comparison operator fails on this. For Rubens, a value is found, but it does not start with the letter "g".
In query 2, however, a nested WHERE-clause is used. This specifies that any binding the optional expression does must begin with the letter "g", otherwise NULL is returned. The result of this query is:
| Name | EmailAddress |
|---|---|
| Michael | |
| Rubens | |
| Giancarlo | "giancarlo@example.work" |
The person "Michael" is returned without a result for his email address because there is no email address known for him at all. The person "Rubens" is returned without a result for his email address because, although he does have an email address, it does not start with the letter "g".
A query can contain at most one nested WHERE-clause per optional path expression, and at most one 'normal' WHERE-clause.
Apart from the boolean functions and operators introduced in the previous section, SeRQL supports several other functions that return RDF terms rather than non-boolean values. These functions can be used in both the SELECT and the WHERE clause.
The three functions label(), lang() and datatype() all operate on literals. The result of the label() function is the lexical form of the supplied literal. The lang() function returns the language attribute. Both functions return their result as an untyped literal, which can again be compared with other literals using (in)equality-, comparison-, and like operators. The result of the datatype() function is a URI, which can be compared to other URIs. These functions can also be used in SELECT clauses, but not in path expressions.
An example query:
SELECT label(L)
FROM {R} rdfs:label {L}
WHERE isLiteral(L) AND lang(L) LIKE "en*"The functions namespace() and localName() operate on URIs. The namespace() function returns the namespace of the supplied URI, as a URI object. The localName() function returns the local name part of the supplied URI, as a literal. These functions can also be used in SELECT clauses, but not in path expressions.
The following query retrieves all properties of foaf:Person instances that are in the FOAF namespace. Notice that as a shorthand for the full URI, we can use a namespace prefix (followed by a colon) as an argument.
Data:
@prefix foaf: <http://xmlns.com/foaf/0.1/> . @prefix ex: <http://example.org/> . _:a rdf:type foaf:Person . _:a my:nick "Schumi" . _:a foaf:firstName "Michael" . _:a foaf:knows _:b . _:b rdf:type foaf:Person . _:b foaf:firstName "Rubens" . _:b foaf:nick "Rubinho" .
Query:
SELECT foafProp, Value
FROM {} foafProp {Value}
WHERE namespace(foafProp) = foaf:
USING NAMESPACE
foaf = <http://xmlns.com/foaf/0.1/>Result:
| foafProp | Value |
|---|---|
| <http://xmlns.com/foaf/0.1/firstName | "Michael" |
| <http://xmlns.com/foaf/0.1/knows | _:b |
| <http://xmlns.com/foaf/0.1/firstName | "Rubens" |
| <http://xmlns.com/foaf/0.1/nick | "Rubinho" |
In the following example, the localName() function is used to match two equivalent properties from different namespaces (using the above data).
Query:
SELECT nick
FROM {} rdf:type {foaf:Person};
nickProp {nick}
WHERE localName(nickProp) LIKE "nick"
USING NAMESPACE
foaf = <http://xmlns.com/foaf/0.1/>Result:
| nick |
|---|
| "Schumi" |
| "Rubinho" |
LIMIT and OFFSET allow you to retrieve just a portion of the results that are generated by the query. If a limit count is given, no more than that many results will be returned (but possibly less, if the query itself yields less results).
OFFSET says to skip that many results before beginning to return results. OFFSET 0 is the same as omitting the OFFSET clause. If both OFFSET and LIMIT appear, then OFFSET rows are skipped before starting to count the LIMIT results that are returned.
The USING NAMESPACE clause can be used to define short prefixes for namespaces, which can then be used in abbreviated URIs. Multiple prefixes can be defined, but each declaration must have a unique prefix. The following query shows the use of namespace prefixes:
CONSTRUCT
{Artist} rdf:type {art:Painter};
art:hasPainted {Painting}
FROM
{Artist} rdf:type {art:Artist};
art:hasCreated {Painting} rdf:type {art:Painting}
USING NAMESPACE
rdf = <http://www.w3.org/1999/02/22-rdf-syntax-ns#>,
art = <http://example.org/arts/>The query engine will replace every occurence of rdf: in an abbreviated URI with http://www.w3.org/1999/02/22-rdf-syntax-ns#, and art: with http://example.org/arts/. So art:hasPainted will be resolved to the URI http://example.org/arts/hasPainted.
Four namespaces that are used very frequently have been assigned prefixes by default:
Table 5.1. Default namespaces
| Prefix | Namespace |
|---|---|
| rdf | http://www.w3.org/1999/02/22-rdf-syntax-ns# |
| rdfs | http://www.w3.org/2000/01/rdf-schema# |
| xsd | http://www.w3.org/2001/XMLSchema# |
| owl | http://www.w3.org/2002/07/owl# |
| serql | http://www.openrdf.org/schema/serql# |
These prefixes can be used without declaring them. If either of these prefixes is declared explicitly in a query, this declaration will override the default mapping.
SeRQL contains a number of built-in predicates. These built-ins can be used like any other predicate, as part of a path expression. The difference with normal predicates is that the built-ins act as operators on the underlying rdf graph: they can be used to query for relations between RDF resources that are not explicitly modeled, nor immediately apparant from the RDF Semantics, but which are nevertheless very useful.
Currently, the following built-in predicates are supported:
{X} sesame:directSubClassOf {Y}This relation holds for every X and Y where:
{X} sesame:directSubPropertyOf {Y}This relation holds for every X and Y where:
{X} sesame:directType {Y}This relation holds for every X and Y where:
Note: the above definition takes class/property equivalence through cyclic subClassOf/subPropertyOf relations into account. This means that if A rdfs:subClassOf B, and B rdfs:subClassOf A, it holds that A = B.
The namespace prefix 'sesame' is built-in and does not have to be defined in the query.
SeRQL offers three combinatory operations that can be used to combine sets of query results.
UNION is a combinatory operation the result of which is the set of query answers of both its operands. This allows one to specify alternatives in a query solution.
By default, UNION filters out duplicate answers from its operands. Specifying the ALL keyword ("UNION ALL") disables this filter.
The following example query retrieves the titles of books in the data, where the property used to describe the title can be either from the DC 1.0 or DC 1.1 specification.
Data:
@prefix dc10: <http://purl.org/dc/elements/1.0/> . @prefix dc11: <http://purl.org/dc/elements/1.1/> . _:a dc10:title "The SeRQL Query Language" . _:b dc11:title "The SeRQL Query Language (revision 1.2)" . _:c dc10:title "SeRQL" . _:c dc11:title "SeRQL (updated)" .
Query:
SELECT title
FROM {book} dc10:title {title}
UNION
SELECT title
FROM {book} dc11:title {title}
USING NAMESPACE
dc10 = <http://purl.org/dc/elements/1.0/>,
dc11 = <http://purl.org/dc/elements/1.1/>Result:
| title |
|---|
| "The SeRQL Query Language" |
| "The SeRQL Query Language (revision 1.2)" |
| "SeRQL" |
| "SeRQL (updated)" |
The union operator matches the projection items in order without taking the name of the projection item into account:
SELECT title, "1.0" AS "version"
FROM {book} dc10:title {title}
UNION
SELECT y, NULL
FROM {x} dc11:title {y}
USING NAMESPACE
dc10 = <http://purl.org/dc/elements/1.0/>,
dc11 = <http://purl.org/dc/elements/1.1/>Result:
| title | version |
|---|---|
| "The SeRQL Query Language" | "1.0" |
| "The SeRQL Query Language (revision 1.2)" | |
| "SeRQL" | "1.0" |
| "SeRQL (updated)" |
SeRQL will use the names of the variables in the first operand of the union in the query result.
The INTERSECT operation retrieves query results that occur in both its operands.
The following query only retrieves those album creators for which the name is specified identically in both DC 1.0 and DC 1.1.
Data:
@prefix dc10: <http://purl.org/dc/elements/1.0/> . @prefix dc11: <http://purl.org/dc/elements/1.1/> . _:a dc10:creator "George" . _:a dc10:creator "Ringo" . _:b dc11:creator "George" . _:b dc11:creator "Ringo" . _:c dc10:creator "Paul" . _:c dc11:creator "Paul C." .
Query:
SELECT creator
FROM {album} dc10:creator {creator}
INTERSECT
SELECT creator
FROM {album} dc11:creator {creator}
USING NAMESPACE
dc10 = <http://purl.org/dc/elements/1.0/>,
dc11 = <http://purl.org/dc/elements/1.1/>Result:
| creator |
|---|
| "George" |
| "Ringo" |
The Minus operation returns query results from its first operand which do not occur in the results from its second operand.
The following query returns the titles of all albums of which "Paul" is not a creator.
Data:
@prefix dc10: <http://purl.org/dc/elements/1.0/> . _:a dc10:creator "George" . _:a dc10:title "Sergeant Pepper" . _:b dc10:creator "Paul" . _:b dc10:title "Yellow Submarine" . _:c dc10:creator "Paul" . _:c dc10:creator "Ringo" . _:c dc10:title "Let it Be" .
Query:
SELECT title
FROM {album} dc10:title {title}
MINUS
SELECT title
FROM {album} dc10:title {title};
dc10:creator {creator}
WHERE creator like "Paul"
USING NAMESPACE
dc10 = <http://purl.org/dc/elements/1.0/>,
dc11 = <http://purl.org/dc/elements/1.1/>Result:
| title |
|---|
| "Sergeant Pepper" |
Just like SQL and most programming languages, SeRQL also has a NULL value. This value can be used just like any other value in SeRQL. For example, it can be used in the where clause to check that a literal doesn't have a datatype:
SELECT *
FROM {X} Y {Z}
WHERE isLiteral(Z) AND datatype(L) = NULLSeRQL has several constructs for nested queries. Nested queries can occur as operands for several boolean operators, which are explained in more detail in the following sections.
SeRQL applies variable scoping for nested queries. This means that when a variable is assigned in the outer query, its value will be carried over to the inner query when that variable is reused there.