The Repository API is the central access point for Sesame repositories. Its purpose is to give a developer-friendly access point to RDF repositories, offering various methods for querying and updating the data, while hiding a lot of the nitty gritty details of the underlying machinery.
In this chapter, we will try to explain the basics of how to program
against the Repository API. The interfaces for the Repository API
can be found in package
org.openrdf.repository. Several
implementations for these interface exist in various sub-packages.
The Javadoc reference for the API is available online and
can also be found in the doc directory of the
download.
The first step in any action that involves Sesame repositories
is to create a Repository for it.
Repository objects operate on (stacks of) Sail object(s) for
storage and retrieval of RDF data. An important thing to
remember is that the behaviour of a repository is determined by
the Sail(s) that it operates on; for example, the repository
will only support RDF Schema or OWL semantics if the Sail stack
includes an inferencer for this.
The central interface of the repository API is the
Repository interface. There are several
implementations available of this interface:
org.openrdf.repository.sail.SailRepository
is Repository that operates directly on
top of a Sail. This is the class most
commonly used when accessing a local Sesame repository.
org.openrdf.repository.http.HTTPRepository
is, as the name implies, a Repository
implementation that acts as a proxy to a Sesame repository
available on a remote Sesame server, accessible through
HTTP.
In the following section, we will first take a look at the use of
the SailRepository class in order to
create and use a local Sesame repository.
One of the simplest configurations is a repository that just stores RDF data in main memory without applying any inferencing or whatsoever. This is also by far the fastest type of repository that can be used. The following code creates and initialize a non-inferencing main-memory repository:
import org.openrdf.sail.memory.MemoryStore; import org.openrdf.repository.Repository; import org.openrdf.repository.sail.SailRepository; ... Repository myRepository = new SailRepository(new MemoryStore()); myRepository.initialize();
The constructor of the SailRepository
class accepts any object of type Sail,
so we simply pass it a new main-memory store object (which is,
of course, a Sail implementation).
Following this, the repository needs to be initialized to
prepare the Sail(s) that it operates on, which includes
operations such as restoring previously stored data, setting
up connections to a relational database, etc.
The repository that is created by the above code is volatile: its contents are lost when the object is garbage collected or when the program is shut down. This is fine for cases where, for example, the repository is used as a means for manipulating an RDF model in memory.
Different types of Sail objects take parameters in their
constructor that change their behaviour. The
MemoryStore for example takes a file
parameter that specifies a data file for persisent storage.
If specified, the MemoryStore will write its contents to
this file so that it can restore it when it is initialized
in a future session:
File dataFile = new File("c:\temp\myRepository.dat");
Repository myRepository = new SailRepository( new MemoryStore(dataFile) );
myRepository.initialize();
As you can see, we can fine-tune the configuration of our
repository by passing parameters to the constructor of the
Sail object. Some Sail types may offer additional
configuration methods, all of which need to be called before
the repository is initialized. The
MemoryStore currently has one such
method: setSyncDelay(long), which can
be used to control the strategy that is used for writing to
the data file, e.g.:
File dataFile = new File("c:\temp\myRepository.dat");
MemoryStore memStore = new MemoryStore(dataFile);
memStore.setSyncDelay(1000L);
Repository myRepository = new SailRepository(memStore);
myRepository.initialize();
As we have seen, we can create Repository
objects for any kind of back-end store by passing them a
reference to the appropriate Sail object. We can pass any
stack of Sails this way, allowing all kinds of different
repository configurations to be created quite easily. For
example, to stack an RDF Schema inferencer on top of a
memory store, we simply create a repository like so:
import org.openrdf.sail.memory.MemoryStore;
import org.openrdf.sail.inferencer.MemoryStoreRDFSInferencer;
import org.openrdf.repository.Repository;
...
Repository myRepository = new SailRepository(
new MemoryStoreRDFSInferencer(
new MemoryStore()));
myRepository.initialize();
Each layer in the Sail stack is created by a constructor
that takes the underlying Sail as a parameter. Finally, we
create the SailRepository object as a
functional wrapper around the Sail stack.
Working with remote repositories is just as easy as working
with local ones. We can simply use a different
Repository object, the
HTTPRepository, instead of the
SailRepository class.
A requirement is of course that there is a Sesame 2 server
running on some remote system, which is accessible over HTTP.
For example, suppose that at
http://example.org/sesame2/ a
Sesame server is running, which has a repository with the
identification 'example-db'. We can access this repository in
our code as follows:
import org.openrdf.repository.Repository; import org.openrdf.repository.http.HTTPRepository; ... String sesameServer = "http://example.org/sesame2"; String repositoryID = "example-db"; Repository myRepository = new HTTPRepository(sesameServer, repositoryID); myRepository.initialize();
Now that we have created a Repository, we
want to do something with it. In Sesame 2, this is achieved through
the use of RepositoryConnection objects, which can be
created by the Repository.
A RepositoryConnection represents - as the name
suggests - an open connection to the actual store. We can issue
operations over this connection, and close it when we are done to
make sure we are not keeping resources unnnecessarily occupied.
In the following sections, we will show some examples of basic operations.
The Repository API offers various methods for adding data to a repository. Data can be added by specifying the location of a file that contains RDF data, and statements can be added individually or in collections.
We perform operations on a repository by requesting a
RepositoryConnection from the repository. On this
RepositoryConnection object we can the various
operations, such as query evaluation, getting, adding, or
removing statements, etc.
The following example code adds two files, one local and one available through HTTP, to a repository:
import org.openrdf.OpenRDFException;
import org.openrdf.rio.RDFFormat;
import org.openrdf.repository.Repository;
import org.openrdf.repository.RepositoryConnection;
import java.io.File;
import java.net.URL;
...
File file = new File("/path/to/example.rdf");
String baseURI = "http://example.org/example/local";
try {
RepositoryConnection con = myRepository.getConnection();
con.add(file, baseURI, RDFFormat.RDFXML);
URL url = new URL("http://example.org/example/remote");
con.add(url, url.toString(), RDFFormat.RDFXML);
con.close();
}
catch (OpenRDFException e) { // handle exception }
catch (java.io.IOEXception e) { // handle io exception }
More information on other available methods can be found in the
javadoc reference of the RepositoryConnection interface.
The Repository API has a number of methods for creating and evaluating queries. Two types of queries are distinguished: tuple queries and graph queries. The query types differ in the type of results that they produce.
The result of a tuple query is a set of tuples (or variable bindings), where each tuple represents a solution of a query. This type of query is commonly used to get specific values (URIs, blank nodes, literals) from the stored RDF data.
The result of Graph queries is an RDF graph (or set of statements). This type of query is very useful for extracting sub-graphs from the stored RDF data, which can then be queried further, serialized to an RDF document, etc.
Note: Sesame 2 currently supports two query languages: SeRQL and SPARQL. The former is explained in Chapter 5, The SeRQL query language (revision 2.0), the specification for the latter is available online.
To evaluate a tuple query we simply do the following:
import java.util.List;
import org.openrdf.OpenRDFException;
import org.openrdf.repository.RepositoryConnection;
import org.openrdf.query.TupleQueryResult;
import org.openrdf.query.BindingSet;
...
try {
RepositoryConnection con = myRepository.getConnection();
String query = "SELECT x, y FROM {x} p {y}";
TupleQueryResult result = null;
try {
result = con.prepareTupleQuery(QueryLanguage.SERQL, query).evaluate();
.... // do something with the result
}
finally {
result.close();
con.close();
}
}
catch (OpenRDFException e) { // handle exception}
This evaluates a SeRQL query and returns a
TupleQueryResult, which consists of a
sequence of BindingSet objects. Each
binding set is a set of Binding
objects. A binding is pair relating a name (as used in the
projection) with a value.
We can use the TupleQueryResult to iterate
over all results and get each individual result for
x and y:
while (result.hasNext()) {
BindingSet bindingSet = result.next();
Value valueOfX = bindingSet.getValue("x");
Value valueOfY = bindingSet.getValue("y");
// do something interesting with the values here...
}
As you can see, we retrieve values by name rather than by an
index. The names used should be the names as used in the
projection of your query. The
TupleQueryResult.getBindingNames() method
returns a list of binding names, in the order in which they were
specified in the query. To process the bindings in each binding
set in the order specified by the projection, you can do the
following:
List<String> bindingNames = result.getBindingNames();
while (result.hasNext()) {
BindingSet bindingSet = result.next();
Value firstValue = bindingSet.getValue(bindingNames.get(0));
Value secondValue = bindingSet.getValue(bindingNames.get(1));
// do something interesting with the values here...
}
As a shortcut for this, the BindingSet
also provides a method for retrieving values based on an index.
The following code is functionally equivalent to the above code:
while (result.hasNext()) {
BindingSet bindingSet = result.next();
Value firstValue = bindingSet.getValue(0);
Value secondValue = bindingSet.getValue(1);
// do something interesting with the values here...
}
It is important to invoke the close()
operation on the TupleQueryResult,
after we are done with it. A
TupleQueryResult evaluates lazily and
keeps resources (such as connections to the underlying
database) open. Closing the
TupleQueryResult frees up these
resources. Do not forget that iterating over a result may
cause exceptions! The best way to make sure no connections are
kept open unnecessarily is to invoke
close() in the
finally clause.
An alternative to producing a
TupleQueryResult is to supply an object
that implements the
TupleQueryResultHandler interface to
the query's evaluate() method. The main
difference is that when using a return object, the client has
control over when the next answer is retrieved, whereas with
the use of a handler, the server side simply pushes answers to
the handler object as soon as it has them available.
As an example we will use
SPARQLResultsXMLWriter, which is a
TupleQueryResultHandler implementation that writes
SPARQL Results XML documents to an outputstream or to a writer:
import org.openrdf.query.resultio.sparqlxml.SPARQLResultsXMLWriter;
...
FileOutputStream out = new FileOutputStream("/path/to/result.srx");
try {
SPARQLResultsXMLWriter sparqlWriter = new SPARQLResultsXMLWriter();
sparqlWriter.setOutputStream(out);
RepositoryConnection con = myRepository.getConnection();
con.prepareTupleQuery(QueryLanguage.SERQL,
"SELECT * FROM {x} p {y}").evaluate(sparqlWriter);
con.close();
}
finally {
out.close();
}
You can just as easily supply your own application-specific
implementation of TupleQueryResultHandler though.
Lastly, an important warning: as soon as you are done with the
RepositoryConnection object, you should close it.
Notice that during processing of the
TupleQueryResult object (for example,
when iterating over its contents), the
RepositoryConnection should still be open. We can
invoke con.close() after we have finished
with the result.
The following code evaluates a graph query on a repository:
import org.openrdf.query.GraphQueryResult;
GraphQueryResult graphResult = con.prepareGraphQuery(
QueryLanguage.SERQL, "CONSTRUCT * FROM {x} p {y}").evaluate();
A GraphQueryResult is similar to
TupleQueryResult in that is an object
that iterates over the query results. However, for graph queries
the query results are RDF statements, so a
GraphQueryResult iterates over
statements:
while (graphResult.hasNext()) {
Statement st = graphResult.next();
// ... do something with the resulting statement here.
}
The TupleQueryResultHandler equivalent
for graph queries is
org.openrdf.rio.RDFHandler. Again, this
is a generic interface, each object implementing it can
process the reported RDF statements in any way it wants.
All writers from Rio (such as the
N3Writer,
RDFXMLWriter,
TurtleWriter, etc.) implement the
RDFHandler interface. This allows them to
be used in combination with querying quite easily. In the
following example, we use a TurtleWriter
to write the result of a SeRQL graph query to standard output
in Turtle format:
import org.openrdf.rio.turtle.TurtleWriter;
...
TurtleWriter turtleWriter = new TurtleWriter(System.out);
con.prepareGraphQuery(QueryLanguage.SERQL,
"CONSTRUCT * FROM {x} p {y}").evaluate(turtleWriter);
con.close();
Again, note that as soon as we are done with the result of the query
(either after iterating over the contents of the
GraphQueryResult or after invoking the
RDFHandler), we invoke
con.close() to close the connection and free
resources.
In the previous sections we have simply created a query from a
string and immediately evaluated it. However, the
prepareTupleQuery and
prepareGraphQuery methods return objects of
type Query, specifically
TupleQuery and
GraphQuery.
A Query object, once created, can be
(re)used. For example, we can evaluate a Query object , then add
some data to our repository, and evaluate the same query
again.
The Query object also has a
setBinding method, which can be used to
fill in certain prepared values in the query. As a simple
example, suppose we have a repository containing names and
e-mail addresses of people, and we want to do a query for each
person, retrieve his/her e-mail address, for example, but we
want to do a separate query for each person. This can be
achieved using the setBinding
functionality, as follows:
RepositoryConnection con = myRepository.getConnection();
// first, prepare a query that retrieves all names of persons
TupleQuery nameQuery = con.prepareTupleQuery(QueryLanguage.SERQL,
"SELECT name FROM {person} ex:name {name}");
// then, we prepare another query that retrieves all e-mails of
// persons:
TupleQuery mailQuery = con.prepareTupleQuery(QueryLanguage.SERQL,
"SELECT mail FROM {person} ex:mail {mail}; ex:name {name}");
// we evaluate the first query to get all names
TupleQueryResult nameResult = nameQuery.evaluate();
try {
// now we loop over all names in this result, and for each name we
// retrieve a corresponding mail address.
while (nameResult.hasNext()) {
BindingSet bindingSet = nameResult.next();
Value name = bindingSet.get("name");
// now for each name we retrieve the matching mailbox, by setting
// the binding for the variable 'name' to the retrieved value:
mailQuery.addBinding("name", name);
TupleQueryResult mailResult = mailQuery.evaluate();
// the mailResult now contains the mail for one particular person
try {
....
}
finally {
// after we are done, close the result
mailResult.close();
}
}
}
finally {
nameResult.close();
}
con.close();
The values with which you perform the
setBinding operation of course do not
necessarily have to come from a previous query result (as they do
in the above example). Using a ValueFactory
we can create our own value objects from string values. Thus, we
can very easily use this functionality to for example query for a
particular keyword that is given by user input:
ValueFactory factory = myRepository.getValueFactory();
// In this example, we specify the keyword string. Of course, this
// could just as easily be obtained by user input, or by reading from
// a file, or...
String keyword = "foobar";
// We prepare a query that retrieves all documents for a keyword.
// Notice that in this query the 'keyword' variable is not bound to
// any specific value yet.
TupleQuery keywordQuery = con.prepareTupleQuery(QueryLanguage.SERQL,
"SELECT document FROM {document} ex:keyword {keyword}");
// then we set the binding to a literal representation of our keyword.
// Evaluation of the query object will now effectively be the same as
// if we had specified the query as follows:
// SELECT document FROM {document} ex:keyword {"foobar"}
keywordQuery.addBinding("keyword", factory.createLiteral(keyword));
// we then evaluate the prepared query and can process the result.
TupleQueryResult keywordQueryResult = keywordQuery.evaluate();
The RepositoryConnection can also be used
for adding, retrieving, removing or otherwise manipulating
individual statements, or sets of statements.
To be able to add new statements, we can use a
ValueFactory to create the
Values out of which the statements consist.
For example, we want to add a few statements about two resources,
Alice and Bob:
import org.openrdf.model.vocabulary.RDF;
import org.openrdf.model.vocabulary.RDFS;
...
ValueFactory f = myRepository.getValueFactory();
// create some resources and literals to make statements out of
URI alice = f.createURI("http://example.org/people/alice");
URI bob = f.createURI("http://example.org/people/bob");
URI name = f.createURI("http://example.org/ontology/name");
URI person = f.createURI("http://example.org/ontology/Person");
Literal bobsName = f.createLiteral("Bob");
Literal alicesName = f.createLiteral("Alice");
try {
RepositoryConnection con = myRepository.getConnection();
// alice is a person
con.add(alice, RDF.TYPE, person);
// alice's name is "Alice"
con.add(alice, name, alicesName);
// bob is a person
con.add(bob, RDF.TYPE, person);
// bob's name is "Bob"
con.add(bob, name, bobsName);
}
catch (OpenRDFException e) { // handle exception}
Of course, it will not always be necessary to use a
ValueFactory to create URIs. In practice,
you will find that you quite often retrieve existing URIs from the
repository (for example, by evaluating a query) and then reusing
those values to add new statements.
As you can see in the above code, for the default RDF and RDF Schema
properties (such as 'rdf:type' and 'rdfs:subClassOf') it is not
necessary to create new URI objects. Instead,
you can import the vocabulary classes
org.openrdf.model.vocabulary.RDF and
RDFS which provide you static references to
the vocabulary primitives.
Retrieving statements works in a very similar way. One way of
retrieving statements we have already seen actually: we can get a
GraphQueryResult containing statements by
evaluating a graph query.
However, we can also use direct method calls to retrieve (sets of)
statements. For example, to retrieve all statements about
Alice, we could do:
RepositoryResult<Statement> statements =
con.getStatements(alice, null, null, true);The additional boolean parameter at the end (set to 'true' in this example) indicates wether inferred triples should be included in the result. Of course, this parameter only makes a difference if your repository uses an inferencer.
The RepositoryResult is an iterator-like
object that that lazily retrieves each matching statement from the
repository when its next() method is called.
Note that, like is the case with
QueryResult objects, iterating over a
RepositoryResult may result in exceptions
which you should catch to make sure that the
RepositoryResult is always properly closed
after use:
RepositoryResult<Statement> statements = con.getStatements(alice, null, null, true);
try {
while (statements.hasNext()) {
Statement st = statements.next();
... // do something with the statement
}
}
finally {
statements.close(); // make sure the result object is closed properly
}
In the above method invocation, we see four parameters being
passed. The first three represent the subject, predicate and object
of the RDF statements which should be retrieved. A
null value indicates a wildcard, so the
above method call retrieves all statements which have as their
subject Alice, and have any kind of predicate and object. The
fourth parameter indicates whether or not inferred statements
should be included or not.
Removing statements again works in a very similar fashion. Suppose we want to retract the statement that the name of Alice is "Alice"):
con.remove(alice, name, alicesName);
Or, if we want to erase all statements about Alice completely, we can do:
con.remove(alice, null, null);
Most of these examples have been on the level of individual
statements. However, the Repository API offers several methods
that work with Collections of statements,
allowing more batch-like update operations.
For example, in the following bit of code, we first retrieve all
statements about Alice, put them in a
Collection and then remove them:
import info.aduna.commons.collections.util.iterations.Iterations; // retrieve all statements about Alice and put them in a list. RepositoryResult<Statement> statements = con.getStatements(alice, null, null, true)); List<Statement> aboutAlice = Iterators.addAll(statements, new ArrayList<Statement>()); // then, remove them from the repository. con.remove(aboutAlice);
As you can see, the
info.aduna.commons.collections.iterators.Iterators
class provides a convenient method that takes an
Iteration (of which
RepositoryResult is a subclass) and a
Collection as input, and returns the Collection with the contents
of the iterator added to it. It also automatically closes the
Iteration for you.
In the above code, you first retrieve all statements, put them in a list, and then remove them. Although this works fine, it can be done in an easier fashion, by simply supplying the resulting object directly:
con.remove(con.getStatements(alice, null, null, true));
The RepositoryConnection interface has several variations of
add, retrieve and remove operations. See the Javadoc API
documentation for a full overview of the options.
Sesame 2 supports the notion of context, which you can think of as a way to group sets of statements together through a single group identifier (this identifier can be a blank node or a URI).
A very typical way to use context is tracking provenance of the statements in a repository, that is, which file these statements originate from. For example, consider an application where you add RDF data from different files to a repository, and then one of those files is updated. You would then like to replace the data from that single file in the repository, and to be able to do this you need a way to figure out which statements need to be removed. The context mechanism gives you a way to do that.
In the following example, we add an RDF document from the Web to our repository, in a context. In the example, we make the context identifier equal to the Web location of the file being uploaded.
String location = "http://example.org/example/example.rdf"; String baseURI = location; URL url = new URL(location); URI context = f.createURI(location); con.add(url, baseURI, RDFFormat.RDFXML, context);
We can now use the context mechanism to specifically address these statements in the repository for retrieve and remove operations:
// get all statements in the context
RepositoryResult<Statement> result =
con.getStatements(null, null, null, true, context);
while (result.hasNext()) {
Statement st = result.next();
... // do something interesting with the result
}
result.close();
// export all statements in the context to System.out, in RDF/XML format
RDFHandler rdfxmlWriter = new RDFXMLWriter(System.out);
con.export(context, rdfxmlWriter);
// remove all statements in the context from the repository
con.clear(context);In most methods in the Repository API, the context parameter is a vararg, meaning that you can specify an arbitrary number (zero, one, or more) of context identifiers. This way, you can very flexibly combine different contexts together. For example, we can very easily retrieve statements that appear in either 'context1' or 'context2'.
In the following example we add information about Bob and Alice again, but this time each has their own context. We also create a new property called 'creator' that has as its value the name of the person who is the creator a particular context. The knowledge about creators of contexts we do not add to any particular context, however:
URI context1 = f.createURI("http://example.org/context1");
URI context2 = f.createURI("http://example.org/context2");
URI creator = f.createURI("http://example.org/ontology/creator");
// add stuff about Alice to context1
con.add(alice, RDF.TYPE, person, context1);
con.add(alice, name, alicesName, context1);
// Alice is the creator of context1
con.add(context1, creator, alicesName);
// add stuff about Bob to context2
con.add(bob, RDF.TYPE, person, context2);
con.add(bob, name, bobsName, context2);
// Bob is the creator of context2
con.add(context2, creator, bobsName);Once we have this information in our repository, we can retrieve all statements about either Alice or Bob by using the context vararg:
// get all statements in either context1 or context2. RepositoryResult<Statement> result = con.getStatements(null, null, null, true, context1, context2);
You should observe that the above RepositoryResult will not contain the information that context1 was created by Alice and context2 by Bob. This is because those statements were added without any context, thus they do not appear in context1 or context2, themselves.
To explicitly retrieve statements that do not have an associated context, we do the following:
// get all statements that do not have an associated context RepositoryResult<Statement> result = con.getStatements(null, null, null, true, (Resource)null);
This will give us only the statements about the
creators of the contexts, because those are the only statements that do
not have an associated context. Note that we have to explicitly cast the
null argument to Resource, because otherwise it
is ambiguous whether we are specifying a single value or an entire array
that is null (a vararg is internally treated as an array). Simply
invoking getStatements(s, p, o, true, null) without
an explicit cast will result in an IllegalArgumentException.
We can also get everything that either has no context or is in context1:
// get all statements that do not have an associated context, or that // are in context1 RepositoryResult<Statement> result = con.getStatements(null, null, null, true, (Resource)null, context1);
So as you can see, you can freely combine contexts in this fashion.
Important:
getStatements(null, null, null, true);
is not the same as:
getStatements(null, null, null, true, (Resource)null);
The former (without any context id parameter) retrieves all statements in the repository, ignoring any context information. The latter, however, only retrieves statements that explicitly do not have any associated context.
So far, we have shown individual operations on repositories:
adding statements, removing them, etc. By default, a
RepositoryConnection
runs in autoCommit mode, meaning that each
operation on a RepositoryConnection is immediately sent
to the store and committed.
The RepositoryConnection interface supports a full
transactional mechanism that allows one to group modification
operations together and treat them as a single update: before the
transaction is committed, none of the operations in the transaction
has taken effect, and after, they all take effect. If something goes
wrong at any point during a transaction, it can be
rolled back so that the state of the repository
is the same as before the transaction started. Bundling update
operations in a single transaction often also improves update
performance compared to multiple smaller transactions.
We can achieve this behaviour by switching off the
RepositoryConnection's autoCommit mode. In
the following example, we use a non-autocommit connection to
bundle two file addition operations in a single transaction:
File inputFile1 = new File("/path/to/example1.rdf");
String baseURI1 = "http://example.org/example1/";
File inputFile2 = new File("/path/to/example2.rdf");
String baseURI2 = "http://example.org/example2/";
RepositoryConnection con = null;
try {
con = myRepository.getConnection();
con.setAutoCommit(false);
// add the first file
con.add(inputFile1, baseURI1, RDFFormat.RDFXML);
// add the second file
con.add(inputFile2, baseURI2, RDFFormat.RDFXML);
// if everything went as planned, we can commit the result
con.commit();
}
catch (RepositoryException e) {
// something went wrong during the transaction, so we roll it back
con.rollback();
}
finally {
// whatever happens, we want to close the connection when we are done.
con.close();
}In the above example, we use a transaction to add two files to the repository. Only if both files can be successfully added will the repository change. If one of the files can not be added (for example because it can not be read), then the entire transaction is cancelled and none of the files is added to the repository.