Table of Contents
In this chapter, we explain how you can install a Sesame 2.0 Server and a Web Client on your machine.
The Sesame 2.0 HTTP Server requires the following software:
The following steps describe the easiest procedure to install the Sesame 2.0 server on Tomcat 5.5, using a default configuration of Sesame.
After you have followed these steps, you will have a Sesame 2.0 server running, using a default configuration. In Section 3.3, “Configuring the Data Directory” and further we will look at Sesame's configuration options in detail.
Installing the Web Client application follows exactly the same steps, except that in the final step you should choose the Sesame 2.0 Web Client WAR file (called openrdf-webclient.war) on your local harddisk (this should be in [SESAME_DIST]/war] as well). After deployment the Sesame 2.0 Web Client will be available at http://localhost:8080/openrdf-webclient/.
The data directory is the location where the Sesame server stores the repository configuration settings and also stores actual data stored in a repository (for example, for a native store).
By default, the location of the data directory is [HOMEDIR]\Application Data\Aduna\openrdf\server\ on MS Windows (for example: C:\Documents and Settings\username\Application Data\Aduna\openrdf\server\).
On Linux/UNIX, the default location is
[HOMEDIR]/.aduna/openrdf/server/ (for
example
/home/username/.aduna/openrdf/server/).
The location of this data directory can be reconfigured using the
Java system property
aduna.platform.applicationdata.dir. To set
this property, you will either need to set the
JAVA_OPTS parameter to include this, for
example:
set
JAVA_OPTS='-Daduna.platform.applicationdata.dir=/path/to/other/dir/'
(on MS Windows)export
JAVA_OPTS='-Daduna.platform.applicationdata.dir=/path/to/other/dir/'(on Linux/UNIX)
If you are using Apache Tomcat as a Windows Service you should use the Windows
Service Configure tool to set this property. Other users can
either edit the Tomcat startup script
(startup.bat or
startup.sh) or set the property some other
way (e.g. using a wrapper script).
In the rest of this manual, we will refer to Sesame's data
directory as [SESAME_DATA].
Repositories are configured in the file
[SESAME_DATA]/repositories.xml. This XML
file contains entries for repositories and their parameters.
By default, it only contains a single entry, for an in-memory repository with the ID 'default':
<repository id="default">
<title>Default Repository</title>
<sailstack>
<sail class="org.openrdf.sail.memory.MemoryStore" />
</sailstack>
<acl worldReadable="true" worldWriteable="true" />
</repository>
In Sesame 2, the nature of a repository (if it is in-memory, on disk, or uses an RDBMS as backend, if it does RDFS entailment or not) is determined by the configuration of the sail stack. There are several options for configuring Sesame repositories. In the following sections we will look at the options per type of repository.
There are two basic types of repositories: RDF repository and RDF Schema (RDFS) repositories. The RDF repository stores and returns the RDF triples you explicitly added, the RDF Schema repository does RDF Schema entailment (i.e. it computes RDF triples that logically follow from the explicitly added triples, see the W3C RDF Semantics specification for an explanation)
In Sesame 2, a RDFS repository is always configured by adding
a Stacked Sail on top of the
Base Sail. In the 'default' repository we
saw earlier, only a base sail
(org.openrdf.sail.MemoryStore) is
included, we will show some examples of other configurations
next.
The configuration of the 'default' repository is a
configuration of an in-memory repository. In the default setup
this configuration contains only a base sail:
org.openrdf.sail.MemoryStore. This
means that the 'default' repository is a simple in-memory
store, that does not do backup on disk and does not do RDFS
entailment.
To configure the repository to backup its contents on disk, we
can add a persist parameter with value
true:
<repository id="default"> <title>Default Repository</title> <sailstack> <sail class="org.openrdf.sail.memory.MemoryStore"> <param name="persist" value="true"/> </sail> </sailstack> <acl worldReadable="true" worldWriteable="true" /> </repository>
If this parameter is set to value true,
the repository will create an on-disk data dump (in the Sesame
datadir), which will be read back into the main repository
upon (re)initialization - for example, when the server is
restarted.
By default, the in-memory store persistence mechanism synchronizes the disk backup directly upon any change to the contents of the store. That means that directly after any change (upload, removal) completes, the disk backup is updated.
It is possible to configure a synchronization delay however. This can be useful if your application performs several transactions in sequence and you want to prevent disk synchronization in the middle of this sequence. The synchronization delay is specified as a parameter of the sail as follows:
<sailstack>
<sail class="org.openrdf.sail.memory.MemoryStore">
<param name="persist" value="true"/>
<param name="syncDelay" value="1000"/>
</sail>
</sailstack>
In the above example the synchronization is set to a delay of 1,000 milliseconds. This means that after every completed update the disk synchronization is delayed for 1,000ms. If in that time a new change operation (an upload or removal) is started, the synchronization is further delayed until after that operation completes (and so forth).
To configure an in-memory repository with RDF Schema
entailment, we need to add an additional Sail to the Sail
stack of the repository. The name of the Sail that performs
inferencing for in-memory repositories is
org.openrdf.sail.inferencer.MemoryStoreRDFSInferencer
and it is configured as follows:
<sailstack> <sail class="org.openrdf.sail.inferencer.MemoryStoreRDFSInferencer"/> <sail class="org.openrdf.sail.memory.MemoryStore"/> </sailstack>
Sesame 2.0 supports a Native Repository that stores and retrieves its data directly from disk. The advantage of this is that it consumes a lot less memory than the in-memory repository, and is therefore also a lot more scalable. Of course, since it has to access the disk, it is slightly slower than the in-memory store, but it is a good solution for larger data sets.
The Sail for the native repository is
org.openrdf.sail.nativerdf.NativeStore
and the configuration looks as follows:
<repository id="native"> <title>Native Repository</title> <sailstack> <sail class="org.openrdf.sail.nativerdf.NativeStore"/> </sailstack> <acl worldReadable="true" worldWriteable="true" /> </repository>
The Native Repository uses on-disk indexes to speed up querying. It uses B-Trees for indexing statements, where the index key consists of four fields: subject (s), predicate (p), object (o) and context (c). The order in which each of these fields is used in the key determines the usability of an index on a specify statement query pattern: searching statements with a specific subject in an index that has the subject as the first field is signifantly faster than searching these same statements in an index where the subject field is second or third. In the worst case, the 'wrong' statement pattern will result in a sequential scan over the entire set of statements.
By default, the native repository only uses one index, with a
subject-predicate-object-conctext (spoc) key pattern. However,
it is possible to define more indexes for the native repository,
using the triple-indexes parameter This
can be used to optimize performance for query patterns that occur
frequently:
<repository id="native"> <title>Native Repository</title> <sailstack> <sail class="org.openrdf.sail.nativerdf.NativeStore"> <param name="triple-indexes" value="spoc, posc"/> </sailstack> <acl worldReadable="true" worldWriteable="true" /> </repository>
The subject, predicate, object and context fields are represented by the characters 's', 'p', 'o' and 'c' respectively. Indexes can be specified by creating 4-letter words from these three characters. Multiple indexes can be specified by separating these words with commas, spaces and/or tabs. For example, the string "spoc, posc" specifies two indexes; a subject-predicate-object-context index and a predicate-object-subject-context index.
Of course, creating multiple indexes speeds up querying, but there is a cost factor to take into account as well: adding and removing data will become more expensive, because each index will have to be updated. Also, each index takes up additional disk space.
The native store automatically creates/drops indexes upon (re)initialization, so the parameter can be adjusted and upon the first refresh of the configuration the native store will change its indexing strategy, without loss of data.