Chapter 3. Sesame 2.0 HTTP Server and Web Client Installation

Table of Contents

3.1. Required software
3.2. Installation under Tomcat 5.5
3.3. Configuring the Data Directory
3.4. Repository Configuration
3.4.1. RDF vs. RDFS Repositories
3.4.2. In-memory Repositories
3.4.3. Native Repositories

In this chapter, we explain how you can install a Sesame 2.0 Server and a Web Client on your machine.

3.1. Required software

The Sesame 2.0 HTTP Server requires the following software:

  • Java 5 (we recommend Sun J2SDK 1.5.0 or better)
  • A Java Servlet Container with the following minimal specifications:
    • Support for Java Servlet API 2.4
    • Support for Java Server Pages (JSP) 2.0
    We recommend using the latest stable version of Apache Tomcat (version 5.5).

3.2. Installation under Tomcat 5.5

The following steps describe the easiest procedure to install the Sesame 2.0 server on Tomcat 5.5, using a default configuration of Sesame.

  1. Download Sesame 2.0 and unpack the downloaded archive in a location of your choice on your local harddisk. We will refer to this location as [SESAME_DIST].
  2. Donwload Tomcat 5.5 from http://tomcat.apache.org/.
  3. Install and configure Tomcat. See the Tomcat 5.5 Setup Documentation for details on how to do this, for a variety of platforms. We will refer to the directory in which you have installed tomcat as [CATALINA_HOME] in the rest of this document.
  4. Configure the Tomcat Manager Application. See the Tomcat Manager Documentation for details.
  5. (Re)start your Tomcat server and use a browser to access the Tomcat Manager Application (typically at http://localhost:8080/manager/html/). You will be asked for a Tomcat user name and password, fill in the usernamme and password you configured in the previous step to gain access.
  6. In the Manager Application, go the section marked "Deploy". Under 'Select WAR file to deploy' fill in the location of the Sesame 2.0 WAR file (called openrdf.war) on your local harddisk (this should be in [SESAME_DIST]/war/) or use the 'Browse' button to navigate to the correct file. After clicking the 'Deploy' button the Sesame server will be automatically installed and started.

After you have followed these steps, you will have a Sesame 2.0 server running, using a default configuration. In Section 3.3, “Configuring the Data Directory” and further we will look at Sesame's configuration options in detail.

Installing the Web Client application follows exactly the same steps, except that in the final step you should choose the Sesame 2.0 Web Client WAR file (called openrdf-webclient.war) on your local harddisk (this should be in [SESAME_DIST]/war] as well). After deployment the Sesame 2.0 Web Client will be available at http://localhost:8080/openrdf-webclient/.

3.3. Configuring the Data Directory

The data directory is the location where the Sesame server stores the repository configuration settings and also stores actual data stored in a repository (for example, for a native store).

By default, the location of the data directory is [HOMEDIR]\Application Data\Aduna\openrdf\server\ on MS Windows (for example: C:\Documents and Settings\username\Application Data\Aduna\openrdf\server\).

On Linux/UNIX, the default location is [HOMEDIR]/.aduna/openrdf/server/ (for example /home/username/.aduna/openrdf/server/).

The location of this data directory can be reconfigured using the Java system property aduna.platform.applicationdata.dir. To set this property, you will either need to set the JAVA_OPTS parameter to include this, for example:

  • set JAVA_OPTS='-Daduna.platform.applicationdata.dir=/path/to/other/dir/' (on MS Windows)
  • export JAVA_OPTS='-Daduna.platform.applicationdata.dir=/path/to/other/dir/'(on Linux/UNIX)

If you are using Apache Tomcat as a Windows Service you should use the Windows Service Configure tool to set this property. Other users can either edit the Tomcat startup script (startup.bat or startup.sh) or set the property some other way (e.g. using a wrapper script).

In the rest of this manual, we will refer to Sesame's data directory as [SESAME_DATA].

3.4. Repository Configuration

Repositories are configured in the file [SESAME_DATA]/repositories.xml. This XML file contains entries for repositories and their parameters.

By default, it only contains a single entry, for an in-memory repository with the ID 'default':

<repository id="default">
  <title>Default Repository</title>
  <sailstack>
    <sail class="org.openrdf.sail.memory.MemoryStore" />
  </sailstack>
  <acl worldReadable="true" worldWriteable="true" />
</repository>

In Sesame 2, the nature of a repository (if it is in-memory, on disk, or uses an RDBMS as backend, if it does RDFS entailment or not) is determined by the configuration of the sail stack. There are several options for configuring Sesame repositories. In the following sections we will look at the options per type of repository.

3.4.1. RDF vs. RDFS Repositories

There are two basic types of repositories: RDF repository and RDF Schema (RDFS) repositories. The RDF repository stores and returns the RDF triples you explicitly added, the RDF Schema repository does RDF Schema entailment (i.e. it computes RDF triples that logically follow from the explicitly added triples, see the W3C RDF Semantics specification for an explanation)

In Sesame 2, a RDFS repository is always configured by adding a Stacked Sail on top of the Base Sail. In the 'default' repository we saw earlier, only a base sail (org.openrdf.sail.MemoryStore) is included, we will show some examples of other configurations next.

3.4.2. In-memory Repositories

The configuration of the 'default' repository is a configuration of an in-memory repository. In the default setup this configuration contains only a base sail: org.openrdf.sail.MemoryStore. This means that the 'default' repository is a simple in-memory store, that does not do backup on disk and does not do RDFS entailment.

3.4.2.1. Memory Store persistence

To configure the repository to backup its contents on disk, we can add a persist parameter with value true:

<repository id="default">
  <title>Default Repository</title>
  <sailstack>
	 <sail class="org.openrdf.sail.memory.MemoryStore">
		<param name="persist" value="true"/>
	 </sail>
  </sailstack>
  <acl worldReadable="true" worldWriteable="true" />
</repository>

If this parameter is set to value true, the repository will create an on-disk data dump (in the Sesame datadir), which will be read back into the main repository upon (re)initialization - for example, when the server is restarted.

3.4.2.2. Synchronization delay

By default, the in-memory store persistence mechanism synchronizes the disk backup directly upon any change to the contents of the store. That means that directly after any change (upload, removal) completes, the disk backup is updated.

It is possible to configure a synchronization delay however. This can be useful if your application performs several transactions in sequence and you want to prevent disk synchronization in the middle of this sequence. The synchronization delay is specified as a parameter of the sail as follows:

<sailstack>
  <sail class="org.openrdf.sail.memory.MemoryStore">
    <param name="persist" value="true"/>
	 <param name="syncDelay" value="1000"/>
  </sail>
</sailstack>

In the above example the synchronization is set to a delay of 1,000 milliseconds. This means that after every completed update the disk synchronization is delayed for 1,000ms. If in that time a new change operation (an upload or removal) is started, the synchronization is further delayed until after that operation completes (and so forth).

3.4.2.3. In-Memory Repository with RDF Schema Entailment

To configure an in-memory repository with RDF Schema entailment, we need to add an additional Sail to the Sail stack of the repository. The name of the Sail that performs inferencing for in-memory repositories is org.openrdf.sail.inferencer.MemoryStoreRDFSInferencer and it is configured as follows:

<sailstack>
  <sail class="org.openrdf.sail.inferencer.MemoryStoreRDFSInferencer"/>
  <sail class="org.openrdf.sail.memory.MemoryStore"/>
</sailstack>

3.4.3. Native Repositories

Sesame 2.0 supports a Native Repository that stores and retrieves its data directly from disk. The advantage of this is that it consumes a lot less memory than the in-memory repository, and is therefore also a lot more scalable. Of course, since it has to access the disk, it is slightly slower than the in-memory store, but it is a good solution for larger data sets.

The Sail for the native repository is org.openrdf.sail.nativerdf.NativeStore and the configuration looks as follows:

<repository id="native">
  <title>Native Repository</title>
  <sailstack>
	 <sail class="org.openrdf.sail.nativerdf.NativeStore"/>
  </sailstack>
  <acl worldReadable="true" worldWriteable="true" />
</repository>

3.4.3.1. Configuration of Native Repository Indexes

The Native Repository uses on-disk indexes to speed up querying. It uses B-Trees for indexing statements, where the index key consists of four fields: subject (s), predicate (p), object (o) and context (c). The order in which each of these fields is used in the key determines the usability of an index on a specify statement query pattern: searching statements with a specific subject in an index that has the subject as the first field is signifantly faster than searching these same statements in an index where the subject field is second or third. In the worst case, the 'wrong' statement pattern will result in a sequential scan over the entire set of statements.

By default, the native repository only uses one index, with a subject-predicate-object-conctext (spoc) key pattern. However, it is possible to define more indexes for the native repository, using the triple-indexes parameter This can be used to optimize performance for query patterns that occur frequently:

<repository id="native">
  <title>Native Repository</title>
  <sailstack>
	 <sail class="org.openrdf.sail.nativerdf.NativeStore">
		<param name="triple-indexes" value="spoc, posc"/>
  </sailstack>
  <acl worldReadable="true" worldWriteable="true" />
</repository>

The subject, predicate, object and context fields are represented by the characters 's', 'p', 'o' and 'c' respectively. Indexes can be specified by creating 4-letter words from these three characters. Multiple indexes can be specified by separating these words with commas, spaces and/or tabs. For example, the string "spoc, posc" specifies two indexes; a subject-predicate-object-context index and a predicate-object-subject-context index.

Of course, creating multiple indexes speeds up querying, but there is a cost factor to take into account as well: adding and removing data will become more expensive, because each index will have to be updated. Also, each index takes up additional disk space.

The native store automatically creates/drops indexes upon (re)initialization, so the parameter can be adjusted and upon the first refresh of the configuration the native store will change its indexing strategy, without loss of data.