The included elmo-codegen.jar can be used from the command line to create an RDF ontology file from existing JavaBeans or generate Elmo roles from an RDF ontology file. The command below will search the given jar (example-entities.jar) for classes in the package com.example.entities and output an OWL DL ontology in example-ontology.owl using the given namespace.
java -jar elmo-codegen.jar \
-b "com.example.entities=http://www.example.com/rdf/2008/model#" \
-r example-ontology.owl \
example-entities.jar |
In the example below, the ontology example-ontology.owl will be imported, and Elmo concepts that are defined with the given namespace will be created and compiled in example-concepts.jar. This jar will then be ready to be used for development and deployment of an Elmo application.
java -jar elmo-codegen.jar \
-b "com.example.concepts=http://www.example.com/rdf/2008/model#" \
-j example-concepts.jar \
example-ontology.owl |
The elmo-codegen.jar will import the ontologies and concepts from elmo-rdfs.jar and elmo-owl.jar, so they don't need to be specified on the command line. However, other dependent concepts jar files must be included at the end of the command.
The elmo-codegen.jar will create concept interfaces from OWL classes and included getters and setters for each of the declared properties within the concept interface. Method declarations will be included in the interface if a special class hierarchy is be used. Any class that extends elmo:Message, where elmo is mapped to the namespace "http://www.openrdf.org/rdf/2008/08/elmo#", will be used to create a method declaration in a corresponding interface. There are three property restrictions that are used to determine the declaration. All values from restrictions on elmo:target determines the concept that should contain this method. Restrictions on elmo:objectResponse determine the return type of the method, use owl:Nothing for void return types. Restrictions on elmo:literalResponse determine the return type of methods with a literal response. Methods that return a primitive type should include a owl:cardinality restriction of 1 on elmo:literalResponse. Both response properties can be restricted with owl:maxCardinality of 1 to indicate they return a single result and not a Set of results. Any other properties declared for classes that extend elmo:Message will be included as parameters in the method declaration ordered by their URI.
The following RDF/XML fragment will produce an interface called "HelloWorld" with a method "hello" that returns a single String.
<owl:Class rdf:ID="HelloWorld"/>
<owl:Class rdf:ID="hello">
<rdfs:subClassOf="&elmo;Message"/>
<rdfs:subClassOf>
<owl:Restriction>
<owl:onProperty rdf:resource="&elmo;target"/>
<owl:allValuesFrom rdf:resource="#HelloWorld"/>
</owl:Restriction>
</rdfs:subClassOf>
<rdfs:subClassOf>
<owl:Restriction>
<owl:onProperty rdf:resource="&elmo;literalResponse"/>
<owl:allValuesFrom rdf:resource="&xsd;string"/>
</owl:Restriction>
</rdfs:subClassOf>
<rdfs:subClassOf>
<owl:Restriction>
<owl:onProperty rdf:resource="&elmo;literalResponse"/>
<owl:maxCardinality>1</owl:maxCardinality>
</owl:Restriction>
</rdfs:subClassOf>
</owl:Class> |
Classes that extends elmo:Message will also generate their own concept interfaces. These concepts will have properties that correspond to the parameters of the method. They will also extend the interface org.openrdf.elmo.Message and have behaviour that implements elmoInvoke() that correctly maps the concept properties to the method parameters invoking the method on the elmoTarget property value.
The elmo-codegen.jar tool can compile valid Java or Groovy behaviour methods while generating concepts and package them both into the generated jar. Any owl:ObjectProperties that extend elmo:method with the annotation elmo:java or elmo:groovy will be used to create a behaviour method for a corresponding concept method. The rdfs:domain indicates the concept that should implement this method and rdfs:range indicates the method declaration that is implemented by the elmo:java or elmo:groovy annotation.
If the method includes parameters they will be available as read only variables in the method body. The name will be a combination of the compiled namespace prefix of the property joined with the local name of the property in the same way they are joined to create Java properties on concepts.
The following RDF/XML fragment will create a behaviour method that implements the hello method, returning the string "World" for resources that implement the HelloWorld concept.
<owl:ObjectProperty rdf:ID="world"> <rdfs:subPropertyOf rdf:resource="&elmo;method"/> <rdfs:domain rdf:resource="#HelloWorld"/> <rdfs:range rdf:resource="#hello"/> <elmo:java>return "World";</elmo:java> </owl:ObjectProperty> |
Methods with multiple parameters will also include an bridge method that accepts a java.util.Map argument. Although this bridge method has no interface it is available through reflection in the proxy entity and therefore available to interpreted languages, for example, in Groovy's named parameter syntax.
The Elmo scutter is a generic RDF crawler that follows rdfs:seeAlso links in RDF documents, which typically point to other relevant RDF sources on the web. The Elmo scutter is based on original code by Matt Biddulph for Jena.
RDF(S) seeAlso is also the mechanism used to connect FOAF profiles and thus (given a starting location) the scutter allows to collect FOAF profiles from the Web. Several advanced features are provided to support this scenario:
Blacklisting: sites that produce FOAF profiles in large quantities are automatically placed on a blacklist. This is to avoid collecting large amounts of uninteresting FOAF data produced by social networking and blogging services or other dynamic sources.
White listing: the crawler can be limited to a domain (defined by a URL pattern).
Metadata: the crawler can optionally store metadata about the collected statements.
Filtering: incoming statements can be filtered individually. This is useful to remove unnecessary information, such as statements from unknown name-spaces.
Persistence: when the scutter is stopped, it saves its state to the disk. This allows to continue scuttering from the point where it left off. Also, when starting the scutter it tries to load back the list of visited URLs from the repository (this requires the saving of metadata to be turned on).
Logging: The Scutter uses slf4j to provide a detailed logging of the crawler.
The data collected by the scutter is stored in a Sesame repository. We recommend using a Native RDF repository for scuttering, because it provides the best performance for uploads.
The Scutter is available as a Java class as well as a Java servlet. The servlet provides access to all of the above features, except for filtering (which requires programming) and it can be deployed by placing the Elmo.war file in the web application directory of a Servlet/JSP container.
The servlet initialization parameters to be specified in the web.xml descriptor file are listed below. An example web.xml file is provided in the war file.
Parameter | Description | Optional/Default |
server | URL of the Sesame server to store the collected data | Required |
repository | Name of the repository on the server | Required |
username | Username for access to the Sesame repository | Optional |
password | Password for access to the Sesame repository | Optional |
queue | Location of the file used to save the queue when the scutter is stopped | Required |
start | URL(s) used to start scuttering. URLs should be separated by white space. | Optional |
domain | Limits crawling to URLs that match the provided regular expression. | Optional |
metadata | Produce reified statements containing information about the provenance of the statements and the time they were collected. Possible values: true/false | Optional, defaults to false. |
autoblacklist | Enable/disable automatic blacklisting. Possible values: true/false | Optional, defaults to true (enabled). |
vocab | Restrict crawling to FOAF specific vocabularies (statements with predicates from the RDF, RDFS, FOAF or WGS_84 namespaces) | Optional, only possible value is 'foaf' |
focused | Collect data about a specific set of target persons. The target persons are given as foaf:Person instances in the repository. | Optional, actual value is ignored |
maxThreads | Maximum number of threads allowed to be running. Must be a positive integer. | Optional, defaults to 20. |
The request parameters to the server are listed in the table below. For convenience, there is an html file provided in the distribution for calling various operations on the servlet.
Parameter | Description | Optional/Default |
start | Try to load the set of visited URLs and start the scutter | Parameter value ignored. |
stop | Stop the scutter, save the queue to disk | Parameter value ignored. |
preloadQueue | Preload the queue from the saved file | Parameter value ignored. |
clear | Clear the queue and the set of visited URLs | Parameter value ignored. |
A custom filtering of statements can be implemented by setting an instance of the StatementFilter interface using the setStatementFilter method of the Scutter class. See the JavaDoc for more details.
The task of the Elmo smusher is to find equivalent instances in large sets of data. This is a very common problem when processing collections of FOAF profiles as several sources on the Web may describe a the same individual using different identifiers or blank nodes (which are always assumed to be different). While the servlet provided is specific to smushing foaf:Person instances, the underlying mechanism is generic
The smusher uses instances of ResourceComparator for comparing instances. Implementations of ResourceComparator are given for foaf:Person and swrc:Publication.
The smusher reports the results (matching instances) by calling methods on registered listeners. Listeners implement the SmusherListener interface. Two implementations of SmusherListener are provided: one writes out results in text, while the other represents matches using the owl:sameAs relationship and uploads such statements to a Sesame repository. While Sesame does not directly support OWL semantics, the semantics of this relationship (the equivalence of property values) can be easily axiomatized using Sesame's custom rule language.