Auditing Repository

The auditing repository is a repository wrapper to track changes and store provenance data in the RDF store. The Auditing Repository uses a vocabulary based on the The PROV Ontology. The audit: namespace used by the auditing repository has a namespace of "http://www.openrdf.org/rdf/2012/auditing#".

Any entity that is modified is associated to the activity graph, through the prov:used relationship. Any graph that is modified is associated to the activity graph, through the prov:wasInformedBy relationship. The auditing repository also maintains the prov:wasGeneratedBy relationships of entities pointing to the most recent activity graph that modified it.

The activity graph can be assigned by setting the default insert graph (called activity graph). The default insert graph can be assigned in the AuditingRepositoryConnection, or assigned in a parent ContextAwareRepositoryConnection, or in ObjectConnection#setActivityURI(), or assigned on a particular Operation's Dataset's defaultInsertGraph. If no default insert graph is assigned a generated graph URI is created through the repository's activityFactory's createActivityURI.

A custom ActivityFactory can be used to listen for activity started and ended events. The started event is fired within a transaction but before anything is written to store. The ended event is fired after an activity has been committed.

Furthermore, when enabled and combined with the auditing sail and a delete graph pattern, of a single entity, is removed from the RDF store these triples are reified, using the prov:qualifiedUsage relationship, from both the activity graph and the graph the triple is removed from. The usage is qualified with an audit:changed pointing to the reified triple.

An entity of a triple, with a subject URI, is the subject up to the first '#' or the entire subject URI. An entity of a graph pattern is the URI in the subject position up to the first '#' or the entire subject URI. A graph pattern only has an entity if the subject URIs have exactly one entity and all other var or terms in a subject position are also in an object position. For an entity to be considered modified, a triple with the entity must either be added to the activity graph or a triple must be removed using a graph pattern with a single entity. A graph pattern may removed using a DELETE/WHERE operation, a DELETE DATA operation, or call remove(subject,predicate,object) with a subject and no context.

The auditing repository can also maintain a list of the most recent activity graphs using the type of audit:RecentActivity and can also purge activity graphs that do not contribute to the current state of any entity (all inserted triples, if any, have since been removed). To enable purging of obsolete activity graphs the purge after duration must be set and activity graphs must have a prov:endedAtTime property of earlier then the current time minus the purge after duration.

Users are encouraged to add aditional metadata of the activity graph into the RDF store using The PROV Ontology. To ensure the purging of obsolete activity graphs functions as expected, the PROV should be used (all predicates must in the "http://www.w3.org/ns/prov#" namespace or an rdf:type to that namespace) or the activity graph (or a fragment identifier of the activity graph) used as the subject. The prov:endedAtTime must also be included.

The auditing repository should be used in conjuction with the auditing sail to track both added and removed triples. When the auditing repository is disabled (no default insert graph is assigned) the auditing sail will track inserted triples using a legacy vocabulary for backwards compatibility.