History | Log In     View a printable version of the current page. Get help!  
Issue Details [XML]

Key: SES-453
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Minor Minor
Assignee: Arjohn Kampman
Reporter: Arjohn Kampman
Votes: 1
Watchers: 1
Operations

If you were logged in you would be able to see more operations.
Sesame

SeRQL-"construct distinct" queries generate duplicates when bnode generation is involved

Created: 03/Oct/07 02:12 PM   Updated: 20/Mar/08 08:41 PM
Component/s: SeRQL
Affects Version/s: 2.0-beta5, 2.0-beta4, 2.0-beta3, 2.0-beta2, 2.0-beta1
Fix Version/s: 2.0-beta6


 Description   
Issue reported on the forum: http://www.openrdf.org/forum/mvnforum/viewthread?thread=1440
--------------------------------------------------------------------------------
As part of a crosswalk from an ontology that holds people as strings to an ontology that has people as objects, we would like to create people objects for people that do not exist as objects yet. This almost works, i.e.

CONSTRUCT DISTINCT {} rdf:type {iribib:Person}; iribib:name {nam}
FROM {doc} enlbib:author {nam} ,
[ {aut} rdf:type {iribib:Person} ;iribib:name {nam} ]
WHERE aut=NULL
using namespace
iribib = <http://iridl.ldeo.columbia.edu/ontologies/iribib.owl#&gt;,
enlbib = <http://iridl.ldeo.columbia.edu/ontologies/john/enlbib.owl#>

does make the blank nodes with the iribib:name property for the missing iribib:Person objects. The only problem is that repeats are not detected (apparently the presence of a blank node means every instance is distinct). While that might be the case from one perspective, I would argue that, since blank nodes only have meaning from their properties, that is not ever what is wanted from CONSTRUCT DISTINCT (and if you did want that you could leave out the DISTINCT request).
--------------------------------------------------------------------------------

This is a problem of the SeRQL parser, which translates the above query to the following query model:

Distinct
   MultiProjection
      Projection
         ProjectionElem "-anon-1" AS "subject"
         ProjectionElem "-const-5" AS "predicate"
         ProjectionElem "-const-6" AS "object"
      Projection
         ProjectionElem "-anon-1" AS "subject"
         ProjectionElem "-const-7" AS "predicate"
         ProjectionElem "nam" AS "object"
      Extension
         ExtensionElem (-const-6)
            ValueConstant (value=http://iridl.ldeo.columbia.edu/ontologies/iribib.owl#Person)
         ExtensionElem (-anon-1)
            BNodeGenerator
         ExtensionElem (-const-7)
            ValueConstant (value=http://iridl.ldeo.columbia.edu/ontologies/iribib.owl#name)
         ExtensionElem (-const-5)
            ValueConstant (value=http://www.w3.org/1999/02/22-rdf-syntax-ns#type)
         Filter
            Compare (=)
               Var (name=aut)
               Null
            LeftJoin
               StatementPattern
                  Var (name=doc)
                  Var (name=-const-1, value=http://iridl.ldeo.columbia.edu/ontologies/john/enlbib.owl#author)
                  Var (name=nam)
               Join
                  StatementPattern
                     Var (name=aut)
                     Var (name=-const-2, value=http://www.w3.org/1999/02/22-rdf-syntax-ns#type)
                     Var (name=-const-3, value=http://iridl.ldeo.columbia.edu/ontologies/iribib.owl#Person)
                  StatementPattern
                     Var (name=aut)
                     Var (name=-const-4, value=http://iridl.ldeo.columbia.edu/ontologies/iribib.owl#name)
                     Var (name=nam)

The DISTINCT operator is applied after the bnodes are generated, which causes all projections to be unique. To fix this issue, the DISTINCT operator should be applied directly after the FILTER operation (all operations after that are related to the constructor).

 All   Comments   Change History      Sort Order:
Comment by Arjohn Kampman [03/Oct/07 02:18 PM]
Addendum: the proposed solution also requires an additional PROJECTION to restrict the binding sets to the variables that are used in the constructor. Without the PROJECTION, multiple combinations of 'doc' and 'nam' would otherwise still result in the generation of duplicates.