The following query will always return zero results in Sesame:
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT ?s ?p ?o
WHERE {
?s ?p ?o .
FILTER("foo"^^xsd:string != "10"^^xsd:integer)
}
I think this is a bug in the literal comparison implementation.
When comparing two datatyped literals, the current QueryEvaluationUtil.compareLiterals() method tries to find a common datatype and then perform the comparison using the ordering defined for that datatype. However, when two literals that share no datatype are compared, only EQ and NEQ can be evaluated (since no useful ordering can be determined between such values). It then uses RDFterm-equal to determine the return value.
However, in lines 252-267, the method does a peculiar check, that I think is wrong or at least incomplete. Consider the FILTER in the above SPARQL query. Clearly, the two operands share no common datatype so it should come to line 250, performing RDFterm-equals on the two values (which will by definition return false). However, the check in lines 252-267 afterwards results in this comparison throwing a ValueExprEvaluationException, because both operands have a datatype which is not a calendar type and neither have a language tag.
This is no problem when the comparison is an EQ (because a FILTER will fail silently on a type error, thus effectively evaluating to false), but it is a problem for NEQ. Effectively, FILTER("foo"^^xsd:string != "10"^^xsd:integer) will always fail, which I think is wrong.
A possible fix would be to add additional check on line 260-263, that checks if the operand datatypes are supported types (either string, numerical, or calendar).
As an aside, since the implemenation of SPARQL 1.1's IN and NOT IN operators reuses != internally, the issue also effects these operators. See SES-735.
|
|
Btw I am not 100% sure if this is actually a bug, it might be to the letter of the spec, but at the very least it seems very un-intuitive to me...
I agree that this behaviour doesn't look very intuitive. I remember that the compareLiterals() method was quite tricky to get it to pass all sparql query tests. If you find another variant that passes this particular query 'correctly' without breaking the sparql query tests, please let me know.
You're right, I just had a quick look at implementing the fix I proposed, and this suddenly fails several DAWG unit tests (in "open world value testing tests"), in particular, open-eq 08, 10, 11 and 12.
So it looks as if the behavior implemented in Sesame is expected behavior. I still find it a strange way to define != though. Will think about posting a comment to DAWG about this, especially since it makes the NOT IN operator practically useless.
A possible solution would be to implement NOT IN in terms of the sameTerm operator, rather than !=. Will write to DAWG with this proposal.
Ok, after receiving a response from Andy Seaborne on my comment to DAWG, I have found that Sesame's implementation in QueryEvaluationUtil is in fact not quite correct after all.
The definition of RDFterm-equals states (although this is awfully well hidden, in a footnote) that for datatyped literals, a value-based comparison should be done. Only when datatypes are unknown or unsupported should it throw a type error. So in the case of comparing a string with an integer, the != should just return TRUE.
There is one exception to that last rule: when the lexical-to-value mapping fails (for example, you have a literal "xyz"^^xsd:integer), a type error should be thrown. This is in fact why some of those open-world unit tests I mentioned failed on my initial fix.
fix implemented in 2.4 branch. Should this be backported to the 2.3 branch?