History | Log In     View a printable version of the current page. Get help!  
Issue Details [XML]

Key: SES-751
Type: Bug Bug
Status: Resolved Resolved
Resolution: Fixed
Priority: Major Major
Assignee: Jeen Broekstra
Reporter: Jeen Broekstra
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Sesame

QueryEvaluationUtil.compareLiterals() throws exception when comparing string-typed with numeric-typed

Created: 15/Mar/11 11:44 PM   Updated: 26/Mar/11 05:08 AM
Component/s: Query Engine
Affects Version/s: 2.3.2
Fix Version/s: 2.4.0-alpha1

Issue Links:
Dependency
 
This issue is a dependency for:
SES-735 SPARQL 1.1 Support: (NOT) IN operator Major Resolved


 Description   
The following query will always return zero results in Sesame:

prefix xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT ?s ?p ?o
 WHERE {
 ?s ?p ?o .
 FILTER("foo"^^xsd:string != "10"^^xsd:integer)
}

I think this is a bug in the literal comparison implementation.

When comparing two datatyped literals, the current QueryEvaluationUtil.compareLiterals() method tries to find a common datatype and then perform the comparison using the ordering defined for that datatype. However, when two literals that share no datatype are compared, only EQ and NEQ can be evaluated (since no useful ordering can be determined between such values). It then uses RDFterm-equal to determine the return value.

However, in lines 252-267, the method does a peculiar check, that I think is wrong or at least incomplete. Consider the FILTER in the above SPARQL query. Clearly, the two operands share no common datatype so it should come to line 250, performing RDFterm-equals on the two values (which will by definition return false). However, the check in lines 252-267 afterwards results in this comparison throwing a ValueExprEvaluationException, because both operands have a datatype which is not a calendar type and neither have a language tag.

This is no problem when the comparison is an EQ (because a FILTER will fail silently on a type error, thus effectively evaluating to false), but it is a problem for NEQ. Effectively, FILTER("foo"^^xsd:string != "10"^^xsd:integer) will always fail, which I think is wrong.

A possible fix would be to add additional check on line 260-263, that checks if the operand datatypes are supported types (either string, numerical, or calendar).

As an aside, since the implemenation of SPARQL 1.1's IN and NOT IN operators reuses != internally, the issue also effects these operators. See SES-735.


 All   Comments   Change History      Sort Order:
Comment by Jeen Broekstra [16/Mar/11 12:11 AM]
Btw I am not 100% sure if this is actually a bug, it might be to the letter of the spec, but at the very least it seems very un-intuitive to me...

Comment by Arjohn Kampman [16/Mar/11 09:20 AM]
I agree that this behaviour doesn't look very intuitive. I remember that the compareLiterals() method was quite tricky to get it to pass all sparql query tests. If you find another variant that passes this particular query 'correctly' without breaking the sparql query tests, please let me know.

Comment by Jeen Broekstra [17/Mar/11 01:18 AM]
You're right, I just had a quick look at implementing the fix I proposed, and this suddenly fails several DAWG unit tests (in "open world value testing tests"), in particular, open-eq 08, 10, 11 and 12.

So it looks as if the behavior implemented in Sesame is expected behavior. I still find it a strange way to define != though. Will think about posting a comment to DAWG about this, especially since it makes the NOT IN operator practically useless.

Comment by Jeen Broekstra [17/Mar/11 03:01 AM]
A possible solution would be to implement NOT IN in terms of the sameTerm operator, rather than !=. Will write to DAWG with this proposal.

Comment by Jeen Broekstra [17/Mar/11 04:09 AM]

Comment by Jeen Broekstra [26/Mar/11 04:57 AM]
Ok, after receiving a response from Andy Seaborne on my comment to DAWG, I have found that Sesame's implementation in QueryEvaluationUtil is in fact not quite correct after all.

The definition of RDFterm-equals states (although this is awfully well hidden, in a footnote) that for datatyped literals, a value-based comparison should be done. Only when datatypes are unknown or unsupported should it throw a type error. So in the case of comparing a string with an integer, the != should just return TRUE.

There is one exception to that last rule: when the lexical-to-value mapping fails (for example, you have a literal "xyz"^^xsd:integer), a type error should be thrown. This is in fact why some of those open-world unit tests I mentioned failed on my initial fix.

Comment by Jeen Broekstra [26/Mar/11 05:08 AM]
fix implemented in 2.4 branch. Should this be backported to the 2.3 branch?