History | Log In     View a printable version of the current page. Get help!  
Issue Details [XML]

Key: SES-385
Type: Improvement Improvement
Status: Closed Closed
Resolution: Fixed
Priority: Minor Minor
Assignee: Arjohn Kampman
Reporter: Henry Story
Votes: 5
Watchers: 2
Operations

If you were logged in you would be able to see more operations.
Sesame

duplicate statements in SPARQL-construct query results should be filtered as much as possible

Created: 24/Apr/07 08:41 PM   Updated: 07/Jan/10 04:29 PM
Component/s: SPARQL, SeRQL
Affects Version/s: None
Fix Version/s: 2.3-pr1

File Attachments: 1. Java Source File DuplicateStatementTest.java (2 kb)
2. Java Source File DuplicateStatementTest2.java (2 kb)

Issue Links:
Related
 
This issue is related to:
SES-700 Further reduce duplicate statements i... Minor Resolved


 Description   
Issue was raised by Henry Story on sesame-devel:
---------------------------------------------------------------------------------
Not sure if this is a bug or not, as I don't have a deep grasp of SPARQL semantics. If I sent the following query

CONSTRUCT { ?c a rdfs:Class . }
WHERE { [] a ?c . }

I get a huge number of results like this

:Issue a rdfs:Class , rdfs:Class , rdfs:Class , rdfs:Class ,
rdfs:Class , rdfs:Class , rdfs:Class , rdfs:Class , rdfs:Class ,
rdfs:Class , rdfs:Class , rdfs:Class , rdfs:Class , rdfs:Class ,
rdfs:Class , rdfs:Class , rdfs:Class , rdfs:Class , rdfs:Class ,
rdfs:Class , rdfs:Class , rdfs:Class , rdfs:Class , rdfs:Class ,
rdfs:Class , rdfs:Class , rdfs:Class , rdfs:Class , rdfs:Class ,
rdfs:Class , rdfs:Class , rdfs:Class , rdfs:Class , rdfs:Class ,
rdfs:Class , rdfs:Class , rdfs:Class , rdfs:Class , rdfs:Class ,
rdfs:Class , rdfs:Class , rdfs:Class , rdfs:Class , rdfs:Class ,
rdfs:Class , rdfs:Class , rdfs:Class , rdfs:Class , rdfs:Class ,
rdfs:Class , rdfs:Class , rdfs:Class , rdfs:Class....

There is no DISTINCT clause for CONSTRUCT, but it seems to me that the graph returned is a little verbose. It is just repeating n times the same relation.
---------------------------------------------------------------------------------

 All   Comments   Change History      Sort Order:
James Leigh suggested that I should add my problem with sparql [1] 'describe' to this issue.

With "sesame 2.2 beta2" a DESCRIBE query is quite fast but delivers dublicate statements.

When using "sesame 2.1.2" the performance is rather bad on big data sets, i.e. about 100.0000 triples.

The behavour of the dublicate returns with 2.2 beta 2 can be reproduced with the attached JUnit test.

Actually dublicate statements as result of a describe query are not really wrong, but are not very handy, too.



[1] http://www.openrdf.org/forum/mvnforum/viewthread?thread=1770

Comment by Simon Reinhardt [26/Mar/09 07:19 PM]
I also noticed a problem with duplicate statements in CONSTRUCT queries.
That is: it returns duplicate statements when there are none in the store!
The query has the following form:

CONSTRUCT
{
<http://example.org/x> <http://example.org/someProperty> ?a .
?a <http://example.org/someOtherProperty> ?b .
}
WHERE
{
<http://example.org/x> <http://example.org/someProperty> ?a .
?a <http://example.org/someOtherProperty> ?b .
}

The data has the characteristic that there are multiple statements in the store matching both the first and the second statement. Now it seems like that for every solution it finds for ?b it will repeat ?a thus giving me the first statement multiple times for the same ?a.

I'm also no expert in SPARQL to be able to say if that's correct or not. But when I tried this pattern against the DBpedia endpoint it did not contain duplicates.

Comment by Arjohn Kampman [03/Apr/09 08:04 PM]
The SeRQL and SPARQL parser now add REDUCED modifiers to the generated query model where appropriate. The query engine handles this by removing _consecutive_ duplicates from the query result.

SPARQL describe still does not work perfectly.
There are (some) dublicates returned.

See the attach JUNIT test.

Comment by Arjohn Kampman [21/Aug/09 02:25 PM]
The REDUCED modifier doesn't guarantee that there aren't any duplicates, it just indicates to the query engine that duplicates can/should be removed if it doesn't add too much overhead. On the other hand, it might make sense to change this into an actual DISTINCT for ASK queries (but not for CONSTRUCT).