History | Log In     View a printable version of the current page. Get help!  
Issue Details [XML]

Key: SES-659
Type: Bug Bug
Status: Resolved Resolved
Resolution: Fixed
Priority: Major Major
Assignee: Arjohn Kampman
Reporter: Nick Giles
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Sesame

Nested UNION blocks appear to degrade query performance

Created: 20/Jan/09 05:27 PM   Updated: 07/Oct/09 07:27 PM
Component/s: SPARQL
Affects Version/s: 2.2.2
Fix Version/s: 2.3.0


 Description   
The following query appeared slow in returning results, timing out with only 2 results found when given a time limit of 60 seconds:

PREFIX someOntology: <http://test.com/someOntology.owl#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT DISTINCT ?connection ?connectionId WHERE {
 ?selected someOntology:locationId "12345"^^xsd:integer .

 FILTER ( BOUND (?selected ) )

 {
  ?selected a someOntology:Location .
  {
   ?selected someOntology:locationId ?locationId .
  } UNION {
   ?selected someOntology:containsLocation ?subLocation .
   ?subLocation someOntology:locationId ?locationId .
  }
  ?location someOntology:locationId ?locationId .
 
  ?node someOntology:locatedAt ?location .
  ?connection someOntology:connectsNode ?node .
 } UNION {
  ?selected a someOntology:Node .
  ?connection someOntology:connectsNode ?selected .
 }

 {
  ?connection someOntology:circuitId ?connectionId .
 } UNION {
  ?connection someOntology:trunkId ?connectionId .
 }

}




Commenting out each side of the nested UNION operation makes a difference:

Finding 99 results in approx 45 seconds:

PREFIX someOntology: <http://test.com/someOntology.owl#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT DISTINCT ?connection ?connectionId WHERE {
 ?selected someOntology:locationId "12345"^^xsd:integer .

 FILTER ( BOUND (?selected ) )

 {
  ?selected a someOntology:Location .
# {
   ?selected someOntology:locationId ?locationId .
# } UNION {
# ?selected someOntology:containsLocation ?subLocation .
# ?subLocation someOntology:locationId ?locationId .
# }
  ?location someOntology:locationId ?locationId .
 
  ?node someOntology:locatedAt ?location .
  ?connection someOntology:connectsNode ?node .
 } UNION {
  ?selected a someOntology:Node .
  ?connection someOntology:connectsNode ?selected .
 }

 {
  ?connection someOntology:circuitId ?connectionId .
 } UNION {
  ?connection someOntology:trunkId ?connectionId .
 }

}

Finding no results in approx 2 seconds (which is correct):

PREFIX someOntology: <http://test.com/someOntology.owl#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT DISTINCT ?connection ?connectionId WHERE {
 ?selected someOntology:locationId "12345"^^xsd:integer .

 FILTER ( BOUND (?selected ) )

 {
  ?selected a someOntology:Location .
# {
# ?selected someOntology:locationId ?locationId .
# } UNION {
   ?selected someOntology:containsLocation ?subLocation .
   ?subLocation someOntology:locationId ?locationId .
# }
  ?location someOntology:locationId ?locationId .
 
  ?node someOntology:locatedAt ?location .
  ?connection someOntology:connectsNode ?node .
 } UNION {
  ?selected a someOntology:Node .
  ?connection someOntology:connectsNode ?selected .
 }

 {
  ?connection someOntology:circuitId ?connectionId .
 } UNION {
  ?connection someOntology:trunkId ?connectionId .
 }

}



If we refactor the query to remove the nested UNION, at the cost of some duplication of lines, we get the following, which returns the correct 99 results in approximately 10 seconds (faster even than the single case...

PREFIX someOntology: <http://test.com/someOntology.owl#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT DISTINCT ?connection ?connectionId WHERE {
 ?selected someOntology:locationId "12345"^^xsd:integer .

 FILTER ( BOUND (?selected ) )

 {
  ?selected a someOntology:Location .
  ?node someOntology:locatedAt ?selected .
  ?connection someOntology:connectsNode ?node .
 } UNION {
  ?selected a someOntology:Location .
  ?selected someOntology:containsLocation ?subLocation .
  ?node someOntology:locatedAt ?subLocation .
  ?connection someOntology:connectsNode ?node .
 } UNION {
  ?selected a someOntology:Node .
  ?connection someOntology:connectsNode ?selected .
 }

 {
  ?connection someOntology:circuitId ?connectionId .
 } UNION {
  ?connection someOntology:trunkId ?connectionId .
 }

}


Is there something fundamental about the query structure, or is this an edge case for the query engine / optimiser?

 All   Comments   Change History      Sort Order:
Comment by Arjohn Kampman [21/Apr/09 03:13 PM]
The last query seems to be different than the others. Not only has it been refactored, but it also seems to be simplified: it misses the path through ?location and ?locationId. That could well explain why this query runs so much faster. To analyze the (lack of) performance for the first query, I 'll need some data to test with. This very much looks like a query optimization issue related to expected result counts of query parts.

Comment by Arjohn Kampman [07/Oct/09 07:27 PM]
The query optimizer has been improved to resolve this issue. Query models with JOIN-UNION-JOIN combinations are now normalized to a UNION-JOIN-JOIN form. The existing join reordering is then applied to the JOIN-JOIN combination. Finally, the query optimizer checks if the left-most JOIN argument can be factered out again, pushing the UNION lower in the query model tree. This last step is only possible if the JOIN order did not change in the second step, or if they changed in a similar way.