openRDF.org Welcome Guest   | Login   
  Search  
  Index  | Recent Threads  | Who's Online  | User List  | Search  | Help  | RSS feeds

Forum has been closed down
This forum has been closed down due to extensive spamming activities. Please use the mailing list instead.


Quick Go »
Thread Status: Normal
Total posts in this thread: 5
[Add To My Favorites] [Watch this Thread]
Author
Previous Thread This topic has been viewed 8045 times and has 4 replies Next Thread
Aug 12, 2008 1:07:05 PM

jans70
Member



Joined: Sep 27, 2007
Posts: 19
Status: Offline

Performance of RDBMS sail using PostgreSQL

Hi, I'm currently evaluating the RDBMS sail in Sesame 2.1.3, at the moment with PostGreSQL. I've got this up and running OK, but initial query performance is disappointing. It looks like the postgres server is the bottleneck, as it's running close to 100% cpu usage while these queries are evaluated. I've read the section in the main Sesame docs on how to configure this triple store, but is there any more information somewhere about how to set it up for best performance, i.e. improving indexing?

We're also seeing some surprising in performance between different queries. For example, this query is much slower if the DISTINCT keyword is left in (by roughly one order of magnitude). Is this to be expected?

SELECT DISTINCT * WHERE { GRAPH ?g { ?s ?p ?o } }
----------------------------------------
[Edit 1 times, last edit by jans70 at Aug 12, 2008 3:57:29 PM]
Show Printable Version of Post     [Link] Report threatening or abusive post: please login first  Go to top 
Aug 12, 2008 1:33:21 PM

james
OpenRDF.org Consultant
Member's Avatar

Canada
Joined: Jul 21, 2005
Posts: 650
Status: Offline
Re: Performance of RDBMS sail using PostgreSQL

The trouble with RDBMS is you have to tweak both the RDF store _and_ the relational store configuration settings. Out-of-the-box settings for most relational store are very conservative. On typical modern hardware they will not scale very well.

You might want to try reading up on max_locks_per_transaction, fsync, or more importantly shared_bufers settings in postgresql (read up on different pool_sizes for mysql). On a typical out-of-the-box Linux distribution kernel tweaks are often needed to achieve reasonable performance, for details or other OSes see the respective database documentation.
----------------------------------------
Sesame Consultant
http://leighnet.ca
http://jamesrdf.blogspot.com
Show Printable Version of Post        Hidden to Guest    http://leighnet.ca [Link] Report threatening or abusive post: please login first  Go to top 
Aug 12, 2008 3:12:07 PM

jans70
Member



Joined: Sep 27, 2007
Posts: 19
Status: Offline

Re: Performance of RDBMS sail using PostgreSQL

OK. Just wondering: do you have any numbers for the sort of performance that can be achieved with the database backends as it currently stands? It would be very useful to understand what's possible to achieve, so we can decide whether it's worth carrying on with tuning the setup. At the moment, the performance is several orders of magnitude behind other options so it's not like it's just small tweaks that are needed.

To give you another example, getting the first 1000 triples out of our repository using the query

SELECT DISTINCT * WHERE { ?s ?p ?o } LIMIT 1000

takes 30 - 40 seconds! This is on a repository that contains less than 1m triples.

Looking at the query that's passed to the database, it looks like:

SELECT DISTINCT t0.subj,
CASE WHEN u_s.value IS NOT NULL THEN u_s.value WHEN b_s.value IS NOT NULL THEN b_s.value END,
t0.pred,
CASE WHEN u_p.value IS NOT NULL THEN u_p.value WHEN lu_p.value IS NOT NULL THEN lu_p.value END,
t0.obj,
CASE WHEN u_o.value IS NOT NULL THEN u_o.value WHEN b_o.value IS NOT NULL THEN b_o.value WHEN l_o.value IS NOT NULL THEN l_o.value WHEN ll_o.value IS NOT NULL THEN ll_o.value WHEN lu_o.value IS NOT NULL THEN lu_o.value END,
NULL ,
d_o.value
FROM (SELECT ctx, subj, pred, obj
FROM triples) t0
LEFT JOIN uri_values u_s ON (u_s.id = t0.subj)
LEFT JOIN bnode_values b_s ON (b_s.id = t0.subj)
LEFT JOIN uri_values u_p ON (u_p.id = t0.pred)
LEFT JOIN long_uri_values lu_p ON (lu_p.id = t0.pred)
LEFT JOIN uri_values u_o ON (u_o.id = t0.obj)
LEFT JOIN bnode_values b_o ON (b_o.id = t0.obj)
LEFT JOIN label_values l_o ON (l_o.id = t0.obj)
LEFT JOIN long_label_values ll_o ON (ll_o.id = t0.obj)
LEFT JOIN long_uri_values lu_o ON (lu_o.id = t0.obj)
LEFT JOIN datatype_values d_o ON (d_o.id = t0.obj)
LIMIT 1000

Lots of joins then... This is slow when executed directly on the database as well.
Have you actually managed to set up a database to execute this sort of query fast on any sizable repository?

Regards,
Jan
Show Printable Version of Post     [Link] Report threatening or abusive post: please login first  Go to top 
Aug 13, 2008 12:16:53 PM

james
OpenRDF.org Consultant
Member's Avatar

Canada
Joined: Jul 21, 2005
Posts: 650
Status: Offline
Re: Performance of RDBMS sail using PostgreSQL

RDBMS SAIL is optimized to store a limited number of predicates that can be described as ObjectProperties or DatatypeProperties with a particular datatype. It is best at complex queries with explicit predicates and limited results. It does perform poorly for bulk import/export of data.

The query you have shown is a bulk export and will always perform poorly in RDBMS SAIL regardless of the settings. For bulk import/export of RDF data the native store performs much better.

In general, if your application will be moving around a lot of data or the data will have unknown or loosely defined predicates then RDBMS SAIL is not going to be your best choice.
----------------------------------------
Sesame Consultant
http://leighnet.ca
http://jamesrdf.blogspot.com
Show Printable Version of Post        Hidden to Guest    http://leighnet.ca [Link] Report threatening or abusive post: please login first  Go to top 
Aug 13, 2008 1:01:26 PM

jans70
Member



Joined: Sep 27, 2007
Posts: 19
Status: Offline

Re: Performance of RDBMS sail using PostgreSQL

OK, thanks for the heads-up.

Regards,
Jan
Show Printable Version of Post     [Link] Report threatening or abusive post: please login first  Go to top 
[Show Printable Version of Thread]