openRDF.org Welcome Guest   | Login   
  Search  
  Index  | Recent Threads  | Who's Online  | User List  | Search  | Help  | RSS feeds

Forum closing down
This forum will be closing down due to extensive spamming activities. As a first step, registration of new members has been disabled. Existing members will be able to use the forum for now, but please consider using the sesame-general mailing list instead.


Quick Go »
Thread Status: Normal
Total posts in this thread: 3
[Add To My Favorites] [Watch this Thread] [Post new Thread]
Author
Previous Thread This topic has been viewed 1511 times and has 2 replies Next Thread
Mar 29, 2005 7:19:25 PM

willpugh
Visitor



Joined: Mar 29, 2005
Posts: 2
Status: Offline

Really bad performance adding objects to Sesame against Postgres Reply to this Post
Reply with Quote

Hi,

I'm pretty new to Sesame. I tried putting it through a number of perf tests. It seemed that the performacnce against Postgres was pretty bad.

I was able to load my ontology reasonably quickly, but as I started adding instances to it, it slowed down to about 2 adds a second. This is with the dependancy-inference turned off. I wasn't running this on a super fast machine (It's a G4 laptop) but I did expect much faster results than this. The native format performed a lot faster (around 10 times faster).

I was adding objects by creating a blank graph, and populating it with about 3 objects and then adding it to a repository. Is there another way I should be adding instances?

When I looked at the code, I had a couple of questions:
1) There appeared to be alot of round trips to the server. I believe adding my graph that had about 3 triples it it was about 20 round trips. Is there a reason a lot of this work can't be done in memory rather than in the database? Particularly for cases like deciding whether namespaces should be exported or not, it seemed like this operation could happen much less frequently, and work in sesame could prevent it from ever going to the DB. Am I missing something?

2) There seemed to be a bunch of tables (RawTriples, etc) that temp work was done in, but didn't seem to be created as temp tables. If I has two sesame clients going against the same DB, would this be safe?

Is there something I'm missing here? Any suggestions for how I can speed up performance?

Thanks!
--Will
Show Printable Version of Post        Hidden to Guest [Link] Report threatening or abusive post: please login first  Go to top 
Mar 30, 2005 9:05:16 AM

jeen
Sesame Addict
Member's Avatar

The Netherlands
Joined: Jan 23, 2004
Posts: 1091
Status: Offline
Re: Really bad performance adding objects to Sesame against Postgres Reply to this Post
Reply with Quote

 
I'm pretty new to Sesame. I tried putting it through a number of perf tests. It seemed that the performacnce against Postgres was pretty bad.

Yes, unfortunately, performance on PostgreSQL is quite poor. Although the RDBMS SAIL is generic in principle, most of our optimizations are developed for MySQL, which makes that database the best choice in combination with Sesame.
 
I was able to load my ontology reasonably quickly, but as I started adding instances to it, it slowed down to about 2 adds a second.

Ouch. That sounds a bit excessive.
 
This is with the dependancy-inference turned off. I wasn't running this on a super fast machine (It's a G4 laptop) but I did expect much faster results than this. The native format performed a lot faster (around 10 times faster).

I was adding objects by creating a blank graph, and populating it with about 3 objects and then adding it to a repository. Is there another way I should be adding instances?

Typically, uploads in bulk go quicker in the long run. If you have the option of creating a larger part of the graph at the client side and adding it to the repository in larger chunks, performance will increase.
 
When I looked at the code, I had a couple of questions:
1) There appeared to be alot of round trips to the server. I believe adding my graph that had about 3 triples it it was about 20 round trips. Is there a reason a lot of this work can't be done in memory rather than in the database? Particularly for cases like deciding whether namespaces should be exported or not, it seemed like this operation could happen much less frequently, and work in sesame could prevent it from ever going to the DB. Am I missing something?

The performance bottleneck for namespace exports is a known issue and has actually been fixed in the latest developer's release (see issue SES-140 ).

As for round trips: the implementation of the Graph API (which I assume is what you are using here) is relatively young. I am quite sure a lot of optimizations are possible, we haven't had the time yet to fully investigate.
 
2) There seemed to be a bunch of tables (RawTriples, etc) that temp work was done in, but didn't seem to be created as temp tables.

That is at least partially because the code was kept compatible with older versions of MySQL. We're planning abandoning that compatibility in future versions, because it is becoming a major drag on performance and on our development speed.
 
If I has two sesame clients going against the same DB, would this be safe?

You can safely access one Sesame repository multiple times through the client API. However, it is not safe to define more than one SAIL on the same database.
 
Is there something I'm missing here? Any suggestions for how I can speed up performance?

Switching to MySQL would be a good step, if possible of course. You might also try out the latest Sesame developer's release which contains that fix for the namespace export issue, which might speed up things quite a bit.
----------------------------------------
Researcher at AFSG - Wageningen UR
Show Printable Version of Post     [Link] Report threatening or abusive post: please login first  Go to top 
Mar 30, 2005 9:09:59 AM

jeen
Sesame Addict
Member's Avatar

The Netherlands
Joined: Jan 23, 2004
Posts: 1091
Status: Offline
Re: Really bad performance adding objects to Sesame against Postgres Reply to this Post
Reply with Quote

 

 
If I has two sesame clients going against the same DB, would this be safe?

You can safely access one Sesame repository multiple times through the client API. However, it is not safe to define more than one SAIL on the same database.

To make this a bit clearer: it is perfectly fine to have two or more independent client applications accessing the same repository at the same time through the repository API. The synchronization layer in the SAIL stack will take care of concurrency issues.

What won't work is having one MySQL database, say 'testdb', and defining more than one SAIL on this database (for example, by running two different Sesame servers in parallel). This will cause inconsistencies in the SAILs caching mechanism.
----------------------------------------
Researcher at AFSG - Wageningen UR
Show Printable Version of Post     [Link] Report threatening or abusive post: please login first  Go to top 
[Show Printable Version of Thread] [Post new Thread]