|
|
Index
| Recent Threads
| Who's Online
| User List
| Search
| Help
| |
|
Forum closing down This forum will be closing down due to extensive spamming activities. As a first step, registration of new members has been disabled. Existing members will be able to use the forum for now, but please consider using the sesame-general mailing list instead. |
![]() |
openRDF.org Forum » Sesame & Rio: Help » Thread: Really bad performance adding objects to Sesame against Postgres |
|
Total posts in this thread: 3 |
[Add To My Favorites] [Watch this Thread] [Post new Thread] |
| Author |
|
|
Mar 29, 2005 7:19:25 PM
willpugh Visitor
|
Hi, I'm pretty new to Sesame. I tried putting it through a number of perf tests. It seemed that the performacnce against Postgres was pretty bad. I was able to load my ontology reasonably quickly, but as I started adding instances to it, it slowed down to about 2 adds a second. This is with the dependancy-inference turned off. I wasn't running this on a super fast machine (It's a G4 laptop) but I did expect much faster results than this. The native format performed a lot faster (around 10 times faster). I was adding objects by creating a blank graph, and populating it with about 3 objects and then adding it to a repository. Is there another way I should be adding instances? When I looked at the code, I had a couple of questions: 1) There appeared to be alot of round trips to the server. I believe adding my graph that had about 3 triples it it was about 20 round trips. Is there a reason a lot of this work can't be done in memory rather than in the database? Particularly for cases like deciding whether namespaces should be exported or not, it seemed like this operation could happen much less frequently, and work in sesame could prevent it from ever going to the DB. Am I missing something? 2) There seemed to be a bunch of tables (RawTriples, etc) that temp work was done in, but didn't seem to be created as temp tables. If I has two sesame clients going against the same DB, would this be safe? Is there something I'm missing here? Any suggestions for how I can speed up performance? Thanks! --Will |
||
|
|
Mar 30, 2005 9:05:16 AM
jeen Sesame Addict The Netherlands Joined: Jan 23, 2004 Posts: 1091 Status: Offline |
Yes, unfortunately, performance on PostgreSQL is quite poor. Although the RDBMS SAIL is generic in principle, most of our optimizations are developed for MySQL, which makes that database the best choice in combination with Sesame.
Ouch. That sounds a bit excessive.
Typically, uploads in bulk go quicker in the long run. If you have the option of creating a larger part of the graph at the client side and adding it to the repository in larger chunks, performance will increase.
The performance bottleneck for namespace exports is a known issue and has actually been fixed in the latest developer's release (see issue SES-140 ). As for round trips: the implementation of the Graph API (which I assume is what you are using here) is relatively young. I am quite sure a lot of optimizations are possible, we haven't had the time yet to fully investigate.
That is at least partially because the code was kept compatible with older versions of MySQL. We're planning abandoning that compatibility in future versions, because it is becoming a major drag on performance and on our development speed.
You can safely access one Sesame repository multiple times through the client API. However, it is not safe to define more than one SAIL on the same database.
Switching to MySQL would be a good step, if possible of course. You might also try out the latest Sesame developer's release which contains that fix for the namespace export issue, which might speed up things quite a bit. ---------------------------------------- Researcher at AFSG - Wageningen UR |
|||||||||||||||
|
|
Mar 30, 2005 9:09:59 AM
jeen Sesame Addict The Netherlands Joined: Jan 23, 2004 Posts: 1091 Status: Offline |
To make this a bit clearer: it is perfectly fine to have two or more independent client applications accessing the same repository at the same time through the repository API. The synchronization layer in the SAIL stack will take care of concurrency issues. What won't work is having one MySQL database, say 'testdb', and defining more than one SAIL on this database (for example, by running two different Sesame servers in parallel). This will cause inconsistencies in the SAILs caching mechanism. ---------------------------------------- Researcher at AFSG - Wageningen UR |
|||||
|
| [Show Printable Version of Thread] [Post new Thread] |