openRDF.org Welcome Guest   | Login   
  Search  
  Index  | Recent Threads  | Who's Online  | User List  | Search  | Help  | RSS feeds

Forum has been closed down
This forum has been closed down due to extensive spamming activities. Please use the mailing list instead.


Quick Go »
Thread Status: Normal
Total posts in this thread: 21
Posts: 21   Pages: 3   [ 1 2 3 | Next Page ]
[Add To My Favorites] [Watch this Thread]
Author
Previous Thread This topic has been viewed 15313 times and has 20 replies Next Thread
Jul 20, 2004 3:47:21 PM

arjohn
OpenRDF project lead
Member's Avatar

The Netherlands
Joined: Jan 23, 2004
Posts: 1289
Status: Offline
A new Sail for Sesame

As you might have read in one of the other threads on this forum, I'm currently working on a new Sail for Sesame. The goals for this Sail implementation are to be as scaleable as the current RDBMS Sail, but to be much faster, without requiring additional software (read: databases).

I just finished a first design document that I would like to share with you (attached to this post). Feedback on this document and/or help implementing the stuff is highly appreciated. Feel free to post any feedback to this thread.

Cheers,

Arjohn

Warning: highly technical document attached! ;-)
----------------------------------------
Attachment new sail.pdf (151742 bytes) (Download Count: 218)

----------------------------------------
Arjohn Kampman, OpenRDF project lead, Aduna
Show Printable Version of Post        Hidden to Guest [Link] Report threatening or abusive post: please login first  Go to top 
Jul 29, 2004 3:31:25 PM

arjohn
OpenRDF project lead
Member's Avatar

The Netherlands
Joined: Jan 23, 2004
Posts: 1289
Status: Offline
Re: A new Sail for Sesame

Status update: value storage has been implemented and it's performance is looking good. I'm currently working on the triple storage using a B-Tree. This differs from what is described in the document; I expect B-Trees to perform for updates, with similar query performance.

Arjohn
----------------------------------------
Arjohn Kampman, OpenRDF project lead, Aduna
Show Printable Version of Post        Hidden to Guest [Link] Report threatening or abusive post: please login first  Go to top 
Jul 30, 2004 4:56:12 PM

bartman
Sesame Addict
Member's Avatar


Joined: May 10, 2004
Posts: 113
Status: Offline
Re: A new Sail for Sesame

This draft looks reasonable to me. I don't had the time to consider it in very detail, but I think it would be helpful to specify which operations are needed on the data structures for the sail(sequential scan, element lookup, element insert, element update, ...).

On this base one could judge alternatives for all different data structures according to the access-operations and their importance. It is likely that the data structures will have positive and negative properties for the different operations. To judge this, it would be necessary to know the importance of the operations compared to each other to choose the right trade offs.

What do you think?

I'm in vacation now, but will get back to this after I return. This is very interesting and important for my application.

Greetings,
Bartman
Show Printable Version of Post        http://www.ontomedia.de [Link] Report threatening or abusive post: please login first  Go to top 
Aug 2, 2004 10:53:57 PM

newmana
Visitor



Joined: Aug 2, 2004
Posts: 2
Status: Offline

Re: A new Sail for Sesame

I see that you've looked at Kowari and decided it was too big for your needs. I just thought I'd let you know that the triple store in Kowari is actually quite small. Kowari lite includes Jetty, Jena, SOAP and all sorts of supporting libraries. Once you take away all of that you get to about 2-3 MB of jars. If you are only interested in a triple store (just storing longs), not tired to RDF, then it's well under 2MB.

Paul Gearon's blog (a Tucana/Kowari developer) also has some interesting ideas about a triple store, he was using JRDF but I guess there's no reason why the underlying store should be tied to Sesame or JRDF if they are storing longs:
http://gearon.blogspot.com/2004_05_01_gearon_archive.html (being a blog it makes sense if you read from the bottom up).
Show Printable Version of Post        Hidden to Guest [Link] Report threatening or abusive post: please login first  Go to top 
Aug 3, 2004 2:16:07 PM

arjohn
OpenRDF project lead
Member's Avatar

The Netherlands
Joined: Jan 23, 2004
Posts: 1289
Status: Offline
Re: A new Sail for Sesame

Hi Andrew,

Thanks for the info and the pointer. I'll check it out as soon as I can find some time.

Arjohn
----------------------------------------
Arjohn Kampman, OpenRDF project lead, Aduna
Show Printable Version of Post        Hidden to Guest [Link] Report threatening or abusive post: please login first  Go to top 
Aug 5, 2004 7:52:29 PM

askutt
Member



Joined: Aug 2, 2004
Posts: 7
Status: Offline

Re: A new Sail for Sesame

I've read your document and have several considerations for you:
You say:
 
and the RDBMS Sail is both complicated to install and too slow in aspects like adding and removing statements.
This simply isn't true. I've done tests with undertuned MySQL and been able to get 1650 statements/second sustained for 50 minutes doing the store into the rawtriples table. While that's not lightning fast, it is faster than everyone else's product I've evalulated thus far.

I think you need to consider other changes before just out and abandoing your RDBMS, for this simple reason: provided your DB schema has been optimized, the RDBMS will always be more efficent at storing data and retreiving it than you will be. While I can understand the desire for a self-contained Sail (and I'm not trying to discourage it), I think you shouldn't consider it an end-all and be willing to extend and do further work with what you already have.

You also say:
 
The Sail should be able to handle millions (O(10^6) - O(10^7)) of RDF statements.
I'm going to say point-blank I'm interested in storing and accessing several powers of 10 more statements, both as raw triples (i.e., RDFRepository style) and as schemed and inferenced data (i.e., RDFSchemaRepository style). While I'm not really interested in writing exact numbers on a public forum, I will say that 5 million RDF triples is a small amount of data for me.

Some notes on the technical implmentations. I've not quoted exact sections, hopefully you can figure out what I' m referring to. If not, reply and I'll be more than happy to expand my rationale and detail exactly what I'm referring to:
  • You're going to need to be able to support 64-bit ids both for the URI/bNode/Literals, and for the triples themselves.
  • Representing a deleted row as all zeros is ok for performance, but you need an mechanism to remove deleted rows, so the data file can be shrunk back. Someone who deletes 100 thousand rows from the file would probably like the capability to reclaim that space at some point.
  • A query caching mechanism within the Sail itself to store frequently occuring queries would not be a bad thing to implement.
  • If you're not using one already, a red-black tree may provide better performance over a standard balanced b-tree for the triple indexes.


While I support the idea of an embedded Sail store (in fact, I think its a really cool and nitfy idea smile) , I'd like to request that this development doesn't cause the abandonment of the existing RDBMS sail. I realize my requirements are very unique in nature among your user community. However, I also think the RDBMS Sail can be extended to provide better performance than it currently does and to provide the ability to extend to any RDBMS on the planet.

I actually have some interest in doing this right now, provided we determine Sesame is adequate for our needs. If we do, then I would be more than willing to provide code to help extend and scale the RDBMS Sail, and my employeer has already agreeded to let us release any changes we make back to the mainline Sesame tree, provided you guys would accept it smile The changes we'd be making would include scability changes (including both speed and volume) and changes to make it eaiser to support Sesame on other database platforms, and to take greater advantage of features available on those platforms (e.g., stored procedures on Oracle and Postgresql, large object support on Postgresql, JTA on platforms that support it, etc.).

I'm currently working on getting Sesame fully working with MS SQL Server, hopefully after that I'll be able to draft some information about initial changes I'd like to make to support the above.
Show Printable Version of Post        Hidden to Guest    http://mantech-ist.com    TheHunter317 [Link] Report threatening or abusive post: please login first  Go to top 
Aug 6, 2004 10:00:43 AM

jeen
Sesame Addict
Member's Avatar

The Netherlands
Joined: Jan 23, 2004
Posts: 1091
Status: Offline
Re: A new Sail for Sesame

 
I've read your document and have several considerations for you:
You say:
 
and the RDBMS Sail is both complicated to install and too slow in aspects like adding and removing statements.
This simply isn't true. I've done tests with undertuned MySQL and been able to get 1650 statements/second sustained for 50 minutes doing the store into the rawtriples table. While that's not lightning fast, it is faster than everyone else's product I've evalulated thus far.

I guess it's all a matter of perspective smile . Suffice to say that for some use cases, this performance is not adequate. And the installation issue is a big one for us since it makes it non-trivial to use Sesame embedded in an application.
 

I think you need to consider other changes before just out and abandoing your RDBMS, for this simple reason: provided your DB schema has been optimized, the RDBMS will always be more efficent at storing data and retreiving it than you will be. While I can understand the desire for a self-contained Sail (and I'm not trying to discourage it), I think you shouldn't consider it an end-all and be willing to extend and do further work with what you already have.

Let me just come straight out and say this: we are not considering abandoning the RDBMS Sail. It will continue to be developed and improved. Of course, any help would be most welcome biggrin

We don't consider Sesame a single tool, but rather a platform that offers several option for different use cases and scenarios. And we do recognize that a native storage system will not solve all our problems magically, but we have high hopes for it nonetheless...

 

You also say:
 
The Sail should be able to handle millions (O(10^6) - O(10^7)) of RDF statements.
I'm going to say point-blank I'm interested in storing and accessing several powers of 10 more statements, both as raw triples (i.e., RDFRepository style) and as schemed and inferenced data (i.e., RDFSchemaRepository style). While I'm not really interested in writing exact numbers on a public forum, I will say that 5 million RDF triples is a small amount of data for me.

One thing that might interest you then is that the Vrije Universiteit Amsterdam and the Technical University Eindhoven are cooperating to implement a distributed version of Sesame. There is a paper about it.
 

(...) I actually have some interest in doing this right now, provided we determine Sesame is adequate for our needs. If we do, then I would be more than willing to provide code to help extend and scale the RDBMS Sail, and my employeer has already agreeded to let us release any changes we make back to the mainline Sesame tree, provided you guys would accept it smile The changes we'd be making would include scability changes (including both speed and volume) and changes to make it eaiser to support Sesame on other database platforms, and to take greater advantage of features available on those platforms (e.g., stored procedures on Oracle and Postgresql, large object support on Postgresql, JTA on platforms that support it, etc.).

Well now, that is an offer we can't refuse of course. Your expertise and contributions will be most welcome, let us know if you need anything from us.
 

I'm currently working on getting Sesame fully working with MS SQL Server, hopefully after that I'll be able to draft some information about initial changes I'd like to make to support the above.

That's great, we'd be more than happy to discuss it.
----------------------------------------
Researcher at AFSG - Wageningen UR
Show Printable Version of Post     [Link] Report threatening or abusive post: please login first  Go to top 
Aug 6, 2004 3:51:46 PM

askutt
Member



Joined: Aug 2, 2004
Posts: 7
Status: Offline

Re: A new Sail for Sesame

 

I guess it's all a matter of perspective smile . Suffice to say that for some use cases, this performance is not adequate. And the installation issue is a big one for us since it makes it non-trivial to use Sesame embedded in an application.
Understandable. My perspective is (obviously) very opposite your current perspective, as I'm interested in mostly high-end enterprise applications, where the installation issue matters less than the performance.

 
Let me just come straight out and say this: we are not considering abandoning the RDBMS Sail. It will continue to be developed and improved. Of course, any help would be most welcome biggrin
While I I'm not being dedicated to this task, I have been tasked at looking at ways to optimize the RDBMS Sail. Whether I'll actually get to make any changes I can't say currently, but I think the outlook is good.

 
We don't consider Sesame a single tool, but rather a platform that offers several option for different use cases and scenarios.
Which is excellent for everyone, not to mention just plain cool smile

 
And we do recognize that a native storage system will not solve all our problems magically, but we have high hopes for it nonetheless...
Which is understandable. I realize that my requirements are probably rather unique and that a integrated native Sail would probably be more than adequate for most people building off the Sesame platform. While I don't think I'll be able to contribute to that portion of the project, I really hope you build something that meets your goals and needs.

 

One thing that might interest you then is that the Vrije Universiteit Amsterdam and the Technical University Eindhoven are cooperating to implement a distributed version of Sesame. There is a paper about it.
Thanks for the paper, I'm going to read it today. Distributed query is one thing we may be looking at in the long-term, but it isn't an immediate goal of our project.

 
Well now, that is an offer we can't refuse of course. Your expertise and contributions will be most welcome, let us know if you need anything from us.
Well, I certainly will be asking when I get to the actual code stage; sadly, I'm not quite there yet as we're still doing lots of planning, feasibility testing, etc. Hopefully I can get there within a few weeks or so.

 
 
I'm currently working on getting Sesame fully working with MS SQL Server, hopefully after that I'll be able to draft some information about initial changes I'd like to make to support the above.

That's great, we'd be more than happy to discuss it.
Well, MS SQL Server is working now ;) As a procedural question, would you rather me open an issue in the issue tracker, or create a post in the bug reports forum? I have working patches and can verify functionality.
----------------------------------------
[Edit 1 times, last edit by askutt at Aug 6, 2004 3:53:24 PM]
Show Printable Version of Post        Hidden to Guest    http://mantech-ist.com    TheHunter317 [Link] Report threatening or abusive post: please login first  Go to top 
Aug 9, 2004 8:58:23 AM

jeen
Sesame Addict
Member's Avatar

The Netherlands
Joined: Jan 23, 2004
Posts: 1091
Status: Offline
Re: A new Sail for Sesame

 
(...) As a procedural question, would you rather me open an issue in the issue tracker, or create a post in the bug reports forum? I have working patches and can verify functionality.

An issue in the tracker would be great, that way we can schedule it for a release when appropriate, etc.
----------------------------------------
Researcher at AFSG - Wageningen UR
Show Printable Version of Post     [Link] Report threatening or abusive post: please login first  Go to top 
Sep 8, 2004 9:56:32 AM

bartman
Sesame Addict
Member's Avatar


Joined: May 10, 2004
Posts: 113
Status: Offline
Re: A new Sail for Sesame

Hi Arjohn,

How is it going with the new sail-Layer you started? What is your experience so far concerning the performance? As you know, I'm especially interested in better update performance of the new layer (due to the limits of the current memory-repository). I could do some early testing if you want.

Bartman
----------------------------------------
[Edit 1 times, last edit by bartman at Sep 8, 2004 10:14:55 AM]
Show Printable Version of Post        http://www.ontomedia.de [Link] Report threatening or abusive post: please login first  Go to top 
Posts: 21   Pages: 3   [ 1 2 3 | Next Page ]
[Show Printable Version of Thread]