openRDF.org Welcome Guest   | Login   
  Search  
  Index  | Recent Threads  | Who's Online  | User List  | Search  | Help  | RSS feeds

Forum has been closed down
This forum has been closed down due to extensive spamming activities. Please use the mailing list instead.


Quick Go »
Thread Status: Normal
Total posts in this thread: 9
[Add To My Favorites] [Watch this Thread]
Author
Previous Thread This topic has been viewed 2249 times and has 8 replies Next Thread
Nov 8, 2004 9:01:33 PM

acyment
Regular
Member's Avatar

Argentina
Joined: Sep 1, 2004
Posts: 33
Status: Offline
Graph transaction bottleneck??

new performance question for you guys...in the new memory custom inferencer, rules (and axioms) are specified using CONSTRUCT-FROM queries. I've been profiling it with my app and I found a severe bottleneck when evaluating these queries.
The code that is ran for every rule goes as follows:

QueryResultsGraphBuilder graphBuilder = new QueryResultsGraphBuilder();
query.evaluate(_repository, graphBuilder);
StatementIterator statementIterator = graphBuilder.getGraph().getStatements();


Well, the evaluate() line is taking years to execute...the majority of the time being spent in commiting the transaction that runs every time the graph builder listener knows it must add a node to the resulting graph. Even finer grained analysis results in RdfSource:_updateExportedNamespaces doing huge loops and eating most of the processor time.
Do you think this could be avoided by having the graph builder listener start a transaction when the whole listening process begins and committing it when it ends?

cheers,
Alan
----------------------------------------
"Always write code as if the maintenance programmer were an axe murderer who knows where you live."
----------------------------------------
[Edit 1 times, last edit by acyment at Nov 8, 2004 9:02:15 PM]
Show Printable Version of Post     [Link] Report threatening or abusive post: please login first  Go to top 
Nov 9, 2004 10:57:20 AM

arjohn
OpenRDF project lead
Member's Avatar

The Netherlands
Joined: Jan 23, 2004
Posts: 1289
Status: Offline
Re: Graph transaction bottleneck??

 
Do you think this could be avoided by having the graph builder listener start a transaction when the whole listening process begins and committing it when it ends?

Good plan. I just re-implemented the graph builder class to use a single transaction. As the Graph API doesn't have any transaction features (yet), I had to change the implementation to use the Sail API directly. The modified class is attached to this post. Would you be willing to test it?

Arjohn
----------------------------------------
Attachment QueryResultsGraphBuilder.java (3794 bytes) (Download Count: 60) (Revised implementation of QueryResultGraphBuilder.java)

----------------------------------------
Arjohn Kampman, OpenRDF project lead, Aduna
Show Printable Version of Post        Hidden to Guest [Link] Report threatening or abusive post: please login first  Go to top 
Nov 10, 2004 3:56:27 AM

acyment
Regular
Member's Avatar

Argentina
Joined: Sep 1, 2004
Posts: 33
Status: Offline
Re: Graph transaction bottleneck??

Damned...NullPointerException .when calling _reportNamespaces()..shouldn't CfwQuery::evaluate call startGraphQueryResult() as soon as evaluation begins? I reckon that is where initialization takes place...

cheers,
Alan
----------------------------------------
"Always write code as if the maintenance programmer were an axe murderer who knows where you live."
Show Printable Version of Post     [Link] Report threatening or abusive post: please login first  Go to top 
Nov 10, 2004 8:31:02 AM

arjohn
OpenRDF project lead
Member's Avatar

The Netherlands
Joined: Jan 23, 2004
Posts: 1289
Status: Offline
Re: Graph transaction bottleneck??

 
Damned...NullPointerException .when calling _reportNamespaces()..

My bad. Second attempt attached.
 
shouldn't CfwQuery::evaluate call startGraphQueryResult() as soon as evaluation begins? I reckon that is where initialization takes place...

Not necessarily. Quoting from the javadoc for GraphQueryResultListener.namespace(...) :
Namespaces will be reported before the start of the query result as much as possible to accomodate listeners that need these mapping beforehand (like an RDF/XML document writer).


Arjohn
----------------------------------------
Attachment QueryResultsGraphBuilder.java (3822 bytes) (Download Count: 66) (second try at new QueryResultsGraphBuilder implementation.)

----------------------------------------
Arjohn Kampman, OpenRDF project lead, Aduna
Show Printable Version of Post        Hidden to Guest [Link] Report threatening or abusive post: please login first  Go to top 
Nov 11, 2004 12:33:54 PM

acyment
Regular
Member's Avatar

Argentina
Joined: Sep 1, 2004
Posts: 33
Status: Offline
Re: Graph transaction bottleneck??

It worked flawlessly biggrin !

Now, for a little more profiling data...

* Queries parsing...dunno why, but SimpleCharStream::FillBuff takes years to run... (inner PS: have you used a parser generator, such as ANTLR for this job? if not, it might be a nice addition to Sesame. I reckon there are some pretty fast algorithms out there)
* Can you think of any way of poolling instances of SeRQLParser? A new instance must be built for every string to be parsed...and that takes valuable time...
* RdfRepository constructor always creates a Timer object to be used in sync'ing...is this always necessary? It takes so much time when tiny ad-hoc repositories are used when building a query result...

I guess that's it for now

cheers,
Alan
----------------------------------------
"Always write code as if the maintenance programmer were an axe murderer who knows where you live."
----------------------------------------
[Edit 1 times, last edit by acyment at Nov 11, 2004 12:36:16 PM]
Show Printable Version of Post     [Link] Report threatening or abusive post: please login first  Go to top 
Nov 11, 2004 2:04:06 PM

arjohn
OpenRDF project lead
Member's Avatar

The Netherlands
Joined: Jan 23, 2004
Posts: 1289
Status: Offline
Re: Graph transaction bottleneck??

 
It worked flawlessly biggrin !

Great. Just checked it in in the 1.1 CVS branch.

 
Now, for a little more profiling data...

* Queries parsing...dunno why, but SimpleCharStream::FillBuff takes years to run... (inner PS: have you used a parser generator, such as ANTLR for this job? if not, it might be a nice addition to Sesame. I reckon there are some pretty fast algorithms out there)

The current SeRQL parser is generated by JavaCC (source file is serql.jj). I don't know have it compares to ANTLR. Do you think it generates faster parsers?
 
* Can you think of any way of poolling instances of SeRQLParser? A new instance must be built for every string to be parsed...and that takes valuable time...

You might consider using SerqlParser.ReInit(Reader). Note that this class is not thread-safe so you'll have to do some synchronization here.
 
* RdfRepository constructor always creates a Timer object to be used in sync'ing...is this always necessary? It takes so much time when tiny ad-hoc repositories are used when building a query result...

This isn't necessary. I'll have a "fix" checked in to the 1.1 CVS branch shortly. You can simply remove the Timer construction from the RdfRepository constructor.

 
I guess that's it for now

cheers,
Alan

Thanks a lot for the feedback.

Arjohn
----------------------------------------
Arjohn Kampman, OpenRDF project lead, Aduna
Show Printable Version of Post        Hidden to Guest [Link] Report threatening or abusive post: please login first  Go to top 
Nov 11, 2004 10:18:15 PM

acyment
Regular
Member's Avatar

Argentina
Joined: Sep 1, 2004
Posts: 33
Status: Offline
Re: Graph transaction bottleneck??

 

The current SeRQL parser is generated by JavaCC (source file is serql.jj). I don't know have it compares to ANTLR. Do you think it generates faster parsers?

I guess benchmarking with the SeRQL grammar is the only way to know, but from what I've read, JFlex generated parsers are both faster and easier to read.
----------------------------------------
"Always write code as if the maintenance programmer were an axe murderer who knows where you live."
Show Printable Version of Post     [Link] Report threatening or abusive post: please login first  Go to top 
Nov 12, 2004 9:29:18 AM

arjohn
OpenRDF project lead
Member's Avatar

The Netherlands
Joined: Jan 23, 2004
Posts: 1289
Status: Offline
Re: Graph transaction bottleneck??

 
I guess benchmarking with the SeRQL grammar is the only way to know, but from what I've read, JFlex generated parsers are both faster and easier to read.

Interesting link, but JFlex appears a lexer only. It needs to be combined with a parser generator like CUP to be useful. My experience with JavaCC has been that having a separate lexer is not ideal. I would rather use a context-aware parser generator that can, e.g. distinguish between a simple string token and a full literal token.

Arjohn
----------------------------------------
Arjohn Kampman, OpenRDF project lead, Aduna
Show Printable Version of Post        Hidden to Guest [Link] Report threatening or abusive post: please login first  Go to top 
Nov 12, 2004 10:46:44 PM

acyment
Regular
Member's Avatar

Argentina
Joined: Sep 1, 2004
Posts: 33
Status: Offline
Re: Graph transaction bottleneck??

Maybe we're best off doing a more in-depth profiling of that damned method...
----------------------------------------
"Always write code as if the maintenance programmer were an axe murderer who knows where you live."
Show Printable Version of Post     [Link] Report threatening or abusive post: please login first  Go to top 
[Show Printable Version of Thread]