openRDF.org Welcome Guest   | Login   
  Search  
  Index  | Recent Threads  | Who's Online  | User List  | Search  | Help  | RSS feeds

Forum closing down
This forum will be closing down due to extensive spamming activities. As a first step, registration of new members has been disabled. Existing members will be able to use the forum for now, but please consider using the sesame-general mailing list instead.


Quick Go »
Thread Status: Normal
Total posts in this thread: 8
[Add To My Favorites] [Watch this Thread] [Post new Thread]
Author
Previous Thread This topic has been viewed 2068 times and has 7 replies Next Thread
Sep 24, 2004 12:26:37 AM

Hugo
Member



Joined: Sep 13, 2004
Posts: 5
Status: Offline

UTF-8 (Bulgarian language) Reply to this Post
Reply with Quote

My problem is this, i have a furniture dictionary (who does translations
too), with seven languages ( Portuguese, English, French, Italian, German,
Spanish and... Bulgarian), allt these languages work just fine at query
time, but with the Bulgarian one, it seems that sesame don't recognize it.

I have all the information in the RDF file in UTF-8 format, but even so,
it seems that's not enough. When i tried to query some Bulgarian related
information, the query cames empty, but this information is in fact there,
in the RDF file uploaded to sesame repository.

A query that should work... but doesn't:

select X from {<!http://www.dictionary.com/bg/Корниз>}
<!http://www.dictionary.com/description> {Y}, {Y}
<!http://www.dictionary.com/term/detail> {X}

Is it possible to send an attach here, with a sample of my RDF file?
Any help would be gold...

Thanks a lot, Hugo.
----------------------------------------
Attachment rdfInUTFSample.rdf (38605 bytes) (Download Count: 71)

Show Printable Version of Post        Hidden to Guest [Link] Report threatening or abusive post: please login first  Go to top 
Sep 24, 2004 7:49:09 AM

jeen
Sesame Addict
Member's Avatar

The Netherlands
Joined: Jan 23, 2004
Posts: 1091
Status: Offline
Re: UTF-8 (Bulgarian language) Reply to this Post
Reply with Quote

The problem is probably that the SeRQL engine does not accept non-ASCII chars in variable names or qnames. This is a known bug (see SES-78).

Your case seems slightly different in that you use these chars in URIs, though. Do you get an error when you try this query, or just an empty result?

Thanks for the dataset, that will be useful for testing.
----------------------------------------
Researcher at AFSG - Wageningen UR
Show Printable Version of Post     [Link] Report threatening or abusive post: please login first  Go to top 
Sep 24, 2004 9:23:38 AM

arjohn
OpenRDF project lead
Member's Avatar

The Netherlands
Joined: Jan 23, 2004
Posts: 1289
Status: Offline
Re: UTF-8 (Bulgarian language) Reply to this Post
Reply with Quote

The problem is with the query: SeRQL uses the N-Triples notation for URIs, which requires characters outside the US-ASCII range to be encoded. You can use the method org.openrdf.rio.ntriples.NTriplesUtil.toNTriplesString(...) to encode the URIs.

On a related note: maybe the strict adherence to the N-Triples notation is not the most ideal solution for SeRQL. I guess most people would like to write queries using their full character set.

Arjohn
----------------------------------------
Arjohn Kampman, OpenRDF project lead, Aduna
Show Printable Version of Post        Hidden to Guest [Link] Report threatening or abusive post: please login first  Go to top 
Sep 24, 2004 2:33:38 PM

Hugo
Member



Joined: Sep 13, 2004
Posts: 5
Status: Offline

Re: UTF-8 (Bulgarian language) Reply to this Post
Reply with Quote

thanks for replying...

I get an empty response...

Thanks, Hugo.
Show Printable Version of Post        Hidden to Guest [Link] Report threatening or abusive post: please login first  Go to top 
Sep 24, 2004 2:39:40 PM

Hugo
Member



Joined: Sep 13, 2004
Posts: 5
Status: Offline

Re: UTF-8 (Bulgarian language) Reply to this Post
Reply with Quote

Thanks for replying.

Does that mean that i can't query directly on sesame, throw the database online page, only by the API?

Can you give me an example of that method, aplyed to the query that i put in my previous post?

Once again, thank's a lot for your help.

Hugo.
Show Printable Version of Post        Hidden to Guest [Link] Report threatening or abusive post: please login first  Go to top 
Sep 24, 2004 10:50:21 PM

arjohn
OpenRDF project lead
Member's Avatar

The Netherlands
Joined: Jan 23, 2004
Posts: 1289
Status: Offline
Re: UTF-8 (Bulgarian language) Reply to this Post
Reply with Quote

Hmm, I had a second look and the SeRQL query parser doesn't seem to be using the N-Triples notation for URIs. I guess I should stop posting on the forum before my first cup of coffee;-)

So, the problem must be somewhere else. The latin-1 characters in your example data appear to be working normally, it's the non-latin-1 characters that are problematic. Probably, the encoding and decoding that that is performed in the HTTP-POST call messes up multi-byte characters. I'll have to take a closer look at this.

Arjohn
----------------------------------------
Arjohn Kampman, OpenRDF project lead, Aduna
Show Printable Version of Post        Hidden to Guest [Link] Report threatening or abusive post: please login first  Go to top 
Oct 1, 2004 2:20:17 PM

arjohn
OpenRDF project lead
Member's Avatar

The Netherlands
Joined: Jan 23, 2004
Posts: 1289
Status: Offline
Re: UTF-8 (Bulgarian language) Reply to this Post
Reply with Quote

Fixed in CVS. All HTTP POST requests have been changed from using www-urlencoded parameters to multipart/file-data and the SeRQL query parser can now handle Unicode characters.

Arjohn

PS: the only part that has not been fixed yet is the RDF explorer from the web interface. It will be fixed asap.
----------------------------------------
Arjohn Kampman, OpenRDF project lead, Aduna
----------------------------------------
[Edit 1 times, last edit by arjohn at Oct 1, 2004 2:26:31 PM]
Show Printable Version of Post        Hidden to Guest [Link] Report threatening or abusive post: please login first  Go to top 
Nov 24, 2007 6:43:15 PM

slon
Member



Joined: Nov 23, 2007
Posts: 7
Status: Offline

Re: UTF-8 (Bulgarian language) Reply to this Post
Reply with Quote

Hi,

I am trying to query the database where an item with id "itemId" has attribute name "attributeName" of value "value", e.g. as in statement:

itemId attributeName "value"^^xsd:string .

When value contains non-ASCII characters this statement is never found. The methods I have attempted were:

(1) Sesame workbench query window
(2) POST and GET queries from custom-written client.

In (2), if I set Content-Type header to multipart/form-data (not "file-data"!) I get:

exceptionClass=org.openrdf.http.server.ClientRequestException
hasParseInfo=false
exceptionMessage=Unsupported content type\: multipart/form-data

Please help.

Thanks.
Show Printable Version of Post     [Link] Report threatening or abusive post: please login first  Go to top 
[Show Printable Version of Thread] [Post new Thread]