openRDF.org Welcome Guest   | Login   
  Search  
  Index  | Recent Threads  | Who's Online  | User List  | Search  | Help  | RSS feeds

Forum has been closed down
This forum has been closed down due to extensive spamming activities. Please use the mailing list instead.


Quick Go »
Thread Status: Normal
Total posts in this thread: 11
Posts: 11   Pages: 2   [ 1 2 | Next Page ]
[Add To My Favorites] [Watch this Thread]
Author
Previous Thread This topic has been viewed 5785 times and has 10 replies Next Thread
Nov 26, 2007 5:13:38 PM

slon
Member



Joined: Nov 23, 2007
Posts: 7
Status: Offline

quering for UTF-8 string containing non-ASCII characters

For some reason I cannot query for a UTF-8 string containing non-ascii characters neither from sesame-workbench nor from a custom HTTP client... Example query:

PREFIX n:<url/n#>
SELECT ?x
WHERE {
?x n:attr "a_utf8_string"^^<http://www.w3.org/2001/XMLSchema#string> .

Could someone help please?

Many thanks.
Show Printable Version of Post     [Link] Report threatening or abusive post: please login first  Go to top 
Nov 26, 2007 7:31:39 PM

arjohn
OpenRDF project lead
Member's Avatar

The Netherlands
Joined: Jan 23, 2004
Posts: 1289
Status: Offline
Re: quering for UTF-8 string containing non-ASCII characters

Can you try encoding the string using SeRQLUtil.encodeString(...) before sending it to the server?
----------------------------------------
Arjohn Kampman, OpenRDF project lead, Aduna
Show Printable Version of Post        Hidden to Guest [Link] Report threatening or abusive post: please login first  Go to top 
Nov 26, 2007 8:33:05 PM

slon
Member



Joined: Nov 23, 2007
Posts: 7
Status: Offline

Re: quering for UTF-8 string containing non-ASCII characters

This would only work if my client is written in Java, right? I saw that analogous problem with strings in Bulgarian was solved this way. This is not an option for me as I don't speak Java.

Is there no way I could properly encode UTF-8 strings when communicating via HTTP interface? I communicate with Sesame server via HTTP or Sesame workbench web interface at the mo.

I successfully uploaded the statements using ADD method, as detailed in HTTP interface section, and UTF-8 strings are correctly displayed when I browse the the repo. However, querying the repository with SPARQL (POST or GET methods of HTTP), or directly using query section of openrdf-workbench web interface worked only when the strings I search for are ASCII.

Many thanks in advance.
Show Printable Version of Post     [Link] Report threatening or abusive post: please login first  Go to top 
Nov 27, 2007 7:44:07 AM

arjohn
OpenRDF project lead
Member's Avatar

The Netherlands
Joined: Jan 23, 2004
Posts: 1289
Status: Offline
Re: quering for UTF-8 string containing non-ASCII characters

 
This would only work if my client is written in Java, right? I saw that analogous problem with strings in Bulgarian was solved this way. This is not an option for me as I don't speak Java.

Yes, but the algorithm is easy enough to translate to any programming language in just a few minutes. You can find it here.
 
Is there no way I could properly encode UTF-8 strings when communicating via HTTP interface? I communicate with Sesame server via HTTP or Sesame workbench web interface at the mo.

I successfully uploaded the statements using ADD method, as detailed in HTTP interface section, and UTF-8 strings are correctly displayed when I browse the the repo. However, querying the repository with SPARQL (POST or GET methods of HTTP), or directly using query section of openrdf-workbench web.

GET-requests are a no-go for anything that is non-ASCII; the HTTP protocol doesn't properly specify how such characters should be processed so the behaviour varies from one server implementation to another.

It should be possible to HTTP-POST the query parameters to the server though. Make sure that the request's Content-Type header is "application/x-www-form-urlencoded" and that it specifies the correct character encoding, for example via the charset parameter. E.g.:
Content-Type: application/x-www-form-urlencoded; charset=utf-8

Then you also need to make sure that the URL-encoding algorithm that you use properly applies the UTF-8 encoding.

Hope this helps.
----------------------------------------
Arjohn Kampman, OpenRDF project lead, Aduna
Show Printable Version of Post        Hidden to Guest [Link] Report threatening or abusive post: please login first  Go to top 
Nov 27, 2007 9:53:35 AM

slon
Member



Joined: Nov 23, 2007
Posts: 7
Status: Offline

Re: quering for UTF-8 string containing non-ASCII characters

 
It should be possible to HTTP-POST the query parameters to the server though. Make sure that the request's Content-Type header is "application/x-www-form-urlencoded" and that it specifies the correct character encoding, for example via the charset parameter. E.g.:
Content-Type: application/x-www-form-urlencoded; charset=utf-8


Yes, I figured that POST is the way. This is what I get with RC1 though:
exceptionClass=org.openrdf.http.server.ClientRequestException
hasParseInfo=false
exceptionMessage=Unsupported content type\: application/x-www-form-urlencoded; charset\=utf-8
Server would accept 'application/x-www-form-urlencoded', but not 'application/x-www-form-urlencoded; charset=utf-8'. Same story with 'multipart/form-data'...
Show Printable Version of Post     [Link] Report threatening or abusive post: please login first  Go to top 
Nov 27, 2007 10:11:16 AM

slon
Member



Joined: Nov 23, 2007
Posts: 7
Status: Offline

Re: quering for UTF-8 string containing non-ASCII characters

 
Yes, but the algorithm is easy enough to translate to any programming language in just a few minutes. You can find it here.


Agreed.

I looked through the code for encodeString function. It transpires that the string I would pass to that function would be returned unchanged, as, while trying to isolate the problem, I reduced the string I am querying for down to just a few characters. It does not contain any backslashes, tab or newline, backspace or formfeed characters... I'll keep that in mind though, thanks.
Show Printable Version of Post     [Link] Report threatening or abusive post: please login first  Go to top 
Nov 27, 2007 10:38:47 AM

slon
Member



Joined: Nov 23, 2007
Posts: 7
Status: Offline

Re: quering for UTF-8 string containing non-ASCII characters

 
Then you also need to make sure that the URL-encoding algorithm that you use properly applies the UTF-8 encoding.

When communicating with sesame by POST, do I pass the query as QUERY_STRING parameter query=my_query and send empty body, or do I send it as POST body? It wasn't clear from the docs for me which one is the right way, but only QUERY_STRING worked for me.
If I pass the encoded QUERY_STRING to another cgi application, it decodes it correctly and is able to interpret UTF-8 characters in it. I therefore assume the UTF-8 to URL encoding is correctly handled in my client.
Thank you for your help, I really hope I can sort this...
Show Printable Version of Post     [Link] Report threatening or abusive post: please login first  Go to top 
Nov 27, 2007 1:22:54 PM

arjohn
OpenRDF project lead
Member's Avatar

The Netherlands
Joined: Jan 23, 2004
Posts: 1289
Status: Offline
Re: quering for UTF-8 string containing non-ASCII characters

 
 
It should be possible to HTTP-POST the query parameters to the server though. Make sure that the request's Content-Type header is "application/x-www-form-urlencoded" and that it specifies the correct character encoding, for example via the charset parameter. E.g.:
Content-Type: application/x-www-form-urlencoded; charset=utf-8


Yes, I figured that POST is the way. This is what I get with RC1 though:
exceptionClass=org.openrdf.http.server.ClientRequestException
hasParseInfo=false
exceptionMessage=Unsupported content type\: application/x-www-form-urlencoded; charset\=utf-8
Server would accept 'application/x-www-form-urlencoded', but not 'application/x-www-form-urlencoded; charset=utf-8'. Same story with 'multipart/form-data'...

This sounds like a bug in the server. I just logged this as SES-493.
----------------------------------------
Arjohn Kampman, OpenRDF project lead, Aduna
Show Printable Version of Post        Hidden to Guest [Link] Report threatening or abusive post: please login first  Go to top 
Nov 27, 2007 1:26:50 PM

arjohn
OpenRDF project lead
Member's Avatar

The Netherlands
Joined: Jan 23, 2004
Posts: 1289
Status: Offline
Re: quering for UTF-8 string containing non-ASCII characters

 
 
Yes, but the algorithm is easy enough to translate to any programming language in just a few minutes. You can find it here.


Agreed.

I looked through the code for encodeString function. It transpires that the string I would pass to that function would be returned unchanged, as, while trying to isolate the problem, I reduced the string I am querying for down to just a few characters. It does not contain any backslashes, tab or newline, backspace or formfeed characters... I'll keep that in mind though, thanks.

You're right. I meant to say NTriplesUtil.escapeString(...). Sorry for the confusion.
----------------------------------------
Arjohn Kampman, OpenRDF project lead, Aduna
Show Printable Version of Post        Hidden to Guest [Link] Report threatening or abusive post: please login first  Go to top 
Dec 2, 2007 2:15:26 PM

slon
Member



Joined: Nov 23, 2007
Posts: 7
Status: Offline

Re: quering for UTF-8 string containing non-ASCII characters

Installed RC2. Sesame now accepts header

Content-Type:application/x-www-form-urlencoded; charset=utf-8

Thank you!

However, when I use POST method to post my UTF-8 query to the server I get:

exceptionClass=org.openrdf.http.server.ClientRequestException
hasParseInfo=false
exceptionMessage=Missing parameter\: query

I have tried various QUERY_STRINGS, the ones containing query parameter only, the ones containing queryLn and Infer parameters with no luck --- I don't think they are parsed properly. In my HTTP request I use method POST, path to the repository does not contain query string appended (after ?) but instead sends parameters via a send() method of HTTP object, as POST dictates. The HTTP headers are:

'Host': 'localhost', 'Content-Type': 'application/x-www-form-urlencoded; charset=utf-8', 'Accept': 'text/rdf+n3', 'Accept-Charset', 'UTF-8'

Please help.

Many thanks.
Show Printable Version of Post     [Link] Report threatening or abusive post: please login first  Go to top 
Posts: 11   Pages: 2   [ 1 2 | Next Page ]
[Show Printable Version of Thread]