History | Log In     View a printable version of the current page. Get help!  
Issue Details [XML]

Key: RIO-73
Type: Bug Bug
Status: Open Open
Priority: Major Major
Assignee: Arjohn Kampman
Reporter: Rob Vesse
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Rio

Turtle Files with a UTF-8 BOM fail to parse

Created: 10/Jan/11 09:18 AM   Updated: 22/Feb/11 03:35 PM
Component/s: Turtle parser
Affects Version/s: None
Fix Version/s: None

File Attachments: 1. File ttl-with-bom.ttl (0.1 kb)

Environment: Windows 7


 Description   
If a Turtle file starts with a UTF-8 BOM then it will fail to parse. The UTF-8 BOM is not required by the Unicode specifications but is not disallowed by it so the parser should accept such files.

A sample file ttl-with-bom.ttl is attached to this issue

 All   Comments   Change History      Sort Order:
Comment by Rob Vesse [10/Jan/11 09:18 AM]
Attached a sample file that will cause the parser to fail

Change by Rob Vesse [10/Jan/11 09:18 AM]
Field Original Value New Value
Attachment ttl-with-bom.ttl [ 10320 ]

Comment by Arjohn Kampman [10/Jan/11 09:28 PM]

Comment by Rob Vesse [22/Feb/11 03:35 PM]
I am well aware of the whole never ending story about Sun refusing to fix this and this being a core Java bug but this is still an issue that should be fixed in Sesame and is fixable since I know other Java based projects like Jena have already done so.

If you don't fix it then you are not providing good data interoperability (which ultimately is what RDF is about) since files created with other tools (esp. Windows text editors like Notepad) will just not work for no apparent reason.