History | Log In     View a printable version of the current page. Get help!  
Issue Details [XML]

Key: RIO-80
Type: Bug Bug
Status: Open Open
Priority: Major Major
Assignee: Peter Ansell
Reporter: Peter Ansell
Votes: 0
Watchers: 1
Operations

If you were logged in you would be able to see more operations.
Rio

TurtleParser does not support bare unlabelled blank node from latest Turtle Working Draft

Created: 10/Jul/12 03:36 AM   Updated: 23/Oct/12 02:47 AM
Component/s: Turtle parser
Affects Version/s: None
Fix Version/s: None

Environment: Sesame-2.6.7
Issue Links:
Dependency
 
This issue is a dependency for:
SES-945 Update Turtle parser to current W3C W... Major Open


 Description   
I am having some difficulty parsing turtle files generated by OWLAPI, and from looking at the Turtle TeamSubmission document it seems like the Turtle file syntax is valid.

The offending file samples have lines similar to the following sets of lines, where the turtle parser baulks at each one of them with the same stack trace:

@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
[<urn:P> "007"^^xsd:int] .
[<urn:P> "007"^^<http://www.w3.org/2001/XMLSchema#int>] .
[<urn:P> "007"^^<http://www.w3.org/2001/XMLSchema#integer>] .
[<urn:P> "7"^^<http://www.w3.org/2001/XMLSchema#integer>] .
[<urn:P> 7 ] .
[<urn:P> "7" ].
[<urn:P> "not a number" ].
[ <urn:P> "language literal"@fr ].
[ <urn:P> <urn:Q> ] .

The following file, which is similar, parses fine however:
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
[] <urn:P> "007"^^xsd:int .
[] <urn:P> "007"^^<http://www.w3.org/2001/XMLSchema#int> .
[] <urn:P> "007"^^<http://www.w3.org/2001/XMLSchema#integer> .
[] <urn:P> "7"^^<http://www.w3.org/2001/XMLSchema#integer> .
[] <urn:P> 7 .
[] <urn:P> "7" .
[] <urn:P> "not a number" .
[] <urn:P> "language literal"@fr .
[] <urn:P> <urn:Q> .

The stack trace in each case is as follows, although the samples do not all contain numbers. I focused on testing numbers as the initial broken file had numbers and the break appeared to come from parseNumber, but it breaks in the same way for plain non-numeric literals and URIs:

org.openrdf.rio.RDFParseException: Object for statement missing [line 3]
at org.openrdf.rio.helpers.RDFParserBase.reportFatalError(RDFParserBase.java:525)
at org.openrdf.rio.turtle.TurtleParser.reportFatalError(TurtleParser.java:1109)
at org.openrdf.rio.turtle.TurtleParser.parseNumber(TurtleParser.java:759)
at org.openrdf.rio.turtle.TurtleParser.parseValue(TurtleParser.java:541)
at org.openrdf.rio.turtle.TurtleParser.parsePredicate(TurtleParser.java:391)
at org.openrdf.rio.turtle.TurtleParser.parsePredicateObjectList(TurtleParser.java:311)
at org.openrdf.rio.turtle.TurtleParser.parseTriples(TurtleParser.java:301)
at org.openrdf.rio.turtle.TurtleParser.parseStatement(TurtleParser.java:208)
at org.openrdf.rio.turtle.TurtleParser.parse(TurtleParser.java:186)
at org.openrdf.rio.turtle.TurtleParser.parse(TurtleParser.java:131)

 All   Comments   Change History      Sort Order:
Comment by Peter Ansell [06/Sep/12 06:57 AM]
The relevant section of the July 2012 W3C Turtle Working Draft is:

http://www.w3.org/TR/2012/WD-turtle-20120710/#unlabeled-bnodes

Comment by Peter Ansell [22/Oct/12 06:13 AM]
I am working on this issue at the following URL:

https://github.com/ansell/openrdf-sesame/compare/master...RIO-80

I modified TurtleParser so that the test file above parses without error, but the change causes an existing test file to fail to parse (test-05.ttl).

I am querying the public-rdf-comments W3C mailing list [1] to clarify whether the grammar in the July 2012 working draft is correct, and whether there is a test suite available, as there are two different test suites right now from Jena and Raptor, and the current Sesame test suite seems to be an old subset of the Raptor test suite that does not cover all of the current Turtle features.

[1] http://lists.w3.org/Archives/Public/public-rdf-comments/2012Oct/thread.html