Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store Token position in the produces quads #377

Open
BenjaminHofstetter opened this issue Feb 16, 2024 · 6 comments
Open

Store Token position in the produces quads #377

BenjaminHofstetter opened this issue Feb 16, 2024 · 6 comments

Comments

@BenjaminHofstetter
Copy link

Why do I need that:
After parsing a Turtle file, I lose all information about the source file. For better tooling support, I propose implementing some kind of "source maps" to trace back from quads to positions in the Turtle file.

For instance, in tools like https://shacl-playground.zazuko.com/, when encountering errors in SHACL validation reports, locating the error-causing triple requires human intervention. With source map information, editors could pinpoint the exact location in the Turtle file, aiding in error resolution. Implementing source maps would bridge the gap between parsed files and their source, enhancing tooling support. The tokenizer already generates tokens with line, start, and end information, laying the groundwork for this feature.

@RubenVerborgh
Copy link
Member

This would be possible indeed, if the parser emits the context from the tokenizer in the quads.

We have no plans to take this up, but a pull request that puts this functionality behind a flag would be welcome, provided it has no performance impact when switched off.

@faubulous
Copy link

This is excactly what I need too. I am currently developing an RDF editing extension for Visual Studio Code named Mentor. For this use case I frequently need to resolve URIs and blank nodes to parsed Tokens and this feature would be extremely helpful.

I found a workaround for URIs which requires parsing the document again after loading and interpreting the Triples, but that only works for URIs and not for blank nodes. This currently blocks me from implementing SHACL support where blank node definitions of (property) shapes are quite common.

Any idea how such source maps could be implemented?

@jeswr
Copy link
Collaborator

jeswr commented Jun 24, 2024

Any idea how such source maps could be implemented?

Luckily tokens emitted by the Lexer already contain information about the line and position of each token emitted by the lexer. In the Parser you could add this information property of Terms every time a new _subject, _predicate, _object or _graph is assigned in the parser. For instance the code here would become

this._subject = this._blankNode();
if (this._recordPosition) {
  this._subject[POS] = { line: token.line, start: token.start }
}
this._saveContext('blank', this._graph,
                        this._subject, null, null);

I would recommend making POS a Symbol that is exported by N3.js, however it could also just be a property name like _internal_position.

The caveat of this approach would be that it might cause a non-negligible performance hit even when the feature is disabled; but I suspect this is something you can perf. test and optimise once the feature is implemented.

@BenjaminHofstetter
Copy link
Author

BenjaminHofstetter commented Jun 25, 2024

I did a POC some time ago. I added it as a use case in the RDF-Star working group. Maybe in the future we can use RDF-Start to define such source maps "externally" from the source turtle.
w3c/rdf-star#285 (comment)

My poc is using n3 parser and exposes the tokens in the quads (not rdf-star).

@faubulous
Copy link

@BenjaminHofstetter Did you create a patch for N3 and publish the code of the PoC somewhere?

@TallTed
Copy link
Contributor

TallTed commented Jun 26, 2024

Perhaps change the issue title from —
Store Token position in the produces quads
— to —
Store original positions of Tokens in quads produced by conversion from Turtle"
?

(At least, change produces to produced.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants