Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serialize mdast to markdown #127

Open
wants to merge 49 commits into
base: main
Choose a base branch
from
Open

Serialize mdast to markdown #127

wants to merge 49 commits into from

Conversation

bnchi
Copy link

@bnchi bnchi commented Aug 27, 2024

fixes #64

TODO handlers :

  • paragraph
  • text
  • strong
  • emphasis
  • heading
  • html
  • break
  • thematic-break
  • code
  • blockquote
    • Complete the tests for blockquote
  • image
  • inline-code
  • link
  • list
  • list-item
    • Add the tests for list item
  • root
  • definition
  • image-reference
  • link-reference

@bnchi bnchi changed the title Serializing mdast to markdown Serialize mdast to markdown Aug 27, 2024
@bnchi
Copy link
Author

bnchi commented Aug 31, 2024

While working on this, I noticed that the JS implementation relies heavily on regular expressions in some places. I believe we can rewrite some parts of the code in Rust instead of depending on the regex crate. Although the regex crate guarantees linear time complexity, we can eliminate the need for the regex engine in places where simpler Rust code would work. Maybe we can benchmark first but this won't happen until I move over most of the tests into Rust.

what do you think @wooorm ?

One more question what do you think make more sense here for the maintainability of this utility :

  1. To create a new workspace for mdast to markdown ?
  2. To create a separate module to group most of the related code for this util ?

@wooorm
Copy link
Owner

wooorm commented Sep 2, 2024

While the core of markdown-rs is no-std + alloc, that’s for the use case where you only really do markdown -> html, straight.
Here we are working with ASTs. I think it’s fine to depend on regexes there.
And, both JS and Rust will need to be maintained and kept in sync. So, fine to improve things if it’s fast, just, that has to be backported.
I’d recommend going with regexes first. To focus on porting the code. Instead of changing the code. That becomes confusing. Hard to manage.

As for workspace vs module, I have to particular preference currently. You?

@bnchi
Copy link
Author

bnchi commented Sep 3, 2024

Since we're working with an AST and this is a utility a workspace sounds better to me this will help to isolate the tests when we quickly run cargo test and version this util by it's own too.

It's not that hard to change that decision later on, I updated the PR to use a workspace instead and kept the no_std constrain.

@bnchi
Copy link
Author

bnchi commented Sep 7, 2024

I'm working on the definition handler and the handler depends on https://github.com/micromark/micromark/tree/main/packages/micromark-util-decode-string to decode a markdown string, I've looked up the code in this crate and found out you've indeed ported some parts of micromark utils.

I was wondering if it's ok to make the util from within the markdown crate pub in order to leverage functions such as decode_named and decode_numeric and add some needed constants within util/constant that the decode-string relies on. Since this is being developed in a workspace markdown::util needs to be pub.

@wooorm

@wooorm
Copy link
Owner

wooorm commented Sep 9, 2024

Some of the utils are already pub because they‘re used in https://github.com/wooorm/mdxjs-rs. So, having some utils pub is fine. Also, putting them in separate crates is also fine.

@bnchi bnchi marked this pull request as ready for review September 18, 2024 12:54
@bnchi
Copy link
Author

bnchi commented Sep 18, 2024

@wooorm This is ready for review more things might need to be done :

  • Support tight definitions
  • Support for extensions?
  • Clean up the code more
  • Add more tests

I've made it ready for review to see if you think I've missed something before addressing some of the points above. Thanks !

Copy link
Owner

@wooorm wooorm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is such an effort. Thank you! 🙇‍♂️

I didn’t check line by line, comparing JS to Rust. I’m assuming that you‘ve probably gotten that right, and trust the tests.

Some small nits.

Otherwise, looking forward to the rest of the work.
And, is there anything in particular you want me to look at?

mdast_util_to_markdown/Cargo.toml Show resolved Hide resolved
reason: value.to_string(),
}
}
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there’s an error struct now in this project, since #114. Which models the JS version vfile-message. Could that be used here too?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes sure

@bnchi
Copy link
Author

bnchi commented Sep 19, 2024

And, is there anything in particular you want me to look at?

  1. if you look at the handler abstraction https://github.com/wooorm/markdown-rs/pull/127/files#diff-91a1db387b2a6664c89d595496a72eab448dfcfe854396bdef68aaaf7c0c0bdaR25 there's no Tracker it doesn't seem like the tracker is important in the serialization process a lot so I opt out of porting it over, are you ok with this decision ?
  2. I didn't test the performance of this implementation do you have a particular markdown file to test the JS implementation which I can use here ?
  3. https://github.com/wooorm/markdown-rs/pull/127/files#diff-f9c02e00091566ab3f89e3fd7e9f44fee067ac1ecee8afc5ee18c129431a2692R11 what do you think of the error messages that the users see ?
  4. There are a few nodes that are not serialized like tables which I think are GFM and a few others, should we add them to this PR too ?
  5. For handlers and extended joins we would probably need to use some sort of dynamic dispatcher and I haven't thought about it much tbh, do you think they should be in this PR or should I propose a solution in an issue and make a separate PR ?
  6. I've depended on Regex less in trivial places in this code if you like I can probably make a separate PR to the JS version with the same changes.
  7. https://github.com/wooorm/markdown-rs/pull/127/files#diff-002c11b91d07900f0eff0a7ebe4d82aec67136f19b764c185362ed5e1e46d377 I've added an Association abstraction within this project, which is the same one defined in mdast here https://github.com/syntax-tree/mdast?tab=readme-ov-file#association this is mainly to make the code a bit easier to read though I'm not sure if that's the right place for the abstraction or if it should be part of the markdown crate.

@bnchi
Copy link
Author

bnchi commented Sep 19, 2024

By looking at the CI it doesn't seem like the tests are running for this project since it's in a separate workspace we probably need to update the CI jobs to include mdast_util_to_markdown in the pipeline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Serializing mdast to markdown
2 participants