Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Newlines missing in sidecar #4

Open
BillyCroan opened this issue Jan 21, 2024 · 1 comment
Open

Newlines missing in sidecar #4

BillyCroan opened this issue Jan 21, 2024 · 1 comment

Comments

@BillyCroan
Copy link

I installed this easyocr version via pipx and I went to compare a bunch of files between the original ocrmypdf and this one, and found that while easyocr is WAY more accurate at getting the letters right, the sidecar is all one line. Less than ideal and sounds like a bug to me.

If I pdftotext the pdf, it comes out on multiple lines. But the sidecar is jacked.

to reproduce, use --sidecar I can provide a jpg for sure if you want.

@jbarlow83
Copy link
Contributor

The output format from easyocr doesn't really have line group, so that information has to be inferred. Using pdftotext -layout should give an accurate reconstruction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants