Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use latest version of objects from object streams (#1) #169

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

mjbryant
Copy link

Previously, when an object was parsed from an object stream and it referenced an indirect object, it'd pull the current version of that object at parse time. This means if you have an object stream that declares updated versions of two objects, the first of which references the second, the first object will have the incorrect old value for the second object. For example, if the content of an object stream is something like (formatted for clarity, and with probably incorrect offsets):

1 0 2 40 
<</Count 3 /Kids [2 0 R] /Type /Pages>>
<</Count 3 /Kids [4 0 R 5 0 R 6 0 R] /Parent 1 0 R /Type /Pages>>

The object stream here defines both objects (1, 0) and (2, 0). If this is an incremental update for (2, 0), the previous version of the code would make /Kids for (1, 0) the previous version of (2, 0). This was manifesting in several PDFs we found in the wild as incorrect page counts. The PDFs had added additional pages in incremental updates, and the old /Pages objects with incorrect kids were getting used.

I've ran this branch against all pdfrw tests and they all still pass. This includes roundtrips for lots of existing PDFs, so I'm fairly confident that it's not going to break the status quo. It also fixes several of the PDFs that broke for us on pdfrw master.

* Load object streams starting from latest, and don't clobber later
versions of objects from object streams

* Ignore pyenv's local file
@pmaupin
Copy link
Owner

pmaupin commented Jun 29, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants