apply_redactions() does not work as expected #3863

nsklei · 2024-09-15T20:10:38Z

Description of the bug

When using apply_redactions(images=pymupdf.PDF_REDACT_IMAGE_NONE) I get several "MuPDF error: syntax error: cannot find XObject resource" errors and as well there are some pages which are completely empty, altough all pages originally contain images.

How to reproduce the bug

import pymupdf
from io import BytesIO
from pathlib import Path

file_path = "path\to\Example_PDF.pdf"
output_path = "path\to\Example_PDF_redacted.pdf"

new_doc = pymupdf.open(file_path)

for num, page in enumerate(new_doc):
    print(f"Page {num + 1} - {page.rect}:")
    
    for image in page.get_images(full=True):
        print(f"  - Image: {image}")

    redact_rect = page.rect

    if page.rotation in {90, 270}:
        redact_rect = pymupdf.Rect(0, 0, page.rect.height, page.rect.width)

    page.add_redact_annot(redact_rect)
    page.apply_redactions(images=pymupdf.PDF_REDACT_IMAGE_NONE)

byte_stream = BytesIO()
new_doc.save(byte_stream)
byte_stream.seek(0)

Path(output_path).write_bytes(byte_stream.getvalue())

The code above prints the following information:

Page 1 - Rect(0.0, 0.0, 598.3200073242188, 813.5999755859375):
  - Image: (22, 0, 554, 754, 8, 'ICCBased', '', 'Im0', 'DCTDecode', 0)
  - Image: (23, 43, 554, 754, 8, 'ICCBased', '', 'Im1', 'DCTDecode', 0)
Page 2 - Rect(0.0, 0.0, 598.3200073242188, 816.47998046875):
  - Image: (25, 0, 554, 756, 8, 'ICCBased', '', 'Im001', 'DCTDecode', 0)
  - Image: (26, 44, 554, 756, 8, 'ICCBased', '', 'Im002', 'DCTDecode', 0)
Page 3 - Rect(0.0, 0.0, 815.760009765625, 596.8800048828125):
  - Image: (28, 0, 553, 756, 8, 'ICCBased', '', 'Im001', 'DCTDecode', 0)
  - Image: (29, 45, 553, 756, 8, 'ICCBased', '', 'Im002', 'DCTDecode', 0)
Page 4 - Rect(0.0, 0.0, 815.760009765625, 597.5999755859375):
  - Image: (31, 0, 554, 756, 8, 'ICCBased', '', 'Im001', 'DCTDecode', 0)
  - Image: (32, 46, 554, 756, 8, 'ICCBased', '', 'Im002', 'DCTDecode', 0)
Page 5 - Rect(0.0, 0.0, 815.0399780273438, 597.5999755859375):
  - Image: (34, 0, 554, 755, 8, 'ICCBased', '', 'Im001', 'DCTDecode', 0)
  - Image: (35, 47, 554, 755, 8, 'ICCBased', '', 'Im002', 'DCTDecode', 0)
Page 6 - Rect(0.0, 0.0, 806.4000244140625, 598.3200073242188):
  - Image: (37, 0, 554, 747, 8, 'ICCBased', '', 'Im001', 'DCTDecode', 0)
  - Image: (38, 48, 554, 747, 8, 'ICCBased', '', 'Im002', 'DCTDecode', 0)
Page 7 - Rect(0.0, 0.0, 815.0399780273438, 597.5999755859375):
  - Image: (39, 0, 554, 755, 8, 'ICCBased', '', 'Im001', 'DCTDecode', 0)
  - Image: (40, 49, 554, 755, 8, 'ICCBased', '', 'Im002', 'DCTDecode', 0)
MuPDF error: syntax error: cannot find XObject resource 'Im1'

MuPDF error: syntax error: cannot find XObject resource 'Im2'

Page 8 - Rect(0.0, 0.0, 815.760009765625, 596.8800048828125):
  - Image: (41, 0, 553, 756, 8, 'ICCBased', '', 'Im001', 'DCTDecode', 0)
  - Image: (42, 50, 553, 756, 8, 'ICCBased', '', 'Im002', 'DCTDecode', 0)
MuPDF error: syntax error: cannot find XObject resource 'Im1'

MuPDF error: syntax error: cannot find XObject resource 'Im2'

As you can see, each page contains two images. The function should remove all content from the PDF file except the images.
But when saving the byte_stream there are some pages completely empy.

PyMuPDF version

1.24.10

Operating system

Windows

Python version

3.12

The text was updated successfully, but these errors were encountered:

JorjMcKie · 2024-09-15T20:21:54Z

This post cannot be accepted as a bug report because no reproducer file is provided.

JorjMcKie · 2024-09-15T21:47:24Z

test2.pdf

MuPDF bug report: https://bugs.ghostscript.com/show_bug.cgi?id=708032.

JorjMcKie · 2024-09-15T21:49:13Z

@nsklei - You are aware that all pages only contain images - no text, no vector graphics.
So your redactions effectively are no-ops!

nsklei · 2024-09-15T21:54:26Z

Thank you for reviewing my issue and creating a bug report.
The described behaviour in your bug report is correct. I am aware, that all pages only contain images and nothing else, so the redactions should indeed be no-ops in this case.

JorjMcKie · 2024-09-17T08:59:07Z

I found that removing page rotation avoids the problem:

for page in doc:
    page.add_redact_annot(page.rect * page.derotation_matrix)
    page.remove_rotation()
    page.apply_redactions(images=pymupdf.PDF_REDACT_IMAGE_NONE)

Works without problem.

nsklei · 2024-09-17T09:48:14Z

Thank you for providing a solution to my problem. I tested your suggestion and it works perfectly :)

JorjMcKie · 2024-09-17T10:12:56Z

Thanks for the feedback!
I am going to re-open this until the fix itself is publicly available. This is our policy for dealing with issue resolutions.

JorjMcKie added example required Waiting for information and removed example required Waiting for information labels Sep 15, 2024

JorjMcKie added the upstream bug bug outside this package label Sep 15, 2024

nsklei closed this as completed Sep 17, 2024

JorjMcKie reopened this Sep 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

apply_redactions() does not work as expected #3863

apply_redactions() does not work as expected #3863

nsklei commented Sep 15, 2024 •

edited

Loading

JorjMcKie commented Sep 15, 2024

JorjMcKie commented Sep 15, 2024

JorjMcKie commented Sep 15, 2024

nsklei commented Sep 15, 2024

JorjMcKie commented Sep 17, 2024

nsklei commented Sep 17, 2024

JorjMcKie commented Sep 17, 2024

apply_redactions() does not work as expected #3863

apply_redactions() does not work as expected #3863

Comments

nsklei commented Sep 15, 2024 • edited Loading

Description of the bug

How to reproduce the bug

PyMuPDF version

Operating system

Python version

JorjMcKie commented Sep 15, 2024

JorjMcKie commented Sep 15, 2024

JorjMcKie commented Sep 15, 2024

nsklei commented Sep 15, 2024

JorjMcKie commented Sep 17, 2024

nsklei commented Sep 17, 2024

JorjMcKie commented Sep 17, 2024

nsklei commented Sep 15, 2024 •

edited

Loading