Decompress FlateDecode Objects in PDF in Python

Question

I am trying the following code to decompress data in PDF

import re
import zlib

pdf = open("some_doc.pdf", "rb").read()
stream = re.compile(r'.*?FlateDecode.*?stream(.*?)endstream', re.S)

for s in stream.findall(pdf):
    s = s.strip('
')
    try:
        print(zlib.decompress(s))
        print("")
    except:
        pass

But it is show me the following error File "D:\pdf_flatedecode.py", line 8, in for s in stream.findall(pdf): TypeError: cannot use a string pattern on a bytes-like object Please help me. I am not able to finding out the problem. My python version is 3.7.1

Andrey Starkov · Accepted Answer

Core problem is that you open your pdf in 'binary'-mode, so you have to compile your regex from bytes, not from str. I'm not sure 100% it's working the way you supposed to, but try this:

import re
import zlib

pdf = open("some_doc.pdf", "rb").read()
stream = re.compile(b'.*?FlateDecode.*?stream(.*?)endstream', re.S)

for s in re.findall(stream,pdf):
    s = s.strip(b'
')
    try:
        print(zlib.decompress(s).decode('UTF-8'))
        print("")
    except:
        pass

Decompress FlateDecode Objects in PDF in Python

Tags:

python

python-3.x

San

1 Answers

Andrey Starkov

Recent Activity

Donate For Us

Decompress FlateDecode Objects in PDF in Python

Tags:

python

python-3.x

San

1 Answers

Andrey Starkov

Related questions

Recent Activity

Donate For Us