I have :

I have a PDF which are in two-column format.Is there a way to read each PDF according to the two-column format without cropping each PDF individually?
I found an alternative method, you can crop the pdf with two part, left and right, then merge left content and right content for every page, you can try this:
# https://github.com/jsvine/pdfplumber
import pdfplumber
x0 = 0 # Distance of left side of character from left side of page.
x1 = 0.5 # Distance of right side of character from left side of page.
y0 = 0 # Distance of bottom of character from bottom of page.
y1 = 1 # Distance of top of character from bottom of page.
all_content = []
with pdfplumber.open("file_path") as pdf:
for i, page in enumerate(pdf.pages):
width = page.width
height = page.height
# Crop pages
left_bbox = (x0*float(width), y0*float(height), x1*float(width), y1*float(height))
page_crop = page.crop(bbox=left_bbox)
left_text = page_crop.extract_text()
left_bbox = (0.5*float(width), y0*float(height), 1*float(width), y1*float(height))
page_crop = page.crop(bbox=left_bbox)
right_text = page_crop.extract_text()
page_context = '\n'.join([left_text, right_text])
all_content.append(page_context)
if i < 2: # help you see the merged first two pages
print(page_context)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With