I have a list of pdf files and I want to analyze the first page of each document to extract information. I've tried a lot of free and paid OCR, but in my case, the results aren't good enough.
So I want to try using the ChatGPT API in python. How do I go about it?
Also, I saw in openAI Vision documentation that there is a detail
parameter but there is no example provided, how do I use this parameter?
Firstly, you need to extact the first page of each document as image (here PNG).
import fitz
def save_first_page_as_png(pdf_path: str, image_path: str):
pdf_document = fitz.open(pdf_path)
first_page = pdf_document.load_page(0)
pixmap = first_page.get_pixmap()
pixmap.save(image_path)
Then, to call ChatGPT API you need to convert this image in base64.
import base64
def encode_image(image_path: str):
with open(image_path, 'rb') as image_file:
return base64.b64encode(image_file.read()).decode('utf-8')
Finally, you can call ChatGPT API (with the detail
parameter).
import requests
api_key = 'your_api_key'
def call_gpt4_with_image(base64_image):
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {api_key}"
}
payload = {
"model": "gpt-4-vision-preview",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Describe this image."
},
{
"type": "image_url",
"image_url": {
"url": f"data:image/png;base64,{base64_image}",
"detail": "low"
}
}
]
}
],
"max_tokens": 300
}
response = requests.post("https://api.openai.com/v1/chat/completions", headers=headers, json=payload)
print(response.json())
Note that the parameter detail
can be low
or high
(see the documentation).
Here is an example of a full workflow.
pdf_paths = ['pdf1.pdf', 'pdf2.pdf', 'pdf3.pdf']
for pdf_path in pdf_paths:
first_page_path = pdf_path.replace('.pdf', '_1st_page.png')
save_first_page_as_png(pdf_path, first_page_path)
base64_image = encode_image(first_page_path)
call_gpt4_with_image(base64_image)
EDIT: In terms of price, I tried it on the first page of a pdf, the png was 596x842 pixels, the request (question + image) cost me 98 Input tokens and 87 Output tokens. With the current pricing of $0.01 / 1K Input tokens and $0.03 / 1K Output tokens, that's a total of $0.00359 ($3.59/1K images).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With