The web interface for ChatGPT has an easy pdf upload. Is there an API from openAI that can receive pdfs?
I know there are 3rd party libraries that can read pdf but given there are images and other important information in a pdf, it might be better if a model like GPT 4 Turbo was fed the actual pdf directly.
I'll state my use case to add more context. I intent to do RAG. In the code below I handle the PDF and a prompt. Normally I'd append the text at the end of the prompt. I could still do that with a pdf if I extract its contents manually.
The following code is taken from here https://platform.openai.com/docs/assistants/tools/code-interpreter. Is this how I'm supposed to do it?
# Upload a file with an "assistants" purpose
file = client.files.create(
file=open("example.pdf", "rb"),
purpose='assistants'
)
# Create an assistant using the file ID
assistant = client.beta.assistants.create(
instructions="You are a personal math tutor. When asked a math question, write and run code to answer the question.",
model="gpt-4-1106-preview",
tools=[{"type": "code_interpreter"}],
file_ids=[file.id]
)
There is an upload endpoint as well, but it seems the intent of those endpoints are for fine-tuning and assistants. I think the RAG use case is a normal one and not necessarily related to assistants.
May 2025 edit: according to the official guide, using OpenAI GPT-4.1 allows to extract content of (or answer questions on) an input pdf file foobar.pdf
stored locally, with a solution along the lines of
from openai import OpenAI
import os
filename = "foobar.pdf"
prompt = """Extract the content from the file provided without altering it.
Just output its exact content and nothing else."""
client = OpenAI(api_key=os.environ.get("MY_OPENAI_KEY"))
file = client.files.create(
file=open(filename, "rb"),
purpose="user_data"
)
response = client.responses.create(
model="gpt-4.1",
input=[
{
"role": "user",
"content": [
{
"type": "input_file",
"file_id": file.id,
},
{
"type": "input_text",
"text": prompt,
},
]
}
]
)
The prompt
can of course be replaced with the desired user request and I assume that the openai key is stored in a env var named MY_OPENAI_KEY
.
P.S. I have edited the answer as this approach is much more streamlined w.r.t to the assistants-based 2024 solution that you can see in the edit history, heavily inspired by https://medium.com/@erik-kokalj/effectively-analyze-pdfs-with-gpt-4o-api-378bd0f6be03
One solution: Convert the pdf to images and feed it to the vision model as multi image inputs https://platform.openai.com/docs/guides/vision.
GPT-4 with vision is not a different model that does worse at text tasks because it has vision, it is simply GPT-4 with vision added
Since its the same model with vision capabilities, this should be sufficient to do both text and image analysis.
You could also choose to extract images from pdf and feed those separately making a multi-model architecture. I have a preference for the first. Ideally experiments should be run to see what produces better results.
Text only + images only VS Images (containing both)
Pdf to image can be done in python locally as can separating img from pdf. It isn't a difficult task requiring support from someone like openAI.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With