Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Opening pdf urls with pyPdf

Tags:

python

pdf

pypdf

How would I open a pdf from url instead of from the disk

Something like

input1 = PdfFileReader(file("http://example.com/a.pdf", "rb"))

I want to open several files from web and download a merge of all the files.

like image 995
meadhikari Avatar asked Mar 17 '12 15:03

meadhikari


People also ask

How do I open a PDF in PyPDF2?

Though PyPDF2 doesn't contain any specific method to read remote files, you can use Python's urllib. request module to first read the remote file in bytes and then pass the file in the bytes format to PdfFileReader() method. The rest of the process is similar to reading a local PDF file.


2 Answers

I think urllib2 will get you what you want.

from urllib2 import Request, urlopen
from pyPdf import PdfFileWriter, PdfFileReader
from StringIO import StringIO

url = "http://www.silicontao.com/ProgrammingGuide/other/beejnet.pdf"
writer = PdfFileWriter()

remoteFile = urlopen(Request(url)).read()
memoryFile = StringIO(remoteFile)
pdfFile = PdfFileReader(memoryFile)

for pageNum in xrange(pdfFile.getNumPages()):
        currentPage = pdfFile.getPage(pageNum)
        #currentPage.mergePage(watermark.getPage(0))
        writer.addPage(currentPage)


outputStream = open("output.pdf","wb")
writer.write(outputStream)
outputStream.close()
like image 116
John Avatar answered Oct 19 '22 01:10

John


Well, you can first download the pdf separately and then use pypdf to read it

import urllib

url = 'http://example.com/a.pdf'
webFile = urllib.urlopen(url)
pdfFile = open(url.split('/')[-1], 'w')
pdfFile.write(webFile.read())
webFile.close()
pdfFile.close()

base = os.path.splitext(pdfFile)[0]
os.rename(pdfFile, base + ".pdf")

input1 = PdfFileReader(file(pdfFile, "rb"))
like image 26
Switch Avatar answered Oct 19 '22 01:10

Switch