Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

REQUESTS: Return file object from url (as with open('','rb') )

I want to download a file straight into memory using requests in order to pass it directly to PyPDF2 reader avoiding writing it to disk, but I can't figure out how to pass it as a file object. Here's what I've tried:

import requests as req
from PyPDF2 import PdfFileReader

r_file = req.get('http://www.location.come/somefile.pdf')
rs_file = req.get('http://www.location.come/somefile.pdf', stream=True)

with open('/location/somefile.pdf', 'wb') as f:
    for chunk in r_file.iter_content():
        f.write(chunk)

local_file = open('/location/somefile.pdf', 'rb')

#Works:
pdf = PdfFileReader(local_file)

#As expected, these don't work:
pdf = PdfFileReader(rs_file)
pdf = PdfFileReader(r_file)
pdf = PdfFileReader(rs_file.content)
pdf = PdfFileReader(r_file.content)
pdf = PdfFileReader(rs_file.raw)
pdf = PdfFileReader(r_file.raw)
like image 380
TimY Avatar asked May 05 '15 09:05

TimY


People also ask

How do I open a local URL in Python?

Go to that directory and create a temporary http server in terminal/cmd as per your OS using command python -m http. server 8000 (Note 8000 is port no.) Open your desired file in browser and copy the link to your url.

How do you write a response to a file in python?

Writing response to file When writing responses to file you need to use the open function with the appropriate file write mode. For text responses you need to use "w" - plain write mode. For binary responses you need to use "wb" - binary write mode.

How do you pass data form in Python?

To post HTML form data to the server in URL-encoded format using Python, you need to make an HTTP POST request to the server and provide the HTML form data in the body of the Python POST message. You also need to specify the data type using the Content-Type: application/x-www-form-urlencoded request header.


1 Answers

Without having to know anything about requests, you can always make a file-like object out of anything you have in memory as a string using StringIO.

In particular:

  • Python 2 StringIO.StringIO(s) is a binary file.
  • Python 2 cStringIO.StringIO(s) is the same, but possibly more efficient.
  • Python 3 io.BytesIO(b) is a binary file (constructed from bytes).
  • Python 3 io.StringIO(s) is a Unicode text file.
  • Python 2 io.BytesIO(s) is a binary file.
  • Python 2 io.StringIO(u) is a Unicode text file (constructed from unicode).

(The first two are "binary" in the Python 2 sense--no line-ending conversion. The others are "binary" vs. "text" in the Python 3 sense--bytes vs. Unicode.)

So, io.BytesIO(response.content) gives you a valid binary file-like object in both Python 2 and Python 3. If you only care about Python 2, cStringIO.StringIO(response.content) may be more efficient.

Of course "file-like" only goes so far; if the library tries to, e.g., call the fileno method and start making C calls against the file descriptor it isn't going to work. But 99% of the time, this works.

like image 132
abarnert Avatar answered Sep 24 '22 13:09

abarnert