Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fetch a file from a local url with Python requests?

I am using Python's requests library in one method of my application. The body of the method looks like this:

def handle_remote_file(url, **kwargs):
    response = requests.get(url, ...)
    buff = StringIO.StringIO()
    buff.write(response.content)
    ...
    return True

I'd like to write some unit tests for that method, however, what I want to do is to pass a fake local url such as:

class RemoteTest(TestCase):
    def setUp(self):
        self.url = 'file:///tmp/dummy.txt'

    def test_handle_remote_file(self):
        self.assertTrue(handle_remote_file(self.url))

When I call requests.get with a local url, I got the KeyError exception below:

requests.get('file:///tmp/dummy.txt')

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/requests/packages/urllib3/poolmanager.pyc in connection_from_host(self, host, port, scheme)
76 
77         # Make a fresh ConnectionPool of the desired type
78         pool_cls = pool_classes_by_scheme[scheme]
79         pool = pool_cls(host, port, **self.connection_pool_kw)
80 

KeyError: 'file'

The question is how can I pass a local url to requests.get?

PS: I made up the above example. It possibly contains many errors.

like image 742
Ozgur Vatansever Avatar asked Apr 12 '12 12:04

Ozgur Vatansever


People also ask

How would you request a webpage using Python import requests?

Python requests reading a web pageThe get method issues a GET request; it fetches documents identified by the given URL. The script grabs the content of the www.webcode.me web page. The get method returns a response object. The text attribute contains the content of the response, in Unicode.

Which function is used to send a GET request to any URL in Python?

We use requests. get() method since we are sending a GET request. The two arguments we pass are url and the parameters dictionary. Now, in order to retrieve the data from the response object, we need to convert the raw response content into a JSON type data structure.


5 Answers

As @WooParadog explained requests library doesn't know how to handle local files. Although, current version allows to define transport adapters.

Therefore you can simply define you own adapter which will be able to handle local files, e.g.:

from requests_testadapter import Resp
import os

class LocalFileAdapter(requests.adapters.HTTPAdapter):
    def build_response_from_file(self, request):
        file_path = request.url[7:]
        with open(file_path, 'rb') as file:
            buff = bytearray(os.path.getsize(file_path))
            file.readinto(buff)
            resp = Resp(buff)
            r = self.build_response(request, resp)

            return r

    def send(self, request, stream=False, timeout=None,
             verify=True, cert=None, proxies=None):

        return self.build_response_from_file(request)

requests_session = requests.session()
requests_session.mount('file://', LocalFileAdapter())
requests_session.get('file://<some_local_path>')

I'm using requests-testadapter module in the above example.

like image 114
b1r3k Avatar answered Sep 29 '22 21:09

b1r3k


Here's a transport adapter I wrote which is more featureful than b1r3k's and has no additional dependencies beyond Requests itself. I haven't tested it exhaustively yet, but what I have tried seems to be bug-free.

import requests
import os, sys

if sys.version_info.major < 3:
    from urllib import url2pathname
else:
    from urllib.request import url2pathname

class LocalFileAdapter(requests.adapters.BaseAdapter):
    """Protocol Adapter to allow Requests to GET file:// URLs

    @todo: Properly handle non-empty hostname portions.
    """

    @staticmethod
    def _chkpath(method, path):
        """Return an HTTP status for the given filesystem path."""
        if method.lower() in ('put', 'delete'):
            return 501, "Not Implemented"  # TODO
        elif method.lower() not in ('get', 'head'):
            return 405, "Method Not Allowed"
        elif os.path.isdir(path):
            return 400, "Path Not A File"
        elif not os.path.isfile(path):
            return 404, "File Not Found"
        elif not os.access(path, os.R_OK):
            return 403, "Access Denied"
        else:
            return 200, "OK"

    def send(self, req, **kwargs):  # pylint: disable=unused-argument
        """Return the file specified by the given request

        @type req: C{PreparedRequest}
        @todo: Should I bother filling `response.headers` and processing
               If-Modified-Since and friends using `os.stat`?
        """
        path = os.path.normcase(os.path.normpath(url2pathname(req.path_url)))
        response = requests.Response()

        response.status_code, response.reason = self._chkpath(req.method, path)
        if response.status_code == 200 and req.method.lower() != 'head':
            try:
                response.raw = open(path, 'rb')
            except (OSError, IOError) as err:
                response.status_code = 500
                response.reason = str(err)

        if isinstance(req.url, bytes):
            response.url = req.url.decode('utf-8')
        else:
            response.url = req.url

        response.request = req
        response.connection = self

        return response

    def close(self):
        pass

(Despite the name, it was completely written before I thought to check Google, so it has nothing to do with b1r3k's.) As with the other answer, follow this with:

requests_session = requests.session()
requests_session.mount('file://', LocalFileAdapter())
r = requests_session.get('file:///path/to/your/file')
like image 25
ssokolow Avatar answered Sep 29 '22 21:09

ssokolow


The easiest way seems using requests-file.

https://github.com/dashea/requests-file (available through PyPI too)

"Requests-File is a transport adapter for use with the Requests Python library to allow local filesystem access via file:// URLs."

This in combination with requests-html is pure magic :)

like image 22
Sil Avatar answered Sep 27 '22 21:09

Sil


packages/urllib3/poolmanager.py pretty much explains it. Requests doesn't support local url.

pool_classes_by_scheme = {                                                        
    'http': HTTPConnectionPool,                                                   
    'https': HTTPSConnectionPool,                                              
}                                                                                 
like image 37
WooParadog Avatar answered Sep 25 '22 21:09

WooParadog


In a recent project, I've had the same issue. Since requests doesn't support the "file" scheme, I'll patch our code to load the content locally. First, I define a function to replace requests.get:

def local_get(self, url):
    "Fetch a stream from local files."
    p_url = six.moves.urllib.parse.urlparse(url)
    if p_url.scheme != 'file':
        raise ValueError("Expected file scheme")

    filename = six.moves.urllib.request.url2pathname(p_url.path)
    return open(filename, 'rb')

Then, somewhere in test setup or decorating the test function, I use mock.patch to patch the get function on requests:

@mock.patch('requests.get', local_get)
def test_handle_remote_file(self):
    ...

This technique is somewhat brittle -- it doesn't help if the underlying code calls requests.request or constructs a Session and calls that. There may be a way to patch requests at a lower level to support file: URLs, but in my initial investigation, there didn't seem to be an obvious hook point, so I went with this simpler approach.

like image 38
Jason R. Coombs Avatar answered Sep 25 '22 21:09

Jason R. Coombs