Pandas has a very convenient ability to read csv and other formats from urls. However,when the data is protected by simple http authentication, Pandas is not capable to prompt the user for the the authentication details (userid, password). What is the best way to fix this limitation?
what I am currently doing is:
response = requests.get('http://my.data.url/metrics/crawler/counts', auth=HTTPBasicAuth('userid', 'password'), stream=True)
pd.read_csv(response.raw)
is there a better way?
Since Pandas 1.2 pandas.read_csv has storage_options argument:
Extra options that make sense for a particular storage connection, e.g. host, port, username, password, etc. For HTTP(S) URLs the key-value pairs are forwarded to
urllibas header options. For other URLs (e.g. starting with “s3://”, and “gcs://”) the key-value pairs are forwarded tofsspec. Please seefsspecandurllibfor more details.
storage_options controls HTTP headers, so one can construct Basic Auth Authorization header.
Run a HTTP server protected with user:pass Basic Auth serving a dummy CSV, test.csv.
docker run --rm -it -p 8000:8000 python:2-alpine sh -c \
  "echo -e 'key,value\nfoo,1\nbar,2' > test.csv \
   && pip install https://github.com/smarzola/extended_http_server/archive/e2006102.zip \
   && ext_http_server -a user:pass"
Python code to read it via pandas.read_csv would look like the following.
In [1]: import pandas
In [2]: try:
   ...:   pandas.read_csv('http://localhost:8000/test.csv')
   ...: except Exception as ex:
   ...:   print(ex)
HTTP Error 401: Unauthorized
In [3]: from base64 import b64encode
   ...: 
   ...: pandas.read_csv(
   ...:   'http://localhost:8000/test.csv',
   ...:   storage_options={'Authorization': b'Basic %s' % b64encode(b'user:pass')},
   ...: )
Out[3]: 
   key  value
0  foo      1
1  bar      2
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With