Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Handling HTTP authentication when accesing remote urls via pandas

Tags:

python

pandas

Pandas has a very convenient ability to read csv and other formats from urls. However,when the data is protected by simple http authentication, Pandas is not capable to prompt the user for the the authentication details (userid, password). What is the best way to fix this limitation?

what I am currently doing is:

response = requests.get('http://my.data.url/metrics/crawler/counts', auth=HTTPBasicAuth('userid', 'password'), stream=True)
pd.read_csv(response.raw)

is there a better way?

like image 669
fccoelho Avatar asked Oct 09 '15 13:10

fccoelho


1 Answers

Since Pandas 1.2 pandas.read_csv has storage_options argument:

Extra options that make sense for a particular storage connection, e.g. host, port, username, password, etc. For HTTP(S) URLs the key-value pairs are forwarded to urllib as header options. For other URLs (e.g. starting with “s3://”, and “gcs://”) the key-value pairs are forwarded to fsspec. Please see fsspec and urllib for more details.

storage_options controls HTTP headers, so one can construct Basic Auth Authorization header.

Run a HTTP server protected with user:pass Basic Auth serving a dummy CSV, test.csv.

docker run --rm -it -p 8000:8000 python:2-alpine sh -c \
  "echo -e 'key,value\nfoo,1\nbar,2' > test.csv \
   && pip install https://github.com/smarzola/extended_http_server/archive/e2006102.zip \
   && ext_http_server -a user:pass"

Python code to read it via pandas.read_csv would look like the following.

In [1]: import pandas

In [2]: try:
   ...:   pandas.read_csv('http://localhost:8000/test.csv')
   ...: except Exception as ex:
   ...:   print(ex)
HTTP Error 401: Unauthorized

In [3]: from base64 import b64encode
   ...: 
   ...: pandas.read_csv(
   ...:   'http://localhost:8000/test.csv',
   ...:   storage_options={'Authorization': b'Basic %s' % b64encode(b'user:pass')},
   ...: )
Out[3]: 
   key  value
0  foo      1
1  bar      2
like image 95
saaj Avatar answered Nov 13 '22 12:11

saaj