Pandas has a very convenient ability to read csv and other formats from urls. However,when the data is protected by simple http authentication, Pandas is not capable to prompt the user for the the authentication details (userid, password). What is the best way to fix this limitation?
what I am currently doing is:
response = requests.get('http://my.data.url/metrics/crawler/counts', auth=HTTPBasicAuth('userid', 'password'), stream=True)
pd.read_csv(response.raw)
is there a better way?
Since Pandas 1.2 pandas.read_csv
has storage_options
argument:
Extra options that make sense for a particular storage connection, e.g. host, port, username, password, etc. For HTTP(S) URLs the key-value pairs are forwarded to
urllib
as header options. For other URLs (e.g. starting with “s3://”, and “gcs://”) the key-value pairs are forwarded tofsspec
. Please seefsspec
andurllib
for more details.
storage_options
controls HTTP headers, so one can construct Basic Auth Authorization
header.
Run a HTTP server protected with user:pass
Basic Auth serving a dummy CSV, test.csv
.
docker run --rm -it -p 8000:8000 python:2-alpine sh -c \
"echo -e 'key,value\nfoo,1\nbar,2' > test.csv \
&& pip install https://github.com/smarzola/extended_http_server/archive/e2006102.zip \
&& ext_http_server -a user:pass"
Python code to read it via pandas.read_csv
would look like the following.
In [1]: import pandas
In [2]: try:
...: pandas.read_csv('http://localhost:8000/test.csv')
...: except Exception as ex:
...: print(ex)
HTTP Error 401: Unauthorized
In [3]: from base64 import b64encode
...:
...: pandas.read_csv(
...: 'http://localhost:8000/test.csv',
...: storage_options={'Authorization': b'Basic %s' % b64encode(b'user:pass')},
...: )
Out[3]:
key value
0 foo 1
1 bar 2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With