I'm trying to make API calls on the consumer complaint dataset, available online (hhttps://data.consumerfinance.gov/dataset/Consumer-Complaints/s6ew-h6mp) with the SodaPy library (https://github.com/xmunoz/sodapy). I just want to get the csv data, the webpage says it has 906182 rows,
I've followed the example on GitHub as best as I can, but it's just not working. Here's the code:
from sodapy import Socrata
client = Socrata("data.consumerfinance.gov", "apptoken", username="myusername", password="mypassword")
results = client.get("s6ew-h6mp")
I want to get the entire dataset,but I keep getting the following error:
ReadTimeout: HTTPSConnectionPool(host='data.consumerfinance.gov', port=443): Read timed out. (read timeout=10)
Any clues on how to work through this?
By default, the Socrata connection will timeout after 10 seconds.
You are able to increase the timeout limit for the Socrata client by updating the 'timeout' instance variable like so:
from sodapy import Socrata
client = Socrata("data.consumerfinance.gov", "apptoken", username="myusername", password="mypassword")
# change the timeout variable to an arbitrarily large number of seconds
client.timeout = 50
results = client.get("s6ew-h6mp")
It's possible that the connection is timing out because the file is too large. You can try to download a subset of the data using the limit option, e.g.
results = client.get("s6ew-h6mp", limit=1000)
You can also query subsets of the data using SoQL keywords.
Otherwise, the sodapy module is built on the requests module so looking at the documentation for that could be useful.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With