I want to download the file in the following url using python. I tried with the following code but it seems like not working. I think the error is in the file format. I would be glad if you can suggest the modifications to the code or a new code that I can use for this purpose
Link to the website
https://www.gov.uk/government/statistics/transport-use-during-the-coronavirus-covid-19-pandemic
URL required to be downloaded
https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/959864/COVID-19-transport-use-statistics.ods
My Code
from urllib import request
response = request.urlopen("https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/959864/COVID-19-transport-use-statistics.ods")
csv = response.read()
csvstr = str(csv).strip("b'")
lines = csvstr.split("\\n")
f = open("historical.csv", "w")
for line in lines:
f.write(line + "\n")
f.close()
Here basically I only want to download the file. I have heard that Beautifulsoup can be used for that but I don't have much experience on this. Any code that would serve my purpose is highly appreciated
Thanks
To download the file:
In [1]: import requests
In [2]: url = 'https://assets.publishing.service.gov.uk/government/uploads/syste
...: m/uploads/attachment_data/file/959864/COVID-19-transport-use-statistics.
...: ods'
In [3]: with open('COVID-19-transport-use-statistics.ods', 'wb') as out_file:
...: content = requests.get(url, stream=True).content
...: out_file.write(content)
And then you can use pandas-ods-reader to read the file by running:
pip install pandas-ods-reader
Then:
In [4]: from pandas_ods_reader import read_ods
In [5]: df = read_ods('COVID-19-transport-use-statistics.ods', 1)
In [6]: df
Out[6]:
Department for Transport statistics ... unnamed.9
0 https://www.gov.uk/government/statistics/trans... ... None
1 None ... None
2 Use of transport modes: Great Britain, since 1... ... None
3 Figures are percentages of an equivalent day o... ... None
4 None ... Percentage
.. ... ... ...
390 Transport for London Tube and Bus ... None
391 Buses (excl. London) ... None
392 Cycling ... None
393 Any other queries ... None
394 Media enquiries ... None
And you can save it as a csv if that is what you want using df.to_csv('my_data.csv', index=False)
I see that you are just trying to download the file that is .ods
format and I think saving it in .csv
wont convert it into a csv
file.
Following code would help you download the file. I have used requests
library which is a better option in place of urllib.
import requests
file_url = "https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/959864/COVID-19-transport-use-statistics.ods"
file_data = requests.get(file_url).content
# create the file in write binary mode, because the data we get from net is in binary
with open("historical.ods", "wb") as file:
file.write(file_data)
Output file can be viewed in MS Excel.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With