Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python basics - request data from API and write to a file

Tags:

python

file-io

I am trying to use "requests" package and retrieve info from Github, like the Requests doc page explains:

import requests
r = requests.get('https://api.github.com/events')

And this:

with open(filename, 'wb') as fd:
    for chunk in r.iter_content(chunk_size):
        fd.write(chunk)

I have to say I don't understand the second code block.

  • filename - in what form do I provide the path to the file if created? where will it be saved if not?
  • 'wb' - what is this variable? (shouldn't second parameter be 'mode'?)
  • following two lines probably iterate over data retrieved with request and write to the file

Python docs explanation also not helping much.

EDIT: What I am trying to do:

  • use Requests to connect to an API (Github and later Facebook GraphAPI)
  • retrieve data into a variable
  • write this into a file (later, as I get more familiar with Python, into my local MySQL database)
like image 495
Alexander Starbuck Avatar asked Dec 24 '22 09:12

Alexander Starbuck


1 Answers

Filename

When using open the path is relative to your current directory. So if you said open('file.txt','w') it would create a new file named file.txt in whatever folder your python script is in. You can also specify an absolute path, for example /home/user/file.txt in linux. If a file by the name 'file.txt' already exists, the contents will be completely overwritten.

Mode

The 'wb' option is indeed the mode. The 'w' means write and the 'b' means bytes. You use 'w' when you want to write (rather than read) froma file, and you use 'b' for binary files (rather than text files). It is actually a little odd to use 'b' in this case, as the content you are writing is a text file. Specifying 'w' would work just as well here. Read more on the modes in the docs for open.

The Loop

This part is using the iter_content method from requests, which is intended for use with large files that you may not want in memory all at once. This is unnecessary in this case, since the page in question is only 89 KB. See the requests library docs for more info.

Conclusion

The example you are looking at is meant to handle the most general case, in which the remote file might be binary and too big to be in memory. However, we can make your code more readable and easy to understand if you are only accessing small webpages containing text:

import requests
r = requests.get('https://api.github.com/events')

with open('events.txt','w') as fd:
    fd.write(r.text)
like image 156
TheSchwa Avatar answered Jan 08 '23 20:01

TheSchwa