Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to parallelize python api calls?

I am developing a program that will email me when new music from my artists on Spotify. It does this by getting the number of albums every artist has when the script is run and comparing the results to a previous day saved as a CSV file.

This involves API calls to verify the artist is on Spotify (I was getting errors that certain albums were not on Spotify) and then getting the number of albums for that artist. These calls are very time consuming especially when I have close to a thousand individual artists.

I was wondering how I would parallelize these API calls or any other suggestions to speed up the overall program. Linked below is the portion of code that has the API calls. Thank you for your time in advance.

# given artist name returns all info related to artist 
def get_artist_info(spotipy_instance, name):
    results = spotipy_instance.search(q='artist:' + name, type='artist')
    items = results['artists']['items']
    if len(items) > 0:
        return items[0]
    else:
        return None

# returns list of all albums given artist name 
def get_artist_albums(spotipy_instance, artist):
    albums = []
    results = spotipy_instance.artist_albums(artist['id'], album_type='album')
    albums.extend(results['items'])
    while results['next']:
        results = spotipy_instance.next(results)
        albums.extend(results['items'])
    seen = set() # to avoid dups
    for album in albums:
        name = album['name']
        # print(album['name'] + ": " + album['id'])
        if name not in seen:
            seen.add(name.encode('utf-8'))
    return list(seen)

def get_all_artists_info(spotipy_instance, list_of_all_artists):
    all_artist_info = []
    print("Getting number of albums for all artists")
    # bar = Bar('Loading...', max=len(list_of_all_artists), suffix='%(index)d/%(max)d - %(percent).1f%% - %(eta)ds')
    for artist_name in list_of_all_artists:
        # increment_progress_bar(bar)
        # print(artist_name)
        artist_info = get_artist_info(spotipy_instance, artist_name)
        if artist_info is not None:  
            albums = get_artist_albums(spotipy_instance, artist_info)
            # print(albums)
            artist = Artist(artist_name, len(albums), albums)
            all_artist_info.append(artist)
        else:
            print("\nCan't find " + artist_name)
            artist = Artist(artist_name, -1, [])
            all_artist_info.append(artist)
        # print(" ")
    # bar.finish()
    print("Done!\n")

    all_artist_info.sort(key=lambda artist: artist.name)

    return all_artist_info
like image 707
123423423242134 Avatar asked Apr 05 '18 02:04

123423423242134


Video Answer


1 Answers

So basically you have 3 options here.

  1. Using Threading
  2. Multiprocessing
  3. Async code ( if you are using python 3.5 or above )

Threading will spawn multiple threads in your process making it run in parallel but the downside is that it introduces big overhead in memory usage and is not the most efficient way of parallelism because the context switching is happening on processor level. Example with threading_toolbelt: https://toolbelt.readthedocs.io/en/latest/threading.html

Multiprocessing will spawn multiple processes of python introduction even more overhead in resources consumption as it has hold whole stack of python process in memory for each one. And communicating between processes is not the most trivial thing in the world.

Async is definitely the best solution here if you are using python3.5 or above. You might think of it as somehow similar to threading but with context switching on event loop level and without memory overhead from coping python stack. You would need to use async request library in order to do that. (here is one: asyncio). And example usage: https://pawelmhm.github.io/asyncio/python/aiohttp/2016/04/22/asyncio-aiohttp.html

So in summary sorting from the best option to worst is:

  • Async
  • Threading
  • Multiprocessing
like image 66
Quba Avatar answered Oct 27 '22 15:10

Quba