Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Looping through a list of pandas dataframes

Two quick pandas questions for you.

  1. I have a list of dataframes I would like to apply a filter to.

    countries = [us, uk, france]
    for df in countries:
        df = df[(df["Send Date"] > '2016-11-01') & (df["Send Date"] < '2016-11-30')] 
    

    When I run this, the df's don't change afterwards. Why is that? If I loop through the dataframes to create a new column, as below, this works fine, and changes each df in the list.

     for df in countries:
          df["Continent"] = "Europe"
    
  2. As a follow up question, I noticed something strange when I created a list of dataframes for different countries. I defined the list then applied transformations to each df in the list. After I transformed these different dfs, I called the list again. I was surprised to see that the list still pointed to the unchanged dataframes, and I had to redefine the list to update the results. Could anybody shed any light on why that is?

like image 751
Iwan Thomas Avatar asked Jan 23 '17 17:01

Iwan Thomas


People also ask

Can you iterate through a pandas series?

iteritems() function iterates over the given series object. the function iterates over the tuples containing the index labels and corresponding value in the series.

How do you iterate through a list in python?

You can loop through the list items by using a while loop. Use the len() function to determine the length of the list, then start at 0 and loop your way through the list items by referring to their indexes. Remember to increase the index by 1 after each iteration.


1 Answers

Taking a look at this answer, you can see that for df in countries: is equivalent to something like

for idx in range(len(countries)):
    df = countries[idx]
    # do something with df

which obviously won't actually modify anything in your list. It is generally bad practice to modify a list while iterating over it in a loop like this.

A better approach would be a list comprehension, you can try something like

 countries = [us, uk, france]
 countries = [df[(df["Send Date"] > '2016-11-01') & (df["Send Date"] < '2016-11-30')]
              for df in countries] 

Notice that with a list comprehension like this, we aren't actually modifying the original list - instead we are creating a new list, and assigning it to the variable which held our original list.

Also, you might consider placing all of your data in a single DataFrame with an additional country column or something along those lines - Python-level loops are generally slower and a list of DataFrames is often much less convenient to work with than a single DataFrame, which can fully leverage the vectorized pandas methods.

like image 154
miradulo Avatar answered Oct 06 '22 23:10

miradulo