Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Splitting DataFrame into DataFrame's

I have one DataFrame where different rows can have the same value for one column.
As an example:

import pandas as pd
df = pd.DataFrame( { 
    "Name" : ["Alice", "Bob", "John", "Mark", "Emma" , "Mary"] , 
    "City" : ["Seattle", "Seattle", "Portland", "Seattle", "Seattle", "Portland"] } )

     City       Name
0    Seattle    Alice
1    Seattle    Bob
2    Portland   John
3    Seattle    Mark
4    Seattle    Emma
5    Portland   Mary

Here, a given value for "City" (e.g. "Portland") is shared by several rows.

I want to create from this data frame several data frames that have in common the value of one column. For the example above, I want to get the following data frames:

     City       Name
0    Seattle    Alice
1    Seattle    Bob
3    Seattle    Mark
4    Seattle    Emma

and

     City       Name
2    Portland   John
5    Portland   Mary

From this answer, I am creating a mask that can be used to generate one data frame:

def mask_with_in1d(df, column, val):
    mask = np.in1d(df[column].values, [val])
    return df[mask]

# Return the last data frame above
mask_with_in1d(df, 'City', 'Portland')

The problem is to create efficiently all data frames, to which a name will be assigned. I am doing it this way:

unique_values = np.sort(df['City'].unique())
for city_value in unique_values:
    exec("df_{0} = mask_with_in1d(df, 'City', '{0}')".format(city_value))

which gives me the data frames df_Seattle and df_Portland that I can further manipulate.

Is there a better way of doing this?

like image 660
unfolx Avatar asked Jan 30 '23 04:01

unfolx


1 Answers

Have you got a fixed list of cities you want to do this for? Simplest solution is to group by city and can then loop over the groups

for city, names in df.groupby("City"):
    print(city)
    print(names)

Portland
       City  Name
2  Portland  John
5  Portland  Mary
Seattle
      City   Name
0  Seattle  Alice
1  Seattle    Bob
3  Seattle   Mark
4  Seattle   Emma

Could then assign to a dictionary or some such (df_city[city] = names) if you wanted df_city["Portland"] to work. Depends what you want to do with the groups once split.

like image 79
Ken Syme Avatar answered Jan 31 '23 18:01

Ken Syme