I have one DataFrame
where different rows can have the same value for one column.
As an example:
import pandas as pd
df = pd.DataFrame( {
"Name" : ["Alice", "Bob", "John", "Mark", "Emma" , "Mary"] ,
"City" : ["Seattle", "Seattle", "Portland", "Seattle", "Seattle", "Portland"] } )
City Name
0 Seattle Alice
1 Seattle Bob
2 Portland John
3 Seattle Mark
4 Seattle Emma
5 Portland Mary
Here, a given value for "City" (e.g. "Portland") is shared by several rows.
I want to create from this data frame several data frames that have in common the value of one column. For the example above, I want to get the following data frames:
City Name
0 Seattle Alice
1 Seattle Bob
3 Seattle Mark
4 Seattle Emma
and
City Name
2 Portland John
5 Portland Mary
From this answer, I am creating a mask that can be used to generate one data frame:
def mask_with_in1d(df, column, val):
mask = np.in1d(df[column].values, [val])
return df[mask]
# Return the last data frame above
mask_with_in1d(df, 'City', 'Portland')
The problem is to create efficiently all data frames, to which a name will be assigned. I am doing it this way:
unique_values = np.sort(df['City'].unique())
for city_value in unique_values:
exec("df_{0} = mask_with_in1d(df, 'City', '{0}')".format(city_value))
which gives me the data frames df_Seattle
and df_Portland
that I can further manipulate.
Is there a better way of doing this?
Have you got a fixed list of cities you want to do this for? Simplest solution is to group by city and can then loop over the groups
for city, names in df.groupby("City"):
print(city)
print(names)
Portland
City Name
2 Portland John
5 Portland Mary
Seattle
City Name
0 Seattle Alice
1 Seattle Bob
3 Seattle Mark
4 Seattle Emma
Could then assign to a dictionary or some such (df_city[city] = names
) if you wanted df_city["Portland"]
to work. Depends what you want to do with the groups once split.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With