I have a panda dataframe
City State
0 Cambridge MA
1 NaN DC
2 Boston MA
3 Washignton DC
4 NaN MA
5 Tampa FL
6 Danvers MA
7 Miami FL
8 Cambridge MA
9 Miami FL
10 NaN FL
11 Washington DC
I want to fill NaNs based on most frequent state if the state appears before so I group by state and apply the following code:
df['City'] = df.groupby('State').transform(lambda x:x.fillna(x.value_counts().idxmax()))
The above code works for if all states have occurred before the output will be
City State
0 Cambridge MA
1 Washignton DC
2 Boston MA
3 Washignton DC
4 Cambridge MA
5 Tampa FL
6 Danvers MA
7 Miami FL
8 Cambridge MA
9 Miami FL
10 Miami FL
11 Washington DC
However I want to add a condtion so that if a state never occur its city will be the most frequent in the entire City column ie if the dataframe is
City State
0 Cambridge MA
1 NaN DC
2 Boston MA
3 Washignton DC
4 NaN MA
5 Tampa FL
6 Danvers MA
7 Miami FL
8 Cambridge MA
9 Miami FL
10 NaN FL
11 Washington DC
12 NaN NY
NY has never occurred before I want output to be
City State
0 Cambridge MA
1 Washignton DC
2 Boston MA
3 Washignton DC
4 Cambridge MA
5 Tampa FL
6 Danvers MA
7 Miami FL
8 Cambridge MA
9 Miami FL
10 Miami FL
11 Washington DC
12 Cambridge NY
The code above gives a ValueError: ('attempt to get argmax of an empty sequence') because "NY" has never occurred before.
IIUC:
def f(x):
if x.count()<=0:
return np.nan
return x.value_counts().index[0]
df['City'] = df.groupby('State')['City'].transform(f)
df['City'] = df['City'].fillna(df['City'].value_counts().idxmax())
Output:
City State
0 Cambridge MA
1 Washignton DC
2 Cambridge MA
3 Washignton DC
4 Cambridge MA
5 Miami FL
6 Cambridge MA
7 Miami FL
8 Cambridge MA
9 Miami FL
10 Miami FL
11 Washignton DC
12 Cambridge NY
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With