I'm working with a small dataframe (9K rows) that has county and city fields (=columns). Some rows have NaN values for the county but have county names in the city column.
I can find these by doing
filter1 = df['county'].isna()
filter2 = df['city'].str.contains('County')
df[ filter1 & filter2 ]
For rows that match this criteria, I want assign the city value to the county field. This stackoverflow discusses how to do this, but I can't get any of the examples to work.
I've tried a variety of things:
1. df[filter1 & filter2]['county'] = df[filter1 & filter2]['city']
2. df['county'] = df.apply(lambda x: x['city'] if ( x['county'].isna() & 'County' in x['city'] ) else x['county'] , axis=1 )
3. df['county'] = np.where(df['county']=='unknown' & df['city'].str.contains('County'), df['city'], df['county'] )
I get a variety of errors:
Helpful feedback, but I couldn't figure out the syntax for using .loc for this. I got as far as
df.loc[df['county'].isna() & df['city'].str.contains('County),'county'] = ???
I don't know if the .loc function would work, but even if it does, how do I specify the correct / row city field?
I thought this might be because county is a string field, so the lambda f'n objected to doing a .isna() test on it. I changed the NaNs to "unknown" but I still get the error. (I've seen a lot of comments about the slowness of .apply() but on a 9K table, it should be fine.)
county field!This seems like a pretty basic operation, but I can't figure out the correct syntax to get it to make the change. What am I doing wrong?
You can obtain the indexes of the filter you created and then use .loc to make the desired change:
filter1 = df['county'].isna()
filter2 = df['city'].str.contains('County')
idx = df[ filter1 & filter2 ].index
df.loc[idx, "county"] = df.loc[idx, "city"]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With