I have the following DataFrame
A B C
1.0 abc 1.0
abc 1.0 abc
-1.11 abc abc
I have mixed datatypes (float
and str
). How can I drop values <= -1
in column A
.
I get an error if I do the following because of the mixed datatypes
df['A'] = (df['A'] != "abc") & (df['A'] > -1)
TypeError: '>' not supported between instances of 'str' and 'int'
How can I change my object to make abc
a str
and 1.0
a float
so I can:
(df['A'] != "abc") & (df['A'] > -1)
print(df['A'].dtype)
-> object
I would like the expected output
df =
A B C
1.0 abc 1.0
abc 1.0 abc
NaN abc abc
There are at least a couple of different approaches to this problem.
pd.DataFrame.loc
accepts Boolean series, so you can calculate a mask via pd.to_numeric
and feed into the loc
setter.
Note there is no need to specify df['A'] != 'abc'
because the mask
series will convert these values to NaN
.
mask = pd.to_numeric(df['A'], errors='coerce') < -1
df.loc[mask, 'A'] = np.nan
print(df)
A B C
0 1 abc 1
1 abc 1 abc
2 NaN abc abc
See @Jan's solution. This solution is preferable if you expect values to be numeric and are only looking for alternative treatment in edge cases.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With