Given a dataframe:
Data
1 246804
2 135272
3 898.01
4 3453.33
5 shine
6 add
7 522
8 Nan
9 string
10 29.11
11 20
I would like two new columns Floats
and Strings
, both having the same length as the original dataframe. Getting the Floats
column is easy:
In [176]: pd.to_numeric(df.Data, errors='coerce')
Out[176]:
1 246804.00
2 135272.00
3 898.01
4 3453.33
5 NaN
6 NaN
7 522.00
8 NaN
9 NaN
10 29.11
11 20.00
Name: Data, dtype: float64
As you can see, non Floats are coerced to NaN
, which is exactly what I want.
To get strings, this is what I do:
In [177]: df[df.Data.str.isalpha()]
Out[177]:
Data
5 shine
6 add
8 Nan
9 string
But as you can see, it does not retain the non-String values as NaN
. I want something like this:
1 NaN
2 NaN
3 NaN
4 NaN
5 shine
6 add
7 NaN
8 Nan (not NaN)
9 string
10 NaN
11 NaN
How can I get it to do so?
To get Strings
, you can use Boolean Indexing on the Data
column and located where Floats
is null.
df['Floats'] = pd.to_numeric(df.Data, errors='coerce')
df['Strings'] = df.Data.loc[df.Floats.isnull()] # Optional: .astype(str)
>>> df
# Output:
# Data Floats Strings
# 1 246804 246804.00 NaN
# 2 135272 135272.00 NaN
# 3 898.01 898.01 NaN
# 4 3453.33 3453.33 NaN
# 5 shine NaN shine
# 6 add NaN add
# 7 522 522.00 NaN
# 8 Nan NaN Nan
# 9 string NaN string
# 10 29.11 29.11 NaN
# 11 20 20.00 NaN
floats = pd.to_numeric(df.Data, 'coerce')
pd.DataFrame(dict(
floats=floats,
strings=df.Data.mask(floats.notnull())
))
floats strings
1 246804.00 NaN
2 135272.00 NaN
3 898.01 NaN
4 3453.33 NaN
5 NaN shine
6 NaN add
7 522.00 NaN
8 NaN Nan
9 NaN string
10 29.11 NaN
11 20.00 NaN
You can even make it more obvious within mask
by passing an alternative
floats = pd.to_numeric(df.Data, 'coerce')
pd.DataFrame(dict(
floats=floats,
strings=df.Data.mask(floats.notnull(), '')
))
floats strings
1 246804.00
2 135272.00
3 898.01
4 3453.33
5 NaN shine
6 NaN add
7 522.00
8 NaN Nan
9 NaN string
10 29.11
11 20.00
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With