I have a 227x4 DataFrame with country names and numerical values to clean (wrangle ?).
Here's an abstraction of the DataFrame:
import pandas as pd
import random
import string
import numpy as np
pdn = pd.DataFrame(["".join([random.choice(string.ascii_letters) for i in range(3)]) for j in range (6)], columns =['Country Name'])
measures = pd.DataFrame(np.random.random_integers(10,size=(6,2)), columns=['Measure1','Measure2'])
df = pdn.merge(measures, how= 'inner', left_index=True, right_index =True)
df.iloc[4,1] = 'str'
df.iloc[1,2] = 'stuff'
print(df)
Country Name Measure1 Measure2
0 tua 6 3
1 MDK 3 stuff
2 RJU 7 2
3 WyB 7 8
4 Nnr str 3
5 rVN 7 4
How do I replace string values with np.nan
in all columns without touching the country names?
I tried using a boolean mask:
mask = df.loc[:,measures.columns].applymap(lambda x: isinstance(x, (int, float))).values
print(mask)
[[ True True]
[ True False]
[ True True]
[ True True]
[False True]
[ True True]]
# I thought the following would replace by default false with np.nan in place, but it didn't
df.loc[:,measures.columns].where(mask, inplace=True)
print(df)
Country Name Measure1 Measure2
0 tua 6 3
1 MDK 3 stuff
2 RJU 7 2
3 WyB 7 8
4 Nnr str 3
5 rVN 7 4
# this give a good output, unfortunately it's missing the country names
print(df.loc[:,measures.columns].where(mask))
Measure1 Measure2
0 6 3
1 3 NaN
2 7 2
3 7 8
4 NaN 3
5 7 4
I have looked at several questions related to mine ([1], [2], [3], [4], [5], [6], [7], [8]), but could not find one that answered my concern.
Convert Nan to Empty String in PandasUse df. replace(np. nan,'',regex=True) method to replace all NaN values to an empty string in the Pandas DataFrame column.
We can replace the NaN with an empty string using df. replace() function. This function will replace an empty string inplace of the NaN value.
You can replace values of all or selected columns based on the condition of pandas DataFrame by using DataFrame. loc[ ] property. The loc[] is used to access a group of rows and columns by label(s) or a boolean array. It can access and can also manipulate the values of pandas DataFrame.
Assign only columns of interest:
cols = ['Measure1','Measure2']
mask = df[cols].applymap(lambda x: isinstance(x, (int, float)))
df[cols] = df[cols].where(mask)
print (df)
Country Name Measure1 Measure2
0 uFv 7 8
1 vCr 5 NaN
2 qPp 2 6
3 QIC 10 10
4 Suy NaN 8
5 eFS 6 4
A meta-question, Is it normal that it takes me more than 3 hours to formulate a question here (including research) ?
In my opinion yes, create good question is really hard.
cols = ['Measure1','Measure2']
df[cols] = df[cols].applymap(lambda x: x if not isinstance(x, str) else np.nan)
or
df[cols] = df[cols].applymap(lambda x: np.nan if isinstance(x, str) else x)
Result:
In [22]: df
Out[22]:
Country Name Measure1 Measure2
0 nBl 10.0 9.0
1 Ayp 8.0 NaN
2 diz 4.0 1.0
3 aad 7.0 3.0
4 JYI NaN 10.0
5 BJO 9.0 8.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With