Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to replace 'any strings' with nan in pandas DataFrame using a boolean mask?

I have a 227x4 DataFrame with country names and numerical values to clean (wrangle ?).

Here's an abstraction of the DataFrame:

import pandas as pd
import random
import string
import numpy as np
pdn = pd.DataFrame(["".join([random.choice(string.ascii_letters) for i in range(3)]) for j in range (6)], columns =['Country Name'])
measures = pd.DataFrame(np.random.random_integers(10,size=(6,2)), columns=['Measure1','Measure2'])
df = pdn.merge(measures, how= 'inner', left_index=True, right_index =True)

df.iloc[4,1] = 'str'
df.iloc[1,2] = 'stuff'
print(df)

  Country Name Measure1 Measure2
0          tua        6        3
1          MDK        3    stuff
2          RJU        7        2
3          WyB        7        8
4          Nnr      str        3
5          rVN        7        4

How do I replace string values with np.nan in all columns without touching the country names?

I tried using a boolean mask:

mask = df.loc[:,measures.columns].applymap(lambda x: isinstance(x, (int, float))).values
print(mask)

[[ True  True]
 [ True False]
 [ True  True]
 [ True  True]
 [False  True]
 [ True  True]]

# I thought the following would replace by default false with np.nan in place, but it didn't
df.loc[:,measures.columns].where(mask, inplace=True)
print(df)

  Country Name Measure1 Measure2
0          tua        6        3
1          MDK        3    stuff
2          RJU        7        2
3          WyB        7        8
4          Nnr      str        3
5          rVN        7        4


# this give a good output, unfortunately it's missing the country names
print(df.loc[:,measures.columns].where(mask))

  Measure1 Measure2
0        6        3
1        3      NaN
2        7        2
3        7        8
4      NaN        3
5        7        4

I have looked at several questions related to mine ([1], [2], [3], [4], [5], [6], [7], [8]), but could not find one that answered my concern.

like image 838
Malik Koné Avatar asked Oct 29 '17 14:10

Malik Koné


People also ask

How do you replace all NaN values with string in pandas?

Convert Nan to Empty String in PandasUse df. replace(np. nan,'',regex=True) method to replace all NaN values to an empty string in the Pandas DataFrame column.

How do you replace NaN values in a string?

We can replace the NaN with an empty string using df. replace() function. This function will replace an empty string inplace of the NaN value.

How do you conditionally replace values in pandas?

You can replace values of all or selected columns based on the condition of pandas DataFrame by using DataFrame. loc[ ] property. The loc[] is used to access a group of rows and columns by label(s) or a boolean array. It can access and can also manipulate the values of pandas DataFrame.


2 Answers

Assign only columns of interest:

cols = ['Measure1','Measure2']
mask = df[cols].applymap(lambda x: isinstance(x, (int, float)))

df[cols] = df[cols].where(mask)
print (df)
  Country Name Measure1 Measure2
0          uFv        7        8
1          vCr        5      NaN
2          qPp        2        6
3          QIC       10       10
4          Suy      NaN        8
5          eFS        6        4

A meta-question, Is it normal that it takes me more than 3 hours to formulate a question here (including research) ?

In my opinion yes, create good question is really hard.

like image 78
jezrael Avatar answered Sep 30 '22 00:09

jezrael


cols = ['Measure1','Measure2']
df[cols] = df[cols].applymap(lambda x: x if not isinstance(x, str) else np.nan)

or

df[cols] = df[cols].applymap(lambda x: np.nan if isinstance(x, str) else x)

Result:

In [22]: df
Out[22]:
  Country Name  Measure1  Measure2
0          nBl      10.0       9.0
1          Ayp       8.0       NaN
2          diz       4.0       1.0
3          aad       7.0       3.0
4          JYI       NaN      10.0
5          BJO       9.0       8.0
like image 39
MaxU - stop WAR against UA Avatar answered Sep 30 '22 01:09

MaxU - stop WAR against UA