How to replace 'any strings' with nan in pandas DataFrame using a boolean mask?

Tags:

I have a 227x4 DataFrame with country names and numerical values to clean (wrangle ?).

Here's an abstraction of the DataFrame:

import pandas as pd
import random
import string
import numpy as np
pdn = pd.DataFrame(["".join([random.choice(string.ascii_letters) for i in range(3)]) for j in range (6)], columns =['Country Name'])
measures = pd.DataFrame(np.random.random_integers(10,size=(6,2)), columns=['Measure1','Measure2'])
df = pdn.merge(measures, how= 'inner', left_index=True, right_index =True)

df.iloc[4,1] = 'str'
df.iloc[1,2] = 'stuff'
print(df)

  Country Name Measure1 Measure2
0          tua        6        3
1          MDK        3    stuff
2          RJU        7        2
3          WyB        7        8
4          Nnr      str        3
5          rVN        7        4

How do I replace string values with np.nan in all columns without touching the country names?

I tried using a boolean mask:

mask = df.loc[:,measures.columns].applymap(lambda x: isinstance(x, (int, float))).values
print(mask)

[[ True  True]
 [ True False]
 [ True  True]
 [ True  True]
 [False  True]
 [ True  True]]

# I thought the following would replace by default false with np.nan in place, but it didn't
df.loc[:,measures.columns].where(mask, inplace=True)
print(df)

  Country Name Measure1 Measure2
0          tua        6        3
1          MDK        3    stuff
2          RJU        7        2
3          WyB        7        8
4          Nnr      str        3
5          rVN        7        4


# this give a good output, unfortunately it's missing the country names
print(df.loc[:,measures.columns].where(mask))

  Measure1 Measure2
0        6        3
1        3      NaN
2        7        2
3        7        8
4      NaN        3
5        7        4

I have looked at several questions related to mine ([1], [2], [3], [4], [5], [6], [7], [8]), but could not find one that answered my concern.

838

asked Oct 29 '17 14:10

Malik Koné

2 Answers

Assign only columns of interest:

cols = ['Measure1','Measure2']
mask = df[cols].applymap(lambda x: isinstance(x, (int, float)))

df[cols] = df[cols].where(mask)
print (df)
  Country Name Measure1 Measure2
0          uFv        7        8
1          vCr        5      NaN
2          qPp        2        6
3          QIC       10       10
4          Suy      NaN        8
5          eFS        6        4

A meta-question, Is it normal that it takes me more than 3 hours to formulate a question here (including research) ?

In my opinion yes, create good question is really hard.

answered Sep 30 '22 00:09

jezrael

cols = ['Measure1','Measure2']
df[cols] = df[cols].applymap(lambda x: x if not isinstance(x, str) else np.nan)

df[cols] = df[cols].applymap(lambda x: np.nan if isinstance(x, str) else x)

Result:

In [22]: df
Out[22]:
  Country Name  Measure1  Measure2
0          nBl      10.0       9.0
1          Ayp       8.0       NaN
2          diz       4.0       1.0
3          aad       7.0       3.0
4          JYI       NaN      10.0
5          BJO       9.0       8.0

answered Sep 30 '22 01:09

MaxU - stop WAR against UA

Related questions
                            
                                Ansible multiple hosts with port forwarding
                            
                                Correct way of transaction.rollback() with raise exception in django
                            
                                how to adjust # of ticks on Bokeh axis (labels are overlapping on small figures)
                            
                                django 1.7.8 not sending emails with password reset
                            
                                Remove keys from object not in a list in python? [duplicate]
                            
                                Python - Most elegant way to extract a substring, being given left and right borders [duplicate]
                            
                                Tensorflow: Where is tf.nn.conv2d Actually Executed?
                            
                                django admin, extending admin with custom views
                            
                                What does KFold in python exactly do?
                            
                                python xlsxwriter change row height for all rows in the sheet
                            
                                Group fields in Django's admin forms
                            
                                I cant init Google Cloud SDK on Ubuntu
                            
                                How to install libjpeg on OSX?
                            
                                How do I push new files to GitHub?
                            
                                Heiken Ashi Using pandas python
                            
                                I have string index in pandas DataFrame how can I select by startswith?
                            
                                from . import _methods ImportError: cannot import name '_methods' in cx-freeze python
                            
                                Plotting multiple boxplots in seaborn?
                            
                                Python Get Property if Object is not None
                            
                                How to specify the correlation coefficient as the loss function in keras

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to replace 'any strings' with nan in pandas DataFrame using a boolean mask?

Tags:

python

python-3.x

pandas

dataframe

numpy

Malik Koné

People also ask

2 Answers

jezrael

MaxU - stop WAR against UA

Recent Activity

Donate For Us