Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replacing values in specific columns in a Pandas Dataframe, when number of columns are unknown

I am brand new to Python and stacks exchange. I have been trying to replace invalid values ( x<-3 and x>12) with np.nan in specific columns.

I don't know how many columns I will have to deal with and thus will have to create a general code that takes this into account. I do however know, that the first two columns are ids and names respectively. I have searched google and stacks exchange for a solution but haven't been able to find a solution that solves my specific objective.

My question is; How would one replace values found in the third column and onwards?

My dataframe looks like this;

Data

I tried this line:

Data[Data > 12.0] = np.nan.

this replaced the first two columns with nan

1st attempt

I tried this line:

Data[(Data.iloc[(range(2,Columns))] >=12) & (Data.iloc[(range(2,Columns))]<=-3)] = np.nan

where,

Columns = len(Data.columns)

This is clearly wrong replacing all values in rows 2 to 6 (Columns = 7).

2nd attempt

Any thoughts would be greatly appreciated.

Python 3.6.1 64bits, Qt 5.6.2, PyQt5 5.6 on Darwin

like image 382
J.Doe Avatar asked Nov 20 '25 04:11

J.Doe


1 Answers

You're looking for the applymap() method.

import pandas as pd
import numpy as np

# get the columns after the second one
cols = Data.columns[2:]

# apply mask to those columns
new_df = Data[cols].applymap(lambda x: np.nan if x > 12 or x <= -3 else x)

Documentation: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.applymap.html

This approach assumes your columns after the second contain float or int values.

like image 71
economy Avatar answered Nov 24 '25 13:11

economy



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!