Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas: fill nans given a condition

Tags:

python

pandas

I'm struggling with something that seemed to be trivial but apparently isn't. General picture: data - pandas dataframe - contains(among others) TOTAL_VISITS and NUM_PRINTS columns.

Objective: given num_prints parameter, find rows where NUM_prints = num_prints and fill nans with a given number.

Where I stopped and it didn't make sense anymore:

indices= data['NUM_PRINTS'] == num_prints

data.loc[indices,'TOTAL_VISITS'].fillna(5,inplace=True)

This should work as much as I know and read. didn't fill nans with anything in practice, seemed like it worked with a copy or something as it didn't change anything in the original object.

What works:

data.loc[indices,'TOTAL_VISITS'] = 2

this does fill the column with 2's on the num_print condition, but does not consider nans.

data['TOTAL_VISITS'].fillna(0, inplace=True)

this does fill the nans in total visits with 0, but does not consider the num_prints condition.

I'm kinda hopeless as a regular for loop with .iloc and conditions takes way too long than I can handle.

like image 390
user9548409 Avatar asked Mar 25 '18 14:03

user9548409


2 Answers

I think this one works fine

data['TOTAL_VISITS'] = np.where(data['NUM_PRINTS'] == 1, 100, data['TOTAL_VISITS'])
like image 194
Shashank Singh Yadav Avatar answered Nov 04 '22 08:11

Shashank Singh Yadav


I think need filter in both sides and apply fillna only for filtered rows:

np.random.seed(1213)

c = ['TOTAL_VISITS', 'A', 'NUM_PRINTS']
data = pd.DataFrame(np.random.choice([1,np.nan,3,4], size=(10,3)), columns=c)
print (data)
   TOTAL_VISITS    A  NUM_PRINTS
0           1.0  4.0         4.0
1           NaN  3.0         1.0
2           1.0  1.0         1.0
3           4.0  3.0         3.0
4           1.0  3.0         4.0
5           4.0  4.0         3.0
6           4.0  1.0         4.0
7           NaN  4.0         3.0
8           NaN  NaN         3.0
9           3.0  NaN         1.0


num_prints = 1
indices= data['NUM_PRINTS'] == num_prints
data.loc[indices,'TOTAL_VISITS'] = data.loc[indices,'TOTAL_VISITS'].fillna(100)
#alternative
#data.loc[indices,'TOTAL_VISITS'] = data['TOTAL_VISITS'].fillna(100)
print (data)
   TOTAL_VISITS    A  NUM_PRINTS
0           1.0  4.0         4.0
1         100.0  3.0         1.0
2           1.0  1.0         1.0
3           4.0  3.0         3.0
4           1.0  3.0         4.0
5           4.0  4.0         3.0
6           4.0  1.0         4.0
7           NaN  4.0         3.0
8           NaN  NaN         3.0
9           3.0  NaN         1.0
like image 45
jezrael Avatar answered Nov 04 '22 07:11

jezrael