I'm struggling with something that seemed to be trivial but apparently isn't. General picture: data
- pandas dataframe - contains(among others) TOTAL_VISITS
and NUM_PRINTS
columns.
Objective: given num_prints
parameter, find rows where NUM_prints = num_prints
and fill nan
s with a given number.
Where I stopped and it didn't make sense anymore:
indices= data['NUM_PRINTS'] == num_prints
data.loc[indices,'TOTAL_VISITS'].fillna(5,inplace=True)
This should work as much as I know and read. didn't fill nans with anything in practice, seemed like it worked with a copy or something as it didn't change anything in the original object.
What works:
data.loc[indices,'TOTAL_VISITS'] = 2
this does fill the column with 2's on the num_print
condition, but does not consider nans.
data['TOTAL_VISITS'].fillna(0, inplace=True)
this does fill the nans in total visits with 0, but does not consider the num_prints
condition.
I'm kinda hopeless as a regular for
loop with .iloc
and conditions takes way too long than I can handle.
I think this one works fine
data['TOTAL_VISITS'] = np.where(data['NUM_PRINTS'] == 1, 100, data['TOTAL_VISITS'])
I think need filter in both sides and apply fillna
only for filtered rows:
np.random.seed(1213)
c = ['TOTAL_VISITS', 'A', 'NUM_PRINTS']
data = pd.DataFrame(np.random.choice([1,np.nan,3,4], size=(10,3)), columns=c)
print (data)
TOTAL_VISITS A NUM_PRINTS
0 1.0 4.0 4.0
1 NaN 3.0 1.0
2 1.0 1.0 1.0
3 4.0 3.0 3.0
4 1.0 3.0 4.0
5 4.0 4.0 3.0
6 4.0 1.0 4.0
7 NaN 4.0 3.0
8 NaN NaN 3.0
9 3.0 NaN 1.0
num_prints = 1
indices= data['NUM_PRINTS'] == num_prints
data.loc[indices,'TOTAL_VISITS'] = data.loc[indices,'TOTAL_VISITS'].fillna(100)
#alternative
#data.loc[indices,'TOTAL_VISITS'] = data['TOTAL_VISITS'].fillna(100)
print (data)
TOTAL_VISITS A NUM_PRINTS
0 1.0 4.0 4.0
1 100.0 3.0 1.0
2 1.0 1.0 1.0
3 4.0 3.0 3.0
4 1.0 3.0 4.0
5 4.0 4.0 3.0
6 4.0 1.0 4.0
7 NaN 4.0 3.0
8 NaN NaN 3.0
9 3.0 NaN 1.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With