Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Conditional column arithmetic in pandas dataframe

I have a pandas dataframe with the following structure:

import numpy as np
import pandas as pd
myData = pd.DataFrame({'x': [1.2,2.4,5.3,2.3,4.1], 'y': [6.7,7.5,8.1,5.3,8.3], 'condition':[1,1,np.nan,np.nan,1],'calculation': [np.nan]*5})

print myData

   calculation  condition    x    y
0          NaN          1  1.2  6.7
1          NaN          1  2.4  7.5
2          NaN        NaN  5.3  8.1
3          NaN        NaN  2.3  5.3
4          NaN          1  4.1  8.3

I want to enter a value in the 'calculation' column based on the values in 'x' and 'y' (e.g. x/y) but only in those cells where the 'condition' column contains NaN (np.isnan(myData['condition']). The final dataframe should look like this:

   calculation  condition    x    y
0          NaN          1  1.2  6.7
1          NaN          1  2.4  7.5
2        0.654        NaN  5.3  8.1
3        0.434        NaN  2.3  5.3
4          NaN          1  4.1  8.3

I'm happy with the idea of stepping through each row in turn using a 'for' loop and then using 'if' statements to make the calculations but the actual dataframe I have is very large and I wanted do the calculations in an array-based way. Is this possible? I guess I could calculate the value for all rows and then delete the ones I don't want but this seems like a lot of wasted effort (the NaNs are quite rare in the dataframe) and, in some cases where 'condition' equals 1, the calculation cannot be made due to division by zero.

Thanks in advance.

like image 405
user1718097 Avatar asked Jan 28 '15 10:01

user1718097


2 Answers

Use where and pass your condition to it, this will then only perform your calculation where the rows meet the condition:

In [117]:

myData['calculation'] = (myData['x']/myData['y']).where(myData['condition'].isnull())
myData
Out[117]:
   calculation  condition    x    y
0          NaN          1  1.2  6.7
1          NaN          1  2.4  7.5
2     0.654321        NaN  5.3  8.1
3     0.433962        NaN  2.3  5.3
4          NaN          1  4.1  8.3
like image 192
EdChum Avatar answered Sep 20 '22 23:09

EdChum


EdChum's answer worked for me well! Still, I wanted to extend this thread as I think it will be useful for other people.

Let's assume your dataframe is

          c     x    y
0         1   1.2  6.7
1         1   2.4  7.5
2         0   5.3  8.1
3         0   2.3  5.3
4         1   4.1  8.3

and you would like to update 0s in column c with associated x/y.

          c     x    y
0         1   1.2  6.7
1         1   2.4  7.5
2      0.65   5.3  8.1
3      0.43   2.3  5.3
4         1   4.1  8.3

You can do

myData['c'] = (myData['x']/myData['y']).where(cond=myData['c']==0, other=myData['c'])

or

myData['c'].where(cond=myData['c'] != 0, other=myData['x']/myData['y'], inplace=True)

In both cases where 'cond' is not satisfied, 'other' is performed. In the second code snippet, inplace flag also works nicely (as it would also in the first code snippet.)

I found these solutions from pandas official site "where" and pandas official site "indexing"

This kind of operations are exactly what I need most of the time. I am new to Pandas and it took me a while to find this useful thread. Could anyone recommend some comprehensive tutorials to practice these types of arithmetic operations? I need to "filter/ groupby/ slice a dataframe then apply different functions/operations to each group/slice separately or all at once and keep it all inplace." Cheers!

like image 37
GuD Avatar answered Sep 19 '22 23:09

GuD