Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

conditional fill in pandas dataframe

I have a dataframe df with float values in column A. I want to add another column B such that:

  1. B[0] = A[0]

    for i > 0...

  2. B[i] = if(np.isnan(A[i])) then A[i] else Step3
  3. B[i] = if(abs((B[i-1] - A[i]) / B[i-1]) < 0.3) then B[i-1] else A[i]

Sample dataframe df can be generated as given below

import numpy as np
import pandas as pd
df = pd.DataFrame(1000*(2+np.random.randn(500, 1)), columns=list('A'))
df.loc[1, 'A'] = np.nan
df.loc[15, 'A'] = np.nan
df.loc[240, 'A'] = np.nan
df.loc[241, 'A'] = np.nan
like image 830
Gerry Avatar asked Jan 04 '19 16:01

Gerry


3 Answers

This can be done fairly efficiently with Numba. If you are not able to use Numba, just omit @njit and your logic will run as a Python-level loop.

import numpy as np
import pandas as pd
from numba import njit

np.random.seed(0)
df = pd.DataFrame(1000*(2+np.random.randn(500, 1)), columns=['A'])
df.loc[1, 'A'] = np.nan
df.loc[15, 'A'] = np.nan
df.loc[240, 'A'] = np.nan

@njit
def recurse_nb(x):
    out = x.copy()
    for i in range(1, x.shape[0]):
        if not np.isnan(x[i]) and (abs(1 - x[i] / out[i-1]) < 0.3):
            out[i] = out[i-1]
    return out

df['B'] = recurse_nb(df['A'].values)

print(df.head(10))

             A            B
0  3764.052346  3764.052346
1          NaN          NaN
2  2978.737984  2978.737984
3  4240.893199  4240.893199
4  3867.557990  4240.893199
5  1022.722120  1022.722120
6  2950.088418  2950.088418
7  1848.642792  1848.642792
8  1896.781148  1848.642792
9  2410.598502  2410.598502
like image 105
jpp Avatar answered Nov 15 '22 14:11

jpp


Not sure what you want to do with the first B-1 and the dividing by NaN situation:

df = pd.DataFrame([1,2,3,4,5,None,6,7,8,9,10], columns=['A'])
b1 = df.A.shift(1)
b1[0] = 1
b = list(map(lambda a,b1: a if np.isnan(a) else (b1 if abs(b1-a)/b1 < 0.3 else a), df.A, b1 ))
df['B'] = b

df
       A    B
0    1.0  1.0
1    2.0  2.0
2    3.0  3.0
3    4.0  4.0
4    5.0  4.0
5    NaN  NaN
6    6.0  6.0
7    7.0  6.0
8    8.0  7.0
9    9.0  8.0
10  10.0  9.0

as per @jpp, you could also do a list comprehension version for list b:

b = [a if np.isnan(a) or abs(b-a)/b >= 0.3 else b for a,b in zip(df.A,b1)]
like image 3
tasha Avatar answered Nov 15 '22 14:11

tasha


A simple solution that I could come up with is following. I was wondering if there is more pythonic way of doing things:

 a = df['A'].values
 b = []
 b.append(t[0])
 for i in range(1, len(a)):
     if np.isnan(a[i]):
         b.append(a[i])
     else:
         b.append(b[i-1] if abs(1 - a[i]/b[i-1]) < 0.3 else a[i])
 df['B'] = b
like image 1
Gerry Avatar answered Nov 15 '22 14:11

Gerry