Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficiently updating NaN's in a pandas dataframe from a prior row & specific columns value

I have a pandas'DataFrame, it looks like this:

# Output 
#        A     B     C     D
# 0    3.0   6.0   7.0   4.0
# 1   42.0  44.0   1.0   3.0
# 2    4.0   2.0   3.0  62.0
# 3   90.0  83.0  53.0  23.0
# 4   22.0  23.0  24.0   NaN
# 5    5.0   2.0   5.0  34.0
# 6    NaN   NaN   NaN   NaN
# 7    NaN   NaN   NaN   NaN
# 8    2.0  12.0  65.0   1.0
# 9    5.0   7.0  32.0   7.0
# 10   2.0  13.0   6.0  12.0
# 11   NaN   NaN   NaN   NaN
# 12  23.0   NaN  23.0  34.0
# 13  61.0   NaN  63.0   3.0
# 14  32.0  43.0  12.0  76.0
# 15  24.0   2.0  34.0   2.0

What I would like to do is fill the NaN's with the earliest preceding row's B value. Apart from Column D, on this row, I would like NaN's replaced with zeros.

I've looked into ffill, fillna.. neither seem to be able to do the job.

My solution so far:

def fix_abc(row, column, df):

    # If the row/column value is null/nan 
    if pd.isnull( row[column] ):

        # Get the value of row[column] from the row before
        prior = row.name
        value = df[prior-1:prior]['B'].values[0]

        # If that values empty, go to the row before that
        while pd.isnull( value ) and prior >= 1 :
            prior = prior - 1
            value = df[prior-1:prior]['B'].values[0]

    else:
        value = row[column]

    return value 

df['A'] = df.apply( lambda x: fix_abc(x,'A',df), axis=1 )
df['B'] = df.apply( lambda x: fix_abc(x,'B',df), axis=1 )
df['C'] = df.apply( lambda x: fix_abc(x,'C',df), axis=1 )


def fix_d(x):
    if pd.isnull(x['D']):
        return 0
    return x

df['D'] = df.apply( lambda x: fix_d(x), axis=1 )

It feels like this quite inefficient, and slow. So I'm wondering if there is a quicker, more efficient way to do this.

Example output;

#        A     B     C     D
# 0    3.0   6.0   7.0   3.0
# 1   42.0  44.0   1.0  42.0
# 2    4.0   2.0   3.0   4.0
# 3   90.0  83.0  53.0  90.0
# 4   22.0  23.0  24.0   0.0
# 5    5.0   2.0   5.0   5.0
# 6    2.0   2.0   2.0   0.0
# 7    2.0   2.0   2.0   0.0
# 8    2.0  12.0  65.0   2.0
# 9    5.0   7.0  32.0   5.0
# 10   2.0  13.0   6.0   2.0
# 11  13.0  13.0  13.0   0.0
# 12  23.0  13.0  23.0  23.0
# 13  61.0  13.0  63.0  61.0
# 14  32.0  43.0  12.0  32.0
# 15  24.0   2.0  34.0  24.0

I have dumped the code including the data for the dataframe into a python fiddle available (here)

like image 896
ManreeRist Avatar asked May 21 '17 14:05

ManreeRist


1 Answers

fillna allows for various ways to do the filling. In this case, column D can just fill with 0. Column B can fill via pad. And then columns A and C can fill from column B, like:

Code:

df['D'] = df.D.fillna(0)
df['B'] = df.B.fillna(method='pad')
df['A'] = df.A.fillna(df['B'])
df['C'] = df.C.fillna(df['B'])

Test Code:

df = pd.read_fwf(StringIO(u"""
       A     B     C     D
     3.0   6.0   7.0   4.0
    42.0  44.0   1.0   3.0
     4.0   2.0   3.0  62.0
    90.0  83.0  53.0  23.0
    22.0  23.0  24.0   NaN
     5.0   2.0   5.0  34.0
     NaN   NaN   NaN   NaN
     NaN   NaN   NaN   NaN
     2.0  12.0  65.0   1.0
     5.0   7.0  32.0   7.0
     2.0  13.0   6.0  12.0
     NaN   NaN   NaN   NaN
    23.0   NaN  23.0  34.0
    61.0   NaN  63.0   3.0
    32.0  43.0  12.0  76.0
    24.0   2.0  34.0   2.0"""), header=1)

print(df)

df['D'] = df.D.fillna(0)
df['B'] = df.B.fillna(method='pad')
df['A'] = df.A.fillna(df['B'])
df['C'] = df.C.fillna(df['B'])
print(df)

Results:

       A     B     C     D
0    3.0   6.0   7.0   4.0
1   42.0  44.0   1.0   3.0
2    4.0   2.0   3.0  62.0
3   90.0  83.0  53.0  23.0
4   22.0  23.0  24.0   NaN
5    5.0   2.0   5.0  34.0
6    NaN   NaN   NaN   NaN
7    NaN   NaN   NaN   NaN
8    2.0  12.0  65.0   1.0
9    5.0   7.0  32.0   7.0
10   2.0  13.0   6.0  12.0
11   NaN   NaN   NaN   NaN
12  23.0   NaN  23.0  34.0
13  61.0   NaN  63.0   3.0
14  32.0  43.0  12.0  76.0
15  24.0   2.0  34.0   2.0

       A     B     C     D
0    3.0   6.0   7.0   4.0
1   42.0  44.0   1.0   3.0
2    4.0   2.0   3.0  62.0
3   90.0  83.0  53.0  23.0
4   22.0  23.0  24.0   0.0
5    5.0   2.0   5.0  34.0
6    2.0   2.0   2.0   0.0
7    2.0   2.0   2.0   0.0
8    2.0  12.0  65.0   1.0
9    5.0   7.0  32.0   7.0
10   2.0  13.0   6.0  12.0
11  13.0  13.0  13.0   0.0
12  23.0  13.0  23.0  34.0
13  61.0  13.0  63.0   3.0
14  32.0  43.0  12.0  76.0
15  24.0   2.0  34.0   2.0
like image 95
Stephen Rauch Avatar answered Oct 05 '22 20:10

Stephen Rauch