Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Automating fill with calculation function in python

Tags:

python

pandas

What I got so far is the code below and it works fine and brings the results it should: It fills df['c'] with the calculation previous c * b if there is no c given. The problem is that I have to apply this to a bigger data set len(df.index) = ca. 10.000, so the function I have so far is inappropriate since I would have to write a couple of thousand times: df['c'] = df.apply(func, axis =1). A while loop is no option in pandas for this size of dataset. Any ideas?

import pandas as pd
import numpy as np
import datetime

randn = np.random.randn
rng = pd.date_range('1/1/2011', periods=10, freq='D')

df = pd.DataFrame({'a': [None] * 10, 'b': [2, 3, 10, 3, 5, 8, 4, 1, 2, 6]},index=rng)
df["c"] =np.NaN

df["c"][0] = 1
df["c"][2] = 3


def func(x):
    if pd.notnull(x['c']):
        return x['c']
    else:
        return df.iloc[df.index.get_loc(x.name) - 1]['c'] * x['b']

df['c'] = df.apply(func, axis =1)
df['c'] = df.apply(func, axis =1)
df['c'] = df.apply(func, axis =1)
df['c'] = df.apply(func, axis =1)
df['c'] = df.apply(func, axis =1)
df['c'] = df.apply(func, axis =1)
df['c'] = df.apply(func, axis =1)
like image 609
sorownas Avatar asked Aug 31 '25 17:08

sorownas


1 Answers

Here is a nice way of solving a recurrence problem. There will be docs on this in v0.16.2 (releasing next week). See docs for numba

This will be quite performant as the real heavy lifting is done in fast jit-ted compiled code.

import pandas as pd
import numpy as np
from numba import jit

rng = pd.date_range('1/1/2011', periods=10, freq='D')
df = pd.DataFrame({'a': np.nan * 10, 'b': [2, 3, 10, 3, 5, 8, 4, 1, 2, 6]},index=rng)
df.ix[0,"c"] = 1
df.ix[2,"c"] = 3

@jit
def ffill(arr_b, arr_c):

    n = len(arr_b)
    assert len(arr_b) == len(arr_c)
    result = arr_c.copy()

    for i in range(1,n):
        if not np.isnan(arr_c[i]):
            result[i] = arr_c[i]
        else:
            result[i] = result[i-1]*arr_b[i]

    return result

df['d'] = ffill(df.b.values, df.c.values)

             a   b   c      d
2011-01-01 NaN   2   1      1
2011-01-02 NaN   3 NaN      3
2011-01-03 NaN  10   3      3
2011-01-04 NaN   3 NaN      9
2011-01-05 NaN   5 NaN     45
2011-01-06 NaN   8 NaN    360
2011-01-07 NaN   4 NaN   1440
2011-01-08 NaN   1 NaN   1440
2011-01-09 NaN   2 NaN   2880
2011-01-10 NaN   6 NaN  17280
like image 119
Jeff Avatar answered Sep 02 '25 06:09

Jeff