Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

functional programming and python pandas dataframes in pipelines

I was wondering which is the best practice of functional-programming to write a pipeline of functions which process pandas dataframes - or any other mutable input types - as input of functions.

Here are 2 ideas but hope that something better exists :)

idea # 1 - no functional programming but saving memory

def foo(df, param):
    df['col'] = df['col'] + param

def pipeline(df):
    foo(df, 1)
    foo(df, 2)
    foo(df, 3)

idea # 2 - more functional programming but wasting memory by doing .copy()

def foo(df, param):
    df = df.copy()
    df['col'] = df['col'] + param
    return df

def pipeline(df):
    df1 = foo(df, 1)
    df2 = foo(df1, 2)
    df3 = foo(df2, 3)
like image 251
user1403546 Avatar asked Dec 23 '22 01:12

user1403546


1 Answers

You can chain function calls operating on the dataframe. Also take a look at DataFrame.pipe in pandas. Something like this, adding in a couple of non-foo operations:

df = (df.pipe(foo,1)
      .pipe(foo,2)
      .pipe(foo,3)
      .drop(columns=['drop','these'])
      .assign(NEW_COL=lambda x: x['OLD_COL'] / 10))

df will be the first argument passed to foo when you use pipe.

like image 198
T Burgis Avatar answered Jan 04 '23 01:01

T Burgis