I was wondering which is the best practice of functional-programming to write a pipeline of functions which process pandas dataframes - or any other mutable input types - as input of functions.
Here are 2 ideas but hope that something better exists :)
idea # 1 - no functional programming but saving memory
def foo(df, param):
df['col'] = df['col'] + param
def pipeline(df):
foo(df, 1)
foo(df, 2)
foo(df, 3)
idea # 2 - more functional programming but wasting memory by doing .copy()
def foo(df, param):
df = df.copy()
df['col'] = df['col'] + param
return df
def pipeline(df):
df1 = foo(df, 1)
df2 = foo(df1, 2)
df3 = foo(df2, 3)
You can chain function calls operating on the dataframe. Also take a look at DataFrame.pipe
in pandas. Something like this, adding in a couple of non-foo operations:
df = (df.pipe(foo,1)
.pipe(foo,2)
.pipe(foo,3)
.drop(columns=['drop','these'])
.assign(NEW_COL=lambda x: x['OLD_COL'] / 10))
df
will be the first argument passed to foo
when you use pipe
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With