functional programming and python pandas dataframes in pipelines

Question

I was wondering which is the best practice of functional-programming to write a pipeline of functions which process pandas dataframes - or any other mutable input types - as input of functions.

Here are 2 ideas but hope that something better exists :)

idea # 1 - no functional programming but saving memory

def foo(df, param):
    df['col'] = df['col'] + param

def pipeline(df):
    foo(df, 1)
    foo(df, 2)
    foo(df, 3)

idea # 2 - more functional programming but wasting memory by doing .copy()

def foo(df, param):
    df = df.copy()
    df['col'] = df['col'] + param
    return df

def pipeline(df):
    df1 = foo(df, 1)
    df2 = foo(df1, 2)
    df3 = foo(df2, 3)

T Burgis · Accepted Answer

You can chain function calls operating on the dataframe. Also take a look at DataFrame.pipe in pandas. Something like this, adding in a couple of non-foo operations:

df = (df.pipe(foo,1)
      .pipe(foo,2)
      .pipe(foo,3)
      .drop(columns=['drop','these'])
      .assign(NEW_COL=lambda x: x['OLD_COL'] / 10))

df will be the first argument passed to foo when you use pipe.

functional programming and python pandas dataframes in pipelines

Tags:

python

pandas

dataframe

functional-programming

pipeline

user1403546

1 Answers

T Burgis

Recent Activity

Donate For Us

functional programming and python pandas dataframes in pipelines

Tags:

python

pandas

dataframe

functional-programming

pipeline

user1403546

1 Answers

T Burgis

Related questions

Recent Activity

Donate For Us