Passing pandas DataFrame by reference

Question

My question is regarding immutability of pandas DataFrame when it is passed by reference. Consider the following code:

import pandas as pd

def foo(df1, df2):

    df1['B'] = 1
    df1 = df1.join(df2['C'], how='inner')

    return()

def main(argv = None):

    # Create DataFrames. 
    df1 = pd.DataFrame(range(0,10,2), columns=['A'])
    df2 = pd.DataFrame(range(1,11,2), columns=['C'])

    foo(df1, df2)    # Pass df1 and df2 by reference.

    print df1

    return(0)

if __name__ == '__main__':
    status = main()
    sys.exit(status)

The output is

and not

In fact, if foo is defined as

def foo(df1, df2):

    df1 = df1.join(df2['C'], how='inner')
    df1['B'] = 1

    return()

(i.e. the "join" statement before the other statement) then the output is simply

I'm intrigued as to why this is the case. Any insights would be appreciated.

Jezzamon · Accepted Answer

The issue is because of this line:

df1 = df1.join(df2['C'], how='inner')

df1.join(df2['C'], how='inner') returns a new dataframe. After this line, df1 no longer refers to the same dataframe as the argument, but a new one, because it's been reassigned to the new result. The first dataframe continues to exist, unmodified. This isn't really a pandas issue, just the general way python, and most other languages, work.

Some pandas functions have an inplace argument, which would do what you want, however the join operation doesn't. If you need to modify a dataframe, you'll have to return this new one instead and reassign it outside the function.

Passing pandas DataFrame by reference

Tags:

python

pass-by-reference

pandas

dataframe

immutability

labrynth

1 Answers

Jezzamon

Recent Activity

Donate For Us

Passing pandas DataFrame by reference

Tags:

python

pass-by-reference

pandas

dataframe

immutability

labrynth

1 Answers

Jezzamon

Related questions

Recent Activity

Donate For Us