Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to modify a pandas DataFrame in a function so that changes are seen by the caller?

Tags:

python

pandas

I find myself doing repetitive tasks to various [pandas][1] DataFrames, so I made a function to do the processing. How do I modify df in the function process_df(df) so that the caller sees all changes (without assigning a return value)?

A simplified version of the code:

def process_df(df):
    df.columns = map(str.lower, df.columns)

df = pd.DataFrame({'A': [1], 'B': [2]})
process_df(df)
print df
   A  B 
0  1  2

EDIT new code:

def process_df(df):
    df = df.loc[:, 'A']

df = pd.DataFrame({'A': [1], 'B': [2]})
process_df(df)
print df
   A  B 
0  1  2
like image 476
ChaimG Avatar asked Feb 02 '16 05:02

ChaimG


1 Answers

Indexing a DataFrame using ix, loc, iloc, etc. returns a view of the underlying data (it is a read operation). In order to modify the contents of the frame you will need to use in-place transforms. For example,

def process_df(df):
    # drop all columns except for A
    df.drop(df.columns[df.columns != 'A'], axis=1, inplace=True)

df = DataFrame({'A':[1,2,3], 'B':[1,2,3]})
process_df(df)

To change the order of columns, you can do something like this:

def process_df(df):
    # swap A and B
    df.columns = ['B', 'A']
    df[['B', 'A']] = df[['A', 'B']]
like image 60
Igor Raush Avatar answered Oct 03 '22 19:10

Igor Raush