Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Return output of function that takes pandas dataframe as a parameter

Tags:

python

pandas

I have a pandas dataframe that looks like:

d = {'some_col' : ['A', 'B', 'C', 'D', 'E'],
     'alert_status' : [1, 2, 0, 0, 5]}
df = pd.DataFrame(d)

Quite a few tasks at my job require the same tasks in pandas. I'm beginning to write standardized functions that will take a dataframe as a parameter and return something. Here's a simple one:

def alert_read_text(df, alert_status=None):
    if (alert_status is None):
        print 'Warning: A column name with the alerts must be specified'
    alert_read_criteria = df[alert_status] >= 1
    df[alert_status].loc[alert_read_criteria] = 1
    alert_status_dict = {0 : 'Not Read',
                         1 : 'Read'}
    df[alert_status] = df[alert_status].map(alert_status_dict)
    return df[alert_status]

I'm looking to have the function return a series. This way, one could add a column to an existing data frame:

df['alert_status_text'] = alert_read_text(df, alert_status='alert_status')

However, currently, this function will correctly return a series, but also modifies the existing column. How do you make it so the original column passed in does not get modified?

like image 364
DataSwede Avatar asked Jul 31 '14 22:07

DataSwede


People also ask

Can I pass DataFrame as argument python?

Using the index of a DataFrame In addition to columns, it is also possible to pass the index of a DataFrame as argument.

How do you pass a DataFrame as a function argument?

The pandas DataFrame apply() function We pass the function to be applied and the axis along which to apply it as arguments. To apply the function to each column, pass 0 or 'index' to the axis parameter which is 0 by default. And to apply the function to each row, pass 1 or 'columns' to the axis parameter.

How do you return a value from a DataFrame in python?

You can get the value of a cell from a pandas dataframe using df. iat[0,0] .

What is ILOC return?

iloc returns a Pandas Series when one row is selected, and a Pandas DataFrame when multiple rows are selected, or if any column in full is selected. To counter this, pass a single-valued list if you require DataFrame output.


1 Answers

As you've discovered the passed in dataframe will be modified as params are passed by reference, this is true in python and nothing to do with pandas as such.

So if you don't want to modify the passed df then take a copy:

def alert_read_text(df, alert_status=None):
    if (alert_status is None):
        print 'Warning: A column name with the alerts must be specified'
    copy = df.copy()
    alert_read_criteria = copy[alert_status] >= 1
    copy[alert_status].loc[alert_read_criteria] = 1
    alert_status_dict = {0 : 'Not Read',
                         1 : 'Read'}
    copy[alert_status] = copy[alert_status].map(alert_status_dict)
    return copy[alert_status]

Also see related: pandas dataframe, copy by value

like image 71
EdChum Avatar answered Oct 12 '22 08:10

EdChum