I have a pandas dataframe that looks like:
d = {'some_col' : ['A', 'B', 'C', 'D', 'E'],
'alert_status' : [1, 2, 0, 0, 5]}
df = pd.DataFrame(d)
Quite a few tasks at my job require the same tasks in pandas. I'm beginning to write standardized functions that will take a dataframe as a parameter and return something. Here's a simple one:
def alert_read_text(df, alert_status=None):
if (alert_status is None):
print 'Warning: A column name with the alerts must be specified'
alert_read_criteria = df[alert_status] >= 1
df[alert_status].loc[alert_read_criteria] = 1
alert_status_dict = {0 : 'Not Read',
1 : 'Read'}
df[alert_status] = df[alert_status].map(alert_status_dict)
return df[alert_status]
I'm looking to have the function return a series. This way, one could add a column to an existing data frame:
df['alert_status_text'] = alert_read_text(df, alert_status='alert_status')
However, currently, this function will correctly return a series, but also modifies the existing column. How do you make it so the original column passed in does not get modified?
Using the index of a DataFrame In addition to columns, it is also possible to pass the index of a DataFrame as argument.
The pandas DataFrame apply() function We pass the function to be applied and the axis along which to apply it as arguments. To apply the function to each column, pass 0 or 'index' to the axis parameter which is 0 by default. And to apply the function to each row, pass 1 or 'columns' to the axis parameter.
You can get the value of a cell from a pandas dataframe using df. iat[0,0] .
iloc returns a Pandas Series when one row is selected, and a Pandas DataFrame when multiple rows are selected, or if any column in full is selected. To counter this, pass a single-valued list if you require DataFrame output.
As you've discovered the passed in dataframe will be modified as params are passed by reference, this is true in python and nothing to do with pandas as such.
So if you don't want to modify the passed df then take a copy:
def alert_read_text(df, alert_status=None):
if (alert_status is None):
print 'Warning: A column name with the alerts must be specified'
copy = df.copy()
alert_read_criteria = copy[alert_status] >= 1
copy[alert_status].loc[alert_read_criteria] = 1
alert_status_dict = {0 : 'Not Read',
1 : 'Read'}
copy[alert_status] = copy[alert_status].map(alert_status_dict)
return copy[alert_status]
Also see related: pandas dataframe, copy by value
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With