Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why Does this DataFrame Modification within Function Change Global Outside Function?

Why does the function below change the global DataFrame named df? Shouldn't it just change a local df within the function, but not the global df?

import pandas as pd

df = pd.DataFrame()

def adding_var_inside_function(df):
    df['value'] = 0

print(df.columns) # Index([], dtype='object')
adding_var_inside_function(df)
print(df.columns) # Index([u'value'], dtype='object')
like image 895
Michael Avatar asked Jun 15 '16 21:06

Michael


People also ask

Can we modify a data inside a DataFrame?

Although DataFrames are meant to be populated by reading already organized data from external files, many times you will need to somehow manage and modify already existing columns (and rows) in a DF.

Which of the following argument is used to make changes permanent in a DataFrame while performing some operations?

The 'inplace=True' argument stands for the data frame has to make changes permanent eg.

What do you mean by reshaping of DataFrame?

melt() function is used to reshape a DataFrame from a wide to a long format. It is useful to get a DataFrame where one or more columns are identifier variables, and the other columns are unpivoted to the row axis leaving only two non-identifier columns named variable and value by default.

What is data transformation in pandas?

A guide on how to do data transformation with toy datasets in Pandas. Photo by Corinne Kutz on Unsplash. Most real-world dataset is dirty. Before analyzing the dataset, you need to transform this dataset. This process is called data transformation.


1 Answers

from docs:

Mutability and copying of data

All pandas data structures are value-mutable (the values they contain can be altered) but not always size-mutable. The length of a Series cannot be changed, but, for example, columns can be inserted into a DataFrame. However, the vast majority of methods produce new objects and leave the input data untouched. In general, though, we like to favor immutability where sensible.

Here is another example, showing values (cell's) mutability:

In [21]: df
Out[21]:
   a  b  c
0  3  2  0
1  3  3  1
2  4  0  0
3  2  3  2
4  0  4  4

In [22]: df2 = df

In [23]: df2.loc[0, 'a'] = 100

In [24]: df
Out[24]:
     a  b  c
0  100  2  0
1    3  3  1
2    4  0  0
3    2  3  2
4    0  4  4

df2 is a reference to df

In [28]: id(df) == id(df2)
Out[28]: True

Your function, that won't mutate the argument DF:

def adding_var_inside_function(df):
    df = df.copy()
    df['value'] = 0
    return df

In [30]: df
Out[30]:
     a  b  c
0  100  2  0
1    3  3  1
2    4  0  0
3    2  3  2
4    0  4  4

In [31]: adding_var_inside_function(df)
Out[31]:
     a  b  c  value
0  100  2  0      0
1    3  3  1      0
2    4  0  0      0
3    2  3  2      0
4    0  4  4      0

In [32]: df
Out[32]:
     a  b  c
0  100  2  0
1    3  3  1
2    4  0  0
3    2  3  2
4    0  4  4
like image 116
MaxU - stop WAR against UA Avatar answered Nov 15 '22 08:11

MaxU - stop WAR against UA