Why does the function below change the global DataFrame
named df
? Shouldn't it just change a local df
within the function, but not the global df
?
import pandas as pd
df = pd.DataFrame()
def adding_var_inside_function(df):
df['value'] = 0
print(df.columns) # Index([], dtype='object')
adding_var_inside_function(df)
print(df.columns) # Index([u'value'], dtype='object')
Although DataFrames are meant to be populated by reading already organized data from external files, many times you will need to somehow manage and modify already existing columns (and rows) in a DF.
The 'inplace=True' argument stands for the data frame has to make changes permanent eg.
melt() function is used to reshape a DataFrame from a wide to a long format. It is useful to get a DataFrame where one or more columns are identifier variables, and the other columns are unpivoted to the row axis leaving only two non-identifier columns named variable and value by default.
A guide on how to do data transformation with toy datasets in Pandas. Photo by Corinne Kutz on Unsplash. Most real-world dataset is dirty. Before analyzing the dataset, you need to transform this dataset. This process is called data transformation.
from docs:
Mutability and copying of data
All pandas data structures are value-mutable (the values they contain can be altered) but not always size-mutable. The length of a Series cannot be changed, but, for example, columns can be inserted into a DataFrame. However, the vast majority of methods produce new objects and leave the input data untouched. In general, though, we like to favor immutability where sensible.
Here is another example, showing values (cell's) mutability:
In [21]: df
Out[21]:
a b c
0 3 2 0
1 3 3 1
2 4 0 0
3 2 3 2
4 0 4 4
In [22]: df2 = df
In [23]: df2.loc[0, 'a'] = 100
In [24]: df
Out[24]:
a b c
0 100 2 0
1 3 3 1
2 4 0 0
3 2 3 2
4 0 4 4
df2
is a reference to df
In [28]: id(df) == id(df2)
Out[28]: True
Your function, that won't mutate the argument DF:
def adding_var_inside_function(df):
df = df.copy()
df['value'] = 0
return df
In [30]: df
Out[30]:
a b c
0 100 2 0
1 3 3 1
2 4 0 0
3 2 3 2
4 0 4 4
In [31]: adding_var_inside_function(df)
Out[31]:
a b c value
0 100 2 0 0
1 3 3 1 0
2 4 0 0 0
3 2 3 2 0
4 0 4 4 0
In [32]: df
Out[32]:
a b c
0 100 2 0
1 3 3 1
2 4 0 0
3 2 3 2
4 0 4 4
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With