I am trying to understand how copying a pandas data frame works. When I assign a copy of an object in python I am not used to changes to the original object affecting copies of that object. For example:
x = 3
y = x
x = 4
print(y)
3
While x
has subsequently been changed, y remains the same. In contrast, when I make changes to a pandas df
after assigning it to a copy df1
the copy is also affected by changes to the original DataFrame.
import pandas as pd
import numpy as np
def minusone(x):
return int(x) - 1
df = pd.DataFrame({"A": [10,20,30,40,50], "B": [20, 30, 10, 40, 50], "C": [32, 234, 23, 23, 42523]})
df1 = df
print(df1['A'])
0 10
1 20
2 30
3 40
4 50
Name: A, dtype: int64
df['A'] = np.vectorize(minusone)(df['A'])
print(df1['A'])
0 9
1 19
2 29
3 39
4 49
Name: A, dtype: int64
The solution appears to be making a deep copy with copy.deepcopy()
, but because this behavior is different from the behavior I am used to in python I was wondering if someone could explain what the reasoning behind this difference is or if it is a bug.
In your first example, you did not make a change to the value of x
. You assigned a new value to x
.
In your second example, you did modify the value of df
, by changing one of its columns.
You can see the effect with builtin types too:
>>> x = []
>>> y = x
>>> x.append(1)
>>> y
[1]
The behavior is not specific to Pandas; it is fundamental to Python. There are many, many questions on this site about this same issue, all stemming from the same misunderstanding. The syntax
barename = value
does not have the same behavior as any other construct in Python.
When using name[key] = value
, or name.attr = value
or name.methodcall()
, you may be mutating the value of the object referred to by name
, you may be copying something, etc. By using name = value
(where name
is a single identifier, no dots, no brackets, etc.), you never mutate anything, and never copy anything.
In your first example, you used the syntax x = ...
. In your second example, you used the syntax df['A'] = ...
. These are not the same syntax, so you can't assume they have the same behavior.
The way to make a copy depends on the kind of object you're trying to copy. For your case, use df1 = df.copy()
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With