I'm having a bit of trouble altering a duplicated pandas DataFrame and not having the edits apply to both the duplicate and the original DataFrame.
Here's an example. Say I create an arbitrary DataFrame from a list of dictionaries:
In [67]: d = [{'a':3, 'b':5}, {'a':1, 'b':1}]
In [68]: d = DataFrame(d)
In [69]: d
Out[69]:
a b
0 3 5
1 1 1
Then I assign the 'd' dataframe to variable 'e' and apply some arbitrary math to column 'a' using apply:
In [70]: e = d
In [71]: e['a'] = e['a'].apply(lambda x: x + 1)
The problem arises in that the apply function apparently applies to both the duplicate DataFrame 'e' and original DataFrame 'd', which I cannot for the life of me figure out:
In [72]: e # duplicate DataFrame
Out[72]:
a b
0 4 5
1 2 1
In [73]: d # original DataFrame, notice the alterations to frame 'e' were also applied
Out[73]:
a b
0 4 5
1 2 1
I've searched both the pandas documentation and Google for a reason why this would be so, but to no avail. I can't understand what is going on here at all.
I've also tried the math operations using a element-wise operation (e.g., e['a'] = [i + 1 for i in e['a']]
), but the problem persists. Is there a quirk in the pandas DataFrame type that I'm not aware of? I appreciate any insight someone might be able to offer.
The output is False because the two dataframes are not equal to each other. They have different elements. Example #2: Use equals() function to test for equality between two data frame object with NaN values. Note : NaNs in the same location are considered equal.
To copy Pandas DataFrame, use the copy() method. The DataFrame. copy() method makes a copy of the provided object's indices and data. The copy() method accepts one parameter called deep, and it returns the Series or DataFrame that matches the caller.
To concatenate DataFrames, use the concat() method, but to ignore duplicates, use the drop_duplicates() method.
This is not a pandas-specific issue. In Python, assignment never copies anything:
>>> a = [1,2,3]
>>> b = a
>>> b[0] = 'WHOA!'
>>> a
['WHOA!', 2, 3]
If you want a new DataFrame, make a copy with e = d.copy()
.
Edit: I should clarify that assignment to a bare name never copies anything. Assignment to an item or attribute (e.g., a[1] = x
or a.foo = bar
) is converted into method calls under the hood and may do copying depending on what kind of object a
is.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With