Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

DataFrame.apply in python pandas alters both original and duplicate DataFrames

Tags:

python

pandas

I'm having a bit of trouble altering a duplicated pandas DataFrame and not having the edits apply to both the duplicate and the original DataFrame.

Here's an example. Say I create an arbitrary DataFrame from a list of dictionaries:

In [67]: d = [{'a':3, 'b':5}, {'a':1, 'b':1}]

In [68]: d = DataFrame(d)

In [69]: d

Out[69]: 
   a  b
0  3  5
1  1  1

Then I assign the 'd' dataframe to variable 'e' and apply some arbitrary math to column 'a' using apply:

In [70]: e = d

In [71]: e['a'] = e['a'].apply(lambda x: x + 1)

The problem arises in that the apply function apparently applies to both the duplicate DataFrame 'e' and original DataFrame 'd', which I cannot for the life of me figure out:

In [72]: e # duplicate DataFrame
Out[72]: 
   a  b
0  4  5
1  2  1

In [73]: d # original DataFrame, notice the alterations to frame 'e' were also applied
Out[73]:  
   a  b
0  4  5
1  2  1

I've searched both the pandas documentation and Google for a reason why this would be so, but to no avail. I can't understand what is going on here at all.

I've also tried the math operations using a element-wise operation (e.g., e['a'] = [i + 1 for i in e['a']] ), but the problem persists. Is there a quirk in the pandas DataFrame type that I'm not aware of? I appreciate any insight someone might be able to offer.

like image 907
MikeGruz Avatar asked Jun 01 '12 04:06

MikeGruz


People also ask

Are two pandas DataFrames identical?

The output is False because the two dataframes are not equal to each other. They have different elements. Example #2: Use equals() function to test for equality between two data frame object with NaN values. Note : NaNs in the same location are considered equal.

How do I duplicate a pandas DataFrame?

To copy Pandas DataFrame, use the copy() method. The DataFrame. copy() method makes a copy of the provided object's indices and data. The copy() method accepts one parameter called deep, and it returns the Series or DataFrame that matches the caller.

How do I merge two DataFrames in pandas and remove duplicates?

To concatenate DataFrames, use the concat() method, but to ignore duplicates, use the drop_duplicates() method.


1 Answers

This is not a pandas-specific issue. In Python, assignment never copies anything:

>>> a = [1,2,3]
>>> b = a
>>> b[0] = 'WHOA!'
>>> a
['WHOA!', 2, 3]

If you want a new DataFrame, make a copy with e = d.copy().

Edit: I should clarify that assignment to a bare name never copies anything. Assignment to an item or attribute (e.g., a[1] = x or a.foo = bar) is converted into method calls under the hood and may do copying depending on what kind of object a is.

like image 147
BrenBarn Avatar answered Sep 24 '22 04:09

BrenBarn