Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas DataFrame mutability

I am pretty new to Panda's Dataframe and it would be highly appreciated if someone can briefly discuss about the mutability of DataFrame to me with the following example:

d1=pd.date_range('1/1/2016',periods=10,freq='w')
col1=['open','high','low','close']
list1=np.random.rand(10,4)
df1=pd.DataFrame(list1,d1,col1)

To my understanding, currently df1 is a reference to a df object.

If I pass df1 or slicing of df1 (e.g. df1.iloc[2:3,1:2]) as an input to a new df, (e.g. df2=pd.DataFrame(df1)), does df2 return a new instance of dataframe or it is still referring to df1 that makes df1 exposed to df2?

Also any other point that I should pay attention to regarding mutability of DataFrame will be very much appreciated.

like image 689
user7786493 Avatar asked Jul 09 '17 06:07

user7786493


1 Answers

This:

df2 = pd.DataFrame(df1)

Constructs a new DataFrame. There is a copy parameter whose default argument is False. According to the documentation, it means:

> Copy data from inputs. Only affects DataFrame / 2d ndarray input

So data will be shared between df2 and df1 by default. If you want there to be no sharing, but rather a complete copy, do this:

df2 = pd.DataFrame(df1, copy=True)

Or more concisely and idiomatically:

df2 = df1.copy()

If you do this:

df2 = df1.iloc[2:3,1:2].copy()

You will again get an independent copy. But if you do this:

df2 = pd.DataFrame(df1.iloc[2:3,1:2])

It will probably share the data, but this style is pretty unclear if you intend to modify df, so I suggest not writing such code. Instead, if you want no copy, just say this:

df2 = df1.iloc[2:3,1:2]

In summary: if you want a reference to existing data, do not call pd.DataFrame() or any other method at all. If you want an independent copy, call .copy().

like image 141
John Zwinck Avatar answered Sep 17 '22 19:09

John Zwinck