Pandas DataFrame mutability

Question

I am pretty new to Panda's Dataframe and it would be highly appreciated if someone can briefly discuss about the mutability of DataFrame to me with the following example:

d1=pd.date_range('1/1/2016',periods=10,freq='w')
col1=['open','high','low','close']
list1=np.random.rand(10,4)
df1=pd.DataFrame(list1,d1,col1)

To my understanding, currently df1 is a reference to a df object.

If I pass df1 or slicing of df1 (e.g. df1.iloc[2:3,1:2]) as an input to a new df, (e.g. df2=pd.DataFrame(df1)), does df2 return a new instance of dataframe or it is still referring to df1 that makes df1 exposed to df2?

Also any other point that I should pay attention to regarding mutability of DataFrame will be very much appreciated.

John Zwinck · Accepted Answer

This:

df2 = pd.DataFrame(df1)

Constructs a new DataFrame. There is a copy parameter whose default argument is False. According to the documentation, it means:

> Copy data from inputs. Only affects DataFrame / 2d ndarray input

So data will be shared between df2 and df1 by default. If you want there to be no sharing, but rather a complete copy, do this:

df2 = pd.DataFrame(df1, copy=True)

Or more concisely and idiomatically:

df2 = df1.copy()

If you do this:

df2 = df1.iloc[2:3,1:2].copy()

You will again get an independent copy. But if you do this:

df2 = pd.DataFrame(df1.iloc[2:3,1:2])

It will probably share the data, but this style is pretty unclear if you intend to modify df, so I suggest not writing such code. Instead, if you want no copy, just say this:

df2 = df1.iloc[2:3,1:2]

In summary: if you want a reference to existing data, do not call pd.DataFrame() or any other method at all. If you want an independent copy, call .copy().

Pandas DataFrame mutability

Tags:

python

pandas

dataframe

user7786493

1 Answers

John Zwinck

Recent Activity

Donate For Us

Pandas DataFrame mutability

Tags:

python

pandas

dataframe

user7786493

1 Answers

John Zwinck

Related questions

Recent Activity

Donate For Us