I am pretty new to Panda's Dataframe and it would be highly appreciated if someone can briefly discuss about the mutability of DataFrame to me with the following example:
d1=pd.date_range('1/1/2016',periods=10,freq='w')
col1=['open','high','low','close']
list1=np.random.rand(10,4)
df1=pd.DataFrame(list1,d1,col1)
To my understanding, currently df1 is a reference to a df object.
If I pass df1 or slicing of df1 (e.g. df1.iloc[2:3,1:2]
) as an input to a new df, (e.g. df2=pd.DataFrame(df1)
), does df2 return a new instance of dataframe or it is still referring to df1 that makes df1 exposed to df2?
Also any other point that I should pay attention to regarding mutability of DataFrame will be very much appreciated.
This:
df2 = pd.DataFrame(df1)
Constructs a new DataFrame. There is a copy
parameter whose default argument is False
. According to the documentation, it means:
> Copy data from inputs. Only affects DataFrame / 2d ndarray input
So data will be shared between df2
and df1
by default. If you want there to be no sharing, but rather a complete copy, do this:
df2 = pd.DataFrame(df1, copy=True)
Or more concisely and idiomatically:
df2 = df1.copy()
If you do this:
df2 = df1.iloc[2:3,1:2].copy()
You will again get an independent copy. But if you do this:
df2 = pd.DataFrame(df1.iloc[2:3,1:2])
It will probably share the data, but this style is pretty unclear if you intend to modify df
, so I suggest not writing such code. Instead, if you want no copy, just say this:
df2 = df1.iloc[2:3,1:2]
In summary: if you want a reference to existing data, do not call pd.DataFrame()
or any other method at all. If you want an independent copy, call .copy()
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With