Views versus copies To put it very simply, a view is a subset of the original object ( DataFrame or Series ) linked to the original source, while a copy is an entirely new object .
loc[mask] returns a new DataFrame with a copy of the data from df . Then df.
The copy() method returns a copy of the DataFrame. By default, the copy is a "deep copy" meaning that any changes made in the original DataFrame will NOT be reflected in the copy.
copy() Pandas copy() function is used to create a copy of the Pandas object. Variables are also used to generate a copy of the object. Still, variables are just pointer to an object, and any change in new data will also change the previous data.
Here's the rules, subsequent override:
All operations generate a copy
If inplace=True
is provided, it will modify in-place; only some operations support this
An indexer that sets, e.g. .loc/.iloc/.iat/.at
will set inplace.
An indexer that gets on a single-dtyped object is almost always a view (depending on the memory layout it may not be that's why this is not reliable). This is mainly for efficiency. (the example from above is for .query
; this will always return a copy as its evaluated by numexpr
)
An indexer that gets on a multiple-dtyped object is always a copy.
Your example of chained indexing
df[df.C <= df.B].loc[:,'B':'E']
is not guaranteed to work (and thus you shoulld never do this).
Instead do:
df.loc[df.C <= df.B, 'B':'E']
as this is faster and will always work
The chained indexing is 2 separate python operations and thus cannot be reliably intercepted by pandas (you will oftentimes get a SettingWithCopyWarning
, but that is not 100% detectable either). The dev docs, which you pointed, offer a much more full explanation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With