Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What rules does Pandas use to generate a view vs a copy?

People also ask

What is view and copy of DataFrame?

Views versus copies To put it very simply, a view is a subset of the original object ( DataFrame or Series ) linked to the original source, while a copy is an entirely new object .

Does LOC return a copy or view?

loc[mask] returns a new DataFrame with a copy of the data from df . Then df.

Why do we use copy () in pandas?

The copy() method returns a copy of the DataFrame. By default, the copy is a "deep copy" meaning that any changes made in the original DataFrame will NOT be reflected in the copy.

When should I use pandas copy?

copy() Pandas copy() function is used to create a copy of the Pandas object. Variables are also used to generate a copy of the object. Still, variables are just pointer to an object, and any change in new data will also change the previous data.


Here's the rules, subsequent override:

  • All operations generate a copy

  • If inplace=True is provided, it will modify in-place; only some operations support this

  • An indexer that sets, e.g. .loc/.iloc/.iat/.at will set inplace.

  • An indexer that gets on a single-dtyped object is almost always a view (depending on the memory layout it may not be that's why this is not reliable). This is mainly for efficiency. (the example from above is for .query; this will always return a copy as its evaluated by numexpr)

  • An indexer that gets on a multiple-dtyped object is always a copy.

Your example of chained indexing

df[df.C <= df.B].loc[:,'B':'E']

is not guaranteed to work (and thus you shoulld never do this).

Instead do:

df.loc[df.C <= df.B, 'B':'E']

as this is faster and will always work

The chained indexing is 2 separate python operations and thus cannot be reliably intercepted by pandas (you will oftentimes get a SettingWithCopyWarning, but that is not 100% detectable either). The dev docs, which you pointed, offer a much more full explanation.