Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas views vs copy : the docs says "nobody knows"?

Tags:

python

pandas

There's lots of questions on StackOverflow about chained indexing and whether a particular operation makes a view or a copy. (for instance, here or here). I still don't fully get it, but the amazing part is the official docs say "nobody knows". (!?!??) Here's an example from the docs; can you tell me if they really meant that, or if they're just being flippant?

From http://pandas-docs.github.io/pandas-docs-travis/indexing.html?highlight=view#why-does-assignment-fail-when-using-chained-indexing

def do_something(df):
   foo = df[['bar', 'baz']]  # Is foo a view? A copy? Nobody knows!
   # ... many lines here ...
   foo['quux'] = value       # We don't know whether this will modify df or not!
   return foo

Seriously? For that specific example, is it really true that "nobody knows" and this is non-deterministic? Will that really behave differently on two different dataframes? The rules are really that complex? Or did the guy mean there is a definite answer but just that most people aren't aware of it?

like image 243
user2543623 Avatar asked Aug 23 '16 12:08

user2543623


People also ask

What does Copy () do in Python pandas?

Pandas DataFrame copy() Method The copy() method returns a copy of the DataFrame. By default, the copy is a "deep copy" meaning that any changes made in the original DataFrame will NOT be reflected in the copy.

Is pandas apply faster than Iterrows?

By using apply and specifying one as the axis, we can run a function on every row of a dataframe. This solution also uses looping to get the job done, but apply has been optimized better than iterrows , which results in faster runtimes.

Does ILOC return a copy?

They do not make copies of the row. You can use the copy() method on the row to solve your problem.


1 Answers

I think I can demonstrate something to clarify your situation, in your example, initially it will be a view but once you try to modify by adding a column it turns into a copy. You can test this by looking at the attribute ._is_view:

In [29]:
df = pd.DataFrame(np.random.randn(5,3), columns=list('abc'))
def doSomething(df):
    a = df[['b','c']]
    print('before ', a._is_view)
    a['d'] = 0
    print('after ', a._is_view)

doSomething(df)
df

before  True
after  False
Out[29]:
          a         b         c
0  0.108790  0.580745  1.820328
1  1.066503 -0.238707 -0.655881
2 -1.320731  2.038194 -0.894984
3 -0.962753 -3.961181  0.109476
4 -1.887774  0.909539  1.318677

So here we can see that initially a is a view on the original subsection of the original df, but once you add a column to this, this is no longer true and we can see that the original df is not modified.

like image 166
EdChum Avatar answered Sep 23 '22 15:09

EdChum