There's lots of questions on StackOverflow about chained indexing and whether a particular operation makes a view or a copy. (for instance, here or here). I still don't fully get it, but the amazing part is the official docs say "nobody knows". (!?!??) Here's an example from the docs; can you tell me if they really meant that, or if they're just being flippant?
From http://pandas-docs.github.io/pandas-docs-travis/indexing.html?highlight=view#why-does-assignment-fail-when-using-chained-indexing
def do_something(df):
foo = df[['bar', 'baz']] # Is foo a view? A copy? Nobody knows!
# ... many lines here ...
foo['quux'] = value # We don't know whether this will modify df or not!
return foo
Seriously? For that specific example, is it really true that "nobody knows" and this is non-deterministic? Will that really behave differently on two different dataframes? The rules are really that complex? Or did the guy mean there is a definite answer but just that most people aren't aware of it?
Pandas DataFrame copy() Method The copy() method returns a copy of the DataFrame. By default, the copy is a "deep copy" meaning that any changes made in the original DataFrame will NOT be reflected in the copy.
By using apply and specifying one as the axis, we can run a function on every row of a dataframe. This solution also uses looping to get the job done, but apply has been optimized better than iterrows , which results in faster runtimes.
They do not make copies of the row. You can use the copy() method on the row to solve your problem.
I think I can demonstrate something to clarify your situation, in your example, initially it will be a view but once you try to modify by adding a column it turns into a copy. You can test this by looking at the attribute ._is_view
:
In [29]:
df = pd.DataFrame(np.random.randn(5,3), columns=list('abc'))
def doSomething(df):
a = df[['b','c']]
print('before ', a._is_view)
a['d'] = 0
print('after ', a._is_view)
doSomething(df)
df
before True
after False
Out[29]:
a b c
0 0.108790 0.580745 1.820328
1 1.066503 -0.238707 -0.655881
2 -1.320731 2.038194 -0.894984
3 -0.962753 -3.961181 0.109476
4 -1.887774 0.909539 1.318677
So here we can see that initially a
is a view on the original subsection of the original df, but once you add a column to this, this is no longer true and we can see that the original df is not modified.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With