In[216]: foo = pd.DataFrame({'a':[1,2,3], 'b':[3,4,5]})
In[217]: bar = foo.ix[:1]
In[218]: bar
Out[218]:
a b
0 1 3
1 2 4
A view is created as expected.
In[219]: bar['a'] = 100
In[220]: bar
Out[220]:
a b
0 100 3
1 100 4
In[221]: foo
Out[221]:
a b
0 100 3
1 100 4
2 3 5
If view is modified, so is the original dataframe foo. However, if the assignment is done with None, then a copy seems to be made. Can anyone shed some light on what's happening and maybe the logic behind?
In[222]: bar['a'] = None
In[223]: bar
Out[223]:
a b
0 None 3
1 None 4
In[224]: foo
Out[224]:
a b
0 100 3
1 100 4
2 3 5
When you assign bar['a'] = None
, you're forcing the column to change its dtype from, e.g., I4
to object
.
Doing so forces it to allocate a new array of object
for the column, and then of course it writes to that new array instead of to the old array that's shared with the original DataFrame
.
You are doing a form of chained assignment, see here why this is a really bad idea.
See this question as well here
Pandas will generally warn you that you are modifying a view (even more so in 0.15.0).
In [49]: foo = pd.DataFrame({'a':[1,2,3], 'b':[3,4,5]})
In [51]: foo
Out[51]:
a b
0 1 3
1 2 4
2 3 5
In [52]: bar = foo.ix[:1]
In [53]: bar
Out[53]:
a b
0 1 3
1 2 4
In [54]: bar.dtypes
Out[54]:
a int64
b int64
dtype: object
# this is an internal method (but is for illustration)
In [56]: bar._is_view
Out[56]: True
# this will warn in 0.15.0
In [57]: bar['a'] = 100
/usr/local/bin/ipython:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
#!/usr/local/bin/python
In [58]: bar._is_view
Out[58]: True
# bar is now a copied object (and will replace the existing dtypes with new ones).
In [59]: bar['a'] = None
In [60]: bar.dtypes
Out[60]:
a object
b int64
dtype: object
You should never rely on whether something is a view (even in numpy), except in certain very performant situations. It is not a guaranteed construct, depending on the memory layout of the underlying data.
You should very very very rarely try to set the data for propogation thru a view. and doing this in pandas
is almost always going to cause trouble, when you mixed dtypes. (In numpy you can only have a view on a single dtype; I am not even sure what a view on a multi-dtyped array which changes the dtype does, or if its even allowed).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With