Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does a copy get created when assigned with None?

In[216]: foo = pd.DataFrame({'a':[1,2,3], 'b':[3,4,5]})
In[217]: bar = foo.ix[:1]
In[218]: bar
Out[218]: 
   a  b
0  1  3
1  2  4

A view is created as expected.

In[219]: bar['a'] = 100
In[220]: bar
Out[220]: 
     a  b
0  100  3
1  100  4
In[221]: foo
Out[221]: 
     a  b
0  100  3
1  100  4
2    3  5

If view is modified, so is the original dataframe foo. However, if the assignment is done with None, then a copy seems to be made. Can anyone shed some light on what's happening and maybe the logic behind?

In[222]: bar['a'] = None
In[223]: bar
Out[223]: 
      a  b
0  None  3
1  None  4
In[224]: foo
Out[224]: 
     a  b
0  100  3
1  100  4
2    3  5
like image 528
Anthony Avatar asked Sep 04 '14 17:09

Anthony


2 Answers

When you assign bar['a'] = None, you're forcing the column to change its dtype from, e.g., I4 to object.

Doing so forces it to allocate a new array of object for the column, and then of course it writes to that new array instead of to the old array that's shared with the original DataFrame.

like image 70
abarnert Avatar answered Sep 19 '22 11:09

abarnert


You are doing a form of chained assignment, see here why this is a really bad idea.

See this question as well here

Pandas will generally warn you that you are modifying a view (even more so in 0.15.0).

In [49]: foo = pd.DataFrame({'a':[1,2,3], 'b':[3,4,5]})

In [51]: foo
Out[51]: 
   a  b
0  1  3
1  2  4
2  3  5

In [52]: bar = foo.ix[:1]

In [53]: bar
Out[53]: 
   a  b
0  1  3
1  2  4

In [54]: bar.dtypes
Out[54]: 
a    int64
b    int64
dtype: object

# this is an internal method (but is for illustration)
In [56]: bar._is_view
Out[56]: True

# this will warn in 0.15.0
In [57]: bar['a'] = 100
/usr/local/bin/ipython:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  #!/usr/local/bin/python

In [58]: bar._is_view
Out[58]: True

# bar is now a copied object (and will replace the existing dtypes with new ones).
In [59]: bar['a'] = None

In [60]: bar.dtypes
Out[60]: 
a    object
b     int64
dtype: object

You should never rely on whether something is a view (even in numpy), except in certain very performant situations. It is not a guaranteed construct, depending on the memory layout of the underlying data.

You should very very very rarely try to set the data for propogation thru a view. and doing this in pandas is almost always going to cause trouble, when you mixed dtypes. (In numpy you can only have a view on a single dtype; I am not even sure what a view on a multi-dtyped array which changes the dtype does, or if its even allowed).

like image 24
Jeff Avatar answered Sep 21 '22 11:09

Jeff