In Pandas, does .iloc method give a copy or view?

I find the result is a little bit random. Sometimes it's a copy sometimes it's a view. For example:

df = pd.DataFrame([{'name':'Marry', 'age':21},{'name':'John','age':24}],index=['student1','student2'])  df               age   name    student1   21  Marry    student2   24   John 

Now, Let me try to modify it a little bit.

df2= df.loc['student1'] df2 [0] = 23 df               age   name    student1   21  Marry    student2   24   John 

As you can see, nothing changed. df2 is a copy. However, if I add another student into the dataframe...

df.loc['student3'] = ['old','Tom'] df                age   name     student1   21  Marry     student2   24   John     student3  old    Tom 

Try to change the age again..

df3=df.loc['student1'] df3[0]=33 df                age   name     student1   33  Marry     student2   24   John     student3  old    Tom 

Now df3 suddenly became a view. What is going on? I guess the value 'old' is the key?

1 Answers

You are starting with a DataFrame that has two columns with two different dtypes:

df.dtypes Out:  age      int64 name    object dtype: object 

Since different dtypes are stored in different numpy arrays under the hood, you have two different blocks for them:

df.blocks  Out:  {'int64':           age  student1   21  student2   24, 'object':            name  student1  Marry  student2   John} 

If you attempt to slice the first row of this DataFrame, it has to get one value from each different block which makes it necessary to create a copy.

df2.is_copy Out[40]: <weakref at 0x7fc4487a9228; to 'DataFrame' at 0x7fc4488f9dd8> 

In the second attempt, you are changing the dtypes. Since 'old' cannot be stored in an integer array, it casts the Series as an object Series.

df.loc['student3'] = ['old','Tom']  df.dtypes Out:  age     object name    object dtype: object 

Now all data for this DataFrame is stored in a single block (and in a single numpy array):

df.blocks  Out:  {'object':           age   name  student1   21  Marry  student2   24   John  student3  old    Tom} 

At this step, slicing the first row can be done on the numpy array without creating a copy, so it returns a view.

df3._is_view Out: True 
