Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In Pandas, does .iloc method give a copy or view?

I find the result is a little bit random. Sometimes it's a copy sometimes it's a view. For example:

df = pd.DataFrame([{'name':'Marry', 'age':21},{'name':'John','age':24}],index=['student1','student2'])  df               age   name    student1   21  Marry    student2   24   John 

Now, Let me try to modify it a little bit.

df2= df.loc['student1'] df2 [0] = 23 df               age   name    student1   21  Marry    student2   24   John 

As you can see, nothing changed. df2 is a copy. However, if I add another student into the dataframe...

df.loc['student3'] = ['old','Tom'] df                age   name     student1   21  Marry     student2   24   John     student3  old    Tom 

Try to change the age again..

df3=df.loc['student1'] df3[0]=33 df                age   name     student1   33  Marry     student2   24   John     student3  old    Tom 

Now df3 suddenly became a view. What is going on? I guess the value 'old' is the key?

like image 398
Qiyu Avatar asked Dec 25 '17 23:12

Qiyu


People also ask

Does LOC return a view or copy?

loc[mask] returns a new DataFrame with a copy of the data from df . Then df.

Does ILOC make a copy?

@Qiyu with multiple dtypes yes.

What does .iloc do in Python?

The iloc() function in python is defined in the Pandas module that helps us to select a specific row or column from the data set. Using the iloc method in python, we can easily retrieve any particular value from a row or column by using index values.

Does Pandas filter return a copy?

The key concepts that are connected to the SettingWithCopyWarning are views and copies. Some operations in pandas (and numpy as well) will return views of the original data, while other copies.


1 Answers

You are starting with a DataFrame that has two columns with two different dtypes:

df.dtypes Out:  age      int64 name    object dtype: object 

Since different dtypes are stored in different numpy arrays under the hood, you have two different blocks for them:

df.blocks  Out:  {'int64':           age  student1   21  student2   24, 'object':            name  student1  Marry  student2   John} 

If you attempt to slice the first row of this DataFrame, it has to get one value from each different block which makes it necessary to create a copy.

df2.is_copy Out[40]: <weakref at 0x7fc4487a9228; to 'DataFrame' at 0x7fc4488f9dd8> 

In the second attempt, you are changing the dtypes. Since 'old' cannot be stored in an integer array, it casts the Series as an object Series.

df.loc['student3'] = ['old','Tom']  df.dtypes Out:  age     object name    object dtype: object 

Now all data for this DataFrame is stored in a single block (and in a single numpy array):

df.blocks  Out:  {'object':           age   name  student1   21  Marry  student2   24   John  student3  old    Tom} 

At this step, slicing the first row can be done on the numpy array without creating a copy, so it returns a view.

df3._is_view Out: True 
like image 68
ayhan Avatar answered Sep 22 '22 16:09

ayhan