Understanding pandas dataframe indexing

Tags:

Summary: This doesn't work:

df[df.key==1]['D'] = 1

but this does:

df.D[df.key==1] = 1

Why?

Reproduction:

In [1]: import pandas as pd

In [2]: from numpy.random import randn

In [4]: df = pd.DataFrame(randn(6,3),columns=list('ABC'))

In [5]: df
Out[5]: 
          A         B         C
0  1.438161 -0.210454 -1.983704
1 -0.283780 -0.371773  0.017580
2  0.552564 -0.610548  0.257276
3  1.931332  0.649179 -1.349062
4  1.656010 -1.373263  1.333079
5  0.944862 -0.657849  1.526811

In [6]: df['D']=0.0

In [7]: df['key']=3*[1]+3*[2]

In [8]: df
Out[8]: 
          A         B         C  D  key
0  1.438161 -0.210454 -1.983704  0    1
1 -0.283780 -0.371773  0.017580  0    1
2  0.552564 -0.610548  0.257276  0    1
3  1.931332  0.649179 -1.349062  0    2
4  1.656010 -1.373263  1.333079  0    2
5  0.944862 -0.657849  1.526811  0    2

This doesn't work:

In [9]: df[df.key==1]['D'] = 1

In [10]: df
Out[10]: 
          A         B         C  D  key
0  1.438161 -0.210454 -1.983704  0    1
1 -0.283780 -0.371773  0.017580  0    1
2  0.552564 -0.610548  0.257276  0    1
3  1.931332  0.649179 -1.349062  0    2
4  1.656010 -1.373263  1.333079  0    2
5  0.944862 -0.657849  1.526811  0    2

but this does:

In [11]: df.D[df.key==1] = 3.4

In [12]: df
Out[12]: 
          A         B         C    D  key
0  1.438161 -0.210454 -1.983704  3.4    1
1 -0.283780 -0.371773  0.017580  3.4    1
2  0.552564 -0.610548  0.257276  3.4    1
3  1.931332  0.649179 -1.349062  0.0    2
4  1.656010 -1.373263  1.333079  0.0    2
5  0.944862 -0.657849  1.526811  0.0    2

Link to notebook

My question is:

Why does only the 2nd way work? I can't seem to see a difference in selection/indexing logic.

Version is 0.10.0

Edit: This should not be done like this anymore. Since version 0.11, there is .loc . See here: http://pandas.pydata.org/pandas-docs/stable/indexing.html

690

asked Jan 07 '13 09:01

K.-Michael Aye

1 Answers

The pandas documentation says:

Returning a view versus a copy

The rules about when a view on the data is returned are entirely dependent on NumPy. Whenever an array of labels or a boolean vector are involved in the indexing operation, the result will be a copy. With single label / scalar indexing and slicing, e.g. df.ix[3:6] or df.ix[:, 'A'], a view will be returned.

In df[df.key==1]['D'] you first do boolean slicing (leading to a copy of the Dataframe), then you choose a column ['D'].

In df.D[df.key==1] = 3.4, you first choose a column, then do boolean slicing on the resulting Series.

This seems to make the difference, although I must admit that it is a little counterintuitive.

Edit: The difference was identified by Dougal, see his comment: With version 1, the copy is made as the __getitem__ method is called for the boolean slicing. For version 2, only the __setitem__ method is accessed - thus not returning a copy but just assigning.

150

answered Oct 15 '22 07:10

Thorsten Kranz

Related questions
                            
                                How can one shorten mongo ids for better use in URLs?
                            
                                Silent printing of a PDF in Python
                            
                                Getter with side effect
                            
                                list of blog engines written in python [closed]
                            
                                In Python, how to change text after it's printed?
                            
                                codility absolute distinct count from an array
                            
                                Python Cut Example
                            
                                How does extending classes (Monkey Patching) work in Python?
                            
                                How to use the win32gui module with Python?
                            
                                get the DST boundaries of a given timezone in python
                            
                                Filter directory when using shutil.copytree?
                            
                                How to trigger authenticated Jenkins job with file parameter using standard Python library
                            
                                Identify contiguous regions in 2D numpy array
                            
                                How can I open UTF-16 files on Python 2.x?
                            
                                Accessing class variables via instance
                            
                                use slugify in template
                            
                                Python multiprocessing keyword arguments
                            
                                Check if a directory exists in a zip file with Python
                            
                                Handling directories with spaces Python subprocess.call()
                            
                                Python: How to check if a string is a valid IRI?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Understanding pandas dataframe indexing

Tags:

python

pandas

dataframe

K.-Michael Aye

People also ask

1 Answers

Thorsten Kranz

Recent Activity

Donate For Us