I am a bit suprised/confused about the following difference between numpy and Pandas <pre class="prettyprint"><code>import numpy as np import pandas as pd a = np.random.randn(10,10) > a[:3,0, newaxis] array([[-1.91687144], [-0.6399471 ], [-0.10005721]]) </code></pre> However: <pre class="prettyprint"><code>b = pd.DataFrame(a) > b.ix[:3,0] 0 -1.916871 1 -0.639947 2 -0.100057 3 0.251988 </code></pre> In other words, numpy does not include the <code>stop</code> index in <code>start:stop</code> notation, but Pandas does. I thought Pandas was based on Numpy. Is this a bug? Intentional?

This is documented, and it's part of Advanced Indexing. The key here is that you're not using a stop index at all. The <code>ix</code> attribute is a special thing that lets you do various kinds of advanced indexing by label—choosing a list of labels, using an inclusive range of labels instead of a half-exclusive range of indices, and various other things. If you don't want that, just don't use it: <pre class="prettyprint"><code>In [191]: b[:3][0] Out[191]: 0 -0.209386 1 0.050345 2 0.318414 Name: 0 </code></pre> If you play with this a bit more without reading the docs, you'll probably come up with a case where your labels are, say, <code>'A', 'B', 'C', 'D'</code> instead of <code>0, 1, 2, 3</code>, and suddenly, <code>b.ix[:3]</code> will returns only 3 rows instead of 4, and you'll be baffled all over again. The difference is that in that case, <code>b.ix[:3]</code> is a slice on indices, not on labels. What you've requested in your code is actually ambiguous between "all labels up to an including 3" and "all indices up to but not including 3", and labels always win with <code>ix</code> (because if you don't want label slicing, you don't have to use <code>ix</code> in the first place). And that's why I said the problem is that you're not using a stop index at all.

When the index type is integer, <code>DataFrame.ix</code> will use label-based indexing only. According to the document, label based slice will include start and stop. http://pandas.pydata.org/pandas-docs/dev/indexing.html#advanced-indexing-with-labels <blockquote> Slicing with labels is semantically slightly different because the slice start and stop are inclusive in the label-based case. Label-based indexing with integer axis labels is a thorny topic. It has been discussed heavily on mailing lists and among various members of the scientific Python community. In pandas, our general viewpoint is that labels matter more than integer locations. Therefore, with an integer axis index only label-based indexing is possible with the standard tools like .ix. The following code will generate exceptions </blockquote>

Start:stop slicing inconsistencies between numpy and Pandas?

Tags:

python

pandas

numpy

I am a bit suprised/confused about the following difference between numpy and Pandas

import numpy as np
import pandas as pd
a = np.random.randn(10,10)

> a[:3,0, newaxis]

array([[-1.91687144],
       [-0.6399471 ],
       [-0.10005721]])

However:

b = pd.DataFrame(a)

> b.ix[:3,0]

0   -1.916871
1   -0.639947
2   -0.100057
3    0.251988

In other words, numpy does not include the stop index in start:stop notation, but Pandas does. I thought Pandas was based on Numpy. Is this a bug? Intentional?

351

asked Feb 28 '13 01:02

Amelio Vazquez-Reina

3 Answers

This is documented, and it's part of Advanced Indexing. The key here is that you're not using a stop index at all.

The ix attribute is a special thing that lets you do various kinds of advanced indexing by label—choosing a list of labels, using an inclusive range of labels instead of a half-exclusive range of indices, and various other things.

If you don't want that, just don't use it:

In [191]: b[:3][0]
Out[191]: 
0   -0.209386
1    0.050345
2    0.318414
Name: 0

If you play with this a bit more without reading the docs, you'll probably come up with a case where your labels are, say, 'A', 'B', 'C', 'D' instead of 0, 1, 2, 3, and suddenly, b.ix[:3] will returns only 3 rows instead of 4, and you'll be baffled all over again.

The difference is that in that case, b.ix[:3] is a slice on indices, not on labels.

What you've requested in your code is actually ambiguous between "all labels up to an including 3" and "all indices up to but not including 3", and labels always win with ix (because if you don't want label slicing, you don't have to use ix in the first place). And that's why I said the problem is that you're not using a stop index at all.

161

answered Sep 30 '22 18:09

abarnert

When the index type is integer, DataFrame.ix will use label-based indexing only. According to the document, label based slice will include start and stop.

http://pandas.pydata.org/pandas-docs/dev/indexing.html#advanced-indexing-with-labels

Slicing with labels is semantically slightly different because the slice start and stop are inclusive in the label-based case.

Label-based indexing with integer axis labels is a thorny topic. It has been discussed heavily on mailing lists and among various members of the scientific Python community. In pandas, our general viewpoint is that labels matter more than integer locations. Therefore, with an integer axis index only label-based indexing is possible with the standard tools like .ix. The following code will generate exceptions

answered Sep 30 '22 18:09

HYRY

From (docs):

Slicing has standard Python semantics for integer slices

...

Slicing with labels is semantically slightly different because the slice start and stop are inclusive in the label-based case.

answered Sep 30 '22 17:09

wim

Related questions
                            
                                how to manage permission table in django admin
                            
                                pandas 3x3 scatter-matrix missing labels
                            
                                Why does App Engine show different float rounding results compared to my local machine?
                            
                                Python-C Api wrapper in Objective-C crashes with call to __getattr__ when passed a Python Object
                            
                                Grab a reference on the last thrown exception
                            
                                Functional python -- why does only one of these generators require list() to work?
                            
                                Set none if numpy array index does not exist
                            
                                How to redirect in real time STDOUT from imported module to Tkinter Text Widget in python?
                            
                                Python class decorator extending class causes recursion
                            
                                Assigning integers to strings
                            
                                Python faster than D?? IO operations seem to slow D down a lot... what's going on?
                            
                                Can a view configured for superclass be used if a view for a class was configured in Pyramid?
                            
                                Numpy loadtxt rounding off numbers
                            
                                how to find property and change value in QtQuick
                            
                                Overriding __or__ operator on python classes
                            
                                django-crispy-forms have field and button on same row
                            
                                Formatting numbers consistently in Python
                            
                                Stumped by one line of Python
                            
                                Wrapping lines in python: breaking on an argument string
                            
                                Running fabric scripts as root

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With