Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

slicing pandas DataFrame with negative index with ix() method

DataFrame.ix() does not seem to slice the DataFrame that I want when negative indexing is used.

I have a DataFrame object and want to slice the last 2 rows.

    In [90]: df = pd.DataFrame(np.random.randn(10, 4))

    In [91]: df
    Out[91]: 
            0         1         2         3
    0  1.985922  0.664665 -2.800102  1.695480
    1  0.580509  0.782473  1.032970  1.559917
    2  0.584387  1.798743  0.095950  0.071999
    3  1.956221  0.075530 -0.391008  1.692585
    4 -0.644979 -1.959265  0.749394 -0.437995
    5 -1.204964  0.653912 -1.426602  2.409855
    6  1.178886  2.177259 -0.165106  1.145952
    7  1.410595 -0.761426 -1.280866  0.609122
    8  0.110534 -0.234781 -0.819976  0.252080
    9  1.798894  0.553394 -1.358335  1.278704

One way to do it:

    In [92]: df[-2:]
    Out[92]: 
              0         1         2         3
    8  0.110534 -0.234781 -0.819976  0.252080
    9  1.798894  0.553394 -1.358335  1.278704

Anther way to do it:

    In [93]: df.ix[len(df)-2:, :]
    Out[93]: 
              0         1         2         3
    8  0.110534 -0.234781 -0.819976  0.252080
    9  1.798894  0.553394 -1.358335  1.278704

Now I want to use negative indexing, but having problem:

    In [94]: df.ix[-2:, :]
    Out[94]: 
              0         1         2         3
    0  1.985922  0.664665 -2.800102  1.695480
    1  0.580509  0.782473  1.032970  1.559917
    2  0.584387  1.798743  0.095950  0.071999
    3  1.956221  0.075530 -0.391008  1.692585
    4 -0.644979 -1.959265  0.749394 -0.437995
    5 -1.204964  0.653912 -1.426602  2.409855
    6  1.178886  2.177259 -0.165106  1.145952
    7  1.410595 -0.761426 -1.280866  0.609122
    8  0.110534 -0.234781 -0.819976  0.252080
    9  1.798894  0.553394 -1.358335  1.278704

How do I use negative indexing with DataFrame.ix() correctly? Thanks.

like image 201
Julia He Avatar asked Dec 26 '12 03:12

Julia He


2 Answers

This is a bug:

In [1]: df = pd.DataFrame(np.random.randn(10, 4))

In [2]: df
Out[2]: 
          0         1         2         3
0 -3.100926 -0.580586 -1.216032  0.425951
1 -0.264271 -1.091915 -0.602675  0.099971
2 -0.846290  1.363663 -0.382874  0.065783
3 -0.099879 -0.679027 -0.708940  0.138728
4 -0.302597  0.753350 -0.112674 -1.253316
5 -0.213237 -0.467802  0.037350  0.369167
6  0.754915 -0.569134 -0.297824 -0.600527
7  0.644742  0.038862  0.216869  0.294149
8  0.101684  0.784329  0.218221  0.965897
9 -1.482837 -1.325625  1.008795 -0.150439

In [3]: df.ix[-2:]
Out[3]: 
          0         1         2         3
0 -3.100926 -0.580586 -1.216032  0.425951
1 -0.264271 -1.091915 -0.602675  0.099971
2 -0.846290  1.363663 -0.382874  0.065783
3 -0.099879 -0.679027 -0.708940  0.138728
4 -0.302597  0.753350 -0.112674 -1.253316
5 -0.213237 -0.467802  0.037350  0.369167
6  0.754915 -0.569134 -0.297824 -0.600527
7  0.644742  0.038862  0.216869  0.294149
8  0.101684  0.784329  0.218221  0.965897
9 -1.482837 -1.325625  1.008795 -0.150439

https://github.com/pydata/pandas/issues/2600

Note that df[-2:] will work:

In [4]: df[-2:]
Out[4]: 
          0         1         2         3
8  0.101684  0.784329  0.218221  0.965897
9 -1.482837 -1.325625  1.008795 -0.150439
like image 73
Wes McKinney Avatar answered Sep 28 '22 04:09

Wes McKinney


ix's main purpose is to allow numpy like indexing with support for row and column labels. So I'm not sure your use-case is the intended purpose. Here are a couple of ways I can think of, mostly trivial:

In [142]: df.ix[:][-2:]
Out[142]:
          0         1         2         3
8  0.386882 -0.836112 -0.108250 -0.433797
9  0.642468 -0.399255 -0.911456 -0.497720

In [161]: df.ix[df.index[-2:],:]
Out[161]:
          0         1         2         3
8  0.386882 -0.836112 -0.108250 -0.433797
9  0.642468 -0.399255 -0.911456 -0.497720

I don't think ix supports negative indexing at all. It seems to just ignore it altogether:

In [181]: df.ix[-100:,:]
Out[181]:
          0         1         2         3
0 -1.144137 -1.042034 -2.158838  0.674055
1 -0.424184  1.237318 -1.846130  0.575357
2 -0.844974 -0.541060  2.197364 -0.031898
3  0.846263  1.244450 -1.570566 -0.477919
4 -0.193445  0.171045 -0.235587 -1.185583
5  1.361539 -1.107389 -1.321081 -0.776407
6  0.505907 -1.364414 -2.093770  0.144016
7 -0.888465 -0.329153  0.491264 -0.363472
8  0.386882 -0.836112 -0.108250 -0.433797
9  0.642468 -0.399255 -0.911456 -0.497720

Edit: From the pandas documentation we have:

Label-based indexing with integer axis labels is a thorny topic. It has been discussed heavily on mailing lists and among various members of the scientific Python community. In pandas, our general viewpoint is that labels matter more than integer locations. Therefore, with an integer axis index only label-based indexing is possible with the standard tools like .ix. The following code will generate exceptions:

s = Series(range(5))
s[-1]
df = DataFrame(np.random.randn(5, 4))
df
df.ix[-2:]

This deliberate decision was made to prevent ambiguities and subtle bugs (many users reported finding bugs when the API change was made to stop “falling back” on position-based indexing).

like image 40
Zelazny7 Avatar answered Sep 28 '22 04:09

Zelazny7