DataFrame.ix() does not seem to slice the DataFrame that I want when negative indexing is used.
I have a DataFrame object and want to slice the last 2 rows.
In [90]: df = pd.DataFrame(np.random.randn(10, 4))
In [91]: df
Out[91]:
0 1 2 3
0 1.985922 0.664665 -2.800102 1.695480
1 0.580509 0.782473 1.032970 1.559917
2 0.584387 1.798743 0.095950 0.071999
3 1.956221 0.075530 -0.391008 1.692585
4 -0.644979 -1.959265 0.749394 -0.437995
5 -1.204964 0.653912 -1.426602 2.409855
6 1.178886 2.177259 -0.165106 1.145952
7 1.410595 -0.761426 -1.280866 0.609122
8 0.110534 -0.234781 -0.819976 0.252080
9 1.798894 0.553394 -1.358335 1.278704
One way to do it:
In [92]: df[-2:]
Out[92]:
0 1 2 3
8 0.110534 -0.234781 -0.819976 0.252080
9 1.798894 0.553394 -1.358335 1.278704
Anther way to do it:
In [93]: df.ix[len(df)-2:, :]
Out[93]:
0 1 2 3
8 0.110534 -0.234781 -0.819976 0.252080
9 1.798894 0.553394 -1.358335 1.278704
Now I want to use negative indexing, but having problem:
In [94]: df.ix[-2:, :]
Out[94]:
0 1 2 3
0 1.985922 0.664665 -2.800102 1.695480
1 0.580509 0.782473 1.032970 1.559917
2 0.584387 1.798743 0.095950 0.071999
3 1.956221 0.075530 -0.391008 1.692585
4 -0.644979 -1.959265 0.749394 -0.437995
5 -1.204964 0.653912 -1.426602 2.409855
6 1.178886 2.177259 -0.165106 1.145952
7 1.410595 -0.761426 -1.280866 0.609122
8 0.110534 -0.234781 -0.819976 0.252080
9 1.798894 0.553394 -1.358335 1.278704
How do I use negative indexing with DataFrame.ix() correctly? Thanks.
This is a bug:
In [1]: df = pd.DataFrame(np.random.randn(10, 4))
In [2]: df
Out[2]:
0 1 2 3
0 -3.100926 -0.580586 -1.216032 0.425951
1 -0.264271 -1.091915 -0.602675 0.099971
2 -0.846290 1.363663 -0.382874 0.065783
3 -0.099879 -0.679027 -0.708940 0.138728
4 -0.302597 0.753350 -0.112674 -1.253316
5 -0.213237 -0.467802 0.037350 0.369167
6 0.754915 -0.569134 -0.297824 -0.600527
7 0.644742 0.038862 0.216869 0.294149
8 0.101684 0.784329 0.218221 0.965897
9 -1.482837 -1.325625 1.008795 -0.150439
In [3]: df.ix[-2:]
Out[3]:
0 1 2 3
0 -3.100926 -0.580586 -1.216032 0.425951
1 -0.264271 -1.091915 -0.602675 0.099971
2 -0.846290 1.363663 -0.382874 0.065783
3 -0.099879 -0.679027 -0.708940 0.138728
4 -0.302597 0.753350 -0.112674 -1.253316
5 -0.213237 -0.467802 0.037350 0.369167
6 0.754915 -0.569134 -0.297824 -0.600527
7 0.644742 0.038862 0.216869 0.294149
8 0.101684 0.784329 0.218221 0.965897
9 -1.482837 -1.325625 1.008795 -0.150439
https://github.com/pydata/pandas/issues/2600
Note that df[-2:]
will work:
In [4]: df[-2:]
Out[4]:
0 1 2 3
8 0.101684 0.784329 0.218221 0.965897
9 -1.482837 -1.325625 1.008795 -0.150439
ix
's main purpose is to allow numpy like indexing with support for row and column labels. So I'm not sure your use-case is the intended purpose. Here are a couple of ways I can think of, mostly trivial:
In [142]: df.ix[:][-2:]
Out[142]:
0 1 2 3
8 0.386882 -0.836112 -0.108250 -0.433797
9 0.642468 -0.399255 -0.911456 -0.497720
In [161]: df.ix[df.index[-2:],:]
Out[161]:
0 1 2 3
8 0.386882 -0.836112 -0.108250 -0.433797
9 0.642468 -0.399255 -0.911456 -0.497720
I don't think ix
supports negative indexing at all. It seems to just ignore it altogether:
In [181]: df.ix[-100:,:]
Out[181]:
0 1 2 3
0 -1.144137 -1.042034 -2.158838 0.674055
1 -0.424184 1.237318 -1.846130 0.575357
2 -0.844974 -0.541060 2.197364 -0.031898
3 0.846263 1.244450 -1.570566 -0.477919
4 -0.193445 0.171045 -0.235587 -1.185583
5 1.361539 -1.107389 -1.321081 -0.776407
6 0.505907 -1.364414 -2.093770 0.144016
7 -0.888465 -0.329153 0.491264 -0.363472
8 0.386882 -0.836112 -0.108250 -0.433797
9 0.642468 -0.399255 -0.911456 -0.497720
Edit: From the pandas documentation we have:
Label-based indexing with integer axis labels is a thorny topic. It has been discussed heavily on mailing lists and among various members of the scientific Python community. In pandas, our general viewpoint is that labels matter more than integer locations. Therefore, with an integer axis index only label-based indexing is possible with the standard tools like .ix. The following code will generate exceptions:
s = Series(range(5)) s[-1] df = DataFrame(np.random.randn(5, 4)) df df.ix[-2:]
This deliberate decision was made to prevent ambiguities and subtle bugs (many users reported finding bugs when the API change was made to stop “falling back” on position-based indexing).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With