Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas: selecting array of index labels with .loc

Tags:

python

pandas

Consider this dataFrame:

df = pd.DataFrame({u'A': {2.0: 2.2,
  7.0: 1.4,
  8.0: 1.4,
  9.0: 2.2},  u'B': {2.0: 7.2,
  7.0: 6.3,
  8.0: 4.4,
  9.0: 5.0}})

Which looks like this:

      A       B
2    2.2     7.2
7    1.4     6.3
8    1.4     4.4
9    2.2     5.0

I'd like to get indices with label 2and 7 (numbers, not strings)

df.loc[[2, 7]]

gives an error!

IndexError: indices are out-of-bounds

However, df.loc[7] and df.loc[2] work fine and as expected. Also, if I define the dataframe index with strings instead of numbers:

df2 = pd.DataFrame({u'A': {'2': 2.2,
  '7': 1.4,
  '8': 1.4,
  '9': 2.2},
 u'B': {'2': 7.2,
  '7': 6.3,
  '8': 4.4,
  '9': 5.0}})

df2.loc[['2', '8']]

it works fine.

This is not the behavior I expected from df.loc (is it a bug or just a gotcha?) Can I pass an array of numbers as label indices and not just positions?

I can convert all indices to strings and then operate with .loc but it would be very inconvenient for the rest of my code.

Thanks for your time!

like image 351
cd98 Avatar asked Nov 07 '13 19:11

cd98


People also ask

What does .LOC do in pandas?

Pandas provide a unique method to retrieve rows from a Data frame. DataFrame. loc[] method is a method that takes only index labels and returns row or dataframe if the index label exists in the caller data frame.

Can you use ILOC on an array?

.iloc[] is primarily integer position based (from 0 to length-1 of the axis), but may also be used with a boolean array.

How do I select multiple ranges DF ILOC?

By using df[], loc[], iloc[] and get() you can select multiple columns from pandas DataFrame.

How do I select a specific index in pandas?

Select Rows & Columns by Name or Index in Pandas DataFrame using [ ], loc & iloc. Indexing in Pandas means selecting rows and columns of data from a Dataframe.

How to select data from a pandas Dataframe by label?

The Pandas loc method enables you to select data from a Pandas DataFrame by label. It allows you to “ loc ate” data in a DataFrame. That’s where we get the name loc [].

What is indexing and selecting data in pandas?

Indexing and Selecting Data with Pandas. Indexing in Pandas : Indexing in pandas means simply selecting particular rows and columns of data from a DataFrame. Indexing could mean selecting all the rows and some of the columns, some of the rows and all of the columns, or some of each of the rows and columns.

How to select rows&columns by name or index in pandas?

Select Rows & Columns by Name or Index in Pandas DataFrame using [ ], loc & iloc. Indexing in Pandas means selecting rows and columns of data from a Dataframe. It can be selecting all the rows and the particular number of columns, a particular number of rows, and all the columns or a particular number of rows and columns each.

What is label based indexing in pandas?

pandas provides a suite of methods in order to have purely label based indexing. This is a strict inclusion based protocol. Every label asked for must be in the index, or a KeyError will be raised. When slicing, both the start bound AND the stop bound are included, if present in the index.


1 Answers

This is a bug in 0.12. Version 0.13 fixes this (IOW, label selection, whether number or string should work when you pass a list).

You could do this (uses an internal method though):

In [10]: df.iloc[df.index.get_indexer([2,7])]
Out[10]: 
     A    B
2  2.2  7.2
7  1.4  6.3
like image 60
Jeff Avatar answered Nov 03 '22 09:11

Jeff