Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to work with data indexed by floats in pandas

Tags:

python

pandas

I use pandas DataFrame with hierarhical index, and in one particular case it is indexed by float values.

Here is example:

example_data = [
    {'a': 1.2, 'b':30, 'v':123},
    {'a': 1.2, 'b':60, 'v':1234},
    {'a': 3, 'b':30, 'v':12345},
    {'a': 3, 'b':60, 'v':123456},
]
frame = pd.DataFrame(example_data)
frame.set_index(['a', 'b'])

Now I'd like to use partial indexing to select frame with a==1.2 and then display it. Documentation shows how to do this for string index, but this approach obviously doesn't work for floats, irrevelant whether I try frame.loc[1.2] i get error about 1.2 being imporper for Int64Index which is obviously true since i use float for indexing.

Is there any way to work with float index in pandas? How can I fix my Hierarhical Index?

Actual error message was:

TypeError: the label [1.2] is not a proper indexer for this index type (Int64Index)
like image 814
jb. Avatar asked Jul 04 '14 15:07

jb.


People also ask

How do pandas use floats?

pandas Convert String to FloatUse pandas DataFrame. astype() function to convert column from string/int to float, you can apply this on a specific column or on an entire DataFrame. To cast the data type to 54-bit signed float, you can use numpy. float64 , numpy.

Can you index a float in Python?

In Python 3, division results in a quotient of type, float . However, above we are subscripting a list which requires integer index references. Floats cannot be used as indices.

How is the dataset indexed in pandas?

Definition and Usage. The index property returns the index information of the DataFrame. The index information contains the labels of the rows. If the rows has NOT named indexes, the index property returns a RangeIndex object with the start, stop, and step values.

How do you access the index of a pandas series?

In order to access the series element refers to the index number. Use the index operator [ ] to access an element in a series. The index must be an integer. In order to access multiple elements from a series, we use Slice operation.


1 Answers

Pandas has no issue if the index level is a single level so not a multi index:

In [178]:

frame = frame.set_index(['a'])
frame.loc[1.2]
Out[178]:
      b     v
a            
1.2  30   123
1.2  60  1234

If you do have a multi-index then you can get generate a mask using the index level 0 (the first) and use this to select the values:

In [180]:

mask = frame.index.get_level_values(0)
frame.loc[mask == 1.2]
Out[180]:
           v
a   b       
1.2 30   123
    60  1234

The mask itself contains all the level 0 values for each row:

In [181]:

mask
Out[181]:
Float64Index([1.2, 1.2, 3.0, 3.0], dtype='float64')

It is better and more explicit to specify the level using the name:

mask = frame.index.get_level_values('a')
like image 172
EdChum Avatar answered Oct 04 '22 22:10

EdChum