Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the purpose of floating point index in Pandas?

s.index=[0.0,1.1,2.2,3.3,4.4,5.5]
s.index
# Float64Index([0.0, 1.1, 2.2, 3.3, 4.4, 5.5], dtype='float64')
s
# 0.0    141.125
# 1.1    142.250
# 2.2    143.375
# 3.3    143.375
# 4.4    144.500
# 5.5    145.125
s.index=s.index.astype('float32')
# s.index
# Float64Index([              0.0, 1.100000023841858, 2.200000047683716,
#               3.299999952316284, 4.400000095367432,               5.5],
#              dtype='float64')

What's the intuition behind floating point indices? Struggling to understand when we would use them instead of int indices (it seems like you can have three types of indices: int64, float64, or object, e.g. s.index=['a','b','c','d','e','f']).

From the code above, it also looks like Pandas really wants float indices to be in 64-bit, as these 64-bit floats are getting cast to 32-bit floats and then back to 64-bit floats, with the dtype of the index remaining 'float64'.

How do people use float indicies?

Is the idea that you might have some statistical calculation over data and want to rank on the result of it, but those results may be floats? And we want to force float64 to avoid losing resolution?

like image 722
phoenixdown Avatar asked Jun 14 '20 06:06

phoenixdown


People also ask

What is the point of pandas index?

Index is like an address, that's how any data point across the dataframe or series can be accessed. Rows and columns both have indexes, rows indices are called as index and for columns its general column names. Pandas have three data structures dataframe, series & panel.

Can pandas index Be float?

Pandas Indexing: Series. A Series is a one-dimensional array of data. It can hold data of any type: string, integer, float, dictionaries, lists, booleans, and more.

Why do we need index in Python?

The Python index() method helps you find the index position of an element or an item in a string of characters or a list of items. It spits out the lowest possible index of the specified element in the list. In case the specified item does not exist in the list, a ValueError is returned.

What is Float64Index?

Float64Index is a special case of Index with purely float labels. . Deprecated since version 1.4. 0: In pandas v2. 0 Float64Index will be removed and NumericIndex used instead. Float64Index will remain fully functional for the duration of pandas 1.


1 Answers

Float indices are generally useless for label-based indexing, because of general floating point restrictions. Of course, pd.Float64Index is there in the API for completeness but that doesn't always mean you should use it. Jeff (core library contributor) has this to say on github:

[...] It is rarely necessary to actually use a float index; you are often better off served by using a column. The point of the index is to make individual elements faster, e.g. df[1.0], but this is quite tricky; this is the reason for having an issue about this.

The tricky part there being 1.0 == 1.0 isn't always true, depending on how you represent that 1.0 in bits.

Floating indices are useful in a few situations (as cited in the github issue), mainly for recording temporal axis (time), or extremely minute/accurate measurements in, for example, astronomical data. For most other cases there's pd.cut or pd.qcut for binning your data because working with categorical data is usually easier than continuous data.

like image 86
cs95 Avatar answered Oct 20 '22 16:10

cs95