s.index=[0.0,1.1,2.2,3.3,4.4,5.5]
s.index
# Float64Index([0.0, 1.1, 2.2, 3.3, 4.4, 5.5], dtype='float64')
s
# 0.0 141.125
# 1.1 142.250
# 2.2 143.375
# 3.3 143.375
# 4.4 144.500
# 5.5 145.125
s.index=s.index.astype('float32')
# s.index
# Float64Index([ 0.0, 1.100000023841858, 2.200000047683716,
# 3.299999952316284, 4.400000095367432, 5.5],
# dtype='float64')
What's the intuition behind floating point indices? Struggling to understand when we would use them instead of int indices (it seems like you can have three types of indices: int64, float64, or object, e.g. s.index=['a','b','c','d','e','f']
).
From the code above, it also looks like Pandas really wants float indices to be in 64-bit, as these 64-bit floats are getting cast to 32-bit floats and then back to 64-bit floats, with the dtype
of the index remaining 'float64'
.
How do people use float indicies?
Is the idea that you might have some statistical calculation over data and want to rank on the result of it, but those results may be floats? And we want to force float64
to avoid losing resolution?
Index is like an address, that's how any data point across the dataframe or series can be accessed. Rows and columns both have indexes, rows indices are called as index and for columns its general column names. Pandas have three data structures dataframe, series & panel.
Pandas Indexing: Series. A Series is a one-dimensional array of data. It can hold data of any type: string, integer, float, dictionaries, lists, booleans, and more.
The Python index() method helps you find the index position of an element or an item in a string of characters or a list of items. It spits out the lowest possible index of the specified element in the list. In case the specified item does not exist in the list, a ValueError is returned.
Float64Index is a special case of Index with purely float labels. . Deprecated since version 1.4. 0: In pandas v2. 0 Float64Index will be removed and NumericIndex used instead. Float64Index will remain fully functional for the duration of pandas 1.
Float indices are generally useless for label-based indexing, because of general floating point restrictions. Of course, pd.Float64Index
is there in the API for completeness but that doesn't always mean you should use it. Jeff (core library contributor) has this to say on github:
[...] It is rarely necessary to actually use a float index; you are often better off served by using a column. The point of the index is to make individual elements faster, e.g. df[1.0], but this is quite tricky; this is the reason for having an issue about this.
The tricky part there being 1.0 == 1.0
isn't always true, depending on how you represent that 1.0
in bits.
Floating indices are useful in a few situations (as cited in the github issue), mainly for recording temporal axis (time), or extremely minute/accurate measurements in, for example, astronomical data. For most other cases there's pd.cut
or pd.qcut
for binning your data because working with categorical data is usually easier than continuous data.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With