What is the purpose of floating point index in Pandas?

Tags:

s.index=[0.0,1.1,2.2,3.3,4.4,5.5]
s.index
# Float64Index([0.0, 1.1, 2.2, 3.3, 4.4, 5.5], dtype='float64')
s
# 0.0    141.125
# 1.1    142.250
# 2.2    143.375
# 3.3    143.375
# 4.4    144.500
# 5.5    145.125
s.index=s.index.astype('float32')
# s.index
# Float64Index([              0.0, 1.100000023841858, 2.200000047683716,
#               3.299999952316284, 4.400000095367432,               5.5],
#              dtype='float64')

What's the intuition behind floating point indices? Struggling to understand when we would use them instead of int indices (it seems like you can have three types of indices: int64, float64, or object, e.g. s.index=['a','b','c','d','e','f']).

From the code above, it also looks like Pandas really wants float indices to be in 64-bit, as these 64-bit floats are getting cast to 32-bit floats and then back to 64-bit floats, with the dtype of the index remaining 'float64'.

How do people use float indicies?

Is the idea that you might have some statistical calculation over data and want to rank on the result of it, but those results may be floats? And we want to force float64 to avoid losing resolution?

722

asked Jun 14 '20 06:06

phoenixdown

1 Answers

Float indices are generally useless for label-based indexing, because of general floating point restrictions. Of course, pd.Float64Index is there in the API for completeness but that doesn't always mean you should use it. Jeff (core library contributor) has this to say on github:

[...] It is rarely necessary to actually use a float index; you are often better off served by using a column. The point of the index is to make individual elements faster, e.g. df[1.0], but this is quite tricky; this is the reason for having an issue about this.

The tricky part there being 1.0 == 1.0 isn't always true, depending on how you represent that 1.0 in bits.

Floating indices are useful in a few situations (as cited in the github issue), mainly for recording temporal axis (time), or extremely minute/accurate measurements in, for example, astronomical data. For most other cases there's pd.cut or pd.qcut for binning your data because working with categorical data is usually easier than continuous data.

answered Oct 20 '22 16:10

cs95

Related questions
                            
                                How to emulate file opened in text mode in Python
                            
                                Nbconvert doesn't display styler dataframe from jupyter notebook
                            
                                Condition statement without loops
                            
                                Do separate Anaconda environments install the same package twice, taking up twice the storage?
                            
                                Python - define constant inside function
                            
                                Comma operator precedence
                            
                                Error: class uri 'eventlet' invalid or not found
                            
                                dtypes muck things up when shifting on axis one (columns)
                            
                                merge two dataframes and add column level with names
                            
                                Colab finishes with a ^C
                            
                                Django server stops immediatly after login into admin page
                            
                                Connected components from an adjacency matrix using Numpy or Scipy
                            
                                What is the standard exception for a missing value in python?
                            
                                Pandas Explode on Multiple columns
                            
                                Get probability of multi-token word in MASK position
                            
                                How to get quick documentation working with PyCharm and Pytorch
                            
                                Django ORM: Equivalent of SQL `NOT IN`? `exclude` and `Q` objects do not work
                            
                                How to create an OpenAPI schema for an UploadFile in FastAPI?
                            
                                running a package pytest with poetry
                            
                                Open database files (.db) using python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is the purpose of floating point index in Pandas?

Tags:

python

floating-point

pandas

phoenixdown

People also ask

1 Answers

cs95

Recent Activity

Donate For Us