On the float_precision argument to pandas.read_csv

Tags:

The documentation for the argument in this post's title says:

float_precision : string, default None

Specifies which converter the C engine should use for floating-point values. The options are None for the ordinary converter, high for the high-precision converter, and round_trip for the round-trip converter.

I'd like to learn more about the three algorithms mentioned, preferably without having to dig into the source code¹.

Q: Do these algorithms have names I can Google for to learn exactly what they do and how they differ?

(Also, one side question: what exactly is "the C engine" in this context? Is that a Pandas-specific thing, or a Python-wide thing? None of the above?)

^{¹ Not being familiar with the code base in question, I expect it would take me a long time just to locate the relevant source code. But even assuming I manage to find it, my experience with this sort of algorithm is that their implementations are so highly optimized, and at such a low level, that without some high-level description it is really difficult, at least for me, to follow what's going on.}

491

asked Jun 22 '17 11:06

kjo

1 Answers

You asked about the actual algorithms - the closest I can find is: https://github.com/pandas-dev/pandas/blob/master/pandas/_libs/parsers.pyx#L492

This is taken from a related answer, kudos to MaxU (Understanding pandas.read_csv() float parsing)

Ordinary: double_converter_nogil = xstrtod
High: double_converter_nogil = precise_xstrtod
Round-Trip: double_converter_withgil = round_trip

From here, you're in C-land. You also asked why pandas uses C - critical code paths are written in Cython or C.

169

answered Sep 24 '22 16:09

MisterJT

Related questions
                            
                                How to denote return type tuple in Google-style Pydoc for Pycharm?
                            
                                Xgboost: what is the difference among bst.best_score, bst.best_iteration and bst.best_ntree_limit?
                            
                                How to return selenium browser (or how to import a def that return selenium browser)
                            
                                How can I speed up this Keras Attention computation?
                            
                                Why does TensorFlow always use GPU 0?
                            
                                Is double-checked locking thread-safe in Python?
                            
                                what does pip install actually do?
                            
                                Is there a python linter that checks types according to type hints?
                            
                                ast.literal_eval() support for set literals in Python 2.7?
                            
                                Efficient structure for element wise access to very large sparse matrix (Python/Cython)
                            
                                Javascript array with default values (equivalent of Python's defaultdict)? [duplicate]
                            
                                Gtk3 replace child widget with another widget
                            
                                Why is `pandas.read_csv` not the reciprocal of `pandas.DataFrame.to_csv`?
                            
                                How to get R-squared for robust regression (RLM) in Statsmodels?
                            
                                Plotting at full resolution with matplotlib.pyplot, imshow() and savefig()?
                            
                                Interchange location of y and z axis in 3D matplotlib plot
                            
                                Changing subclassed `ndarray` view in-place
                            
                                Sample code for listing a FixedPriceItem with ebay
                            
                                Make Pylint care about blank lines
                            
                                Celery connection drop with AWS ELB and RabbitMQ

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

On the float_precision argument to pandas.read_csv

Tags:

python

algorithm

floating-point

pandas

ieee-754

kjo

People also ask

1 Answers

MisterJT

Recent Activity

Donate For Us