The documentation for the argument in this post's title says:
float_precision : string, default None
Specifies which converter the C engine should use for floating-point values. The options are None for the ordinary converter, high for the high-precision converter, and round_trip for the round-trip converter.
I'd like to learn more about the three algorithms mentioned, preferably without having to dig into the source code1.
Q: Do these algorithms have names I can Google for to learn exactly what they do and how they differ?
(Also, one side question: what exactly is "the C engine" in this context? Is that a Pandas-specific thing, or a Python-wide thing? None of the above?)
1 Not being familiar with the code base in question, I expect it would take me a long time just to locate the relevant source code. But even assuming I manage to find it, my experience with this sort of algorithm is that their implementations are so highly optimized, and at such a low level, that without some high-level description it is really difficult, at least for me, to follow what's going on.
The pandas. read_csv is used to load a CSV file as a pandas dataframe. In this article, you will learn the different features of the read_csv function of pandas apart from loading the CSV file and the parameters which can be customized to get better output from the read_csv function.
Read a CSV File In this case, the Pandas read_csv() function returns a new DataFrame with the data and labels from the file data. csv , which you specified with the first argument.
If True and parse_dates is enabled, pandas will attempt to infer the format of the datetime strings in the columns, and if it can be inferred, switch to a faster method of parsing them. In some cases this can increase the parsing speed by 5-10x.
You asked about the actual algorithms - the closest I can find is: https://github.com/pandas-dev/pandas/blob/master/pandas/_libs/parsers.pyx#L492
This is taken from a related answer, kudos to MaxU (Understanding pandas.read_csv() float parsing)
Ordinary: double_converter_nogil = xstrtod
High: double_converter_nogil = precise_xstrtod
Round-Trip: double_converter_withgil = round_trip
From here, you're in C-land. You also asked why pandas uses C - critical code paths are written in Cython or C.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With