Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

On the float_precision argument to pandas.read_csv

The documentation for the argument in this post's title says:

float_precision : string, default None

Specifies which converter the C engine should use for floating-point values. The options are None for the ordinary converter, high for the high-precision converter, and round_trip for the round-trip converter.

I'd like to learn more about the three algorithms mentioned, preferably without having to dig into the source code1.


Q: Do these algorithms have names I can Google for to learn exactly what they do and how they differ?


(Also, one side question: what exactly is "the C engine" in this context? Is that a Pandas-specific thing, or a Python-wide thing? None of the above?)


1 Not being familiar with the code base in question, I expect it would take me a long time just to locate the relevant source code. But even assuming I manage to find it, my experience with this sort of algorithm is that their implementations are so highly optimized, and at such a low level, that without some high-level description it is really difficult, at least for me, to follow what's going on.

like image 491
kjo Avatar asked Jun 22 '17 11:06

kjo


People also ask

What does read_csv do in pandas?

The pandas. read_csv is used to load a CSV file as a pandas dataframe. In this article, you will learn the different features of the read_csv function of pandas apart from loading the CSV file and the parameters which can be customized to get better output from the read_csv function.

What output type does pandas read_csv () return?

Read a CSV File In this case, the Pandas read_csv() function returns a new DataFrame with the data and labels from the file data. csv , which you specified with the first argument.

When using the read_csv () function in pandas What does the attribute parse_dates true accomplish?

If True and parse_dates is enabled, pandas will attempt to infer the format of the datetime strings in the columns, and if it can be inferred, switch to a faster method of parsing them. In some cases this can increase the parsing speed by 5-10x.


1 Answers

You asked about the actual algorithms - the closest I can find is: https://github.com/pandas-dev/pandas/blob/master/pandas/_libs/parsers.pyx#L492

This is taken from a related answer, kudos to MaxU (Understanding pandas.read_csv() float parsing)

Ordinary: double_converter_nogil = xstrtod
High: double_converter_nogil = precise_xstrtod
Round-Trip: double_converter_withgil = round_trip

From here, you're in C-land. You also asked why pandas uses C - critical code paths are written in Cython or C.

like image 169
MisterJT Avatar answered Sep 24 '22 16:09

MisterJT