Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Engines in Python Pandas read_csv

In the document for pd.read_csv() method in pandas in python while describing the "sep" parameter there is a mention of engines such as C engine and Python engine.

The document link is : https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html

What are these engines? What is the role of each engine? Is there any analogy which can help understand these engines better?

like image 955
PUNEET AGARWAL Avatar asked Oct 12 '18 07:10

PUNEET AGARWAL


People also ask

What does read_csv do in pandas?

Read a comma-separated values (csv) file into DataFrame. Also supports optionally iterating or breaking of the file into chunks. Additional help can be found in the online docs for IO Tools.

Is read_csv faster than Read_excel?

Importing csv files in Python is 100x faster than Excel files. We can now load these files in 0.63 seconds. That's nearly 10 times faster! Python loads CSV files 100 times faster than Excel files.

What is the difference between read_table and read_csv in pandas?

The difference between read_csv() and read_table() is almost nothing. In fact, the same function is called by the source: read_csv() delimiter is a comma character. read_table() is a delimiter of tab \t .


1 Answers

The pd.read_csv documentation notes specific differences between 'c' (default) and 'python' engines. The names indicate the language in which the parsers are written. Specifically, the docs note:

Where possible pandas uses the C parser (specified as engine='c'), but may fall back to Python if C-unsupported options are specified.

Here are the main differences you should note (as of v0.23.4):

  • 'c' is faster, while 'python' is currently more feature-complete.
  • 'python' supports skipfooter, while 'c' does not.
  • 'python' supports flexible sep other than a single character (inc regex), while 'c' does not.
  • 'python' supports sep=None with delim_whitespace=False, which means it can auto-detect a delimiter, while 'c' does not.
  • 'c' supports float_precision, while 'python' does not (or not necessary).

Version notes:

  • dtype supported in 'python' v0.20.0+.
  • delim_whitespace supported in 'python' v0.18.1+.

Note the above may change as features are developed. You should check IO Tools (Text, CSV, HDF5, …) if you see unexpected behaviour in later versions.

like image 182
jpp Avatar answered Oct 09 '22 20:10

jpp