Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Trailing delimiter confuses pandas read_csv

A csv (comma delimited) file, where lines have an extra trailing delimiter, seems to confuse pandas.read_csv. (The data file is [1])

It treats the extra delimiter as if there's an extra column. So there's one more column than what headers require. Then pandas.read_csv takes the first column as row labels. The overall effect is that columns and headers are not aligned any more - the first column becomes row labels, the second column is named by first header, etc.

It is quite annoying. Any idea how to tell pandas.read_csv do the right thing? I couldn't find one.

Great book, BTW.


[1]: 2012 FEC Election Database from chapter 9 of the book Python for Data Analysis

like image 384
edwardw Avatar asked Dec 05 '12 09:12

edwardw


People also ask

Is read_csv faster than Read_excel?

Python loads CSV files 100 times faster than Excel files. Use CSVs. Con: csv files are nearly always bigger than . xlsx files.

Can pandas Read_excel read csv?

Pandas is a powerful and flexible Python package that allows you to work with labeled and time series data. It also provides statistics methods, enables plotting, and more. One crucial feature of Pandas is its ability to write and read Excel, CSV, and many other types of files.

What is the difference between Read_table and read_csv in pandas?

The difference between read_csv() and read_table() is almost nothing. In fact, the same function is called by the source: read_csv() delimiter is a comma character. read_table() is a delimiter of tab \t .


1 Answers

For everyone who is still finding this. Wes wrote a blogpost about this. The problem if there is one value too many in the row it is treated as the rows name.

This behaviour can be changed by setting index_col=False as an option to read_csv.

like image 50
k-nut Avatar answered Oct 14 '22 13:10

k-nut