A csv (comma delimited) file, where lines have an extra trailing delimiter, seems to confuse pandas.read_csv
. (The data file is [1])
It treats the extra delimiter as if there's an extra column. So there's one more column than what headers require. Then pandas.read_csv
takes the first column as row labels. The overall effect is that columns and headers are not aligned any more - the first column becomes row labels, the second column is named by first header, etc.
It is quite annoying. Any idea how to tell pandas.read_csv
do the right thing? I couldn't find one.
Great book, BTW.
[1]: 2012 FEC Election Database from chapter 9 of the book Python for Data Analysis
Python loads CSV files 100 times faster than Excel files. Use CSVs. Con: csv files are nearly always bigger than . xlsx files.
Pandas is a powerful and flexible Python package that allows you to work with labeled and time series data. It also provides statistics methods, enables plotting, and more. One crucial feature of Pandas is its ability to write and read Excel, CSV, and many other types of files.
The difference between read_csv() and read_table() is almost nothing. In fact, the same function is called by the source: read_csv() delimiter is a comma character. read_table() is a delimiter of tab \t .
For everyone who is still finding this. Wes wrote a blogpost about this. The problem if there is one value too many in the row it is treated as the rows name.
This behaviour can be changed by setting index_col=False
as an option to read_csv
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With