Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does pandas read_csv not support multiple comments (#,@,...)?

Tags:

python

pandas

I found pandas read_csv method to be faster than numpy loadtxt. Unfortunatly now I find myself in a situation where I have to go back to numpy because loadtxt has the option of setting comments=['#','@']. Pandas read_csv method can only take one comment string like comment='#' as far as I can tell from the help site. Any suggestions or workarounds that could make my life easier and make me not pivot back to numpy? Also why does pandas not support multiple comment indicators?

# save this in test.dat
@ bla
# bla
1 2 3 4

Minimal example:

# does work, but only one type of comment is accounted for
df = pd.read_csv('test.dat', index_col=0, header=None, comment='#')

# does not work (not suprising reading the help)
df = pd.read_csv('test.dat', index_col=0, header=None, comment=['#','@'])

# does work but is slow
df = np.loadtxt('test.dat', comments=['#','@'])
like image 759
Asking Questions Avatar asked Nov 17 '16 16:11

Asking Questions


People also ask

Is read_csv faster than Read_excel?

Python loads CSV files 100 times faster than Excel files. Use CSVs. Con: csv files are nearly always bigger than . xlsx files.

What is the difference between read_table and read_csv in pandas?

The difference between read_csv() and read_table() is almost nothing. In fact, the same function is called by the source: read_csv() delimiter is a comma character. read_table() is a delimiter of tab \t .

What does CSV in read_csv () stand for?

A comma-separated values (csv) file is returned as two-dimensional data structure with labeled axes. See also DataFrame.to_csv. Write DataFrame to a comma-separated values (csv) file. read_csv. Read a comma-separated values (csv) file into DataFrame.


1 Answers

The short answer is that nobody has implemented it in pandas yet. Looking quickly through their Github issues, it looks like someone else has suggested it and that the maintainers are open to a patch that implements it: https://github.com/pandas-dev/pandas/issues/13948

Could be a good opportunity for you to contribute back to the pandas project if you feel comfortable with that, or just keep an eye on that issue if someone else does it. The part of the codebase that handles comments looks to be around here in _check_comments: https://github.com/pandas-dev/pandas/blob/master/pandas/io/parsers.py#L2348

like image 69
Randy Avatar answered Sep 29 '22 07:09

Randy