Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas guess delimiter with sep=None

Tags:

python

pandas

csv

Pandas documentation has this:

With sep=None, read_csv will try to infer the delimiter automatically in some cases by “sniffing”.

How can I access pandas' guess for the delimiter?

I want to read in 10 lines of my file, have pandas guess the delimiter, and start up my GUI with that delimiter already selected. But I don't know how to access what pandas thinks is the delimiter.

Also, is there a way to pass pandas a list of strings to restrict it's guesses to?

like image 251
Gregory Lepore Avatar asked Jun 10 '15 12:06

Gregory Lepore


People also ask

Is delimiter and SEP same?

sep & delimiter : The delimiter parameter is an alias for sep . You can use sep to tell Pandas what to use as a delimiter, by default this is , . However, you can pass in regex such as \t for tab spaced data.

How do you use SEP in pandas?

sep: Specify a custom delimiter for the CSV input, the default is a comma. index_col: This is to allow you to set which columns to be used as the index of the dataframe. The default value is None, and pandas will add a new column start from 0 to specify the index column.

How do you use delimiter in pandas?

pandas. read_csv(filepath_or_buffer, sep=', ', delimiter=None, header='infer', names=None, index_col=None, ....) It reads the content of a csv file at given path, then loads the content to a Dataframe and returns that. It uses comma (,) as default delimiter or separator while parsing a file.


2 Answers

Looking at the source code, I doubt that it's possible to get the delimiter out of read_csv. But pandas internally uses the Sniffer class from the csv module. Here's an example that should get you going:

import csv
s = csv.Sniffer()
print s.sniff("a,b,c").delimiter
print s.sniff("a;b;c").delimiter
print s.sniff("a#b#c").delimiter

Output:

,
;
#

What remains, is reading the first line from a file and feeding it to the Sniffer.sniff() function, but I'll leave that up to you.

like image 56
Matt Avatar answered Sep 21 '22 15:09

Matt


The csv.Sniffer is the simplest solution, but it doesn't work if you need to use compressed files. Here's what's working, although it uses a private member, so beware:

reader = pd.read_csv('path/to/file.tar.gz', sep=None, engine='python', iterator=True)
sep = reader._engine.data.dialect.delimiter
reader.close()
like image 37
Eugene Pakhomov Avatar answered Sep 21 '22 15:09

Eugene Pakhomov