Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Auto-detect the delimiter in a CSV file using pd.read_csv

Is there a way for read_csv to auto-detect the delimiter? numpy's genfromtxt does this. My files have data with single space, double space and a tab as delimiters. genfromtext() solves it, but is slower than pandas' read_csv. Any ideas?

like image 236
SEU Avatar asked Sep 09 '17 22:09

SEU


People also ask

How do I find the delimiter in a CSV file?

Here are the steps you should follow: Open your CSV using a text editor. Skip a line at the top, and add sep=; if the separator used in the CSV is a semicolon (;), or sep=, if the separator is a comma (,). Save, and re-open the file.

Which function is used to specify the delimiter in CSV?

Indicate separator directly in CSV file For this, open your file in any text editor, say Notepad, and type the below string before any other data: To separate values with comma: sep=, To separate values with semicolon: sep=; To separate values with a pipe: sep=|


1 Answers

Option 1

Using delim_whitespace=True

df = pd.read_csv('file.csv', delim_whitespace=True)

Option 2

Pass a regular expression to the sep parameter:

df = pd.read_csv('file.csv', sep='\s+')

This is equivalent to the first option


Documentation for pd.read_csv.

like image 143
cs95 Avatar answered Oct 05 '22 00:10

cs95