Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python import CSV short code (pandas?) delimited with ';' and ',' in entires

I need to import a CSV file in Python on Windows. My file is delimited by ';' and has strings with non-English symbols and commas (',').

I've read posts:

Importing a CSV file into a sqlite3 database table using Python

Python import csv to list

When I run:

with open('d:/trade/test.csv', 'r') as f1:
    reader1 = csv.reader(f1)
    your_list1 = list(reader1)

I get an issue: comma is changed to '-' symbol.

When I try:

df = pandas.read_csv(csvfile)

I got errors:

pandas.io.common.CParserError: Error tokenizing data. C error: Expected 1 fields in line 13, saw 2.

Please help. I would prefer to use pandas as the code is shorter without listing all field names from the CSV file.

I understand there could be the work around of temporarily replacing commas. Still, I would like to solve it by some parameters to pandas.

like image 750
Alexei Martianov Avatar asked Jun 19 '16 06:06

Alexei Martianov


3 Answers

Pandas solution - use read_csv with regex separator [;,]. You need add engine='python', because warning:

ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators (separators > 1 char and different from '\s+' are interpreted as regex); you can avoid this warning by specifying engine='python'.

import pandas as pd
import io

temp=u"""a;b;c
1;1,8
1;2,1
1;3,6
1;4,3
1;5,7
"""
#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp), sep="[;,]", engine='python')
print (df)

   a  b  c
0  1  1  8
1  1  2  1
2  1  3  6
3  1  4  3
4  1  5  7
like image 60
jezrael Avatar answered Sep 27 '22 21:09

jezrael


Pandas documentation says for parameters:

pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html

sep : str, default ‘,’

    Delimiter to use. If sep is None, will try to automatically determine this.

Pandas did not parse my file delimited by ; because default is not None denoted for automatic but ,. Adding sep parameter set to ; for pandas fixed the issue.

like image 33
Alexei Martianov Avatar answered Sep 27 '22 21:09

Alexei Martianov


Unless your CSV file is broken, you can try to make csv guess your format.

import csv

with open('d:/trade/test.csv', 'r') as f1:
    dialect = csv.Sniffer().sniff(f1.read(1024))
    f1.seek(0)
    r = csv.reader(f1, dialect=dialect)
    for row in r:
        print(row)
like image 32
totoro Avatar answered Sep 27 '22 20:09

totoro