double quoted elements in csv cant read with pandas

Tags:

I have an input file where every value is stored as a string. It is inside a csv file with each entry inside double quotes.

Example file:

"column1","column2", "column3", "column4", "column5", "column6" "AM", "07", "1", "SD", "SD", "CR" "AM", "08", "1,2,3", "PR,SD,SD", "PR,SD,SD", "PR,SD,SD" "AM", "01", "2", "SD", "SD", "SD"

There are only six columns. What options do I need to enter to pandas read_csv to read this correctly?

I currently am trying:

import pandas as pd df = pd.read_csv(file, quotechar='"')

but this gives me the error message: CParserError: Error tokenizing data. C error: Expected 6 fields in line 3, saw 14

Which obviously means that it is ignoring the '"' and parsing every comma as a field. However, for line 3, columns 3 through 6 should be strings with commas in them. ("1,2,3", "PR,SD,SD", "PR,SD,SD", "PR,SD,SD")

How do I get pandas.read_csv to parse this correctly?

Thanks.

765

asked Oct 27 '14 19:10

PopcornKing

1 Answers

This will work. It falls back to the python parser (as you have non-regular separators, e.g. they are comma and sometimes space). If you only have commas it would use the c-parser and be much faster.

In [1]: import csv  In [2]: !cat test.csv "column1","column2", "column3", "column4", "column5", "column6" "AM", "07", "1", "SD", "SD", "CR" "AM", "08", "1,2,3", "PR,SD,SD", "PR,SD,SD", "PR,SD,SD" "AM", "01", "2", "SD", "SD", "SD"  In [3]: pd.read_csv('test.csv',sep=',\s+',quoting=csv.QUOTE_ALL) pandas/io/parsers.py:637: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators; you can avoid this warning by specifying engine='python'.   ParserWarning) Out[3]:       "column1","column2" "column3"   "column4"   "column5"   "column6" "AM"                "07"       "1"        "SD"        "SD"        "CR" "AM"                "08"   "1,2,3"  "PR,SD,SD"  "PR,SD,SD"  "PR,SD,SD" "AM"                "01"       "2"        "SD"        "SD"        "SD"

104

answered Sep 28 '22 21:09

Jeff

Related questions
                            
                                Assignment of variables with space after the (=) sign?
                            
                                AngularJs dynamic name for a form inside ng-repeat
                            
                                How can I mark a committed file as read-only in Git?
                            
                                sudo pip install VS pip install --user
                            
                                Why does compareTo return an integer
                            
                                How to specify cacheDirectory option when using babel-loader with webpack?
                            
                                How to style an HTML5 Progress Element as Circle/Pie with pure CSS
                            
                                .AsExpandable in Linq to Entity
                            
                                What does it mean: Should explicitly set 'android:fullBackupContent' to avoid backing up the GCM device specific regId?
                            
                                How to correctly pass selector as parameter in swift
                            
                                Load NuGet dependencies at runtime
                            
                                How to create and commit a branch in gitlab

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With