Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas Read_CSV quotes issue

I have a file that looks like:

'colA'|'colB'
'word"A'|'A'
'word'B'|'B'

I want to use pd.read_csv('input.csv',sep='|', quotechar="'") but I get the following output:

colA    colB
word"A   A
wordB'   B

The last row is not correct, it should be word'B B. How do I get around this? I have tried various iterations but none of them word that reads both rows correctly. I need some csv reading expertise!

like image 630
A. Jameel Avatar asked Jun 02 '16 10:06

A. Jameel


People also ask

Is read_csv faster than Read_excel?

Python loads CSV files 100 times faster than Excel files. Use CSVs. Con: csv files are nearly always bigger than . xlsx files.

What does parse_dates mean in pandas?

parse_dates : boolean or list of ints or names or list of lists or dict, default False. boolean. If True -> try parsing the index. list of ints or names. e.g. If [1, 2, 3] -> try parsing columns 1, 2, 3 each as a separate date column.

What does To_csv do in pandas?

Pandas DataFrame to_csv() function converts DataFrame into CSV data. We can pass a file object to write the CSV data into a file. Otherwise, the CSV data is returned in the string format.


2 Answers

I think you need str.strip with apply:

import pandas as pd
import io

temp=u"""'colA'|'colB'
'word"A'|'A'
'word'B'|'B'"""

#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp), sep='|')

df = df.apply(lambda x: x.str.strip("'"))
df.columns = df.columns.str.strip("'")
print (df)
     colA colB
0  word"A    A
1  word'B    B
like image 109
jezrael Avatar answered Sep 20 '22 20:09

jezrael


The source of the problem is that ' is defined as quote, and as a regular char.

You can escape it e.g.

'colA'|'colB'
'word"A'|'A'
'word/'B'|'B'

And then use escapechar:

>>> pd.read_csv('input.csv',sep='|',quotechar="'",escapechar="/")
     colA colB
0  word"A    A
1  word'B    B

Also You can use: quoting=csv.QUOTE_ALL - but the output will include the quote chars

>>> import pandas as pd
>>> import csv
>>> pd.read_csv('input.csv',sep='|',quoting=csv.QUOTE_ALL)
     'colA' 'colB'
0  'word"A'    'A'
1  'word'B'    'B'
>>>
like image 42
Yaron Avatar answered Sep 21 '22 20:09

Yaron