I have a file that looks like: <pre class="prettyprint"><code>'colA'|'colB' 'word"A'|'A' 'word'B'|'B' </code></pre> I want to use <code>pd.read_csv('input.csv',sep='|', quotechar="'"</code>) but I get the following output: <pre class="prettyprint"><code>colA colB word"A A wordB' B </code></pre> The last row is not correct, it should be <code>word'B B</code>. How do I get around this? I have tried various iterations but none of them word that reads both rows correctly. I need some csv reading expertise!

The source of the problem is that ' is defined as quote, and as a regular char. You can escape it e.g. <pre class="prettyprint"><code>'colA'|'colB' 'word"A'|'A' 'word/'B'|'B' </code></pre> And then use escapechar: <pre class="prettyprint"><code>>>> pd.read_csv('input.csv',sep='|',quotechar="'",escapechar="/") colA colB 0 word"A A 1 word'B B </code></pre> Also You can use: quoting=csv.QUOTE_ALL - but the output will include the quote chars <pre class="prettyprint"><code>>>> import pandas as pd >>> import csv >>> pd.read_csv('input.csv',sep='|',quoting=csv.QUOTE_ALL) 'colA' 'colB' 0 'word"A' 'A' 1 'word'B' 'B' >>> </code></pre>

Pandas Read_CSV quotes issue

Tags:

python

pandas

dataframe

csv

quoting

I have a file that looks like:

'colA'|'colB'
'word"A'|'A'
'word'B'|'B'

I want to use pd.read_csv('input.csv',sep='|', quotechar="'") but I get the following output:

colA    colB
word"A   A
wordB'   B

The last row is not correct, it should be word'B B. How do I get around this? I have tried various iterations but none of them word that reads both rows correctly. I need some csv reading expertise!

630

asked Jun 02 '16 10:06

A. Jameel

2 Answers

I think you need str.strip with apply:

import pandas as pd
import io

temp=u"""'colA'|'colB'
'word"A'|'A'
'word'B'|'B'"""

#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp), sep='|')

df = df.apply(lambda x: x.str.strip("'"))
df.columns = df.columns.str.strip("'")
print (df)
     colA colB
0  word"A    A
1  word'B    B

109

answered Sep 20 '22 20:09

jezrael

The source of the problem is that ' is defined as quote, and as a regular char.

You can escape it e.g.

'colA'|'colB'
'word"A'|'A'
'word/'B'|'B'

And then use escapechar:

>>> pd.read_csv('input.csv',sep='|',quotechar="'",escapechar="/")
     colA colB
0  word"A    A
1  word'B    B

Also You can use: quoting=csv.QUOTE_ALL - but the output will include the quote chars

>>> import pandas as pd
>>> import csv
>>> pd.read_csv('input.csv',sep='|',quoting=csv.QUOTE_ALL)
     'colA' 'colB'
0  'word"A'    'A'
1  'word'B'    'B'
>>>

answered Sep 21 '22 20:09

Yaron

Related questions
                            
                                how does python variable inheritance work
                            
                                Python - socketio import error
                            
                                How to get a floating point infinity that when multiplied by zero gives zero
                            
                                Easiest way to plot data on country map with python
                            
                                Django + postgres relation does not exist error
                            
                                how do I test methods using boto3 with moto
                            
                                How to XOR two strings in Python
                            
                                How do Dask dataframes handle larger-than-memory datasets?
                            
                                Using bounding rectangle to get rotation angle not working (OpenCV/Python)
                            
                                Forward fill all except last value in python pandas dataframe
                            
                                Checking if text file is empty Python [duplicate]
                            
                                Django serializers: validate function not called
                            
                                Is it possible to run a command that is in a list?
                            
                                Get index values from slice objects in python [duplicate]
                            
                                Use of StreamField in Snippets on Wagtail
                            
                                Django model DateTimeField set auto_now_add format or modify the serializer
                            
                                Difficulty comparing generated and google cloud storage provided CRC32c checksums
                            
                                How to send axhline to back of Matplotlib's barplot
                            
                                Lazy evaluation of map
                            
                                Convert RGB triplets to LAB triplets using skimage.color.rgb2lab()

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With