I have a file that looks like:
'colA'|'colB'
'word"A'|'A'
'word'B'|'B'
I want to use pd.read_csv('input.csv',sep='|', quotechar="'"
) but I get the following output:
colA colB
word"A A
wordB' B
The last row is not correct, it should be word'B B
. How do I get around this? I have tried various iterations but none of them word that reads both rows correctly. I need some csv reading expertise!
Python loads CSV files 100 times faster than Excel files. Use CSVs. Con: csv files are nearly always bigger than . xlsx files.
parse_dates : boolean or list of ints or names or list of lists or dict, default False. boolean. If True -> try parsing the index. list of ints or names. e.g. If [1, 2, 3] -> try parsing columns 1, 2, 3 each as a separate date column.
Pandas DataFrame to_csv() function converts DataFrame into CSV data. We can pass a file object to write the CSV data into a file. Otherwise, the CSV data is returned in the string format.
I think you need str.strip
with apply
:
import pandas as pd
import io
temp=u"""'colA'|'colB'
'word"A'|'A'
'word'B'|'B'"""
#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp), sep='|')
df = df.apply(lambda x: x.str.strip("'"))
df.columns = df.columns.str.strip("'")
print (df)
colA colB
0 word"A A
1 word'B B
The source of the problem is that ' is defined as quote, and as a regular char.
You can escape it e.g.
'colA'|'colB'
'word"A'|'A'
'word/'B'|'B'
And then use escapechar:
>>> pd.read_csv('input.csv',sep='|',quotechar="'",escapechar="/")
colA colB
0 word"A A
1 word'B B
Also You can use: quoting=csv.QUOTE_ALL - but the output will include the quote chars
>>> import pandas as pd
>>> import csv
>>> pd.read_csv('input.csv',sep='|',quoting=csv.QUOTE_ALL)
'colA' 'colB'
0 'word"A' 'A'
1 'word'B' 'B'
>>>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With