Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python parse CSV ignoring comma with double-quotes

I have a CSV file with lines like this:

"AAA", "BBB", "Test, Test", "CCC" "111", "222, 333", "XXX", "YYY, ZZZ"  

and so on ...

I dont want to parse comma's under double-quotes. ie. My expected result should be

AAA BBB Test, Test CCC 

My code:

import csv with open('values.csv', 'rb') as f:     reader = csv.reader(f)     for row in reader:         print row 

I tried using csv package under python but no luck. The parses explodes all comma's.

Please let me know if I'm missing something

like image 769
Abhi Avatar asked Feb 03 '14 12:02

Abhi


People also ask

How do you handle double quotes in CSV?

Yes. You can import double quotation marks using CSV files and import maps by escaping the double quotation marks. To escape the double quotation marks, enclose them within another double quotation mark.

How do you escape a comma in Python?

An escape character is a backslash \ followed by the character you want to insert.

What is Quotechar in CSV Python?

quotechar specifies the character used to surround fields that contain the delimiter character. The default is a double quote ( ' " ' ). escapechar specifies the character used to escape the delimiter character, in case quotes aren't used.

Why there is double quotes in CSV?

If the text within a field contains quoted text and a comma, then it starts to get ugly as double quotes are now needed to prevent confusion as to what each quote character means.


2 Answers

This should do:

lines = '''"AAA", "BBB", "Test, Test", "CCC"            "111", "222, 333", "XXX", "YYY, ZZZ"'''.splitlines() for l in  csv.reader(lines, quotechar='"', delimiter=',',                      quoting=csv.QUOTE_ALL, skipinitialspace=True):     print l >>> ['AAA', 'BBB', 'Test, Test', 'CCC'] >>> ['111', '222, 333', 'XXX', 'YYY, ZZZ'] 
like image 63
Michael Avatar answered Sep 20 '22 06:09

Michael


You have spaces before the quote characters in your input. Set skipinitialspace to True to skip any whitespace following a delimiter:

When True, whitespace immediately following the delimiter is ignored. The default is False.

>>> import csv >>> lines = '''\ ... "AAA", "BBB", "Test, Test", "CCC" ... "111", "222, 333", "XXX", "YYY, ZZZ"  ... ''' >>> reader = csv.reader(lines.splitlines()) >>> next(reader) ['AAA', ' "BBB"', ' "Test', ' Test"', ' "CCC"'] >>> reader = csv.reader(lines.splitlines(), skipinitialspace=True) >>> next(reader) ['AAA', 'BBB', 'Test, Test', 'CCC'] 
like image 43
Martijn Pieters Avatar answered Sep 20 '22 06:09

Martijn Pieters