Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove unwanted commas from CSV using Python

Tags:

python

csv

I need some help, I have a CSV file that contains an address field, whoever input the data into the original database used commas to separate different parts of the address - for example:

Flat 5, Park Street

When I try to use the CSV file it treats this one entry as two separate fields when in fact it is a single field. I have used Python to strip commas out where they are between inverted commas as it is easy to distinguish them from a comma that should actually be there, however this problem has me stumped.

Any help would be gratefully received.

Thanks.

like image 235
merlin_1980 Avatar asked Oct 05 '22 11:10

merlin_1980


2 Answers

You can define the separating and quoting characters with Python's CSV reader. For example:

With this CSV:

1,`Flat 5, Park Street`

And this Python:

import csv

with open('14144315.csv', 'rb') as csvfile:
    rowreader = csv.reader(csvfile, delimiter=',', quotechar='`')
    for row in rowreader:
        print row

You will see this output:

['1', 'Flat 5, Park Street']

This would use commas to separate values but inverted commas for quoted commas

like image 110
Jason Sperske Avatar answered Oct 10 '22 04:10

Jason Sperske


The CSV file was not generated properly. CSV files should have some form of escaping of text, usually using double-quotes:

1,John Doe,"City, State, Country",12345

Some CSV exports do this to all fields (this is an option when exporting from Excel/LibreOffice), but ambiguous fields (such as those including commas) must be escaped.

Either fix this manually or properly regenerate the CSV. Naturally, this cannot be fixed programatically.

Edit: I just noticed something about "inverted commas" being used for escaping - if that is the case see Jason Sperske's answer, which is spot on.

like image 25
Yuval Adam Avatar answered Oct 10 '22 04:10

Yuval Adam