Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove special characters from csv file using python

There seems to something on this topic already (How to replace all those Special Characters with white spaces in python?), but I can't figure this simple task out for the life of me.

I have a .CSV file with 75 columns and almost 4000 rows. I need to replace all the 'special characters' ($ # & * ect) with '_' and write to a new file. Here's what I have so far:

import csv

input = open('C:/Temp/Data.csv', 'rb')
lines = csv.reader(input)
output = open('C:/Temp/Data_out1.csv', 'wb')
writer = csv.writer(output)

conversion = '-"/.$'
text =  input.read()
newtext = '_'
for c in text:
    newtext += '_' if c in conversion else c
    writer.writerow(c)

input.close()
output.close()

All this succeeds in doing is to write everything to the output file as a single column, producing over 65K rows. Additionally, the special characters are still present!

Sorry for the redundant question. Thank you in advance!

like image 494
Jenny Avatar asked Apr 01 '13 19:04

Jenny


People also ask

How do I remove special characters from a file in Python?

Using 'str. Using str. replace() , we can replace a specific character. If we want to remove that specific character, replace that character with an empty string. The str.

How do I escape special characters in CSV?

By default, the escape character is a " (double quote) for CSV-formatted files. If you want to use a different escape character, use the ESCAPE clause of COPY , CREATE EXTERNAL TABLE or gpload to declare a different escape character.

How do I remove unique characters from a string in Python?

translate() is another method that can be used to remove a character from a string in Python. translate() returns a string after removing the values passed in the table. Also, remember that to remove a character from a string using translate() you have to replace it with None and not "" .


2 Answers

I might do something like

import csv

with open("special.csv", "rb") as infile, open("repaired.csv", "wb") as outfile:
    reader = csv.reader(infile)
    writer = csv.writer(outfile)
    conversion = set('_"/.$')
    for row in reader:
        newrow = [''.join('_' if c in conversion else c for c in entry) for entry in row]
        writer.writerow(newrow)

which turns

$ cat special.csv
th$s,2.3/,will-be
fixed.,even.though,maybe
some,"shoul""dn't",be

(note that I have a quoted value) into

$ cat repaired.csv 
th_s,2_3_,will-be
fixed_,even_though,maybe
some,shoul_dn't,be

Right now, your code is reading in the entire text into one big line:

text =  input.read()

Starting from a _ character:

newtext = '_'

Looping over every single character in text:

for c in text:

Add the corrected character to newtext (very slowly):

    newtext += '_' if c in conversion else c

And then write the original character (?), as a column, to a new csv:

    writer.writerow(c)

.. which is unlikely to be what you want. :^)

like image 50
DSM Avatar answered Sep 28 '22 09:09

DSM


This doesn't seem to need to deal with CSV's in particular (as long as the special characters aren't your column delimiters).

lines = []
with open('C:/Temp/Data.csv', 'r') as input:
    lines = input.readlines()

conversion = '-"/.$'
newtext = '_'
outputLines = []
for line in lines:
    temp = line[:]
    for c in conversion:
        temp = temp.replace(c, newtext)
    outputLines.append(temp)

with open('C:/Temp/Data_out1.csv', 'w') as output:
    for line in outputLines:
        output.write(line + "\n")
like image 34
dckrooney Avatar answered Sep 28 '22 10:09

dckrooney