I have a bunch of CSV files. In some of them, missing data are represented by empty cells, but in others there is a period. I want to loop over all my files, open them, delete any periods that occur alone, and then save and close the file.
I've read a bunch of other questions about doing whole-word-only searches using re.sub(). That is what I want to do (delete . when it occurs alone but not the . in 3.5), but I can't get the syntax right for a whole-word-only search where the whole word is a special character ('.'). Also, I'm worried those answers might be a little different in the case where a whole word can be distinguished by tab and newlines too. That is, does /b work in my CSV file case?
UPDATE: Here is a function I wound up writing after seeing the help below. Maybe it will be useful to someone else.
import csv, re
def clean(infile, outfile, chars):
'''
Open a file, remove all specified special characters used to represent missing data, and save.\n\n
infile:\tAn input file path\n
outfile:\tAn output file path\n
chars:\tA list of strings representing missing values to get rid of
'''
in_temp = open(infile)
out_temp = open(outfile, 'wb')
csvin = csv.reader(in_temp)
csvout = csv.writer(out_temp)
for row in csvin:
row = re.split('\t', row[0])
for colno, col in enumerate(row):
for char in chars:
if col.strip() == char:
row[colno] = ''
csvout.writerow(row)
in_temp.close()
out_temp.close()
Something like this should do the trick... This data wouldn't happen to be coming out of SAS would it - IIRC, that quite often used '.' as missing for numeric values.
import csv
with open('input.csv') as fin, open('output.csv', 'wb') as fout:
csvin = csv.reader(fin)
csvout = csv.writer(fout)
for row in csvin:
for colno, col in enumerate(row):
if col.strip() == '.':
row[colno] = ''
csvout.writerow(row)
Why not just use the csv module?
#!/usr/bin/env python
import csv
with open(somefile) as infile:
r=csv.reader(infile)
rows = []
for row in csv:
rows.append(['' if f == "." else f for f in row])
with open(newfile, 'w') as outfile:
w=csv.writer(outfile)
w.writelines(rows)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With