I have a large CSV file (1GB) that I would like to remove commas from. The data are all positive integers. Methods I have tried include dlmwrite with a space as the delimiter, but the output is then output in decimal format. I have also tried to use the fprintf command, but then I lose the shape of the matrix (i.e. all data appear in one line or column).
Thus,
Is there a simple way to read in from a CSV (input.txt):
1, 2, 3, 4, 5
2, 3, 4, 5, 6
and then output to a text file (output.txt) in the form:
1 2 3 4 5
2 3 4 5 6
Some people's name use commas, for example Joe Blow, CFA. This comma breaks the CSV format, since it's interpreted as a new column. I've read up and the most common prescription seems to be replacing that character, or replacing the delimiter, with a new value (e.g. this|that|the, other ).
In Python, if the format is really that simple (and there already is a space after each comma):
with open("infile.csv") as infile, open("outfile.csv", "w") as outfile:
for line in infile:
outfile.write(line.replace(",", ""))
If you can't be sure about whitespace:
import re
with open("infile.csv") as infile, open("outfile.csv", "w") as outfile:
for line in infile:
outfile.write(re.sub(r"\s*,\s*", " ", line))
Personally, I like to use sed, a command line program that replaces strings.
This application is available on linux and via a cygwin install also in windows.
Using
sed -i 's/,/ /g' filename
all the commas in the file are replaced by spaces.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With