Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove Commas from Large CSV (1GB)

I have a large CSV file (1GB) that I would like to remove commas from. The data are all positive integers. Methods I have tried include dlmwrite with a space as the delimiter, but the output is then output in decimal format. I have also tried to use the fprintf command, but then I lose the shape of the matrix (i.e. all data appear in one line or column).

Thus,

Is there a simple way to read in from a CSV (input.txt):

1, 2, 3, 4, 5
2, 3, 4, 5, 6

and then output to a text file (output.txt) in the form:

1 2 3 4 5
2 3 4 5 6
like image 302
user1566235 Avatar asked Jul 31 '12 15:07

user1566235


People also ask

Do commas mess up CSV files?

Some people's name use commas, for example Joe Blow, CFA. This comma breaks the CSV format, since it's interpreted as a new column. I've read up and the most common prescription seems to be replacing that character, or replacing the delimiter, with a new value (e.g. this|that|the, other ).


2 Answers

In Python, if the format is really that simple (and there already is a space after each comma):

with open("infile.csv") as infile, open("outfile.csv", "w") as outfile:
    for line in infile:
        outfile.write(line.replace(",", ""))

If you can't be sure about whitespace:

import re
with open("infile.csv") as infile, open("outfile.csv", "w") as outfile:
    for line in infile:
        outfile.write(re.sub(r"\s*,\s*", " ", line))
like image 124
Tim Pietzcker Avatar answered Oct 05 '22 17:10

Tim Pietzcker


Personally, I like to use sed, a command line program that replaces strings.

This application is available on linux and via a cygwin install also in windows.

Using

sed -i 's/,/ /g' filename

all the commas in the file are replaced by spaces.

like image 34
Hugo van den Brand Avatar answered Oct 05 '22 17:10

Hugo van den Brand