Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing newline from a csv file

Tags:

python

newline

I am trying to process a csv file in python that has ^M character in the middle of each row/line which is a newline. I cant open the file in any mode other than 'rU'.

If I do open the file in the 'rU' mode, it reads in the newline and splits the file (creating a newline) and gives me twice the number of rows.

I want to remove the newline altogether. How?

like image 274
ganesh reddy Avatar asked Jan 17 '13 23:01

ganesh reddy


1 Answers

Note that, as the docs say:

csvfile can be any object which supports the iterator protocol and returns a string each time its next() method is called — file objects and list objects are both suitable.

So, you can always stick a filter on the file before handing it to your reader or DictReader. Instead of this:

with open('myfile.csv', 'rU') as myfile:
    for row in csv.reader(myfile):

Do this:

with open('myfile.csv', 'rU') as myfile:
    filtered = (line.replace('\r', '') for line in myfile)
    for row in csv.reader(filtered):

That '\r' is the Python (and C) way of spelling ^M. So, this just strips all ^M characters out, no matter where they appear, by replacing each one with an empty string.


I guess I want to modify the file permanently as opposed to filtering it.

First, if you want to modify the file before running your Python script on it, why not do that from outside of Python? sed, tr, many text editors, etc. can all do this for you. Here's a GNU sed example:

gsed -i'' 's/\r//g' myfile.csv

But if you want to do it in Python, it's not that much more verbose, and you might find it more readable, so:

First, you can't really modify a file in-place if you want to insert or delete from the middle. The usual solution is to write a new file, and either move the new file over the old one (Unix only) or delete the old one (cross-platform).

The cross-platform version:

os.rename('myfile.csv', 'myfile.csv.bak')
with open('myfile.csv.bak', 'rU') as infile, open('myfile.csv', 'wU') as outfile:
    for line in infile:
        outfile.write(line.replace('\r'))
os.remove('myfile.csv.bak')

The less-clunky, but Unix-only, version:

temp = tempfile.NamedTemporaryFile(delete=False)
with open('myfile.csv', 'rU') as myfile, closing(temp):
    for line in myfile:
        temp.write(line.replace('\r'))
os.rename(tempfile.name, 'myfile.csv')
like image 132
abarnert Avatar answered Nov 15 '22 04:11

abarnert