I am trying to process a csv file in python that has ^M character in the middle of each row/line which is a newline. I cant open the file in any mode other than 'rU'.
If I do open the file in the 'rU' mode, it reads in the newline and splits the file (creating a newline) and gives me twice the number of rows.
I want to remove the newline altogether. How?
Note that, as the docs say:
csvfile can be any object which supports the iterator protocol and returns a string each time its
next()
method is called — file objects and list objects are both suitable.
So, you can always stick a filter on the file before handing it to your reader
or DictReader
. Instead of this:
with open('myfile.csv', 'rU') as myfile:
for row in csv.reader(myfile):
Do this:
with open('myfile.csv', 'rU') as myfile:
filtered = (line.replace('\r', '') for line in myfile)
for row in csv.reader(filtered):
That '\r'
is the Python (and C) way of spelling ^M
. So, this just strips all ^M
characters out, no matter where they appear, by replacing each one with an empty string.
I guess I want to modify the file permanently as opposed to filtering it.
First, if you want to modify the file before running your Python script on it, why not do that from outside of Python? sed
, tr
, many text editors, etc. can all do this for you. Here's a GNU sed example:
gsed -i'' 's/\r//g' myfile.csv
But if you want to do it in Python, it's not that much more verbose, and you might find it more readable, so:
First, you can't really modify a file in-place if you want to insert or delete from the middle. The usual solution is to write a new file, and either move the new file over the old one (Unix only) or delete the old one (cross-platform).
The cross-platform version:
os.rename('myfile.csv', 'myfile.csv.bak')
with open('myfile.csv.bak', 'rU') as infile, open('myfile.csv', 'wU') as outfile:
for line in infile:
outfile.write(line.replace('\r'))
os.remove('myfile.csv.bak')
The less-clunky, but Unix-only, version:
temp = tempfile.NamedTemporaryFile(delete=False)
with open('myfile.csv', 'rU') as myfile, closing(temp):
for line in myfile:
temp.write(line.replace('\r'))
os.rename(tempfile.name, 'myfile.csv')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With