Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I remove carriage return from a text file with Python?

The things I've googled haven't worked, so I'm turning to experts!

I have some text in a tab-delimited text file that has some sort of carriage return in it (when I open it in Notepad++ and use "show all characters", I see [CR][LF] at the end of the line). I need to remove this carriage return (or whatever it is), but I can't seem to figure it out. Here's a snippet of the text file showing a line with the carriage return:

firstcolumn secondcolumn    third   fourth  fifth   sixth       seventh
moreoftheseventh        8th             9th 10th    11th    12th                    13th

Here's the code I'm trying to use to replace it, but it's not finding the return:

with open(infile, "r") as f:
    for line in f:
        if "\n" in line:
            line = line.replace("\n", " ")

My script just doesn't find the carriage return. Am I doing something wrong or making an incorrect assumption about this carriage return? I could just remove it manually in a text editor, but there are about 5000 records in the text file that may also contain this issue.

Further information: The goal here is select two columns from the text file, so I split on \t characters and refer to the values as parts of an array. It works on any line without the returns, but fails on the lines with the returns because, for example, there is no element 9 in those lines.

vals = line.split("\t")
print(vals[0] + " " + vals[9])

So, for the line of text above, this code fails because there is no index 9 in that particular array. For lines of text that don't have the [CR][LF], it works as expected.

like image 629
mrcoulson Avatar asked Jul 15 '13 15:07

mrcoulson


People also ask

How do I remove a carriage return in notepad?

In notepad++, you can actually open the search box, check the option for "extended search" in the search mode, and replace \R with blanks. This will help you replace the carriage return characters... This also works for the other special chars such as \t , \n , etc. It's lowercase \r not \R .


1 Answers

Technically, there is an answer!

with open(filetoread, "rb") as inf:
    with open(filetowrite, "w") as fixed:
        for line in inf:
            fixed.write(line)

The b in open(filetoread, "rb") apparently opens the file in such a way that I can access those line breaks and remove them. This answer actually came from Stack Overflow user Kenneth Reitz off the site.

Thanks everyone!

like image 173
mrcoulson Avatar answered Sep 28 '22 04:09

mrcoulson