How do I convert LF to CRLF?

Tags:

I found a list of the majority of English words online, but the line breaks are of unix-style (encoded in Unicode: UTF-8). I found it on this website: http://dreamsteep.com/projects/the-english-open-word-list.html

How do I convert the line breaks to CRLF so I can iterate over them? The program I will be using them in goes through each line in the file, so the words have to be one per line.

This is a portion of the file: bitbackbitebackbiterbackbitersbackbitesbackbitingbackbittenbackboard

It should be:

bit
backbite
backbiter
backbiters
backbites
backbiting
backbitten
backboard

How can I convert my files to this type? Note: it's 26 files (one per letter) with 80,000 words or so in total (so the program should be very fast).

I don't know where to start because I've never worked with unicode. Thanks in advance!

Using rU as the parameter (as suggested), with this in my code:

with open(my_file_name, 'rU') as my_file:
    for line in my_file:
        new_words.append(str(line))
my_file.close()

I get this error:

Traceback (most recent call last):
  File "<pyshell#5>", line 1, in <module>
    addWords('B Words')
  File "D:\my_stuff\Google Drive\documents\SCHOOL\Programming\Python\Programming Class\hangman.py", line 138, in addWords
    for line in my_file:
  File "C:\Python3.3\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 7488: character maps to <undefined>

Can anyone help me with this?

363

asked Dec 19 '12 14:12

Rushy Panchal

3 Answers

You can use the replace method of strings. Like

txt.replace('\n', '\r\n')

EDIT :
in your case :

with open('input.txt') as inp, open('output.txt', 'w') as out:
    txt = inp.read()
    txt = txt.replace('\n', '\r\n')
    out.write(txt)

157

answered Oct 23 '22 07:10

dugres

Instead of converting, you should be able to just open the file using Python's universal newline support:

f = open('words.txt', 'rU')

(Note the U.)

answered Oct 23 '22 06:10

NPE

You don't need to convert the line endings in the files in order to be able to iterate over them. As suggested by NPE, simply use python's universal newlines mode.

The UnicodeDecodeError happens because the files you are processing are encoded as UTF-8 and when you attempt to decode the contents from bytes to a string, via str(line), Python is using the cp1252 encoding to convert the bytes read from the file into a Python 3 string (i.e. a sequence of unicode code points). However, there are bytes in those files that cannot be decoded with the cp1252 encoding and that causes a UnicodeDecodeError.

If you change str(line) to line.decode('utf-8') you should no longer get the UnicodeDecodeError. Check out the Text Vs. Data Instead of Unicode Vs. 8-bit writeup for some more details.

Finally, you might also find The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky useful.

answered Oct 23 '22 06:10

Eric Rahmig

Related questions
                            
                                Speeding (Bulk) Insert into MySQL with Python
                            
                                how do i use python libraries in C++?
                            
                                Does python have something like C++'s using keyword?
                            
                                Newbie teaching self python, what else should I be learning? [closed]
                            
                                Sqlalchemy complex in_ clause with tuple in list of tuples
                            
                                Python: how to change (last) element of tuple?
                            
                                How can I run my python script from the terminal in Mac OS X without having to type the full path?
                            
                                Get rid of stopwords and punctuation
                            
                                Why do rfind and find return the same values in Python 2.6.5?
                            
                                Parentheses pairing ({}[]()<>) issue
                            
                                Append a new item to a list within a list
                            
                                jinja2 filesystemloader load all subdirectories
                            
                                Initializing 2D array in Python
                            
                                Passing a variable in url?
                            
                                finding the derivative of a polynomial
                            
                                Python negative zero slicing
                            
                                Comparing two lists in Python
                            
                                Python in raw mode stdin print adds spaces
                            
                                Cron parser and validation in python
                            
                                Python - Unsupported type(s) : range and range

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How do I convert LF to CRLF?

Tags:

python

unix

Rushy Panchal

People also ask

3 Answers

dugres

NPE

Eric Rahmig

Recent Activity

Donate For Us