Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to ignore invalid lines in a file?

I'm iterating over a file

for line in io.TextIOWrapper(readFile, encoding = 'utf8'):

when the file contains the following line

b'"""\xea\x11"\t1664\t507\t137\t2\n'

that generates the following exception

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xea in position 3: invalid continuation byte

How can I make my script to ignore such lines and continue with the good ones?

like image 514
Jader Dias Avatar asked Dec 06 '22 04:12

Jader Dias


1 Answers

If you actually want to ignore the whole line if it has any invalid characters, you will have to know there were invalid characters. Which means you can't use TextIOWrapper, and have to instead decode the lines manually. What you want to do is this:

for bline in readFile:
    try:
        line = bline.decode('utf-8')
    except UnicodeDecodeError:
        continue
    # do stuff with line

However, note that this does not give you the same newline behavior as using a text file; if you need that, you'll need to be explicit about that as well.

like image 165
abarnert Avatar answered Dec 29 '22 01:12

abarnert