The UnicodeDecodeError normally happens when decoding an str string from a certain coding. Since codings map only a limited number of str strings to unicode characters, an illegal sequence of str characters will cause the coding-specific decode() to fail.
The file is being read as a bunch of str
s, but it should be unicode
s. Python tries to implicitly convert, but fails. Change:
job_titles = [line.strip() for line in title_file.readlines()]
to explicitly decode the str
s to unicode
(here assuming UTF-8):
job_titles = [line.decode('utf-8').strip() for line in title_file.readlines()]
It could also be solved by importing the codecs
module and using codecs.open
rather than the built-in open
.
This works fine for me.
f = open(file_path, 'r+', encoding="utf-8")
You can add a third parameter encoding to ensure the encoding type is 'utf-8'
Note: this method works fine in Python3, I did not try it in Python2.7.
For me there was a problem with the terminal encoding. Adding UTF-8 to .bashrc solved the problem:
export LC_CTYPE=en_US.UTF-8
Don't forget to reload .bashrc afterwards:
source ~/.bashrc
You can try this also:
import sys
reload(sys)
sys.setdefaultencoding('utf8')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With