I want to read a file that contains also German and not only characters. I found that i can do like this
>>> import codecs
>>> file = codecs.open('file.txt','r', encoding='UTF-8')
>>> lines= file.readlines()
This is working when i try to run my job in Python IDLE but when i try to run it from somewhere else does not give correct result. Have a idea?
Approach 1: This approach is related to the inbuilt library unidecode. This library helps Transliterating non-ASCII characters in Python. It provides an unidecode() method that takes Unicode data and tries to represent it in ASCII.
You need to know which character encoding the text is encoded in. If you don't know that beforehand, you can try guessing it with the chardet module. First install it:
$ pip install chardet
Then, for example reading the file in binary mode:
>>> import chardet
>>> chardet.detect(open("file.txt", "rb").read())
{'confidence': 0.9690625, 'encoding': 'utf-8'}
So then:
>>> import codecs
>>> import unicodedata
>>> lines = codecs.open('file.txt', 'r', encoding='utf-8').readlines()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With