<p>I need to analyse a textfile in tamil (utf-8 encoded). Im using nltk package of Python on the interface IDLE. when i try to read the text file on the interface, this is the error i get. how do i avoid this?</p> <pre class="prettyprint"><code>corpus = open('C:\\Users\\Customer\\Desktop\\DISSERTATION\\ettuthokai.txt').read() Traceback (most recent call last): File "<pyshell#2>", line 1, in <module> corpus = open('C:\\Users\\Customer\\Desktop\\DISSERTATION\\ettuthokai.txt').read() File "C:\Users\Customer\AppData\Local\Programs\Python\Python35-32\lib\encodings\cp1252.py", line 23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0] UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 33: character maps to <undefined> </code></pre>

<p>Since you are using Python 3, just add the <code>encoding</code> parameter to <code>open()</code>:</p> <pre class="prettyprint"><code>corpus = open( r"C:\Users\Customer\Desktop\DISSERTATION\ettuthokai.txt", encoding="utf-8" ).read() </code></pre>

How to read a utf-8 encoded text file using Python

Tags:

python

encoding

utf-8

I need to analyse a textfile in tamil (utf-8 encoded). Im using nltk package of Python on the interface IDLE. when i try to read the text file on the interface, this is the error i get. how do i avoid this?

corpus = open('C:\\Users\\Customer\\Desktop\\DISSERTATION\\ettuthokai.txt').read()

Traceback (most recent call last):
  File "<pyshell#2>", line 1, in <module>
    corpus = open('C:\\Users\\Customer\\Desktop\\DISSERTATION\\ettuthokai.txt').read()
  File "C:\Users\Customer\AppData\Local\Programs\Python\Python35-32\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 33: character maps to <undefined>

862

asked Dec 01 '16 19:12

Ramprashanth

1 Answers

Since you are using Python 3, just add the encoding parameter to open():

corpus = open(
    r"C:\Users\Customer\Desktop\DISSERTATION\ettuthokai.txt", encoding="utf-8"
).read()

answered Oct 04 '22 17:10

Antonis Christofides

Related questions
                            
                                How to load one line at a time from a pickle file?
                            
                                Python: single colon vs double colon
                            
                                Insert element to list based on previous and next elements
                            
                                Excel VLOOKUP equivalent in pandas
                            
                                Python heapq vs sorted speed for pre-sorted lists
                            
                                Pandas: replace column values based on match from another column
                            
                                Anaconda 3 for Linux Has No ensurepip?
                            
                                captureWarnings set to True doesn't capture warnings
                            
                                pandas not condition with filtering
                            
                                Optimizing multiprocessing.Pool with expensive initialization
                            
                                Minimize total distance between two sets of points in Python
                            
                                Python with MySql unicode problems
                            
                                How does python interpreter run the code line by line in the following code?
                            
                                Error iterating through a Pandas series
                            
                                Can't import tkinter (or Tkinter)
                            
                                Is there a difference between str function and percent operator in Python
                            
                                Django - Can't load custom filter in templates
                            
                                General way of comparing numerics in Python [duplicate]
                            
                                How to get number without decimal places?
                            
                                How to open a file only using its extension?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With