UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in position 257: invalid start byte

Question

I am new in python and want to apply p reprocessing steps 
so here is decoding error 

import nltk
from nltk.tokenize import word_tokenize,sent_tokenize
from nltk.corpus import stopwords
from nltk.tag import pos_tag
from nltk.stem import PorterStemmer

`ps=PorterStemmer()
print ("
 Reading file with out stopwords.")
text_file=open('preprocessing.txt',encoding='utf-8').read()
stop_words= set(stopwords.words("english"))
words=word_tokenize(text_file)
filtered_sentence = [w for w in words if not w in stop_words]
print(filtered_sentence)
print ("
 Removed stopword.")
print(stop_words)
print ("
 Stemming.")
for w in text_file:
print (ps.stem(w))
print(w)
print(sent_tokenize(text_file))
print ("
 tokenization.")
print(word_tokenize(text_file))
print ("
 part of speech tagging.")
print (pos_tag(words))   `

" i want to show the result in specific format but the output is ", line 322, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in position 257: invalid start byte"

Rahila T - Intel · Accepted Answer

Please try to read the data using encoding='unicode_escape'. For example:

text_file=open('preprocessing.txt',encoding ='unicode_escape').read()

This resolved the UnicodeDecodeError for me.

Else you can try as below:

text_file=open(r'preprocessing.txt',encoding ='unicode_escape').read()

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in position 257: invalid start byte

Tags:

python-3.x

umarsaleem

1 Answers

Rahila T - Intel

Recent Activity

Donate For Us

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in position 257: invalid start byte

Tags:

python-3.x

umarsaleem

1 Answers

Rahila T - Intel

Related questions

Recent Activity

Donate For Us