Code:
import nltk
eng_lish= open("C:/Users/Nouros/Desktop/Thesis/english.csv","rb", encoding='utf8').read()
bang_lish= open("C:/Users/Nouros/Desktop/Thesis/banglish.csv","rb", encoding='utf8').read()
Problem:
Traceback (most recent call last):
File "C:/Users/Nouros/Desktop/Thesis/nltk_run_copy.py", line 3, in <module>
eng_lish= open("C:/Users/Nouros/Desktop/Thesis/english.csv","rb",encoding="utf-8")
ValueError: binary mode doesn't take an encoding argument
you're reading csv
files, which are text files. So you need encoding but not binary mode.
So you should not use rb
to open them (it is advised to do so when using csv
module in Python 2, but it's irrelevant in other contexts).
Just use plain text mode:
open("C:/Users/Nouros/Desktop/Thesis/english.csv","r", encoding='utf8').read()
Me I would prefer using csv
module, to avoid manual split of lines & cols:
import csv
with open(r"C:\Users\Nouros\Desktop\Thesis\english.csv","r", encoding='utf8') as f:
cr = csv.reader(f,delimiter=",") # , is default
rows = list(cr) # create a list of rows for instance
(note that csv module recommends using newline=""
when opening files for reading in Python 3, but the issues are actually when writing files)
Binary mode by definition does not require an encoding because you are reading individual bytes. Encoding is only relevant when you want to read text. Different encodings treat the binary data differently. For some encodings a single byte represents a character. For others, a character may be multiple bytes. This is the whole purpose of encoding: to represent text data as characters.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With