Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

UnicodeDecodeError: 'charmap' codec can't decode byte 0x90 in position 100: character maps to <undefined>

I have two files both at the same directory:

  1. http://nlp.lsi.upc.edu/awn/AWNDatabaseManagement.py.gz

  2. the xml database of Arabic WordNet (http://nlp.lsi.upc.edu/awn/get_bd.php) upc_db.xml

When i try to run the .py file to give me the error in the image i am trying to check the .py file is working so i can import it as WordNet for arabic words

Can you help me through it?

Thanks

image for error

like image 800
Abdelrahman Yasser Avatar asked Oct 17 '25 06:10

Abdelrahman Yasser


2 Answers

To read any binary file/db use the encoding="utf-8" while opening the file/db. UTF-8 is capable of encoding all 1,112,064 valid character code points in Unicode using one to four one-byte code units. So, simple is the best.

like image 153
Muhammad Afzaal Avatar answered Oct 18 '25 19:10

Muhammad Afzaal


To read the above binary file, use

ent = open(ent, 'rb')

instead of,

ent = open(ent)
like image 20
Abdelrahman Yasser Avatar answered Oct 18 '25 19:10

Abdelrahman Yasser