I am running the following code on IDLE(Python) and I want to enter Arabic string and get the stemming for it but actually it doesn't work
>>> from nltk.stem.isri import ISRIStemmer
>>> st = ISRIStemmer()
>>> w= 'حركات'
>>> join = w.decode('Windows-1256')
>>> print st.stem(join).encode('Windows-1256').decode('utf-8')
The result of running it is the same text in w which is 'حركات' which is not the stem
But when do the following:
>>> print st.stem(u'اعلاميون')
The result succeeded and returns the stem which is 'علم'
Why passing some words to stem() function doesn't return the stem?
This code above won't work in Python 3 because we are trying to decode an object that is already decoded. So, there is no need to decode from UTF-8 anymore.
Here is the new code that should work just fine in Python 3.
import nltk
from nltk.stem.isri import ISRIStemmer
st = ISRIStemmer()
w= 'حركات'
print(st.stem(w))
Ok, I solved the problem by myself using the following:
w = 'حركات'
st.stem(w.decode('utf-8'))
and it gives the root correctly which is "حرك"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With