I am facing some problem for accessing nltk data. I have tried nltk.download(). The gui page has come with HTTP Error 403: Forbidden error. I have also try to install from command line which is provided here.
python -m nltk.downloader all
and get this error.
C:\Python36\lib\runpy.py:125: RuntimeWarning: 'nltk.downloader' found in sys.modules after import of package 'nltk', but prior to execution of 'nltk.downloader'; this may result in unpredictable behaviour warn(RuntimeWarning(msg)) [nltk_data] Error loading all: HTTP Error 403: Forbidden.
I also go through How do I download NLTK data? and Failed loading english.pickle with nltk.data.load.
The problem is coming from the nltk download server. If you look at the gui's config, it's pointing to this link
https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml
If you access this link in the browser, you get this as a message :
Error 403 Forbidden.
Forbidden.
Guru Mediation:
Details: cache-lcy1125-LCY 1501134862 2002107460
Varnish cache server
So, I was going to file an issue on github, but someone else already did that here : https://github.com/nltk/nltk/issues/1791
A workaround was suggested here: https://github.com/nltk/nltk/issues/1787.
Based on the discussion on github:
It seems like the Github is down/blocking access to the raw content on the repo.
The suggested workaround is to manually download as follows:
PATH_TO_NLTK_DATA=/home/username/nltk_data/
wget https://github.com/nltk/nltk_data/archive/gh-pages.zip
unzip gh-pages.zip
mv nltk_data-gh-pages/ $PATH_TO_NLTK_DATA
People also suggested using an laternative index as follows:
python -m nltk.downloader -u https://pastebin.com/raw/D3TBY4Mj punkt
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With