I am trying to run the following command
import nltk
nltk.download('all')
But I am getting this error
Traceback (most recent call last):
File "./update.py", line 3, in <module>
nltk.download('all')
File "/usr/lib/python3.6/site-packages/nltk/downloader.py", line 664, in download
for msg in self.incr_download(info_or_id, download_dir, force):
File "/usr/lib/python3.6/site-packages/nltk/downloader.py", line 534, in incr_download
try: info = self._info_or_id(info_or_id)
File "/usr/lib/python3.6/site-packages/nltk/downloader.py", line 508, in _info_or_id
return self.info(info_or_id)
File "/usr/lib/python3.6/site-packages/nltk/downloader.py", line 875, in info
self._update_index()
File "/usr/lib/python3.6/site-packages/nltk/downloader.py", line 825, in _update_index
ElementTree.parse(compat.urlopen(self._url)).getroot())
File "/usr/lib/python3.6/xml/etree/ElementTree.py", line 1196, in parse
tree.parse(source, parser)
File "/usr/lib/python3.6/xml/etree/ElementTree.py", line 597, in parse
self._root = parser._parse_whole(source)
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 23, column 143
I am new to python, so I am not really sure what should I do. I looked into the source module reported above and noticed that it is trying to download the xml file. So i ran the below command and did not give me any error.
compat.urlopen('https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml')
So I presume there is no issue in the download, but in the parser. Can someone suggest how do I proceed from here?
punkt is the required package for tokenization. Hence you may download it using nltk download manager or download it programmatically using nltk. download('punkt') .
index.xml
had a typo. It is already patched. Just checked and nltk.download('all')
works fine!
see: nltk/nltk_data#70
The problem is with the XML that NLTK has returned.
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 23, column 143
At 23:143 we see the problem, a missing '=':
... unzip="1" unzipped_size"1917" url="https...
NTLK will surely fix this soon, until then I'm not sure what the best response is.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With