I'm trying to learn NLTK - Natural Language Toolkit written in Python and I want install a sample data set to run some examples. My web connection uses a proxy server, and I'm trying to specify the proxy address as follows: <pre class="prettyprint"><code>>>> nltk.set_proxy('http://proxy.example.com:3128' ('USERNAME', 'PASSWORD')) >>> nltk.download() </code></pre> But I get an error: <pre class="prettyprint"><code>Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'str' object is not callable </code></pre> I decided to set up a <code>ProxyBasicAuthHandler</code> before calling <code>nltk.download()</code>: <pre class="prettyprint"><code>import urllib2 auth_handler = urllib2.ProxyBasicAuthHandler(urllib2.HTTPPasswordMgrWithDefaultRealm()) auth_handler.add_password(realm=None, uri='http://proxy.example.com:3128/', user='USERNAME', passwd='PASSWORD') opener = urllib2.build_opener(auth_handler) urllib2.install_opener(opener) import nltk nltk.download() </code></pre> But now I get <code>HTTP Error 407 - Proxy Autentification Required</code>. The documentation says that if the proxy is set to <code>None</code> then this function will attempt to detect the system proxy. But it isn't working. How can I install a sample data set for NLTK?

There is an error with the website where you got those lines of code for your first attempt (I have seen that same error) The line in error is <pre class="prettyprint"><code>nltk.set_proxy('http://proxy.example.com:3128' ('USERNAME', 'PASSWORD')) </code></pre> You need a comma to separate the arguments. The correct line should be <pre class="prettyprint"><code>nltk.set_proxy('http://proxy.example.com:3128', ('USERNAME', 'PASSWORD')) </code></pre> This will work just fine.

I run NLTK 3.2.5 and python 3.6 under Windows 10 environment. I use this script : <pre class="prettyprint"><code>nltk.set_proxy('http://user:password@proxy.example.com:3128') nltk.download() </code></pre>

The options suggested above did not work for me. Here's what worked for me in my windows environment. Try removing the round braces . it works now ! <pre class="prettyprint"><code>nltk.set_proxy('http://proxy.example.com:3128', 'USERNAME', 'PASSWORD') </code></pre>

I run NLTK 3.0 and python 3.4 in windows environment..and proxy authentication runs well if i remove the branch.. so use this script <pre class="prettyprint"><code>nltk.set_proxy('http://proxy.example.com:3128', 'username', 'password') </code></pre>

NLTK: set proxy server

Tags:

python

nltk

proxy-server

I'm trying to learn NLTK - Natural Language Toolkit written in Python and I want install a sample data set to run some examples.

My web connection uses a proxy server, and I'm trying to specify the proxy address as follows:

>>> nltk.set_proxy('http://proxy.example.com:3128' ('USERNAME', 'PASSWORD'))
>>> nltk.download()

But I get an error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'str' object is not callable

I decided to set up a ProxyBasicAuthHandler before calling nltk.download():

import urllib2

auth_handler = urllib2.ProxyBasicAuthHandler(urllib2.HTTPPasswordMgrWithDefaultRealm())
auth_handler.add_password(realm=None, uri='http://proxy.example.com:3128/', user='USERNAME', passwd='PASSWORD')
opener = urllib2.build_opener(auth_handler)
urllib2.install_opener(opener)

import nltk
nltk.download()

But now I get HTTP Error 407 - Proxy Autentification Required.

The documentation says that if the proxy is set to None then this function will attempt to detect the system proxy. But it isn't working.

How can I install a sample data set for NLTK?

764

asked Dec 17 '12 05:12

ymn

7 Answers

There is an error with the website where you got those lines of code for your first attempt (I have seen that same error)

The line in error is

nltk.set_proxy('http://proxy.example.com:3128' ('USERNAME', 'PASSWORD'))

You need a comma to separate the arguments. The correct line should be

nltk.set_proxy('http://proxy.example.com:3128', ('USERNAME', 'PASSWORD'))

This will work just fine.

127

answered Oct 05 '22 21:10

demongolem

I run NLTK 3.2.5 and python 3.6 under Windows 10 environment. I use this script :

nltk.set_proxy('http://user:[email protected]:3128')
nltk.download()

answered Oct 05 '22 23:10

jcpg

I was too getting the same error but i got a perfectly working solution.You need to download the nltk_data MANUALLY and put it in usr/lib/nltk_data directory in linux and c:\nltk_data if you use windows .
Here are the steps you need to follow :
1.Download the nltk_data zip file from this Github link
https://github.com/nltk/nltk_data/tree/gh-pages .
2.Since data is in zip form you need to extract it .
3.Specially for ubuntu users , following command to navigate the filesystem in a handy way.
sudo nautilus it makes copy/paste process handy . Now you can copy to usr/share easily or create a folder easily .
4.Now if you are a linux user than create a folder named as nltk_data in usr/share and if you use windows than create the same in c:/ .
5.Now paste all content of nltk_data-gh-pages (which you just extracted ) in nltk_data folder you just created .
6. Now form nltk_data/packages folder copy all folder and paste it to nltk_data folder. Now you are done.

Since this is my first answer i might be not able to explain the process correctly . So if you have trouble going through these steps , please do comment .

answered Oct 05 '22 23:10

Ankit Maurya

The options suggested above did not work for me. Here's what worked for me in my windows environment. Try removing the round braces . it works now !

nltk.set_proxy('http://proxy.example.com:3128', 'USERNAME', 'PASSWORD')

answered Oct 05 '22 21:10

DACW

I run NLTK 3.0 and python 3.4 in windows environment..and proxy authentication runs well if i remove the branch.. so use this script

nltk.set_proxy('http://proxy.example.com:3128', 'username', 'password')

answered Oct 05 '22 21:10

diah_stis

If you want to manually install NLTK Corpus.

1) Go to http://www.nltk.org/nltk_data/ and download your desired NLTK Corpus file.

2) Now in a Python shell check the value of nltk.data.path

3) Choose one of the path that exists on your machine, and unzip the data files into the corpora sub directory inside.

4) Now you can import the data from nltk.corpos import stopwords

Reference: https://medium.com/@satorulogic/how-to-manually-download-a-nltk-corpus-f01569861da9

answered Oct 05 '22 22:10

SVK

Set the proxy of the system in bash also by changing proper environment variable.

Some of the proxy settings which I keep are:

http_proxy=http://127.0.0.1:3129/
ftp_proxy=http://127.0.0.1:3129/
all_proxy=socks://127.0.0.1:3129/
https_proxy=http://127.0.0.1:3129/

You can make the changes in environment variable permanent by editing your ~/.bashrc file. Sample edit:

export http_proxy=http://127.0.0.1:3129/

answered Oct 05 '22 21:10

Sibi

Related questions
                            
                                How to subtract rows of one pandas data frame from another?
                            
                                Pascal's Triangle for Python
                            
                                How to disable pylint no-self-use warning?
                            
                                Can we Zoom the browser window in python selenium webdriver?
                            
                                Plotly: How to display charts in Spyder?
                            
                                Is it possible to dump an enum in json without passing an encoder to json.dumps()?
                            
                                How to make a subquery in sqlalchemy
                            
                                'module' object has no attribute 'py' when running from cmd
                            
                                Flask SQLAlchemy filter by value OR another
                            
                                How to install TA-lib in google colab?
                            
                                How to create a file one directory up?
                            
                                What encoding do normal python strings use?
                            
                                python: APNs SSLError
                            
                                URL building with Flask and non-unique handler names
                            
                                Django-compressor: how to write to S3, read from CloudFront?
                            
                                match dates using python regular expressions
                            
                                Django load local json file
                            
                                Matplotlib plots not displaying in sublimetext
                            
                                Using flask extensions in flask blueprints
                            
                                Getting next line in a file

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

NLTK: set proxy server

Tags:

python

nltk

proxy-server

ymn

People also ask

7 Answers

demongolem

jcpg

Ankit Maurya

DACW

diah_stis

SVK

Sibi

Recent Activity

Donate For Us