I was trying to download/update python <code>nltk</code> packages on a computing server and it returned this <code>[Errno 122] Disk quota exceeded:</code> error. Specifically: <pre class="prettyprint"><code>[nltk_data] Downloading package stop words to /home/sh2264/nltk_data... [nltk_data] Error downloading u'stopwords' from [nltk_data] <https://raw.githubusercontent.com/nltk/nltk_data/gh- [nltk_data] pages/packages/corpora/stopwords.zip>: [Errno 122] [nltk_data] Disk quota exceeded: [nltk_data] u'/home/sh2264/nltk_data/corpora/stopwords.zip False </code></pre> How could I change the entire path for <code>nltk</code> packages, and what other changes should I make to ensure errorless loading of <code>nltk</code>?

This can be configured both by command-line (<code>nltk.download(..., download_dir=)</code> or by GUI. Bizarrely nltk seems to totally ignore its own environment variable <code>NLTK_DATA</code> and default its download directories to a standard set of five paths, regardless whether <code>NLTK_DATA</code> is defined and where it points, and regardless whether nltk's five default dirs even exist on the machine or architecture(!). Some of that is documented in Installing NLTK Data, although it's incomplete and kinda buried; reproduced below with much clearer formatting: <blockquote> <h3>Command line installation</h3> The downloader will search for an existing <code>nltk_data</code> directory to install NLTK data. If one does not exist it will attempt to create one in a central location (when using an administrator account) or otherwise in the user’s filespace. If necessary, run the download command from an administrator account, or using sudo. The recommended system location is: <ul> <li> <code>C:\nltk_data</code> (Windows) ;</li> <li> <code>/usr/local/share/nltk_data</code> (Mac) and</li> <li> <code>/usr/share/nltk_data</code> (Unix).</li> </ul> You can use the -d flag to specify a different location (but if you do this, be sure to set the NLTK_DATA environment variable accordingly). <ul> <li>Run the command <code>python -m nltk.downloader all</code></li> <li>To ensure central installation, run the command: <code>sudo python -m nltk.downloader -d /usr/local/share/nltk_data all</code></li> <li>But really they should say: <code>sudo python -m nltk.downloader -d $NLTK_DATA all</code></li> </ul> </blockquote> Now as to what recommended path NLTK_DATA should use, nltk doesn't really give any proper guidance, but it should be a generic standalone path not under any install tree (so not under <code><python-install-directory>/lib/site-packages</code>) or any user dir. Hence, <code>/usr/local/share</code>, <code>/opt/share</code> or similar. On MacOS 10.7+, <code>/usr</code> and thus <code>/usr/local/</code> these days are hidden by default, so <code>/opt/share</code> may well be a better choice. Or do <code>chflags nohidden /usr/local/share</code>.

Change nltk.download() path directory from default ~/ntlk_data

Tags:

I was trying to download/update python nltk packages on a computing server and it returned this [Errno 122] Disk quota exceeded: error.

Specifically:

[nltk_data] Downloading package stop words to /home/sh2264/nltk_data... [nltk_data] Error downloading u'stopwords' from [nltk_data] <https://raw.githubusercontent.com/nltk/nltk_data/gh- [nltk_data] pages/packages/corpora/stopwords.zip>: [Errno 122] [nltk_data] Disk quota exceeded: [nltk_data] u'/home/sh2264/nltk_data/corpora/stopwords.zip False

How could I change the entire path for nltk packages, and what other changes should I make to ensure errorless loading of nltk?

907

asked Jul 01 '17 04:07

shenglih

2 Answers

This can be configured both by command-line (nltk.download(..., download_dir=) or by GUI. Bizarrely nltk seems to totally ignore its own environment variable NLTK_DATA and default its download directories to a standard set of five paths, regardless whether NLTK_DATA is defined and where it points, and regardless whether nltk's five default dirs even exist on the machine or architecture(!). Some of that is documented in Installing NLTK Data, although it's incomplete and kinda buried; reproduced below with much clearer formatting:

Command line installation

The downloader will search for an existing nltk_data directory to install NLTK data. If one does not exist it will attempt to create one in a central location (when using an administrator account) or otherwise in the user’s filespace. If necessary, run the download command from an administrator account, or using sudo. The recommended system location is:

C:\nltk_data (Windows) ;

/usr/local/share/nltk_data (Mac) and

/usr/share/nltk_data (Unix).

You can use the -d flag to specify a different location (but if you do this, be sure to set the NLTK_DATA environment variable accordingly).

Run the command python -m nltk.downloader all

To ensure central installation, run the command: sudo python -m nltk.downloader -d /usr/local/share/nltk_data all

But really they should say: sudo python -m nltk.downloader -d $NLTK_DATA all

Now as to what recommended path NLTK_DATA should use, nltk doesn't really give any proper guidance, but it should be a generic standalone path not under any install tree (so not under <python-install-directory>/lib/site-packages) or any user dir. Hence, /usr/local/share, /opt/share or similar. On MacOS 10.7+, /usr and thus /usr/local/ these days are hidden by default, so /opt/share may well be a better choice. Or do chflags nohidden /usr/local/share.

108

answered Oct 05 '22 23:10

smci

According to the documentation:

By default, packages are installed in either a system-wide directory (if Python has sufficient access to write to it); or in the current user’s home directory. However, the download_dir argument may be used to specify a different installation target, if desired.

To specify the download directory, use for example:

nltk.download('treebank', download_dir='/mnt/data/treebank')

answered Oct 06 '22 01:10

Ortomala Lokni

Related questions
                            
                                ScrollView child layout must be applied through the contentContainerStyle prop
                            
                                Nested Resources w/ Rails 5.1 form_with
                            
                                iPhone X keyboard appear showing extra space
                            
                                What exactly is namespacing of modules in vuex
                            
                                Difference between module and component in Dagger2
                            
                                Configure time zone to mysql docker container
                            
                                MongoDB on with Docker "failed to connect to server [localhost:27017] on first connect "
                            
                                How to know the labels assigned by astype('category').cat.codes?
                            
                                How to implements Lombok @Builder for Abstract class
                            
                                Convert array to array of objects with reduce
                            
                                Angular 7 routerLink directive warning 'Navigation triggered outside Angular zone'
                            
                                How can I position an icon over an image?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With