Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Trying to pull text data with the keras get_file function

Tags:

python

text

keras

I am currently looking at a keras program that tries to generate text data using a CNN. In the code provided to me by my professor, I use the function:

path = get_file('input.txt', origin='https://www.dropbox.com/s/2z0zdn54cqu3cqj/input.txt?dl=0')

This is imported using the function:

from keras.utils.data_utils import get_file

Now the original text corpus provided to us was working just fine. However, whenever I changed the file origin inside the get_file function, and renamed the file name to be saved as, I started getting HTML code. Is there a particular reason for this? For example, I pull HTML code, even though I used https://github.com/nlp-compromise/nlp-corpus/blob/master/poe/man_of_crowd.txt and https://raw.githubusercontent.com/nlp-compromise/nlp-corpus/master/poe/man_of_crowd.txt(The second link is the raw file).

like image 659
SDG Avatar asked Jan 28 '26 23:01

SDG


1 Answers

For the first link, https://github.com/nlp-compromise/nlp-corpus/blob/master/poe/man_of_crowd.txt, even though it appears that it resolves to a text file resource, it's a HTML page on GitHub, which is why you get HTML code when downloading from this link.

As for the second raw link, https://raw.githubusercontent.com/nlp-compromise/nlp-corpus/master/poe/man_of_crowd.txt which actually points to the text file resource, when you download the file using:

>> from keras.utils.data_utils import get_file
>> path = get_file('man_of_crowd.txt', 
                'https://raw.githubusercontent.com/nlp-compromise/nlp-corpus/master/poe/man_of_crowd.txt')

Downloading data from https://raw.githubusercontent.com/nlp-compromise/nlp-corpus/master/poe/man_of_crowd.txt
16384/20391 [=======================>......] - ETA: 0s

It actually downloads as a text file with path:

>> print(path)
/home/<username>/.keras/datasets/man_of_crowd.txt

The keras util function really uses a six wrapper for urllib.request. The code for get_file method can be found at their GitHub repository, here.

like image 78
shubhamsingh Avatar answered Jan 31 '26 12:01

shubhamsingh



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!