I want to download the sign language dataset from Kaggle to my Colab.
So far I always used wget and the specific zip file link, for example:
!wget --no-check-certificate \
https://storage.googleapis.com/laurencemoroney-blog.appspot.com/rps.zip \
-O /tmp/rps.zip
However, when I right-click the download button at Kaggle and select copy link to get the path copied to my clipboard and I output it I get:
https://www.kaggle.com/datamunge/sign-language-mnist/download
When I use this link in my browser I am asked to download it. I can see that the filename is 3258_5337_bundle_archive.zip
So I tried:
!wget --no-check-certificate \
https://www.kaggle.com/datamunge/sign-language-mnist/download3258_5337_bundle_archive.zip \
-O /tmp/kds.zip
and also tried:
!wget --no-check-certificate \
https://www.kaggle.com/datamunge/sign-language-mnist/download3258_5337_bundle_archive.zip \
-O /tmp/kds.zip
I get as output:
So it does not work. File coudln't be found or the returned zip archive is not 101mb large, but just a few kb. Also when trying to unzip it, it does not work.
How can I download this file into my colab (directly with wget?)?
Kaggle recommends using their own API instead of wget or rsync.
First, make an API token for Kaggle. On Kaggle's website go to "My Account", Scroll to API section and click on "Create New API Token" - It will download kaggle.json file on your machine.
Then run the following in Google Colab:
from google.colab import files
files.upload() # Browse for the kaggle.json file that you downloaded
# Make directory named kaggle, copy kaggle.json file there, and change the permissions of the file.
! mkdir ~/.kaggle
! cp kaggle.json ~/.kaggle/
! chmod 600 ~/.kaggle/kaggle.json
# You can check if everything's okay by running this command.
! kaggle datasets list
# Download and unzip sign-language-mnist dataset into '/usr/local'
! kaggle datasets download -d datamunge/sign-language-mnist --path '/usr/local' --unzip
Used info from here: https://www.kaggle.com/general/74235
This is the simplest way I came up to do it (if you participate in competition just change datasets to competitions):
import os
os.environ['KAGGLE_USERNAME'] = "xxxx"
os.environ['KAGGLE_KEY'] = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
!kaggle datasets download -d iarunava/happy-house-dataset
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With