Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there any Command to Download data from particular folder from Kaggle Competition using kaggle API

I'm trying to download data from Kaggle Competition state-farm-distracted-driver-detection

The dataset has following directory structure

|-driver_imgs_list.csv
|-sample-submission.csv
|imgs
|   |test
|   |train
|       |c0
|       |c1
|       |c2
|          |-img_100029.jpg
|          |-img_100108.jpg

I want only imgs/train/c2 folder to download. I know how to download full dataset and particular files, But I'm unable to figure out How to download a particular folder using the API

Initially I have tried using Kaggle CLI API, Using that I'm able to download particular image as follows

kaggle competitions download state-farm-distracted-driver-detection -f imgs/train/c2/img_100029.jpg

But when I tried the following command to download the c2 folder, I'm getting error like File not found

kaggle competitions download state-farm-distracted-driver-detection -f imgs/train/c2
404 - Not Found

Is there any Command To download a Particular folder from competition in kaggle api??

As another trial I used Kaggle API from python to download that folder

My idea is, There is a file named "driver_imgs_list.csv" which contains class names like (c0,c1,c2..) along with their corresponding image files. As I want to download c2 class folder, I stored the c2 class image files in an array using pandas. Then I tried to download the each file using a for loop as follows

from kaggle.api.kaggle_api_extended import KaggleApi
import pandas as pd
api = KaggleApi()
api.authenticate()

data = pd.read_csv("driver_imgs_list.csv")

images = data[data["classname"] == "c2"]["img"]   #It will give me all image file names under c2 folder

imgArray=[]
for i in images:
   imgArray.append(i)

for i in imgArray:
   file = "imgs/train/c2/{i}".format(i=i)
   api.competition_download_file('state-farm-distracted-driver-detection',file,quiet = False,force = True)

Even By using the above Code I'm getting the same error as file not found as follows

HTTP response body: b'{"code":404,"message":"NotFound"}'

How can I Download a Particular folder either using Kaggle CLI API or from python

like image 621
harshini gulipalli Avatar asked Mar 09 '20 19:03

harshini gulipalli


People also ask

How do I download part of a dataset from kaggle?

Follow the steps below to download and use kaggle datasets in Google Colab: Go to your kaggle account, Scroll to API section and Click Expire API Token to remove previous tokens. Click on Create New API Token - It will download kaggle.

How to download all files for a Kaggle dataset?

Each dataset can have various files. For example, uciml/iris dataset is provided in CSV format ( Iris.csv) and in SQLite database file format ( database.sqlite ). Kaggle API client provides dataset_download_files method which allows to download all files in ZIP format for a dataset.

How to use Kaggle API via command line?

You can interact using Kaggle API to use its resources without login into its website now and then, below is the list of interactions available using API via command line statement: 1. Searching for the dataset: Using CLI arguments you can search for any keyword to find the corresponding datasets.

How do I install Kaggle on Windows?

In the windows system, go to the root directory, then to .kaggle folder, and copy the downloaded file to this directory. If you are using the Kaggle API directly, where you keep the token doesn’t matter, so long as you can provide your credentials at runtime.

How do I download Kaggle notebooks from API?

Kaggle API provides the advantage to download any published notebooks from Kaggle to your local machine. Commands to download the files associated with the notebooks using CLI: kaggle kernels pull -k [KERNEL] -p /path/to/download -m 6.


Video Answer


1 Answers

Could it be that the error message is true, and that the file is truly not in the dataset's folder?

Another idea is that it has to do with the order (?), because I was able to get your code running when using .sort_values() on the image names' Series:

data = pd.read_csv('driver_imgs_list.csv')
filenames = 'imgs/train/c2/' + data[data['classname'] == 'c2']['img'].sort_values()

for filename in filenames:
    api.competition_download_file('state-farm-distracted-driver-detection', filename)

However, I only let it run for like 10 files. So again it could be that there is a mismatch between the files in the CSV file and the files actually available in the dataset.

like image 62
jorijnsmit Avatar answered Oct 07 '22 18:10

jorijnsmit