I want to write a python
script that downloads a public dataset from Kaggle.com.
The Kaggle API is written in python, but almost all of the documentation and resources that I can find are on how to use the API in command line, and very little on how to use the kaggle
library within python
.
Some users seem to know how to do this, see for example several answers to this question, but the hints are not enough to resolve my specific issue.
Namely, I have a script that looks like this:
from kaggle.api.kaggle_api_extended import KaggleApi
api = KaggleApi('content of my json metadata file')
file = api.datasets_download_file(
owner_slug='the-owner-slug',
dataset_slug='the-dataset-slug',
file_name='the-file-name.csv',
)
I have come up with this by looking at the method's signature:api.datasets_download_file(owner_slug, dataset_slug, file_name, **kwargs)
I get the following error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa1 in position 12: invalid start byte
Beyond the solution to this specific problem, I would be really happy to know how to go about troubleshooting errors with the Kaggle library, other than going through the code itself. In fact, perhaps the issue has nothing to do with utf-encoding, but I don't know how to figure this out. What if it is just that the filename is wrong, or something as silly as this?
The csv
file is nothing special: three columns, first is timestamp, the other two are integers.
Installation. Ensure you have Python and the package manager pip installed. Run the following command to access the Kaggle API using the command line: pip install kaggle (You may need to do pip install --user kaggle on Mac/Linux. This is recommended if problems come up during the installation process.)
API credentials To use the Kaggle API, sign up for a Kaggle account at https://www.kaggle.com. Then go to the 'Account' tab of your user profile ( https://www.kaggle.com/<username>/account ) and select 'Create API Token'. This will trigger the download of kaggle. json , a file containing your API credentials.
Step 1: Visit the Kaggle website and Select the Dataset tab. Step 2: Select any Dataset and Click on the Download. Step 3: The downloaded file will be in Zip form, Unzip it. Step 4: Upload to Your Jupyter Notebook.
I published a blog post that explains most of the common use cases of competition, datasets and kernel interactions.
Here are the steps involved in using the Kaggle API from Python.
Setting up API Key
Go to your Kaggle account Tab https://www.kaggle.com/<username>/account
and click ‘Create API Token’. A file named kaggle.json will be downloaded. Move this file in to ~/.kaggle/ folder in Mac and Linux or to C:\Users\.kaggle\ on windows.
Alternatively, you can populate KAGGLE_USERNAME and KAGGLE_KEY environment variables with values from kaggle.json to get the api to authenticate.
Authenticating With API Server
from kaggle.api.kaggle_api_extended import KaggleApi
api = KaggleApi()
api.authenticate()
Downloading Datasets
# Download all files of a dataset
# Signature: dataset_download_files(dataset, path=None, force=False, quiet=True, unzip=False)
api.dataset_download_files('avenn98/world-of-warcraft-demographics')
# downoad single file
#Signature: dataset_download_file(dataset, file_name, path=None, force=False, quiet=True)
api.dataset_download_file('avenn98/world-of-warcraft-demographics','WoW Demographics.csv')
Downloading Competition Files
# Download all files for a competition
# Signature: competition_download_files(competition, path=None, force=False, quiet=True)
api.competition_download_files('titanic')
# Download single file for a competition
# Signature: competition_download_file(competition, file_name, path=None, force=False, quiet=False)
api.competition_download_file('titanic','gender_submission.csv')
Submitting to competitions
# Signature: competition_submit(file_name, message, competition,quiet=False)
api.competition_submit('gender_submission.csv','API Submission','titanic')
Retrieving Leader Board
# Signature: competition_view_leaderboard(id, **kwargs)
leaderboard = api.competition_view_leaderboard('titanic')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With