I'd like to create some ridiculously-easy-to-use pip packages for loading common machine-learning datasets in Python. (Yes, some stuff already exists, but I want it to be even simpler.)
What I'd like to achieve is this:
pip install dataset
wget http://mydata.com/data.tar.gz
. Note that the data does not reside in the python package itself, but is downloaded from somewhere else.This question is about bullets 2 and 3. Is there a way to do this with setuptools?
The command to install any external Python package is pip install. The pip version can be checked using pip --version or pip -V. If the path shows Python 2.7, then make sure you have Python version 3 installed and then run pip as pip3.
Follow the below steps to install the Setuptools package on Linux using the setup.py file: Step 1: Download the latest source package of Setuptools for Python3 from the website. Step 3: Go to the setuptools-60.5. 0 folder and enter the following command to install the package.
you generally don't need to worry about setuptools - either it isn't really needed, or the high-level installers will make sure you have a recent enough version installed; in this last case, as long as the operations they have to do are simple enough generally they won't fail.
setuptools in Python setuptools is a library which is built on top of distutils that has been deprecated (and up for removal as of Python 3.12).
As alluded to by Kevin, Python package installs should be completely reproducible, and any potential external-download issues should be pushed to runtime. This therefore shouldn't be handled with setuptools.
Instead, to avoid burdening the user, consider downloading the data in a lazy way, upon load. Example:
def download_data(url='http://...'):
# Download; extract data to disk.
# Raise an exception if the link is bad, or we can't connect, etc.
def load_data():
if not os.path.exists(DATA_DIR):
download_data()
data = read_data_from_disk(DATA_DIR)
return data
We could then describe download_data
in the docs, but the majority of users would never need to bother with it. This is somewhat similar to the behavior in the imageio
module with respect to downloading necessary decoders at runtime, rather than making the user manage the external downloads themselves.
Python package installation states that it should never execute Python code in order to install Python packages. This means that you may not be able to download stuff during the installation process.
If you want to download some additional data, do it after you install the package , for example when you import your package you could download this data and cache it somewhere in order not to download it at every new import.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With