Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Not enough disk space when loading dataset with TFDS

I was implementing a DCGAN application based on the lsun-bedroom dataset. I was planning to utilize tfds, since lsun was on its catalog. Since the total dataset contains 42.7 GB of images, I only wanted to load a portion(10%) of the full data and used the following code to load the data according to the manual. Unfortunately, the same error informing not enough disk space occurred. Would there be a possible solution with tfds or should I use another API to load the data?

tfds.load('lsun/bedroom',split='train[10%:]')

Not enough disk space. Needed: 42.77 GiB (download: 42.77 GiB, generated: Unknown size)

I was testing on Google Colab

like image 746
krenerd Avatar asked Feb 19 '26 04:02

krenerd


2 Answers

TFDS download the dataset from the original author website. As the datasets are often published as monolithic archive (e.g lsun.zip), it is unfortunately impossible for TFDS to only download/install part of the dataset.

The split argument only filter the dataset after it has been fully generated. Note: You can see the download size of the datasets in the catalog: https://www.tensorflow.org/datasets/catalog/overview

like image 128
Conchylicultor Avatar answered Feb 21 '26 13:02

Conchylicultor


To me, there seems to be some kind of issue or, at least, a misunderstanding about the variable 'split' of tfds.load(). 'split' seems to be intended to load a given portion of the dataset, once the whole dataset has been downloaded.

I got the same error message when downloading the dataset called "librispeech". Any setting of the variable 'split' seems to be intended to download the whole dataset, which is too big for my disk.

I managed to download the much smaller "mnist" dataset, but I found both the train and test splits downloaded by setting 'split' to 'test'.

like image 22
Phys Avatar answered Feb 21 '26 13:02

Phys



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!