Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can tf.keras.utils.get_file(), be used to load local zip files?

I have zip file containing 4 image folders. The tutorial I followed on Google Colab uses a similar zip file but the file is hosted online and the link is given as the value of origin parameter which is necessary.I uploaded my zip file to Google Drive and can access it in Colab. Is it possible to load a local file using get_file()?

like image 505
VishnuVS Avatar asked Feb 11 '20 09:02

VishnuVS


3 Answers

Yes it is useful for unzip the file. For example:-

dataset_url='https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz'
data_dir=keras.utils.get_file('folder_name',origin=dataset_url,cache_dir='.', untar=True)

where cache_dir-> tells you where it will be stored and untar-> is that you want unzip that file.

like image 169
Anubhav Avatar answered Nov 12 '22 02:11

Anubhav


I recently ran into this myself. After not finding answers, I had to put on the old thinking cap, and solved it. So in the documentation for tf.keras.utils.get_file() it states the first two arguments are mandatory, the rest can default per internals. These are the FILENAME for reference and naming in the cache, and ORIGIN which must be a URL from where the image/data is obtained.

myFile = sys.args[1]  # just for example...
fullPath = os.path.abspath("./" + myFile)  # or similar, depending on your scenario
data_for_processing = keras.utils.get_file(myFile, 'file://'+fullPath)

file:// is a URL for a local file trick.

like image 27
fotonix Avatar answered Nov 12 '22 02:11

fotonix


If you have mounted your gdrive and can access your files stored in drive through colab, you can access the files using the path '/gdrive/My Drive/your_file'. For me, I needed to unzip the file, so I used

import zipfile
with zipfile.ZipFile(your_file, 'r') as zip_ref:
    zip_ref.extractall(directory_to_extract_to)

I used '/content' as the directory_to_extract_to.

Then you can access the data the usual way.

base_dir = '/content/my_folder'    

train_generator = datagen.flow_from_directory(
base_dir,
target_size=(IMAGE_SIZE, IMAGE_SIZE),
batch_size=BATCH_SIZE, 
subset='training')
like image 5
VishnuVS Avatar answered Nov 12 '22 02:11

VishnuVS