Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Delete all files in 'folder' or with prefix in Google Cloud Bucket from Java

I know the idea of 'folders' is sort of non existent or different in Google Cloud Storage, but I need a way to delete all objects in a 'folder' or with a given prefix from Java.

The GcsService has a delete function, but as far as I can tell it only takes 1 GscFilename object and does not honor wildcards (i.e., "folderName/**" did not work).

Any tips?

like image 295
shieldstroy Avatar asked Oct 13 '14 22:10

shieldstroy


3 Answers

The API only supports deleting a single object at a time. You can only request many deletions using many HTTP requests or by batching many delete requests. There is no API call to delete multiple objects using wildcards or the like. In order to delete all of the objects with a certain prefix, you'd need to list the objects, then make a delete call for each object that matches the pattern.

The command-line utility, gsutil, does exactly that when you ask it to delete the path "gs://bucket/dir/**. It fetches a list of objects matching that pattern, then it makes a delete call for each of them.

If you need a quick solution, you could always have your Java program exec gsutil.

Here is the code that corresponds to the above answer in case anyone else wants to use it:

public void deleteFolder(String bucket, String folderName) throws CoultNotDeleteFile {
  try
  {
    ListResult list = gcsService.list(bucket, new ListOptions.Builder().setPrefix(folderName).setRecursive(true).build());

    while(list.hasNext())
    {
      ListItem item = list.next();
      gcsService.delete(new GcsFilename(file.getBucket(), item.getName()));
    }
  }
  catch (IOException e)
  {
    //Error handling
  }
}
like image 67
Brandon Yarbrough Avatar answered Oct 29 '22 14:10

Brandon Yarbrough


Extremely late to the party, but here's for current google searches. We can delete multiple blobs efficiently by leveraging com.google.cloud.storage.StorageBatch.

Like so:

public static void rmdir(Storage storage, String bucket, String dir) {
    StorageBatch batch = storage.batch();
    Page<Blob> blobs = storage.list(bucket, Storage.BlobListOption.currentDirectory(),
            Storage.BlobListOption.prefix(dir));
    for(Blob blob : blobs.iterateAll()) {
        batch.delete(blob.getBlobId());
    }
    batch.submit();
}

This should run MUCH faster than deleting one by one when your bucket/folder contains a non trivial amount of items.

Edit since this is getting a little attention, I'll demo error handling:

public static boolean rmdir(Storage storage, String bucket, String dir) {
    List<StorageBatchResult<Boolean>> results = new ArrayList<>();
    StorageBatch batch = storage.batch();
    try {
        Page<Blob> blobs = storage.list(bucket, Storage.BlobListOption.currentDirectory(),
            Storage.BlobListOption.prefix(dir));
        for(Blob blob : blobs.iterateAll()) {
            results.add(batch.delete(blob.getBlobId()));
        }
    } finally {
        batch.submit();
        return results.stream().allMatch(r -> r != null && r.get());
    }
}

This method will: Delete every blob in the given folder of the given bucket returning true if so. The method will return false otherwise. One can look into the return method of batch.delete() for a better understanding and error proofing.

To ensure ALL items are deleted, you could call this like:

boolean success = false
while(!success)) {
    success = rmdir(storage, bucket, dir);
}
like image 11
MeetTitan Avatar answered Oct 29 '22 14:10

MeetTitan


I realise this is an old question, but I just stumbled upon the same issue and found a different way to resolve it.

The Storage class in the Google Cloud Java Client for Storage includes a method to list the blobs in a bucket, which can also accept an option to set a prefix to filter results to blobs whose names begin with the prefix.

For example, deleting all the files with a given prefix from a bucket can be achieved like this:

Storage storage = StorageOptions.getDefaultInstance().getService();
Iterable<Blob> blobs = storage.list("bucket_name", Storage.BlobListOption.prefix("prefix")).iterateAll();
for (Blob blob : blobs) {
    blob.delete(Blob.BlobSourceOption.generationMatch());
}
like image 9
dilico Avatar answered Oct 29 '22 14:10

dilico