Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the easiest way of deleting all my blobstore data?

What is your best way to remove all of the blob from blobstore? I'm using Python.

I have quite a lot of blobs and I'd like to delete them all. I'm currently doing the following:

class deleteBlobs(webapp.RequestHandler): 
    def get(self): 
        all = blobstore.BlobInfo.all(); 
        more = (all.count()>0) 
        blobstore.delete(all); 
        if more: 
            taskqueue.add(url='/deleteBlobs',method='GET'); 

Which seems to be using tons of CPU and (as far as I can tell) doing nothing useful.

like image 266
defuz Avatar asked Dec 21 '22 09:12

defuz


2 Answers

I use this approach:

import datetime
import logging
import re
import urllib

from google.appengine.ext import blobstore
from google.appengine.ext import db
from google.appengine.ext import webapp

from google.appengine.ext.webapp import blobstore_handlers
from google.appengine.ext.webapp import util
from google.appengine.ext.webapp import template

from google.appengine.api import taskqueue
from google.appengine.api import users


class IndexHandler(webapp.RequestHandler):
    def get(self):
        self.response.headers['Content-Type'] = 'text/plain'
        self.response.out.write('Hello. Blobstore is being purged.\n\n')
        try:
            query = blobstore.BlobInfo.all()

            index = 0

            to_delete = []
            blobs = query.fetch(400)
            if len(blobs) > 0:
                for blob in blobs:
                    blob.delete()
                    index += 1

            hour = datetime.datetime.now().time().hour
            minute = datetime.datetime.now().time().minute
            second = datetime.datetime.now().time().second
            self.response.out.write(str(index) + ' items deleted at ' + str(hour) + ':' + str(minute) + ':' + str(second))
            if index == 400:
                self.redirect("/purge")

        except Exception, e:
            self.response.out.write('Error is: ' + repr(e) + '\n')
            pass

APP = webapp.WSGIApplication(
    [
        ('/purge', IndexHandler),
    ],
    debug=True)

def main():
    util.run_wsgi_app(APP)


if __name__ == '__main__':
    main()

My experience is that more than 400 blobs at once will fail, so I let it reload for every 400. I tried blobstore.delete(query.fetch(400)), but I think there's a bug right now. Nothing happened at all, and nothing was deleted.

like image 134
beruic Avatar answered Dec 24 '22 03:12

beruic


You're passing the query object to the delete method, which will iterate over it fetching it in batches, then submit a single enormous delete. This is inefficient because it requires multiple fetches, and won't work if you have more results than you can fetch in the available time or with the available memory. The task will either complete once and not require chaining at all, or more likely, fail repeatedly, since it can't fetch every blob at once.

Also, calling count executes the query just to determine the count, which is a waste of time since you're going to try fetching the results anyway.

Instead, you should fetch results in batches using fetch, and delete each batch. Use cursors to set the next batch and avoid the need for the query to iterate over all the 'tombstoned' records before finding the first live one, and ideally, delete multiple batches per task, using a timer to determine when you should stop and chain the next task.

like image 25
Nick Johnson Avatar answered Dec 24 '22 01:12

Nick Johnson