Right now I use remote_api and appcfg.py download_data
to take a snapshot of my database every night. It takes a long time (6 hours) and is expensive. Without rolling my own change-based backup (I'd be too scared to do something like that), what's the best option for making sure my data is safe from failure?
PS: I recognize that Google's data is probably way safer than mine. But what if one day I accidentally write a program that deletes it all?
Google has released Firestore, a new version of Datastore with several improvements and additional features. Existing Datastore users can access these features by cheating a database using “Firestore in Datastore” mode.
Use the gcloud datastore export command to export all entities in your database. where bucket-name is the name of your Cloud Storage bucket and an optional prefix, for example, bucket-name /datastore-exports/export-name . You cannot re-use the same prefix for another export operation.
I think you've pretty much identified all of your choices.
download_data
, perhaps less frequently than once per night if it is prohibitively expensive.Option 3 is actually an interesting idea. You'd need a modification timestamp on all entities, and you wouldn't catch deleted entities, but otherwise it's very doable with remote_api and cursors.
Edit:
Here's a simple incremental downloader for use with remote_api. Again, the caveats are that it won't notice deleted entities, and it assumes all entities store the last modification time in a property named updated_at. Use it at your own peril.
import os
import hashlib
import gzip
from google.appengine.api import app_identity
from google.appengine.ext.db.metadata import Kind
from google.appengine.api.datastore import Query
from google.appengine.datastore.datastore_query import Cursor
INDEX = 'updated_at'
BATCH = 50
DEPTH = 3
path = ['backups', app_identity.get_application_id()]
for kind in Kind.all():
kind = kind.kind_name
if kind.startswith('__'):
continue
while True:
print 'Fetching %d %s entities' % (BATCH, kind)
path.extend([kind, 'cursor.txt'])
try:
cursor = open(os.path.join(*path)).read()
cursor = Cursor.from_websafe_string(cursor)
except IOError:
cursor = None
path.pop()
query = Query(kind, cursor=cursor)
query.Order(INDEX)
entities = query.Get(BATCH)
for entity in entities:
hash = hashlib.sha1(str(entity.key())).hexdigest()
for i in range(DEPTH):
path.append(hash[i])
try:
os.makedirs(os.path.join(*path))
except OSError:
pass
path.append('%s.xml.gz' % entity.key())
print 'Writing', os.path.join(*path)
file = gzip.open(os.path.join(*path), 'wb')
file.write(entity.ToXml())
file.close()
path = path[:-1-DEPTH]
if entities:
path.append('cursor.txt')
file = open(os.path.join(*path), 'w')
file.write(query.GetCursor().to_websafe_string())
file.close()
path.pop()
path.pop()
if len(entities) < BATCH:
break
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With