Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Google Datastore - Blob or Text

2 possible ways of persisting large strings in the Google Datastore are Text and Blob data types.

From a storage consumption perspective, which of the 2 is recommended? Same question from a protobuf serialization and deserialization perspective.

like image 318
Keyur Avatar asked Jun 04 '10 20:06

Keyur


People also ask

What type of database is Google Datastore?

Datastore is a highly scalable NoSQL database for your web and mobile applications.

What kind of data model is used by Datastore?

Datastore is a NoSQL document database built for automatic scaling, high performance, and ease of application development.

What is BLOB storage Google cloud?

The Blobstore API allows your application to serve data objects, called blobs, that are much larger than the size allowed for objects in the Datastore service. Blobs are useful for serving large files, such as video or image files, and for allowing users to upload large data files.


1 Answers

There is no significant performance difference between the two - just use whichever one best fits your data. BlobProperty should be used to store binary data (e.g., str objects) while TextProperty should be used to store any textual data (e.g., unicode or str objects). Note that if you store a str in a TextProperty, it must only contain ASCII bytes (less than hex 80 or decimal 128) (unlike BlobProperty).

Both of these properties are derived from UnindexedProperty as you can see in the source.

Here is a sample app which demonstrates that there is no difference in storage overhead for these ASCII or UTF-8 strings:

import struct

from google.appengine.ext import db, webapp
from google.appengine.ext.webapp.util import run_wsgi_app

class TestB(db.Model):
    v = db.BlobProperty(required=False)

class TestT(db.Model):
    v = db.TextProperty(required=False)

class MainPage(webapp.RequestHandler):
    def get(self):
        self.response.headers['Content-Type'] = 'text/plain'

        # try simple ASCII data and a bytestring with non-ASCII bytes
        ascii_str = ''.join([struct.pack('>B', i) for i in xrange(128)])
        arbitrary_str = ''.join([struct.pack('>2B', 0xC2, 0x80+i) for i in xrange(64)])
        u = unicode(arbitrary_str, 'utf-8')

        t = [TestT(v=ascii_str), TestT(v=ascii_str*1000), TestT(v=u*1000)]
        b = [TestB(v=ascii_str), TestB(v=ascii_str*1000), TestB(v=arbitrary_str*1000)]

        # demonstrate error cases
        try:
            err = TestT(v=arbitrary_str)
            assert False, "should have caused an error: can't store non-ascii bytes in a Text"
        except UnicodeDecodeError:
            pass
        try:
            err = TestB(v=u)
            assert False, "should have caused an error: can't store unicode in a Blob"
        except db.BadValueError:
            pass

        # determine the serialized size of each model (note: no keys assigned)
        fEncodedSz = lambda o : len(db.model_to_protobuf(o).Encode())
        sz_t = tuple([fEncodedSz(x) for x in t])
        sz_b = tuple([fEncodedSz(x) for x in b])

        # output the results
        self.response.out.write("text:   1=>%dB  2=>%dB  3=>%dB\n" % sz_t)
        self.response.out.write("blob:   1=>%dB  2=>%dB  3=>%dB\n" % sz_b)

application = webapp.WSGIApplication([('/', MainPage)])
def main(): run_wsgi_app(application)
if __name__ == '__main__': main()

And here is the output:

text:   1=>172B  2=>128047B  3=>128047B
blob:   1=>172B  2=>128047B  3=>128047B
like image 92
David Underhill Avatar answered Sep 22 '22 05:09

David Underhill