2 possible ways of persisting large strings in the Google Datastore are Text
and Blob
data types.
From a storage consumption perspective, which of the 2 is recommended? Same question from a protobuf serialization and deserialization perspective.
Datastore is a highly scalable NoSQL database for your web and mobile applications.
Datastore is a NoSQL document database built for automatic scaling, high performance, and ease of application development.
The Blobstore API allows your application to serve data objects, called blobs, that are much larger than the size allowed for objects in the Datastore service. Blobs are useful for serving large files, such as video or image files, and for allowing users to upload large data files.
There is no significant performance difference between the two - just use whichever one best fits your data. BlobProperty
should be used to store binary data (e.g., str objects) while TextProperty
should be used to store any textual data (e.g., unicode
or str
objects). Note that if you store a str
in a TextProperty
, it must only contain ASCII bytes (less than hex 80 or decimal 128) (unlike BlobProperty
).
Both of these properties are derived from UnindexedProperty
as you can see in the source.
Here is a sample app which demonstrates that there is no difference in storage overhead for these ASCII or UTF-8 strings:
import struct
from google.appengine.ext import db, webapp
from google.appengine.ext.webapp.util import run_wsgi_app
class TestB(db.Model):
v = db.BlobProperty(required=False)
class TestT(db.Model):
v = db.TextProperty(required=False)
class MainPage(webapp.RequestHandler):
def get(self):
self.response.headers['Content-Type'] = 'text/plain'
# try simple ASCII data and a bytestring with non-ASCII bytes
ascii_str = ''.join([struct.pack('>B', i) for i in xrange(128)])
arbitrary_str = ''.join([struct.pack('>2B', 0xC2, 0x80+i) for i in xrange(64)])
u = unicode(arbitrary_str, 'utf-8')
t = [TestT(v=ascii_str), TestT(v=ascii_str*1000), TestT(v=u*1000)]
b = [TestB(v=ascii_str), TestB(v=ascii_str*1000), TestB(v=arbitrary_str*1000)]
# demonstrate error cases
try:
err = TestT(v=arbitrary_str)
assert False, "should have caused an error: can't store non-ascii bytes in a Text"
except UnicodeDecodeError:
pass
try:
err = TestB(v=u)
assert False, "should have caused an error: can't store unicode in a Blob"
except db.BadValueError:
pass
# determine the serialized size of each model (note: no keys assigned)
fEncodedSz = lambda o : len(db.model_to_protobuf(o).Encode())
sz_t = tuple([fEncodedSz(x) for x in t])
sz_b = tuple([fEncodedSz(x) for x in b])
# output the results
self.response.out.write("text: 1=>%dB 2=>%dB 3=>%dB\n" % sz_t)
self.response.out.write("blob: 1=>%dB 2=>%dB 3=>%dB\n" % sz_b)
application = webapp.WSGIApplication([('/', MainPage)])
def main(): run_wsgi_app(application)
if __name__ == '__main__': main()
And here is the output:
text: 1=>172B 2=>128047B 3=>128047B
blob: 1=>172B 2=>128047B 3=>128047B
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With