I want to build an application that will serve a lot of people (more than 2 million) so I think that I should use Google Cloud Datastore. However I also know that there is an option to use Google Cloud SQL and still serve a lot of people using mySQL (like what Facebook and Youtube do).
Is this a correct assumption to use Datastore rather that the relational Cloud SQL with this many users? Thank you in advance
Datastore is a NoSQL document database built for automatic scaling, high performance, and ease of application development. Datastore features include: Atomic transactions. Datastore can execute a set of operations where either all succeed, or none occur.
Cloud Datastore—a document database built for automatic scaling, high performance, and ease of use. Cloud Bigtable—an alternative to HBase, a columnar database system running on HDFS. Suitable for high throughput applications.
Unlike BigTable, Datastore is optimized for smaller set of data. Although Cloud Datastore is a NoSQL db and you don't need to define a schema before storing a row, it actually uses more for ad hoc storage of structured data. Cloud Datastore does not have SQL, but have an API called GQL to perform some kind of queries.
At a high level, Datastore is best suited for storing hierarchical, transactional data that has a flexible, non-relational schema. In particular, consider using Datastore if your web application requires some of the following: Any amount of storage capacity: Datastore is agnostic to the amount of data stored.
To give an intelligent answer, I would need to know a lot more about your app. But... I'll outline the biggest gotchas I've found...
Google Datastore is effectively a distributed hierarchical data store. To get the scalability they wanted there had to be some compromises. As a developer you will find that these are anywhere from easy to work around, difficult to work around, or impossible to work around. The latter is far more likely than you would ever assume.
If you are accustomed to relational databases and the ability to manipulate data across multiple tables within the same transaction, you are likely to pull your hair out with datastore. The biggest(?) gotcha is that transactions are only supported across a limited number of entity groups (5 at the current time). To give a simple example, say you had a simple parent-child relationship and you needed to update child records under more than 5 parents at the same time within a transaction... can't be done (yes, really). If you reorganize your data structures and try to put all of the former child records under a single entity so they can be updated in a single transaction, you will come across another limitation... the fact that you can't reliably update the same entity group more than once per second (yes, really). And if you query an entity type across parents without specifying the root entity of each, you will get what is euphemistically referred to as "eventual consistency"... which means it isn't (yes, really).
The above is all in Google's documentation, but you are likely to gloss over it if you are just getting started (of course it can handle it!).
It is not strictly true that Facebook and YouTube are using MySQL to serve the majority of their content to the majority of their users. They both mainly use very large NoSQL stores (Cassandra and BigTable) for scalability, and probably use MySQL for smaller scale work that demands more complex relational storage. Try to use Datastore if you can, because you can start for free and will also save money when handling large volumes of data.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With