Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I store the date with datastore?

Datastore documentation is very clear that there is an issue with "hotspots" if you include 'monotonically increasing values' (like the current unix time), however there isn't a good alternative mentioned, nor is it addressed whether storing the exact same (rather than increasing values) would create "hotspots":

"Do not index properties with monotonically increasing values (such as a NOW() timestamp). Maintaining such an index could lead to hotspots that impact Cloud Datastore latency for applications with high read and write rates." https://cloud.google.com/datastore/docs/best-practices

I would like to store the time when each particular entity is inserted into the datastore, if that's not possible though, storing just the date would also work.

That almost seems more likely to cause "hotspots" though, since every new entity for 24 hours would get added to the same index (that's my understanding anyway).

Perhaps there's something more going on with how indexes work (I am having trouble finding great explanations of exactly how they work) and having the same value index over and over again is fine, but incrementing values is not.

I would appreciate if anyone has an answer to this question, or else better documentation for how datastore indexes work.

like image 830
SAM Avatar asked Jan 17 '17 06:01

SAM


1 Answers

Is your application actually planning on querying the date? If not, consider simply not indexing that property. If you only need to read that property infrequently, consider writing a mapreduce rather than indexing.

That advice is given due to the way BigTable tablets work, which is described here: https://ikaisays.com/2011/01/25/app-engine-datastore-tip-monotonically-increasing-values-are-bad/

To the best of my knowledge, it's more important to have the primary key of an entity not be a monotonically increasing number. It would be better to have a string key, so the entity can be stored with better distribution.

But saying this as a non-expert, I can't imagine that indexes on individual properties with monotonic values would be as problematic, if it's legitimately needed. I know with the Nomulus codebase for example, we had a legitimate need for an index on time, because we wanted to delete commit logs older than a specific time.

One cool thing I think happens with these monotonic indexes is that, when these tablet splits don't happen, fetching the leftmost or rightmost element in the index actually has better latency properties than fetching stuff in the middle of the index. For example, if you do a query that just grabs the first result in the index, it can actually go faster than a key lookup.

like image 132
Justine Tunney Avatar answered Oct 25 '22 22:10

Justine Tunney