I will describe the data and case.
record {
customerId: "id", <---- indexed
binaryData: "data" <---- not indexed
}
Expectations:
What we have tried:
We have this approach implemented in relational database, in AWS RDS MariaDB. Server is 32GB RAM, 2TB GP2 SSD, 8 CPU. I found that IOPS usage was high and insert speed was not satisfactory. After investigation I concluded that due to random nature of customerId there is high rate of different writes to index. After this I did the following:
Even like this, I could not make more than 1-3M record writes per hour. Different write concurrencies were tested, current value is 4 concurrent writers. After all modifications it's not clear what else we can improve:
Apparently, ACID principles are holding us back. I am in look for flatly scalable key-value store and will be glad to hear any ideas and roughly estimations.
So if I understand you...
That's quite doable in NoSQL.
The scale is not the issue. It's your cost.
$1k a month for 1 TB of data sounds like a reasonable goal. I just don't think that the public clouds are quite there yet.
Let me give an example with my recommendation: Scylla Cloud and Scylla Open Source. (Disclosure: I work for ScyllaDB.)
I will caution you that your $1k/month capitation on costs might cause you to consider and make some tradeoffs.
As is typical in high availability deployments, to ensure data redundancy in case of node failure, you could use 3x i3.2xlarge instances on AWS (can store 1.9 TB per instance).
You want the extra capacity to run compactions. We use incremental compaction, which saves on space amplification, but you don't want to go with the i3.xlarge (0.9 tb each), which is under your 1 tb limit unless really pressed for costs. In which case you'll have to do some sort of data eviction (like a TTL) to keep your data to around <600 gb.
Even with annual reserved pricing for Scylla Cloud (see here: https://www.scylladb.com/product/scylla-cloud/#pricing) of $764.60/server, to run the three i3.2xlarge would be $2,293.80/month. More than twice your budget.
Now, if you eschew managed services, and want to run self-service, you could go Scylla Open Source, and just look at the on-demand instance pricing (see here: https://aws.amazon.com/ec2/pricing/on-demand/). For 3x i3.2xlarge, you are running each at $0.624/hour. That's a raw on-demand cost of $449.28 each, which doesn't include incidentals like backups, data transfer, etc. But you could get three instances for $1,347.84. Open Source. Not managed.
Still over your budget, but closer. If you could get reserved pricing, that might just make it.
Edit: Found the reserve pricing:
3x i3.2xlarge is going to cost you
So, again, backups, monitoring, and other costs are above that. But you should be able to bring the raw server cost <$1,000 to meet your needs using Scylla Open Source.
But the admin burden is on your team (and their time isn't exactly zero cost).
For example, if you want monitoring on your system, you'll need to set up something like Prometheus, Grafana or Datadog. That will be other servers or services, and they aren't free. (The cost of backups and monitoring by our team are covered with Scylla Cloud. Part of the premium for the service.)
Another way to save money is to only do 2x replication. Which puts your data in a real risky place in case you lose a server. It is not recommended.
All of this was based on maximal assumptions of your data. That your records are all around 2k (not 1k). That you're not getting much utility out of data compression, which ScyllaDB has built in – see part one (https://www.scylladb.com/2019/10/04/compression-in-scylla-part-one/) and part two (https://www.scylladb.com/2019/10/07/compression-in-scylla-part-two/).
To my mind, you should be able to squeak through with your $1k/month budget if you go reserved pricing and open source. Though adding on monitoring and backups and other incidental costs (which I haven't calculated here) may end you up back over that number again.
Otherwise, $2.3k/month in a fully-managed-cloud enterprise package and you can sleep easy at night.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With