Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best Data Store for huge data with large number of reads and writes

I need to store around 100 millions of records on the database. Around 60-70% of them will be deleted daily and same amount of records are inserted daily. I feel a document database like Hbase, Big Table would fit in this. There are many other data stores like Cassandra, MongoDb, etc. Which data store would be useful for this kind of problem as there will be huge amount of reads/writes(order of 10's of millions) daily.

like image 850
sravan_kumar Avatar asked Dec 23 '11 08:12

sravan_kumar


People also ask

What is the best database for large amount of data?

If your files reflect your table design, all databases have bulk loader tools that can populate and index SQL tables from the files. Google's BigTable database and Hadoop are two database engines that can handle large amount of data. The amount of data (200m records per year) is not really big and should go with any standard database engine.

What database should I use for 200m records per year?

The amount of data (200m records per year) is not really big and should go with any standard database engine. The case is yet easier if you do not need live reports on it. I'd mirror and preaggregate data on some other server in e.g. daily batch.

What is the fastest way to sort data with millions of records?

If you need it to be much tighter, just store them sorted in a plain array and use binary search to fetch them. It will be O (log n) instead of O (1), but for 'millions' of records it's still just twentysomething steps to get any one of them. In C you have bsearch (), which is as fast as it can get.

What is the best database for time series data?

Currently InfluxDB seems to be the most established and widely used time series database. Show activity on this post.


1 Answers

Based on the characteristics you've mentioned (JSON Documents, accesses by key, 100 million records, balanced read/write) I'd say CouchDB or Membase are good candidates (here's a quick comparison)

Both HBase and Cassandra can probably also work but for HBase you'd need to install a lot of components (Hadoop, ZooKeeper etc) that you won't really use d only use and Cassandra is better when you have more writes than read (at least the last time I used it).

Big Table, is unfortunately internal to google : )

like image 145
Arnon Rotem-Gal-Oz Avatar answered Sep 28 '22 20:09

Arnon Rotem-Gal-Oz