Practical Limits of ElasticSearch + Cassandra

Tags:

I am planning on using ElasticSearch to index my Cassandra database. I am wondering if anyone has seen the practical limits of ElasticSearch. Do things get slow in the petabyte range? Also, has anyone has any problems using ElasticSearch to index Cassandra?

571

asked Jun 15 '11 14:06

Henry

2 Answers

See this thread from 2011, which mentions ElasticSearch configurations with 1700 shards each of 200GB, which would be in the 1/3 petabyte range. I would expect that the architecture of ElasticSearch would support almost limitless horizontal scalability, because each shard index works separately from all other shards.

The practical limits (which would apply to any other solution as well) include the time needed to actually load that much data in the first place. Managing a Cassandra cluster (or any other distributed datastore) of that size will also involve significant workload just for maintenance, load balancing etc.

151

answered Oct 07 '22 13:10

DNA

Sonian is the company kimchy alludes to in that thread. We have over a petabyte on AWS across multiple ES clusters. There isn't a technical limitation to how far horizontally you can scale ES, but as DNA mentioned there are practical problems. The biggest by far is network. It applies to every distributed data storage. You can only move so much across the wire at a time. When ES has to recover from a failure, it has to move data. The best option is to use smaller shards across more nodes (more concurrent transfer), but you risk a higher rate of failure and exhorbitant cost per byte.

answered Oct 07 '22 14:10

drewr

Related questions
                            
                                Rails 3.1 + Heroku Cedar - Static image assets are not being served
                            
                                How to get the page source with Mechanize/Nokogiri
                            
                                ASP.NET MVC on IIS falls through to the static file handler
                            
                                Determining Camera Resolution (i.e. Megapixels) Programmatically in Android
                            
                                How do you build an Android back stack when an activity is started directly from a notification?
                            
                                Drawing different colored shapes in a path (HTML5 Canvas / Javascript)
                            
                                Windows 8, C++ and Metro GUI samples?
                            
                                How to properly use qRegisterMetaType on a class derived from QObject?
                            
                                Calculate the time difference between of two rows
                            
                                Submitting object to Facebook via open graph doesn't work, but then works after testing the URL in Facebook's object debugger?
                            
                                Numpy python find minimum value of each column and subtract this value from each column
                            
                                How can boost::serialization be used with std::shared_ptr from C++11?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With