Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Solr vs. ElasticSearch [closed]

What are the core architectural differences between these technologies?

Also, what use cases are generally more appropriate for each?

like image 823
Ben ODay Avatar asked Apr 18 '12 15:04

Ben ODay


People also ask

Which is better Elasticsearch or Solr?

Solr has more advantages when it comes to the static data, because of its caches and the ability to use an uninverted reader for faceting and sorting – for example, e-commerce. On the other hand, Elasticsearch is better suited – and much more frequently used – for timeseries data use cases, like log analysis use cases.

What is the main architectural difference between Elasticsearch and Solr?

1 Ingest and Query services. The Elasticsearch query process is structured very similarly to the Solr service. The main difference lies in the microservice architecture of the system, and the exits to the Elasticsearch and the ZooKeeper administrative functions, rather than to Solr and the monolithic search server.

Is Elasticsearch based on Solr?

Elasticsearch is completely based on JSON and is suitable for time series and NoSQL data. This tool is much younger than Solr, but it has gained a lot of popularity because of its feature-rich use cases.

Is Solr open-source?

Solr is a leading open source search engine from the Apache Software Foundation's Lucene project.


2 Answers

Update

Now that the question scope has been corrected, I might add something in this regard as well:

There are many comparisons between Apache Solr and ElasticSearch available, so I'll reference those I found most useful myself, i.e. covering the most important aspects:

  • Bob Yoplait already linked kimchy's answer to ElasticSearch, Sphinx, Lucene, Solr, Xapian. Which fits for which usage?, which summarizes the reasons why he went ahead and created ElasticSearch, which in his opinion provides a much superior distributed model and ease of use in comparison to Solr.

  • Ryan Sonnek's Realtime Search: Solr vs Elasticsearch provides an insightful analysis/comparison and explains why he switched from Solr to ElasticSeach, despite being a happy Solr user already - he summarizes this as follows:

    Solr may be the weapon of choice when building standard search applications, but Elasticsearch takes it to the next level with an architecture for creating modern realtime search applications. Percolation is an exciting and innovative feature that singlehandedly blows Solr right out of the water. Elasticsearch is scalable, speedy and a dream to integrate with. Adios Solr, it was nice knowing you. [emphasis mine]

  • The Wikipedia article on ElasticSearch quotes a comparison from the reputed German iX magazine, listing advantages and disadvantages, which pretty much summarize what has been said above already:

    Advantages:

    • ElasticSearch is distributed. No separate project required. Replicas are near real-time too, which is called "Push replication".
    • ElasticSearch fully supports the near real-time search of Apache Lucene.
    • Handling multitenancy is not a special configuration, where with Solr a more advanced setup is necessary.
    • ElasticSearch introduces the concept of the Gateway, which makes full backups easier.

    Disadvantages:

    • Only one main developer [not applicable anymore according to the current elasticsearch GitHub organization, besides having a pretty active committer base in the first place]
    • No autowarming feature [not applicable anymore according to the new Index Warmup API]

Initial Answer

They are completely different technologies addressing completely different use cases, thus cannot be compared at all in any meaningful way:

  • Apache Solr - Apache Solr offers Lucene's capabilities in an easy to use, fast search server with additional features like faceting, scalability and much more

  • Amazon ElastiCache - Amazon ElastiCache is a web service that makes it easy to deploy, operate, and scale an in-memory cache in the cloud.

    • Please note that Amazon ElastiCache is protocol-compliant with Memcached, a widely adopted memory object caching system, so code, applications, and popular tools that you use today with existing Memcached environments will work seamlessly with the service (see Memcached for details).

[emphasis mine]

Maybe this has been confused with the following two related technologies one way or another:

  • ElasticSearch - It is an Open Source (Apache 2), Distributed, RESTful, Search Engine built on top of Apache Lucene.

  • Amazon CloudSearch - Amazon CloudSearch is a fully-managed search service in the cloud that allows customers to easily integrate fast and highly scalable search functionality into their applications.

The Solr and ElasticSearch offerings sound strikingly similar at first sight, and both use the same backend search engine, namely Apache Lucene.

While Solr is older, quite versatile and mature and widely used accordingly, ElasticSearch has been developed specifically to address Solr shortcomings with scalability requirements in modern cloud environments, which are hard(er) to address with Solr.

As such it would probably be most useful to compare ElasticSearch with the recently introduced Amazon CloudSearch (see the introductory post Start Searching in One Hour for Less Than $100 / Month), because both claim to cover the same use cases in principle.

like image 152
Steffen Opel Avatar answered Sep 17 '22 21:09

Steffen Opel


I see some of the above answers are now a bit out of date. From my perspective, and I work with both Solr(Cloud and non-Cloud) and ElasticSearch on a daily basis, here are some interesting differences:

  • Community: Solr has a bigger, more mature user, dev, and contributor community. ES has a smaller, but active community of users and a growing community of contributors
  • Maturity: Solr is more mature, but ES has grown rapidly and I consider it stable
  • Performance: hard to judge. I/we have not done direct performance benchmarks. A person at LinkedIn did compare Solr vs. ES vs. Sensei once, but the initial results should be ignored because they used non-expert setup for both Solr and ES.
  • Design: People love Solr. The Java API is somewhat verbose, but people like how it's put together. Solr code is unfortunately not always very pretty. Also, ES has sharding, real-time replication, document and routing built-in. While some of this exists in Solr, too, it feels a bit like an after-thought.
  • Support: there are companies providing tech and consulting support for both Solr and ElasticSearch. I think the only company that provides support for both is Sematext (disclosure: I'm Sematext founder)
  • Scalability: both can be scaled to very large clusters. ES is easier to scale than pre-Solr 4.0 version of Solr, but with Solr 4.0 that's no longer the case.

For more thorough coverage of Solr vs. ElasticSearch topic have a look at https://sematext.com/blog/solr-vs-elasticsearch-part-1-overview/ . This is the first post in the series of posts from Sematext doing direct and neutral Solr vs. ElasticSearch comparison. Disclosure: I work at Sematext.

like image 35
Otis Gospodnetic Avatar answered Sep 19 '22 21:09

Otis Gospodnetic