I'm looking for a stand-alone full-text search server with the following properties: <ul> <li>Must operate as a stand-alone server that can serve search requests from multiple clients</li> <li>Must be able to do "bulk indexing" by indexing the result of an SQL query: say "SELECT id, text_to_index FROM documents;"</li> <li>Must be free software and must run on Linux with MySQL as the database</li> <li>Must be fast (rules out MySQL's internal full-text search)</li> </ul> The alternatives I've found that have these properties are: <ul> <li>Solr (based on Lucene)</li> <li>ElasticSearch (also based on Lucene)</li> <li>Sphinx</li> </ul> My questions: <ul> <li>How do they compare? </li> <li>Have I missed any alternatives?</li> <li>I know that each use case is different, but are there certain cases where I would definitely not want to use a certain package?</li> </ul>

Unless you need to extend the search functionality in any proprietary way, Sphinx is your best bet. Sphinx advantages: <ol> <li>Development and setup is faster</li> <li>Much better (and faster) aggregation. This was the killer feature for us.</li> <li>Not XML. This is what ultimately ruled out Solr for us. We had to return rather large result sets (think hundreds of results) and then aggregate them ourselves since Solr aggregation was lacking. The amount of time to serialize to and from XML just absolutely killed performance. For small results sets though, it was perfectly fine.</li> <li>Best documentation I've seen in an open source app</li> </ol> Solr advantages: <ol> <li>Can be extended. </li> <li>Can hit it directly from a web app, i.e., you can have autocomplete-like searches hit the Solr server directly via AJAX. </li> </ol>

I've been using Solr successfully for almost 2 years now, and have never used Sphinx, so I'm obviously biased. However, I'll try to keep it objective by quoting the docs or other people. I'll also take patches to my answer :-) Similarities: <ul> <li>Both Solr and Sphinx satisfy all of your requirements. They're fast and designed to index and search large bodies of data efficiently.</li> <li>Both have a long list of high-traffic sites using them (Solr, Sphinx)</li> <li>Both offer commercial support. (Solr, Sphinx)</li> <li>Both offer client API bindings for several platforms/languages (Sphinx, Solr)</li> <li>Both can be distributed to increase speed and capacity (Sphinx, Solr)</li> </ul> Here are some differences: <ul> <li>Solr, being an Apache project, is obviously Apache2-licensed. Sphinx is GPLv2. This means that if you ever need to embed or extend (not just "use") Sphinx in a commercial application, you'll have to buy a commercial license (rationale)</li> <li>Solr is easily embeddable in Java applications.</li> <li>Solr is built on top of Lucene, which is a proven technology over 8 years old with a huge user base (this is only a small part). Whenever Lucene gets a new feature or speedup, Solr gets it too. Many of the devs committing to Solr are also Lucene committers.</li> <li>Sphinx integrates more tightly with RDBMSs, especially MySQL.</li> <li>Solr can be integrated with Hadoop to build distributed applications </li> <li>Solr can be integrated with Nutch to quickly build a fully-fledged web search engine with crawler.</li> <li>Solr can index proprietary formats like Microsoft Word, PDF, etc. Sphinx can't.</li> <li>Solr comes with a spell-checker out of the box.</li> <li>Solr comes with facet support out of the box. Faceting in Sphinx takes more work.</li> <li> Sphinx doesn't allow partial index updates for field data.</li> <li>In Sphinx, all document ids must be unique unsigned non-zero integer numbers. Solr doesn't even require an unique key for many operations, and unique keys can be either integers or strings.</li> <li>Solr supports field collapsing (currently as an additional patch only) to avoid duplicating similar results. Sphinx doesn't seem to provide any feature like this.</li> <li>While Sphinx is designed to only retrieve document ids, in Solr you can directly get whole documents with pretty much any kind of data, making it more independent of any external data store and it saves the extra roundtrip.</li> <li>Solr, except when used embedded, runs in a Java web container such as Tomcat or Jetty, which require additional specific configuration and tuning (or you can use the included Jetty and just launch it with <code>java -jar start.jar</code>). Sphinx has no additional configuration.</li> </ul> Related questions: <ul> <li>Full Text Searching with Rails</li> <li>Comparison of full text search engine - Lucene, Sphinx, Postgresql, MySQL?</li> </ul>

Choosing a stand-alone full-text search server: Sphinx or SOLR? [closed]

2 Answers

Unless you need to extend the search functionality in any proprietary way, Sphinx is your best bet.

Sphinx advantages:

Development and setup is faster
Much better (and faster) aggregation. This was the killer feature for us.
Not XML. This is what ultimately ruled out Solr for us. We had to return rather large result sets (think hundreds of results) and then aggregate them ourselves since Solr aggregation was lacking. The amount of time to serialize to and from XML just absolutely killed performance. For small results sets though, it was perfectly fine.
Best documentation I've seen in an open source app

Solr advantages:

Can be extended.
Can hit it directly from a web app, i.e., you can have autocomplete-like searches hit the Solr server directly via AJAX.

answered Sep 23 '22 11:09

larf311

I've been using Solr successfully for almost 2 years now, and have never used Sphinx, so I'm obviously biased. However, I'll try to keep it objective by quoting the docs or other people. I'll also take patches to my answer :-)

Similarities:

Both Solr and Sphinx satisfy all of your requirements. They're fast and designed to index and search large bodies of data efficiently.
Both have a long list of high-traffic sites using them (Solr, Sphinx)
Both offer commercial support. (Solr, Sphinx)
Both offer client API bindings for several platforms/languages (Sphinx, Solr)
Both can be distributed to increase speed and capacity (Sphinx, Solr)

Here are some differences:

Solr, being an Apache project, is obviously Apache2-licensed. Sphinx is GPLv2. This means that if you ever need to embed or extend (not just "use") Sphinx in a commercial application, you'll have to buy a commercial license (rationale)
Solr is easily embeddable in Java applications.
Solr is built on top of Lucene, which is a proven technology over 8 years old with a huge user base (this is only a small part). Whenever Lucene gets a new feature or speedup, Solr gets it too. Many of the devs committing to Solr are also Lucene committers.
Sphinx integrates more tightly with RDBMSs, especially MySQL.
Solr can be integrated with Hadoop to build distributed applications
Solr can be integrated with Nutch to quickly build a fully-fledged web search engine with crawler.
Solr can index proprietary formats like Microsoft Word, PDF, etc. Sphinx can't.
Solr comes with a spell-checker out of the box.
Solr comes with facet support out of the box. Faceting in Sphinx takes more work.
Sphinx doesn't allow partial index updates for field data.
In Sphinx, all document ids must be unique unsigned non-zero integer numbers. Solr doesn't even require an unique key for many operations, and unique keys can be either integers or strings.
Solr supports field collapsing (currently as an additional patch only) to avoid duplicating similar results. Sphinx doesn't seem to provide any feature like this.
While Sphinx is designed to only retrieve document ids, in Solr you can directly get whole documents with pretty much any kind of data, making it more independent of any external data store and it saves the extra roundtrip.
Solr, except when used embedded, runs in a Java web container such as Tomcat or Jetty, which require additional specific configuration and tuning (or you can use the included Jetty and just launch it with java -jar start.jar). Sphinx has no additional configuration.

Mauricio Scheffer

Related questions
                            
                                Mysql command not found in OS X 10.7
                            
                                How do I turn off the mysql password validation?
                            
                                MySQL user DB does not have password columns - Installing MySQL on OSX
                            
                                Delete sql rows where IDs do not have a match from another table
                            
                                If table exists drop table then create it, if it does not exist just create it
                            
                                MySQL stored procedure vs function, which would I use when?
                            
                                Is there a naming convention for MySQL?
                            
                                How to recover MySQL database from .myd, .myi, .frm files
                            
                                Select records from NOW() -1 Day
                            
                                Throw an error preventing a table update in a MySQL trigger
                            
                                On duplicate key ignore? [duplicate]
                            
                                How to create a database from shell command?
                            
                                MySQL - UPDATE multiple rows with different values in one query
                            
                                How to get ER model of database from server with Workbench
                            
                                #1273 - Unknown collation: 'utf8mb4_unicode_ci' cPanel
                            
                                Maximum number of records in a MySQL database table
                            
                                How to find out the MySQL root password
                            
                                MySQL: Sort GROUP_CONCAT values
                            
                                MySQL Great Circle Distance (Haversine formula)
                            
                                Setting Django up to use MySQL

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Choosing a stand-alone full-text search server: Sphinx or SOLR? [closed]

Tags:

mysql

full-text-search

solr

lucene

sphinx

knorv

People also ask

2 Answers

larf311

Mauricio Scheffer

Recent Activity

Donate For Us