What are the options when it comes to SaaS/hosted full text search? How should I evaluate the different options available?
I'm looking for something that uses Lucene, solr, or sphinx on the backend, and provides a REST API for submitting documents to index, and running searches.
I could build my own EC2 AMI, but I'd have to configure EBS and other stuff, monitor it, etc.
A full-text index is a special type of index that provides index access for full-text queries against character or binary column data. A full-text index breaks the column into tokens and these tokens make up the index data.
Full text search is a more advanced way to search a database. Full text search quickly finds all instances of a term (word) in a table without having to scan rows and without having to know which column a term is stored in.
Full-text search allows you to search the full text of all EDGAR filings submitted since 2001. The full text of a filing includes all data in the filing itself as well as all attachments (such as exhibits) to the filing.
Web search engines and document editing software make extensive use of the full-text search technique in functions for searching a text database stored on the Web or on the local drive of a computer; it lets the user find a word or phrase anywhere within the database or document.
Websolr provides a cloud-based Solr with a control panel. It's in private beta as of this writing, but you can get the service through Heroku.
Another hosted Solr service is PowCloud, also in private beta, which seems to offer strong Wordpress integration.
SolrHQ: another beta service providing a hosted Solr solution, with Joomla and Wordpress integrations.
Acquia Search offers Solr integration for Drupal sites.
If you decide to build your own EC2 instance, the SolrOnAmazonEC2 wiki page might be useful. Or you could just get LucidWorks Solr for EC2, which is probably the easiest and fastest way to get Solr on EC2.
Engine Yard provides a cloud-based Sphinx service.
Indextank is a hosted real-time full text search solution. It's pretty simple to set up (you can get an index running in a couple of minutes) and it's very powerfull (Reddit runs over IndexTank). It provides Java, Python, Ruby and Php clients as well as a Rest API specification. There's an awesome support service (including live chat). You should give it a try.
Another option, particularly for UK people is http://www.netaphorsearch.com/ . I should point out I own Netaphor Ltd. We support the Solr REST API but also have a PHP connector so that you can get up and running very quickly.
Have a look at Artirix - UK company but also in the US http://www.artirix.com. I know they power some sites such as Globrix.com in the UK based on SOLR and have a bunch of other products for crawling and data processing
My five cents
http://indexisto.com/
Offers free hosted Elastic Search if you are ready for advertisement in search results. But anyway you can start with free, and switch to no ads paid account.
It's also not just hosted Elastic Search, but ready to ase Ajax search box (that really impress) to embed to you site (mobile and tablet adopted), and some useful features like statistics, image resizing. There are several options to fill the index with documents - crawler, API and DB connector
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With