Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Redis full text search : reverse indexing or sunspot?

Tags:

redis

sunspot

I have 3,5 millions records (readonly) actually stored in a MySQL DB that I would want to pull out to Redis for performance reasons. Actually, I've managed to store things like this into Redis :

1 {"type":"Country","slug":"albania","name_fr":"Albanie","name_en":"Albania"}
2 {"type":"Country","slug":"armenia","name_fr":"Arménie","name_en":"Armenia"}
...

The key I use here is the legacy MySQL id, so with some Ruby glue, I can break as less things as possible in this existing app (and this is a serious concern here).

Now the problem is when I need to perform a search on the keyword "Armenia", inside the value part. Seems like there's only two ways out :

Either I multiplicate Redis index :

  • id => JSON values (as shown above)
  • slug => id (reverse indexing based on the slug, that could do the basic search trick)
  • finally, another huge index specifically for autocomplete, as shown in this post : http://oldblog.antirez.com/post/autocomplete-with-redis.html

Either I use sunspot or some full text search engine (unfortunatly, I actually use ThinkingSphinx which is too much tied to MySQL :-(

So, what would you do ? Do you think the MySQL to Redis move of a single table is even a good idea ? I'm afraid of the Memory footprint those gigantic Redis key/values could take on a 16GB RAM Server.

Any feedback on a similar Redis usage ?

like image 678
gbarillot Avatar asked Jun 16 '13 13:06

gbarillot


People also ask

Does Redis support full-text search?

(If you aren't familiar with RediSearch, it's a great querying and indexing system for Redis, and includes powerful full-text search capabilities.)

Is Redis search fast?

In addition to being fast and memory efficient, RediSearch can store documents, index existing Redis data, support numeric range filtering of results, perform query execution using a chained-iterator based approach, provide stemming for over 15 languages using the Snowball stemming library, auto-complete search terms, ...

How does RediSearch work?

RediSearch allows you to quickly create indexes on datasets (Hashes), and uses an incremental indexing approach for rapid index creation and deletion. The indexes let you query your data at lightning speed, perform complex aggregations, and filter by properties, numeric ranges, and geographical distance.

Is RediSearch free?

To try RediSearch, either use the RediSearch Docker image, or create a free Redis Cloud Essentials account to get a RediSearch instance in the cloud.


3 Answers

Before I start with a real answer, I wanted to mention that I don't see a good reason for you to be using Redis here. Based on what types of use cases it sounds like you're trying to do, it sounds like something like elasticsearch would be more appropriate for you.

That said, if you just want to be able to search for a few different fields within your JSON, you've got two options:

  1. Auxiliary index that points field_key -> list_of_ids (in your case, "Armenia" -> 1).
  2. Use Lua on top of Redis with JSON encoding and decoding to get at what you want. This is way more flexible and space efficient, but will be slower as your table grows.

Again, I don't think either is appropriate for you because it doesn't sound like Redis is going to be a good choice for you, but if you must, those should work.

like image 155
Eli Avatar answered Oct 03 '22 12:10

Eli


Here's my take on Redis. Basically I think of it as an in-memory cache that can be configured to only store the least recently used data (LRU). Which is the role I made it to play in my use case, the logic of which may be applicable to helping you think about your use case.

I'm currently using Redis to cache results for a search engine based on some complex queries (slow), backed by data in another DB (similar to your case). So Redis serves as a cache storage for answering queries. All queries either get served the data in Redis or the DB if it's a cache-miss in Redis. So, note that Redis is not replacing the DB, but merely being an extension via cache in my case. This fit my specific use case, because the addition of Redis was supposed to assist future scalability. The idea is that repeated access of recent data (in my case, if a user does a repeated query) can be served by Redis, and take some load off of the DB.

Basically my Redis schema ended up looking somewhat like the duplication of your index you outlined above. I used sets and sortedSets to create "batches / sets" of redis-keys, each of which pointed to specific query results stored under a particular redis-key. And in the DB, I still had the complete data set and an index.

If your data set fits on RAM, you could do the "table dump" into Redis, and get rid of the need for MySQL. I could see this working, as long as you plan for persistent Redis storage and plan for the possible growth of your data, if this "table" will grow in the future.

So depending on your actual use case and how you see Redis fitting into your stack, and the load your DB serves, don't rule out the possibility of having to do both of the options you outlined above (which happend in my case).

Hope this helps!

like image 22
chinnychinchin Avatar answered Oct 03 '22 12:10

chinnychinchin


Redis does provide Full Text Search with RediSearch.

Redisearch implements a search engine on top of Redis. This also enables more advanced features, like exact phrase matching, auto suggestions and numeric filtering for text queries, that are not possible or efficient with traditional Redis search approaches.

like image 29
Guy Korland Avatar answered Oct 03 '22 13:10

Guy Korland