Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can ElasticSearch be used to implement social search?

Tags:

I’m trying to create a business search with social features using ElasticSearch. I have a business directory, and users can interact with those businesses in different ways: by reviewing them, checking into them, etc.

When a user searches for a business, I'd like to be able to show them the businesses that their friends have interacted with at the top of the results (or filter based on those interactions). What's the best way to set up my index to achieve this?

I can think have a few possible solutions, but I'm a beginner with ES and I'm not sure what will cause problems:

  1. I could use multi-tennancy and create a separate index for each user. I've ruled this out because the number of users is much greater than the amount of businesses or the amount of user-specific content.

  2. I could add a list of user/score pairs to each indexed business. Every user who has interacted with the business would be in there, and the score would represent the amount of interaction they'd had with the business (this is good enough for my filtering/sorting purposes). Every time they interact with the business, I would update the score in the index. The problem with this is that I only care about my friends' activity, so I would need to figure out some way to take into account who my friends are when creating a composite score for the business. I don't know how to do this in ES.

  3. I could create a similar scheme, but instead of keeping score of my interactions with a business, the score would reflect my friends' interactions with the business. This takes away the need to model my social graph in ElasticSearch, but it does mean that any time a person interacts with a business, I would need to update all of their friends' scores. It would also mean that the list of user/score pairs for each business would be larger, since it'll need to include anybody who has a friend who has interacted with the business.

  4. The final solution I can think of is to keep track of every individual interaction that happens to a business, and add it to business’s document in ES. This doesn’t seem realistic to me – it combines the problems from the other solutions. But it’s probably the most straightforward approach in terms of keeping the index up to date.

Thanks for your help!

like image 367
Borys Avatar asked May 21 '12 16:05

Borys


People also ask

How do you implement ElasticSearch?

First of all, you need Elasticsearch. Follow the documentation instructions to download the latest version, install it and start it. Basically, you need a recent version of Java, download and install Elasticsearch for your Operating System, and finally start it with the default values - bin/elasticsearch.

Which of the following ways are used for searching in ElasticSearch?

Following are the way of search in Elasticsearch: Multi-index, Multitype search: You can search APIs that can be applied across all multiple indices by using the multi-index support system. In Elastic search, we can create certain tags across all indices across all indices and all types.

What is ElasticSearch mainly used for?

Elasticsearch allows you to store, search, and analyze huge volumes of data quickly and in near real-time and give back answers in milliseconds. It's able to achieve fast search responses because instead of searching the text directly, it searches an index.


2 Answers

I'm voting for a modified #2.

Instead of storing each user/score pair inside of the business document itself, I would create a Parent/Child relationship. This lets you update the score of the child (the user scores) without having to reindex the entire business document (and all the other user scores).

Check out this page for a great tutorial parent/children are about halfway down: http://www.spacevatican.org/2012/6/3/fun-with-elasticsearch-s-children-and-nested-documents/

Then you can use a has_child filter or top_children query to find only those businesses that your friends have scores for. There are a few caveats about ordering children documents, but it's covered by that tutorial so make sure you read to the bottom.

Then I'd just perform a normal query for all "non-social" ranked searches.

Alternatively, you could lump everything together and add boosts to the matches that your friends have scored, so that everything ranks appropriately. It may just be easier to perform two queries and combine them yourself.

like image 95
Zach Avatar answered Oct 20 '22 17:10

Zach


There's another set of solutions that have the upside of being extremely fast (i.e. taking advantage of what ES is best at), but looks terrible to anyone who knows even the first thing about designing data storage/retrieval systems.

If your 'business' index is smaller than your 'user' index (i.e. 10,000 biz, 1,000,000 users)

  1. Create 2 indexes: User and Business.
  2. Business index should have an 'array' field that holds the ids of every user who has ever "interacted" with it (i.e. "users: 1,4,23,26,127,8678")
  3. User index should have a nested array field with business IDs and reviews, checkins, etc in a nested object with meta information (i.e. "business_id:1233,rating: 7.5,checkins:21")

When you search for a business, do a quick string query or filter query with the User's friend ids (OR of course) against the Business index. The tf-idf should automatically filter businesses that have been interacted with the most by your your friends to the top. If you need more info, just hit the User index to get the meta data for each of your friends (rating, checkins, etc). This should be lightening fast and super efficient, because ES is absolutely fantastic at matching arrays as individual terms. That's what its for yo!

If your 'business' index is signifigantly larger than your 'user' index, reverse the pattern...putting an indexed array of business_ids a user has interacted with on the user index.

like image 41
thoughtpunch Avatar answered Oct 20 '22 17:10

thoughtpunch