I’m trying to create a business search with social features using ElasticSearch. I have a business directory, and users can interact with those businesses in different ways: by reviewing them, checking into them, etc. When a user searches for a business, I'd like to be able to show them the businesses that their friends have interacted with at the top of the results (or filter based on those interactions). What's the best way to set up my index to achieve this? I can think have a few possible solutions, but I'm a beginner with ES and I'm not sure what will cause problems: <ol> <li>I could use multi-tennancy and create a separate index for each user. I've ruled this out because the number of users is much greater than the amount of businesses or the amount of user-specific content.</li> <li>I could add a list of user/score pairs to each indexed business. Every user who has interacted with the business would be in there, and the score would represent the amount of interaction they'd had with the business (this is good enough for my filtering/sorting purposes). Every time they interact with the business, I would update the score in the index. The problem with this is that I only care about my friends' activity, so I would need to figure out some way to take into account who my friends are when creating a composite score for the business. I don't know how to do this in ES.</li> <li>I could create a similar scheme, but instead of keeping score of my interactions with a business, the score would reflect my friends' interactions with the business. This takes away the need to model my social graph in ElasticSearch, but it does mean that any time a person interacts with a business, I would need to update all of their friends' scores. It would also mean that the list of user/score pairs for each business would be larger, since it'll need to include anybody who has a friend who has interacted with the business.</li> <li>The final solution I can think of is to keep track of every individual interaction that happens to a business, and add it to business’s document in ES. This doesn’t seem realistic to me – it combines the problems from the other solutions. But it’s probably the most straightforward approach in terms of keeping the index up to date.</li> </ol> Thanks for your help!

There's another set of solutions that have the upside of being extremely fast (i.e. taking advantage of what ES is best at), but looks terrible to anyone who knows even the first thing about designing data storage/retrieval systems. If your 'business' index is smaller than your 'user' index (i.e. 10,000 biz, 1,000,000 users) <ol> <li>Create 2 indexes: User and Business.</li> <li>Business index should have an 'array' field that holds the ids of every user who has ever "interacted" with it (i.e. "users: 1,4,23,26,127,8678")</li> <li>User index should have a nested array field with business IDs and reviews, checkins, etc in a nested object with meta information (i.e. "business_id:1233,rating: 7.5,checkins:21")</li> </ol> When you search for a business, do a quick string query or filter query with the User's friend ids (OR of course) against the Business index. The tf-idf should automatically filter businesses that have been interacted with the most by your your friends to the top. If you need more info, just hit the User index to get the meta data for each of your friends (rating, checkins, etc). This should be lightening fast and super efficient, because ES is absolutely fantastic at matching arrays as individual terms. That's what its for yo! If your 'business' index is signifigantly larger than your 'user' index, reverse the pattern...putting an indexed array of business_ids a user has interacted with on the user index.

How can ElasticSearch be used to implement social search?

Tags:

I’m trying to create a business search with social features using ElasticSearch. I have a business directory, and users can interact with those businesses in different ways: by reviewing them, checking into them, etc.

When a user searches for a business, I'd like to be able to show them the businesses that their friends have interacted with at the top of the results (or filter based on those interactions). What's the best way to set up my index to achieve this?

I can think have a few possible solutions, but I'm a beginner with ES and I'm not sure what will cause problems:

I could use multi-tennancy and create a separate index for each user. I've ruled this out because the number of users is much greater than the amount of businesses or the amount of user-specific content.
I could add a list of user/score pairs to each indexed business. Every user who has interacted with the business would be in there, and the score would represent the amount of interaction they'd had with the business (this is good enough for my filtering/sorting purposes). Every time they interact with the business, I would update the score in the index. The problem with this is that I only care about my friends' activity, so I would need to figure out some way to take into account who my friends are when creating a composite score for the business. I don't know how to do this in ES.
I could create a similar scheme, but instead of keeping score of my interactions with a business, the score would reflect my friends' interactions with the business. This takes away the need to model my social graph in ElasticSearch, but it does mean that any time a person interacts with a business, I would need to update all of their friends' scores. It would also mean that the list of user/score pairs for each business would be larger, since it'll need to include anybody who has a friend who has interacted with the business.
The final solution I can think of is to keep track of every individual interaction that happens to a business, and add it to business’s document in ES. This doesn’t seem realistic to me – it combines the problems from the other solutions. But it’s probably the most straightforward approach in terms of keeping the index up to date.

Thanks for your help!

367

asked May 21 '12 16:05

Borys

2 Answers

I'm voting for a modified #2.

Instead of storing each user/score pair inside of the business document itself, I would create a Parent/Child relationship. This lets you update the score of the child (the user scores) without having to reindex the entire business document (and all the other user scores).

Check out this page for a great tutorial parent/children are about halfway down: http://www.spacevatican.org/2012/6/3/fun-with-elasticsearch-s-children-and-nested-documents/

Then you can use a has_child filter or top_children query to find only those businesses that your friends have scores for. There are a few caveats about ordering children documents, but it's covered by that tutorial so make sure you read to the bottom.

Then I'd just perform a normal query for all "non-social" ranked searches.

Alternatively, you could lump everything together and add boosts to the matches that your friends have scored, so that everything ranks appropriately. It may just be easier to perform two queries and combine them yourself.

answered Oct 20 '22 17:10

Zach

There's another set of solutions that have the upside of being extremely fast (i.e. taking advantage of what ES is best at), but looks terrible to anyone who knows even the first thing about designing data storage/retrieval systems.

If your 'business' index is smaller than your 'user' index (i.e. 10,000 biz, 1,000,000 users)

Create 2 indexes: User and Business.
Business index should have an 'array' field that holds the ids of every user who has ever "interacted" with it (i.e. "users: 1,4,23,26,127,8678")
User index should have a nested array field with business IDs and reviews, checkins, etc in a nested object with meta information (i.e. "business_id:1233,rating: 7.5,checkins:21")

When you search for a business, do a quick string query or filter query with the User's friend ids (OR of course) against the Business index. The tf-idf should automatically filter businesses that have been interacted with the most by your your friends to the top. If you need more info, just hit the User index to get the meta data for each of your friends (rating, checkins, etc). This should be lightening fast and super efficient, because ES is absolutely fantastic at matching arrays as individual terms. That's what its for yo!

If your 'business' index is signifigantly larger than your 'user' index, reverse the pattern...putting an indexed array of business_ids a user has interacted with on the user index.

answered Oct 20 '22 17:10

thoughtpunch

Related questions
                            
                                Why does Mathematica use an underscore when defining parameters?
                            
                                Creating a secondary site-packages directory (and loading packages from .pth files therein)
                            
                                Numpy: Check array for string data type
                            
                                attempt to reconnect jdbc pool datasource after database restarts
                            
                                Benefits of vector<char> over string?
                            
                                How can I display the Build number and/or DateTime of last build in my app?
                            
                                Can't get Jetty to scan for annotated classes
                            
                                D3 - using strings for axis ticks
                            
                                Using HTML <abbr> tag to explain content
                            
                                How can I recover sharing in a GADT?
                            
                                Is there a C function like sprintf in the Linux kernel?
                            
                                CURLOPT_RETURNTRANSFER set to true doesnt work on hosting server

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With