Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Full text search for irregular rapper names with Solr

I'm implementing full text search functionality on my rap website, and I'm running into some issues with rapper and song names.

For example, someone might want to search for the rapper "Cam'ron" using the query "camron" (leaving out the mid-word apostrophe). Likewise, someone might search for the song "3 Peat" using the query "3peat".

"The Notorious B.I.G." is a bit of a weird case: "The Notorious BIG" and "The Notorious B.I.G." both work (I guess because the solr.StandardFilterFactory removes dots from acronyms?), but "The Notorious B.I.G" (i.e., minus the trailing dot) doesn't.

Ideally all reasonable variations of these names should work. I'm guessing the answer has something to do with the solr.WordDelimiterFilterFactory, but I'm not sure.

Also, I'm using Sunspot with Rails if that's relevant.

like image 350
Tom Lehman Avatar asked May 24 '10 05:05

Tom Lehman


1 Answers

Yes, you are right. You need to configure WordDelimiterFilterFactory properly. Try to enable all properties and don't forget to enable preserveOriginal property, which will save your original terms also.

generateWordparts - will make from B.I.G. terms - B I G

generateNumberParts - will make from 3Peat terms - 3 Peat

catenateWords - will make from B.I.G. terms - BIG

catenateNumbers - will make from Rapper 802.11 terms - Rapper 80211

catenateAll - will make from Rapper-802.11 term - Rapper80211

splitOnCaseChange - will make from GanGsTa terms - Gan Gs Ta

preserveOriginal - will save also original term. From Rapper-802.11RuuLlZ will make - Rapper-802.11RuuLlZ.

like image 60
Yurish Avatar answered Sep 27 '22 23:09

Yurish