SQL database representing different types of events (concerts, football matches, charity collections etc.), where each contain event-related data (concert - artist name, match - host/visitor team). All of these events inherit from one, general table event
, which contains data related to all of them (name, description, location, start/end date).
Inheritance is implemented using table-per-subclass model known from Hibernate or Doctrine. The database also stores tables artists
(id
, name
, birth_date
) and football_teams
(id
, name
, country
, coach_name
) used in event_concerts
and event_football_matches
tables (through FKs).
Create a search engine that given some criteria ({name: "manchester", startDate: "01.01.2012 - 01.02.2012"}
or {location: "london", description: "artists +metallica -bieber"}
) will return all events that meet the criteria, as well as results from artists
/football_teams
tables.
Some properties of those events contain large pieces of text, that should be searched through in fulltext-search manner.
Given following search criteria:
{ location: "london", startDate: "05.11.2012 - 07.11.2012" }
Search engine should return:
As you can see, startDate (event-related property) is considered only in case of events.
Search engine has to scan lots of tables, that's why I believe I should use dedicated software (Sphinx, Lucene, ...?) and create separate index just for the searching.
Could anyone suggest some solution for building such an index? What software could I use as a base for that search engine?
Just to clarify: none of the properties is required. Some of them contain dates which will be searched using exact-match, some of them contain short text (like a location) that also will be searched using exact-match. But some of them contain long pieces of text, and that needs to be searched in full-text manner.
I see three approaches.
Migrate to Couchdb. To make the location
+ start time
search, you build a view using [location, event_start_time]
as key. During search time, you query by using ?startkey=["london,"05.11.2012"]&endkey=["london", "07.11.2012"]
. To enable full text search on description
, you build a specific view like this.
Build ElasticSearch index. The search in your example is just a structured query using two data fields, in which location
is a must, and event start time
is a range. Full-text-search in elasticsearch is more powerful, you can define specific analyzer to process text in your data. Elasticsearch also supports search with geospatial data.
Use Amazon CloudSearch. There are some known limits. I list some I did not like:
For some comparisons between search implementations:
You can use a search table with a tsearch2 column that will allow you to do the fulltext search and every column that you need to do a query against : name and location for instance.
If your tables inherit all from the same model it will be easier to reference them in the search table. If not, you can build a content type table to put in place generic foreign keys which will be used in your search table to reference the result rows which will allow eventually to retrieve the results of the query. The table is populated using triggers. This might be the most efficient solution especially if you know SQL (maybe PL/SQL) well.
If it's not the case, I recommend you use Lucene to build the index you need in Java, and query it as you wish.
I'm assuming you are not using a databse with native full text implementation so this is a poor man's solution but it's something to get you going and delivery something before you go to a full blown solution such as Lucene or any other search services suggested that might need a longer learning time.
You could simply have a scheduled process running from time to time to get all the data from all these different tables and dump them into an "index" table where you could perform the search without all the joins.
You could have a table such as:
eventId - int
keywords - nvarchar(max)
location - nvarchar
start_date - datetime
end_date - datetime
This way you would have everything you need to perform the search in one place. For the search itself you could probably get a decent result with like operator in a query.
Not the best search platform a man can dream of but something that you can get working in a few hours.
You have a lot of different requirements, and no software will be able to fulfill all of them.
Let us talk about every one of them.
You need a system that manages to construct a well-formatted query based on user input. This system manages not only to do +metallica -bieber -> text CONTAIN metallica AND text NOT CONTAIN bieber
Your example is not entirely accurate. How would the system manage to find "Arsenal", based on 'location: London' ? How does the system manage to find "Neil Christian" based on 'location: London' ?
Solution: I am pretty sure systems like Lucene support rich syntax on search queries. However, it will be the framework syntax, not your own. Please note that specifying your own syntax is difficult; you will need to construct a lexer, parser and your own translation into the query tree of your search framework.
You need to search multiple entities at once. This is no problem for search frameworks.
However, you also want to define the search structure and link this back to your entities. This will be a little bit harder, but not impossible to do.
This is also pretty simple to do.
Advanced search frameworks should have support for semantic search (e.g. London is related to Queen, since they occur frequently together in some texts). This may be dependent on your training dataset though.
The thing which is hard to do is not the inheritance, nor is it the full text search. You should however really define how the data is matched. Search systems are not magic hocus pocus, although Google may make it look that way. Search depends on real mathematics; you should specify these mathematical rules on your syntax before going any further on this.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With