Reverse Search Best Practices?

Tags:

I'm making an app that has a need for reverse searches. By this, I mean that users of the app will enter search parameters and save them; then, when any new objects get entered onto the system, if they match the existing search parameters that a user has saved, a notification will be sent, etc.

I am having a hard time finding solutions for this type of problem.

I am using Django and thinking of building the searches and pickling them using Q objects as outlined here: http://www.djangozen.com/blog/the-power-of-q

The way I see it, when a new object is entered into the database, I will have to load every single saved query from the db and somehow run it against this one new object to see if it would match that search query... This doesn't seem ideal - has anyone tackled such a problem before?

594

asked Mar 12 '10 08:03

edub

2 Answers

At the database level, many databases offer 'triggers'.

Another approach is to have timed jobs that periodically fetch all items from the database that have a last-modified date since the last run; then these get filtered and alerts issued. You can perhaps put some of the filtering into the query statement in the database. However, this is a bit trickier if notifications need to be sent if items get deleted.

You can also put triggers manually into the code that submits data to the database, which is perhaps more flexible and certainly doesn't rely on specific features of the database.

A nice way for the triggers and the alerts to communicate is through message queues - queues such as RabbitMQ and other AMQP implementations will scale with your site.

answered Sep 28 '22 06:09

Will

The amount of effort you use to solve this problem is directly related to the number of stored queries you are dealing with.

Over 20 years ago we handled stored queries by treating them as minidocs and indexing them based on all of the must have and may have terms. A new doc's term list was used as a sort of query against this "database of queries" and that built a list of possibly interesting searches to run, and then only those searches were run against the new docs. This may sound convoluted, but when there are more than a few stored queries (say anywhere from 10,000 to 1,000,000 or more) and you have a complex query language that supports a hybrid of Boolean and similarity-based searching, it substantially reduced the number we had to execute as full-on queries -- often no more that 10 or 15 queries.

One thing that helped was that we were in control of the horizontal and the vertical of the whole thing. We used our query parser to build a parse tree and that was used to build the list of must/may have terms we indexed the query under. We warned the customer away from using certain types of wildcards in the stored queries because it could cause an explosion in the number of queries selected.

Update for comment:

Short answer: I don't know for sure.

Longer answer: We were dealing with a custom built text search engine and part of it's query syntax allowed slicing the doc collection in certain ways very efficiently, with special emphasis on date_added. We played a lot of games because we were ingesting 4-10,000,000 new docs a day and running them against up to 1,000,000+ stored queries on a DEC Alphas with 64MB of main memory. (This was in the late 80's/early 90's.)

I'm guessing that filtering on something equivalent to date_added could be done used in combination the date of the last time you ran your queries, or maybe the highest id at last query run time. If you need to re-run the queries against a modified record you could use its id as part of the query.

For me to get any more specific, you're going to have to get a lot more specific about exactly what problem you are trying to solve and the scale of the solution you are trying accomplishing.

answered Sep 28 '22 07:09

Peter Rowell

Related questions
                            
                                How to list the files in a static directory?
                            
                                Docs for the internals of CPython Implementation
                            
                                Django or CodeIgniter for Turn-Key Web Application
                            
                                Can I encrypt email and decrypt it back using python default library set?
                            
                                reading a stream made by urllib2 never recovers when connection got interrupted
                            
                                What is the fastest way to draw an image in Gtk+?
                            
                                Change python file in place
                            
                                How do I efficiently do a bulk insert-or-update with SQLAlchemy?
                            
                                Python TCP stack implementation
                            
                                How can I use Microsoft Word's spelling/grammar checker programmatically?
                            
                                wxPython: Update wx.ListBox list
                            
                                General Purpose Progressbar in Django
                            
                                Is there any scripting SVG editor?
                            
                                Writing a LaTeX document with Python code snippets
                            
                                django, location based searches
                            
                                The memory usage reported by guppy differ from ps command
                            
                                Memcached getting null for String set with python and then get from Java
                            
                                Prevent a console app from closing when not invoked from an existing terminal?
                            
                                Overwrite auto_now for unittest
                            
                                Ugly combination of generator expression with for loop

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Reverse Search Best Practices?

Tags:

python

search

django

reverse

edub

People also ask

2 Answers

Will

Peter Rowell

Recent Activity

Donate For Us