Say I'm designing a tool that would save code snippets either in a PostgreSQL/MySQL database or on the file system. I want to search through these snippets. Using a search engine like Sphinx doesn't seem practical because we need exact text matches of code when searching code.
grep
and ack
and has always worked great, but storing stuff in a database makes a large collection of stuff more manageable in certain ways. I'm wonder what the relative performance of running grep
recursively over a tree of directories is compared to running a query like SQL's LIKE or MySQL's REGEXP function over an equivalent number of records with TEXT blobs is.
If you've 1M files to grep through, you will (best I'm aware) go through each one with a regular expression.
For all intents and purposes, you're going to end up doing the same thing over table rows if you mass-query them using a LIKE operator or a regular expression.
My own experience with grep is that I seldom look for something that doesn't contain at least one full word, however, so you might be able to take advantage of a database to reduce the set in which you're searching.
MySQL has native full text search features, but I'd recommend against because they mean you're not using InnoDB.
You can read about those from Postgres here:
http://www.postgresql.org/docs/current/static/textsearch.html
After creating an index on a tsvector column, you can then do your "grep" in two steps, one to immediately find rows that might vaguely qualify, followed by another on your true criteria:
select * from docs where tsvcol @@ :tsquery and (regexp at will);
That will be significantly faster than anything grep can do.
I can't compare them but both will take long. My guess is grep will be faster.
But MySQL support full text indexing and searching, which will be faster then grep -- i guess again.
Also, I did not understand, what is the problem with Sphinx or Lucene. Anyway, here's a benchmark for MySQL, Sphinx and Lucene
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With