Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Database/datasource optimized for string matching?

I want to store large amount (~thousands) of strings and be able to perform matches using wildcards.

For example, here is a sample content:

  • Folder1
  • Folder1/Folder2
  • Folder1/*
  • Folder1/Folder2/Folder3
  • Folder2/Folder*
  • */Folder4
  • */Fo*4

(each line has additionnal data too, like tags, but the matching is only against that key)

Here is an example of what I would like to match against the data:

  • Folder1
  • Folder1/Folder2/Folder3
  • Folder3

(* being a wildcard here, it can be a different character)

I naively considered storing it in a MySQL table and using % wildcards with the LIKE operator, but MySQL indexes will only work for characters on the left of the wildcard, and in my case it can be anywhere (i.e. %/Folder3).

So I'm looking for a fast solution, that could be used from PHP. And I am open: it can be a separate server, a PHP library using files with regex, ...

like image 567
Matthieu Napoli Avatar asked Feb 22 '13 12:02

Matthieu Napoli


1 Answers

Have you considered using MySQL's regular expression engine? Try something like this:

SELECT * FROM your_table WHERE your_query_string REGEXP pattern_column

This will return rows with regex keys that your query string matches. I expect it will perform better than running a query to pull all of the data and doing the matching in PHP.

More info here: http://dev.mysql.com/doc/refman/5.1/en/regexp.html

like image 191
BrickWall10 Avatar answered Nov 15 '22 09:11

BrickWall10