Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mongodb match accented characters as underlying character

In MongoDB "db.foo.find()" syntax, how can I tell it to match all letters and their accented versions?

For example, if I have a list of names in my database:
João
François
Jesús

How would I allow a search for the strings "Joao", "Francois", or "Jesus" to match the given name?
I am hoping that I don't have to do a search like this every time:
db.names.find({name : /Fr[aã...][nñ][cç][all accented o characters][all accented i characters]s/ })

like image 819
Josh Avatar asked Oct 10 '11 01:10

Josh


3 Answers

As of Mongo 3.2, you can use $text and set $diacriticSensitive to false:

{
  $text:
    {
      $search: <string>,
      $language: <string>,
      $caseSensitive: <boolean>,
      $diacriticSensitive: <boolean>
    }
}

See more in the Mongo docs: https://docs.mongodb.com/manual/reference/operator/query/text/

like image 193
Eliezer Steinbock Avatar answered Oct 12 '22 15:10

Eliezer Steinbock


I suggest you add an indexed field like NameSearchable of simplified strings, e.g.

  • João -> JOAO
  • François -> FRANCOIS
  • Jesús -> JESUS
  • Jürgen -> JUERGEN

The same mapping that is used when inserting new items in the database can be used when searching. The original string with correct casing and accents will be preserved.

Most importantly, the query can make use of indexing. Case insensitive queries and regex queries can not use indexes (with the exception of rooted regexs) and will grow prohibitively slow on large collections.

Oh, and since the simplified strings can be created from the original strings, it's not a problem to add this to existing collections.

like image 41
mnemosyn Avatar answered Oct 12 '22 17:10

mnemosyn


In this blog: http://tech.rgou.net/en/php/pesquisas-nao-sensiveis-ao-caso-e-acento-no-mongodb-e-php/

Somebody used the approach you were trying to do. This is as far as I know the only solution for the latest MongoDB version.

like image 42
Menda Avatar answered Oct 12 '22 16:10

Menda