Since v3.4 collations are available for find operations, especially as matches for diacritic characters are concerned. While a find query with a definite value ($eq opeartor or corresponding construct) will match letters and correspondent diacritics, the same is not true if a $regex is used in order to achieve a match on a partial search string (a 'LIKE').
Is there a to make the $regex query use the collation the same way than the $eq query?
consider example collection testcoll:
{ "_id" : ObjectId("586b7a0163aff45945462bea"), "city" : "Antwerpen" },
{ "_id" : ObjectId("586b7a0663aff45945462beb"), "city" : "Antwërpen" }
this query will find both records
db.testcoll.find({city: 'antwerpen'}).collation({"locale" : "en_US", "strength" : 1});
the same query using a regex will not (finds the record with 'Antwerpen' only)
db.testcoll.find({city: /antwe/i}).collation({"locale" : "en_US", "strength" : 1});
Documentation
Case insensitive regular expression queries generally cannot use indexes effectively. The $regex implementation is not collation-aware and is unable to utilize case-insensitive indexes.
I face this same issue today and I searched the Internet like crazy trying to find a solution. Didn't find any. So I came up with my on solution, a little frankenstein that worked for me.
I created a function which removes all the special characters from a string and then replaces all the characters that could be special to the equivalent regexp that could be special. In the end I just add a "i"
option to cover the capitalize strings in my DB.
export const convertStringToRegexp = (text: string) => {
let regexp = '';
const textNormalized = text
.normalize('NFD')
.replace(/[\u0300-\u036f]/g, '') // remove all accents
.replace(/[|\\{}()[\]^$+*?.]/g, '\\$&') // remove all regexp reserved char
.toLowerCase();
regexp = textNormalized
.replace(/a/g, '[a,á,à,ä,â,ã]')
.replace(/e/g, '[e,é,ë,è,ê]')
.replace(/i/g, '[i,í,ï,ì,î]')
.replace(/o/g, '[o,ó,ö,ò,õ,ô]')
.replace(/u/g, '[u,ü,ú,ù,û]')
.replace(/c/g, '[c,ç]')
.replace(/n/g, '[n,ñ]')
.replace(/[ªº°]/g, '[ªº°]');
return new RegExp(regexp, 'i'); // "i" -> ignore case
};
And in my find()
method, I just use this function with $regex
option, like this:
db.testcoll.find({city: {$regex: convertStringToRegexp('twerp')} })
/*
Output:
[
{ "_id" : ObjectId("586b7a0163aff45945462bea"), "city" : "Antwerpen" },
{ "_id" : ObjectId("586b7a0663aff45945462beb"), "city" : "Antwërpen" }
]
*/
I also create a .spec.ts
file (using Chai) to test this function. Of course you could adapt to Jest.
describe('ConvertStringToRegexp', () => {
it('should convert all "a" to regexp', () => {
expect(convertStringToRegexp('TAÁdaáh!')).to.deep.equal(
/t[a,á,à,ä,â,ã][a,á,à,ä,â,ã]d[a,á,à,ä,â,ã][a,á,à,ä,â,ã]h!/i
);
});
it('should convert all "e" to regexp', () => {
expect(convertStringToRegexp('MEÉeéh!')).to.deep.equal(
/m[e,é,ë,è,ê][e,é,ë,è,ê][e,é,ë,è,ê][e,é,ë,è,ê]h!/i
);
});
it('should convert all "i" to regexp', () => {
expect(convertStringToRegexp('VÍIiishí!')).to.deep.equal(
/v[i,í,ï,ì,î][i,í,ï,ì,î][i,í,ï,ì,î][i,í,ï,ì,î]sh[i,í,ï,ì,î]!/i
);
});
it('should convert all "o" to regexp', () => {
expect(convertStringToRegexp('ÓOoóhhhh!!!!')).to.deep.equal(
/[o,ó,ö,ò,õ,ô][o,ó,ö,ò,õ,ô][o,ó,ö,ò,õ,ô][o,ó,ö,ò,õ,ô]hhhh!!!!/i
);
});
it('should convert all "u" to regexp', () => {
expect(convertStringToRegexp('ÚUhuuúll!')).to.deep.equal(
/[u,ü,ú,ù,û][u,ü,ú,ù,û]h[u,ü,ú,ù,û][u,ü,ú,ù,û][u,ü,ú,ù,û]ll!/i
);
});
it('should convert all "c" to regexp', () => {
expect(convertStringToRegexp('Cacacacaca')).to.deep.equal(
/[c,ç][a,á,à,ä,â,ã][c,ç][a,á,à,ä,â,ã][c,ç][a,á,à,ä,â,ã][c,ç][a,á,à,ä,â,ã][c,ç][a,á,à,ä,â,ã]/i
);
});
it('should remove all special characters', () => {
expect(
convertStringToRegexp('hello 123 °º¶§∞¢£™·ª•*!@#$%^WORLD?.')
).to.deep.equal(
/h[e,é,ë,è,ê]ll[o,ó,ö,ò,õ,ô] 123 [ªº°][ªº°]¶§∞¢£™·[ªº°]•\*!@#\$%\^w[o,ó,ö,ò,õ,ô]rld\?\./i
);
});
it('should accept all regexp reserved characters', () => {
expect(
convertStringToRegexp('Olá [-[]{}()*+?.,\\o/^$|#s] Mundo! ')
).to.deep.equal(
/* eslint-disable @typescript-eslint/no-explicit-any */
/[o,ó,ö,ò,õ,ô]l[a,á,à,ä,â,ã] \[-\[\]\{\}\(\)\*\+\?\.,\\[o,ó,ö,ò,õ,ô]\/\^\$\|#s\] m[u,ü,ú,ù,û][n,ñ]d[o,ó,ö,ò,õ,ô]! /i
);
});
});
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With