Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PostgreSQL accent + case insensitive search

I'm looking for a way to support with good performances case insensitive + accent insensitive search. Till now we had no issue on this using MSSql server, on Oracle we had to use OracleText, and now we need it on PostgreSQL.

I've found this post about it, but we need to combine it with case insensitive. We also need to use indexes, otherwise performances could be impacted. Any real experience about the best approach for large databases?

like image 337
Robert Avatar asked Feb 20 '15 11:02

Robert


People also ask

How do you do a case insensitive search in PostgreSQL?

PostgreSQL case insensitive is defined as searching with considering as the SQL select queries and the regular expression in PostgreSQL. While using regular expressions, we need to use the PostgreSQL ~* operator instead of the like operator; we can also use the ilike operator in PostgreSQL.

Is Postgres query case sensitive?

PostgreSQL names are case sensitive. By default, AWS Schema Conversion Tool (AWS SCT) uses object name in lowercase for PostgreSQL. In most cases, you'll want to use AWS Database Migration Service transformations to change schema, table, and column names to lower case.

What is collation in PostgreSQL?

The collation feature allows specifying the sort order and character classification behavior of data per-column, or even per-operation. This alleviates the restriction that the LC_COLLATE and LC_CTYPE settings of a database cannot be changed after its creation.


1 Answers

If you need to "combine with case insensitive", there are a number of options, depending on your exact requirements.

Maybe simplest, make the expression index case insensitive.

Building on the function f_unaccent() laid out in the referenced answer:

  • Does PostgreSQL support "accent insensitive" collations?
CREATE INDEX users_lower_unaccent_name_idx ON users(lower(f_unaccent(name)));

Then:

SELECT *
FROM   users
WHERE  lower(f_unaccent(name)) = lower(f_unaccent('João'));

Or you could build the lower() into the function f_unaccent(), to derive something like f_lower_unaccent().

Or (especially if you need to do fuzzy pattern matching anyways) you can use a trigram index provided by the additional module pg_trgm building on above function, which also supports ILIKE. Details:

  • LOWER LIKE vs iLIKE

I added a note to the referenced answer.

Or you could use the additional module citext (but I rather avoid it):

  • Deferrable, case-insensitive unique constraint
like image 191
Erwin Brandstetter Avatar answered Oct 13 '22 08:10

Erwin Brandstetter