Find possible duplicates in two columns ignoring case and special characters

Question

Query

SELECT COUNT(*), name, number
FROM   tbl
GROUP  BY name, number
HAVING COUNT(*) > 1

It sometimes fails to find duplicates between lower case and upper case.
E.g.: sunny and Sunny don't show up as a duplicates.
So how to find all possible duplicates in PostgreSQL for two columns.

Erwin Brandstetter · Accepted Answer

`lower()`/ `upper()`

Use one of these to fold characters to either lower or upper case. Special characters are not affected:

SELECT count(*), lower(name), number
FROM   tbl
GROUP  BY lower(name), number
HAVING count(*) > 1;

`unaccent()`

If you actually want to ignore diacritic signs, like your comments imply, install the additional module unaccent, which provides a text search dictionary that removes accents and also the general purpose function unaccent():

CREATE EXTENSION unaccent;

Makes it very simple:

SELECT lower(unaccent('Büßercafé'));

Result:

busercafe

This doesn't strip non-letters. Add regexp_replace() like @Craig mentioned for that:

SELECT lower(unaccent(regexp_replace('$s^o&f!t Büßercafé', '\W', '', 'g') ));

Result:

softbusercafe

You can even build a functional index on top of that:

Does PostgreSQL support "accent insensitive" collations?

Palpatim · Answer

PostgreSQL by default is case sensitive. You can force it to be case-insensitive during searches by converting all values to a single case:

SELECT COUNT(*), lower(name), number FROM TABLE 
GROUP BY lower(name), number HAVING COUNT(*) > 1

NOTE: This has not been tested in Postgres

Find possible duplicates in two columns ignoring case and special characters

Tags:

sql

pattern-matching

case-insensitive

duplicates

postgresql

Query

Ghostman

2 Answers

`lower()`/ `upper()`

`unaccent()`

Erwin Brandstetter

Palpatim

Recent Activity

Donate For Us

Find possible duplicates in two columns ignoring case and special characters

Tags:

sql

pattern-matching

case-insensitive

duplicates

postgresql

Query

Ghostman

2 Answers

lower()/ upper()

unaccent()

Erwin Brandstetter

Palpatim

Related questions

Recent Activity

Donate For Us

`lower()`/ `upper()`

`unaccent()`