SELECT COUNT(*), name, number
FROM tbl
GROUP BY name, number
HAVING COUNT(*) > 1
It sometimes fails to find duplicates between lower case and upper case.
E.g.: sunny
and Sunny
don't show up as a duplicates.
So how to find all possible duplicates in PostgreSQL for two columns.
lower()
/ upper()
Use one of these to fold characters to either lower or upper case. Special characters are not affected:
SELECT count(*), lower(name), number
FROM tbl
GROUP BY lower(name), number
HAVING count(*) > 1;
unaccent()
If you actually want to ignore diacritic signs, like your comments imply, install the additional module unaccent
, which provides a text search dictionary that removes accents and also the general purpose function unaccent()
:
CREATE EXTENSION unaccent;
Makes it very simple:
SELECT lower(unaccent('Büßercafé'));
Result:
busercafe
This doesn't strip non-letters. Add regexp_replace()
like @Craig mentioned for that:
SELECT lower(unaccent(regexp_replace('$s^o&f!t Büßercafé', '\W', '', 'g') ));
Result:
softbusercafe
You can even build a functional index on top of that:
PostgreSQL by default is case sensitive. You can force it to be case-insensitive during searches by converting all values to a single case:
SELECT COUNT(*), lower(name), number FROM TABLE
GROUP BY lower(name), number HAVING COUNT(*) > 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With