Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find all rows using some Unicode range (such as Cyrillic characters) with PostgreSQL?

How do I find all rows of a PostgreSQL table that contain characters in some Unicode range, such as Cyrillic characters?

like image 710
Henrik N Avatar asked Nov 05 '13 13:11

Henrik N


People also ask

How do I find special characters in PostgreSQL?

Solution: Let's consider a table called spatial_ref_sys having columns srid , auth_name, auth_srid, srtext, and proj4text. SELECT * FROM spatial_ref_sys WHERE srtext LIKE '%\ /%'; Sometimes these ticks are very useful for searching special characters in a database.

Does PostgreSQL support Unicode?

One of the interesting features of PostgreSQL database is the ability to handle Unicode characters. In SQL Server, to store non-English characters, we need to use NVARCHAR or NCAHR data type. In PostgreSQL, the varchar data type itself will store both English and non-English characters.

How to escape special characters in PostgreSQL?

PostgreSQL also accepts “escape” string constants, which are an extension to the SQL standard. An escape string constant is specified by writing the letter E (upper or lower case) just before the opening single quote, e.g., E'foo' .

Does PostgreSQL support UTF 16?

Unlike Oracle, PostgreSQL doesn't support an NVARHCHAR data type and doesn't offer support for UTF-16.


2 Answers

Figured it out! For Cyrillic:

SELECT * FROM "items" WHERE (title SIMILAR TO '%[\u0410-\u044f]%')

I got the range from http://symbolcodes.tlt.psu.edu/bylanguage/cyrillicchart.html. The characters have hex entities А to я, which are also my numbers above.

like image 128
Henrik N Avatar answered Oct 14 '22 21:10

Henrik N


If you install the pgpcre extension, you can use this expression:

SELECT * FROM items WHERE title ~ pcre '\p{Cyrillic}';
like image 30
Peter Eisentraut Avatar answered Oct 14 '22 20:10

Peter Eisentraut