Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Postgresql and ActiveRecord where: Regex matching

I created this regex in normal Regex

/(first|last)\s(last|first)/i

It matches the first three of

first last
Last first
First Last
First name

I am trying to get all the records where the full_name matches with the regex I wrote. I'm using PostgreSQL

Person.where("full_name ILIKE ?", "%(first|last)%(last|first)%")

This is my attempt. I also tried SIMILAR TO and ~ with no luck

like image 313
Patrick Avatar asked Apr 02 '14 00:04

Patrick


People also ask

Does PostgreSQL support regex?

The simplest use of regex in PostgreSQL is the ~ operator, and its cousin the ~* operator. value ~ regex tests the value on the left against the regex on the right and returns true if the regex can match within the value. Note that the regex does not have to fully match the whole value, it just has to match a part.

What does Activerecord where return?

Returns a new relation, which is the result of filtering the current relation according to the conditions in the arguments.

How can you compare a part of name rather than entire name in PostgreSQL?

The % operator lets you compare against elements of an array, so you can match against any part of the name.

What is the string operator to pattern match using regular expressions?

The simplest and very common pattern matching character operators is the . This simply allows for any single character to match where a . is placed in a regular expression. For example /b.t/ can match to bat, bit, but or anything like bbt, bct ....


1 Answers

Your LIKE query:

full_name ilike '%(first|last)%(last|first)%'

won't work because LIKE doesn't understand regex grouping ((...)) or alternation (|), LIKE only understands _ for a single character (like . in a regex) and % for any sequence of zero or more characters (like .* in a regex).

If you hand that pattern to SIMILAR TO then you'll find 'first last' but none of the others due to case problems; however, this:

lower(full_name) similar to '%(first|last)%(last|first)%'

will take care of the case problems and find the same ones as your regex.

If you want to use a regex (which you probably do because LIKE is very limited and cumbersome and SIMILAR TO is, well, a strange product of the fevered minds of some SQL standards subcommittee) then you'll want to use the case-insensitive matching operator and your original regex:

full_name ~* '(first|last)\s+(last|first)'

That translates to this bit of AR:

Person.where('full_name ~* :pat', :pat => '(first|last)\s+(last|first)')
# or this
Person.where('full_name ~* ?', '(first|last)\s+(last|first)')

There's a subtle change in my code that you need to take note of: I'm using single quotes for my Ruby strings, you're using double quotes. Backslashes mean more in double quoted strings than they do in single quoted strings so '\s' and "\s" are different things. Toss in a couple to_sql calls and you might see something interesting:

> puts Person.where('full_name ~* :pat', :pat => 'a\s+b').to_sql
SELECT "people".* FROM "people"  WHERE (full_name ~* 'a\s+b')

> puts Person.where('full_name ~* :pat', :pat => "a\s+b").to_sql
SELECT "people".* FROM "people"  WHERE (full_name ~* 'a +b')

That difference probably isn't causing you any problems but you need to be very careful with your strings when everyone wants to use the same escape character. Personally, I use single quoted strings unless I specifically need the extra escapes and string interpolation functionality of double quoted strings.

Some demos: http://sqlfiddle.com/#!15/99a2c/6

like image 166
mu is too short Avatar answered Oct 02 '22 13:10

mu is too short