Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Postgres regular expressions and regexp_split_to_array

In postgresql, I need to extract the first two words in the value for a given column. So if the value is "hello world moon and stars" or "hello world moon" or even just "hello world", I need "hello world".

I was hoping to use regexp_split_to_array but it doesn't seem that I can use this and access the elements returned in the same query?

Do I need to create a function for what I'm trying to do?

like image 493
scriptThis Avatar asked Mar 20 '11 22:03

scriptThis


People also ask

How to split a string along its regex matches in PostgreSQL?

PostgreSQL 8.3 and later have two new functions to split a string along its regex matches. regexp_split_to_table (subject, pattern[, flags]) returns the split string as a new table. regexp_split_to_array (subject, pattern[, flags]) returns the split string as an array of text. If the regex finds no matches, both functions return the subject string.

Can I use regexp_split_to_array() with PostgreSQL's substring()?

I spent some hours with googling this. The problem is that both regexp_split_to_array () and regexp_matches () gives the result as an array even the regexp returns a single string only. You can use POSIX regular expressions with PostgreSQL's substring ():

What is regex pattern in PostgreSQL?

RegEx is a sequence of characters that defines a pattern that can be used to filter data in PostgreSQL.PostgreSQL uses POSIX or “Portable Operating System Interface for uniX” Regular Expressions which are better than LIKE and SIMILAR TO operators used for pattern matching.

What is the difference between regexp_split_to_array and regexp_split_to_table?

regexp_split_to_array : It splits the string according to regular expression and returns its parts in an array. regexp_split_to_table : It splits the string into pieces according to the regular expression and returns its parts in the rows of a table. Split by space (' ') character and get an array: Split by space (' ') character and get an array.


2 Answers

I can't believe that 5 years ago and no one noticed that you can access elements from regexp_split_to_array function if you surround them with parenthesis.

I saw many people tried to access the elements of the table like this:

select regexp_split_to_array(my_field, E'my_pattern')[1] from my_table

The previous will return an error, but the following will not :

select (regexp_split_to_array(my_field, E'my_pattern'))[1] from my_table
like image 167
VGe0rge Avatar answered Sep 18 '22 14:09

VGe0rge


You can use POSIX regular expressions with PostgreSQL's substring():

select substring('hello world moon' from E'^\\w+\\s+\\w+');

Or with a very liberal interpretation of what a word is:

select substring('it''s a nice day' from E'^\\S+\\s+\\S+');

Note the \S (non-whitespace) instead of \w ("word" character, essentially alphanumeric plus underscore).

Don't forget all the extra quoting nonsense though:

  • The E'' to tell PostgreSQL that you're using extending escaping.
  • And then double backslashes to get single backslashes past the string parser and in to the regular expression parser.

If you really want to use regexp_split_to_array, then you can but the above quoting issues apply and I think you'd want to slice off just the first two elements of the array:

select (regexp_split_to_array('hello world moon', E'\\s+'))[1:2];

I'd guess that the escaping was causing some confusion; I usually end up adding backslashes until it works and then I pick it apart until I understand why I needed the number of backslashes that I ended up using. Or maybe the extra parentheses and array slicing syntax was an issue (it was for me but a bit of experimentation sorted it out).

like image 26
mu is too short Avatar answered Sep 17 '22 14:09

mu is too short