Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Postgresql - How do I extract the first occurence of a substring in a string using a regular expression pattern?

I am trying to extract a substring from a text column using a regular expression, but in some cases, there are multiple instances of that substring in the string.

In those cases, I am finding that the query does not return the first occurrence of the substring. Does anyone know what I am doing wrong?

For example:

If I have this data:

create table data1
(full_text text, name text);

insert into data1 (full_text)
values ('I 56, donkey, moon, I 92')

I am using

UPDATE data1
SET name = substring(full_text from '%#"I ([0-9]{1,3})#"%' for '#')

and I want to get 'I 56' not 'I 92'

like image 915
Michelle Avatar asked Feb 12 '16 20:02

Michelle


People also ask

Can you use regex in PostgreSQL?

Regex is a sequence of characters that defines a pattern that can filter data in PostgreSQL. The TILDE (~) operator and the wildcard operator “. *” is used to implement PostgreSQL's regular expressions.

How do I get part of a string in PostgreSQL?

PostgreSQL provides a built-in function named SUBSTRING() that extracts a substring from any specific string. The SUBSTRING() function accepts three parameters: a string, starting position, and length. The starting position” and “length” parameters are optional that can be skipped depending on the situation.

How do I run a dynamic query in PostgreSQL?

To execute an SQL statement with a single result row, EXECUTE can be used. To save the result, add an INTO clause. EXEC SQL BEGIN DECLARE SECTION; const char *stmt = "SELECT a, b, c FROM test1 WHERE a > ?"; int v1, v2; VARCHAR v3[50]; EXEC SQL END DECLARE SECTION; EXEC SQL PREPARE mystmt FROM :stmt; ...

What is the string operator to pattern match using regular expressions?

The simplest and very common pattern matching character operators is the . This simply allows for any single character to match where a . is placed in a regular expression. For example /b.t/ can match to bat, bit, but or anything like bbt, bct ....


1 Answers

You can use regexp_matches() instead:

update data1
  set full_text = (regexp_matches(full_text, 'I [0-9]{1,3}'))[1];

As no additional flag is passed, regexp_matches() only returns the first match - but it returns an array so you need to pick the first (and only) element from the result (that's the [1] part)

It is probably a good idea to limit the update to only rows that would match the regex in the first place:

update data1
  set full_text = (regexp_matches(full_text, 'I [0-9]{1,3}'))[1]
where full_text ~ 'I [0-9]{1,3}'
like image 128
a_horse_with_no_name Avatar answered Nov 15 '22 08:11

a_horse_with_no_name