Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Returning Matching Substrings with Postgres Regex Search

I am trying to pull some values from a varchar field in Postgres. The *product_name* field could contain something like 'Big Bag 24-0-3 Twenty Gallons' or 'Small Bag 0-14-40' and the product code is the #-#-#. The numbers in the product code can be 0, or can be one or two digits, but there will always be three numbers separated by two dashes.

I've got the matching products returning correctly, but now I need to get each number in a separate field, so I'd really appreciate a person with a bigger brain than me helping out with the substring returns!

This Regex Match returns the correct products:

select * from products where product_name LIKE '%_-_-_%'

I've tried to return the product code using a substring but it is cutting off products that have a third number with two digits (i.e. 'Big Bag 24-0-32 Foo' would return '24-0-3')

select trim(substring(name from '%#"__-_-_#"%' for '#')),* 
from products where name LIKE '%_-_-_%'

And really, the whole number doesn't do me a lot of good anyway - I really need to extract each of the three numbers in a separate substring.

like image 887
user1681072 Avatar asked Feb 15 '23 18:02

user1681072


2 Answers

One option would be to use regexp_matches to extract the code:

regexp_matches(string text, pattern text [, flags text])
Return all captured substrings resulting from matching a POSIX regular expression against the string.

and then regexp_split_to_array:

regexp_split_to_array(string text, pattern text [, flags text ])
Split string using a POSIX regular expression as the delimiter.

to pull the code apart into the numbers. For example:

=> select regexp_split_to_array((regexp_matches('Big Bag 24-0-3 Twenty Gallons', E'(\\d+-\\d+-\\d+)'))[1], '-');
 regexp_split_to_array 
-----------------------
 {24,0,3}
(1 row)

The {24,0,3} is a three element array containing the three numbers (as strings) you're interested in. There's also regexp_split_to_table if a three row table would be easier to work with than an array:

=> select regexp_split_to_table((regexp_matches('Big Bag 24-0-3 Twenty Gallons', E'(\\d+-\\d+-\\d+)'))[1], '-');
 regexp_split_to_table 
-----------------------
 24
 0
 3
(3 rows)
like image 162
mu is too short Avatar answered Feb 17 '23 09:02

mu is too short


This isn't as regex-y as you were looking for but maybe it will get you closer:

Select substring( arr[ 1 ] from '[0-9][0-9]*' ) as first,
    arr[ 2 ] as second,
    substring( arr[ 3 ] from '[0-9][0-9]*' ) as third
FROM
(
Select string_to_array( d1, '-' ) as arr
from
(
SELECT * FROM ( VALUES
( 1, 'Big Bag 24-0-3 Twenty Gallons' ),
( 2, 'Small Bag 0-14-40' ),
( 3, 'Big Bag 24-0-32 Foo' ),
( 4, 'Other Bag 4-4-24' )
) AS products( id, d1 )
) AS values_table
) AS get_array

There's probably a nicer way to do this in one pass and without all the block AS aliasing but here's the breakdown:

  • VALUES table supplies the test data - d1 is the data to get.
  • This is parsed by - in string_to_array() to get and array with substrings like Big Bag 24, 0 and 3 Twenty Gallons (which are auto-typed)
  • The outer select converts the array values by only picking out the numbers from the first and last array elements.

That kind of work could be put into a function to get each of the numbers for you but should get NULL tests etc.

like image 23
n0741337 Avatar answered Feb 17 '23 09:02

n0741337