Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract a substring without using regexes - PostgreSQL

This is based on a question previously asked which was deleted by the OP but it got me thinking and playing round with it, and I managed to do it without using a regex (only (I)LIKE() % or _ allowed).

  • The question is also a lesson in the perils of allowing users to enter free text!

However, my solution is quite cumbersome and I've wracked my brains to see if I could find a more elegant solution, but no joy.

I'm not going to put my own solution forward yet.

A typical string might be like this:

'First order 437.3/10-87 16NY100013XX55 - Return'

So, the only thing you can be sure of is that the string (order_name) starts with '437.' and that it ends with the last digit in the string.

  • N.B. The text following the order is arbitrary: its only constraint is that it can't contain a digit!

So, the result for this particular example would be

'437.3/10-87 16NY100013XX55'

So, my table and data are below and also on the fiddle here:

CREATE TABLE t1 
(
  id INT NOT NULL PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
  order_name TEXT NOT NULL
);

strings:

INSERT INTO t1 (order_name) VALUES
('First order 437.8/03-87 22LA190028YV7ER55'),
('Second order 437.8-03-87 22LA19'),
('First order 437.3/10-87 16NY100013XX55 - Return'),
('Order 21.02.2022 437.8/10-87 16WA239766'),
('437.8/10-87 16NY10023456YY78 - Paid'),
('First order (437.8/03-87 22LA190028)'),
('Visit 02.02.2023 Order 437.5/10-87 16DC107765X56 REFUND'),
('Visit 02.02.2023 Order 437.5/10-87 16DC1077657FFR56REFUND'),    
('Visit 02.02.2023 Order 437.5/10-87 16DC107765745 - Reorder');

Desired result:

Order Text
437.8/03-87 22LA190028YV7ER55
437.8-03-87 22LA19
437.3/10-87 16NY100013XX55
437.8/10-87 16WA239766
437.8/10-87 16NY10023456YY78
437.8/03-87 22LA190028
437.5/10-87 16DC107765X56
437.5/10-87 16DC1077657FFR56
437.5/10-87 16DC107765745

The fiddle is PostgreSQL 16, but I've checked and it works back to version 10. The question is PostgreSQL based, but I'm open to elegant solutions from other systems.

like image 911
Vérace Avatar asked Oct 31 '25 03:10

Vérace


2 Answers

Find the positions of '437.' and the last digit in the order_name string (with a scalar subquery) to use with substring function.

select substring(
  order_name, 
  position('437.' in order_name), 
  (
   select max(o)::integer 
   from string_to_table(order_name, null) with ordinality as t(c, o)
   where c between '0' and '9'
  ) - position('437.' in order_name) + 1
) from t1;

Fiddle

like image 199
Stefanov.sm Avatar answered Nov 01 '25 18:11

Stefanov.sm


In order to extract string without using regular expressions, I would use string functions like POSITION, and SUBSTRING. I have two solutions:

Solution 1:

WITH ExtractedOrders AS (
  SELECT
    SUBSTRING(order_name FROM POSITION('437.' IN order_name)) AS raw_order_text
  FROM t1
)
SELECT
  TRIM(
    BOTH ' - ' FROM
    CASE
      WHEN raw_order_text LIKE '%Paid' THEN LEFT(raw_order_text, LENGTH(raw_order_text) - LENGTH('Paid'))
      WHEN raw_order_text LIKE '%REFUND' THEN LEFT(raw_order_text, LENGTH(raw_order_text) - LENGTH('Refund'))
      WHEN raw_order_text LIKE '%Reorder' THEN LEFT(raw_order_text, LENGTH(raw_order_text) - LENGTH('Reorder'))
      WHEN raw_order_text LIKE '%Return' THEN LEFT(raw_order_text, LENGTH(raw_order_text) - LENGTH('Return'))
      WHEN raw_order_text LIKE '%)' THEN LEFT(raw_order_text, LENGTH(raw_order_text) - LENGTH(')'))
      ELSE raw_order_text
    END
  ) AS "Order Text"
FROM ExtractedOrders;

Solution 2:

WITH ExtractedOrders AS (
  SELECT
    id,
    SUBSTRING(order_name FROM POSITION('437.' IN order_name)) AS raw_order_text
  FROM t1
)
SELECT
  TRIM(BOTH ' -' FROM cleaned_order_text) AS "Order Text"
FROM (
  SELECT
    id,
    TRIM(BOTH ' -' FROM
      REPLACE(
        REPLACE(
          REPLACE(
            REPLACE(
              REPLACE(raw_order_text, 'Paid', ''), 
              'REFUND', ''
            ), 
            'Reorder', ''
          ), 
          'Return', ''
        ), 
        ')', ''
      )
    ) AS cleaned_order_text
  FROM ExtractedOrders
) AS cleaned_orders;

Edit

I had understood your question a little differently, my assumption was the ending text is consistent and can be handled via hard code methods.

As per your comment, changed my approach to checked min int index on string reverse and than fetched substring via lengths and indices. Here is the new solution:

WITH RECURSIVE check_integers AS (
  SELECT
    order_name,
    reverse(order_name) AS reversedStr,
    1 AS idx,
    substring(reverse(order_name) from 1 for 1) AS current_char,
    CASE
      WHEN substring(reverse(order_name) from 1 for 1) ~ '\d' THEN true
      ELSE false
    END AS is_int
  FROM t1
  UNION ALL
  SELECT
    order_name,
    reversedStr,
    idx + 1 AS idx,
    substring(reversedStr from idx + 1 for 1) AS current_char,
    CASE
      WHEN substring(reversedStr from idx + 1 for 1) ~ '\d' THEN true
      ELSE false
    END AS is_int
  FROM
    check_integers
  WHERE
    idx < length(reversedStr)
)
SELECT
  substring(a.order_name from POSITION('437.' IN a.order_name) for (length(a.order_name) - POSITION('437.' IN a.order_name) - idx + 2))  as orderText
FROM (  
  SELECT
    min(idx) as idx,
    order_name
  FROM
    check_integers
  WHERE
    is_int = true
  GROUP BY
    order_name
) a;
like image 35
hassaan mustafa Avatar answered Nov 01 '25 18:11

hassaan mustafa