Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Oracle regex looking for space or end of string

I am working on a query that validates quarter data in a legal description. Our standard is input like "SE/4" to mark the southeast quarter or "SE/4 NW/4" to mark the southeast quarter of the northwest quarter. I'm struggling with how to structure by regex to check for a space or end of string.

Here is some sample data with my regex so far.

WITH test_data AS (
  SELECT 'NW/4' AS quarter_cd FROM dual UNION ALL --VALID
  SELECT 'E/2 SW/4' FROM dual UNION ALL           --VALID
  SELECT 'W/2' FROM dual UNION ALL                --VALID
  SELECT 'SW/4 NE/4' FROM dual UNION ALL          --VALID
  SELECT 'SW/4 NE/4 NW/4' FROM dual UNION ALL     --VALID, THEY CAN REPEAT AN UNKNOWN NUMBER OF TIMES
  SELECT 'E/2 N/2' FROM dual UNION ALL            --TECHNICALLY VALID BUT WOULD LIKE TO EXCLUDE (1/2 of 1/2 is a 1/4) -> NE/4
  SELECT 'E/2 SW/4, SE/4' FROM dual UNION ALL     --INVALID, HAS A COMMA (TWO QUARTER ENTRIES ON ONE ROW)
  SELECT 'E/2 SW/4 & SE/4' FROM dual UNION ALL    --INVALID, HAS AN AMPERSAND (TWO QUARTER ENTRIES ON ONE ROW)
  SELECT 'E/2 SW/' FROM dual UNION ALL            --INVALID, INCOMPLETE ENTRY
  SELECT 'SE/4SW/4' FROM dual UNION ALL           --INVALID, NO SPACE BETWEEN DEFINITIONS
  SELECT 'SE/2' FROM dual UNION ALL               --INVALID, SOUTHEAST HALF DOES NOT MAKE SENSE
  SELECT 'N/4' FROM dual UNION ALL                --INVALID, NORTH QUARTER DOES NOT MAKE SENSE
  SELECT 'LOT 1' FROM dual                        --INVALID, LOTS WILL BE DEALT WITH SEPARATELY
)
SELECT * FROM test_data 
WHERE regexp_like(quarter_cd, '^([NSEW]/[2]{1}|[NSEW]{2}/[4]{1})+', 'c');

The regex in my code is just one of my many attempts. I've marked in the query the results that should be returned. I'm willing to allow "E/2 N/2" to be returned for simplicity sake, although technically it is invalid as the east half of the north half would best be simplified to the northeast quarter. All examples above were pulled from actual entries in my data.

Any help would be appreciated.

like image 592
sukach Avatar asked Dec 28 '25 22:12

sukach


2 Answers

Here's my, lowly, attempt:

select *
  from test_data
 where regexp_like(quarter_cd
        , '^((([NSEW]{1}/2)|[NS]{1}[EW]{1}/4)([[:space:]]|$))+$'
        , 'c')

It does return E/2 N/2 I'm afraid.

This

  • Allows one of N S E W followed by a 2
  • or one of N S and E W followed by a 4
  • This must be followed by a space or the end of the line
  • Allow this to match greedily
  • Must end in the end of the line

By splitting up your [NSEW] it precludes a match on NS or EW etc.

Here's a SQL Fiddle to demonstrate. I've added a couple of extra cases on top of your own. The problem with this is that it'll allow all four halves.

I would seriously consider not using a regular expression to validate this data. Instead pass it through a PL/SQL function. Split on the space and add up what you have to check that you don't go over the limits. You can then use a smaller regular expression to validate based data between the space delimiters.

like image 95
Ben Avatar answered Dec 30 '25 10:12

Ben


I think something like this will give you what you want:

SELECT * FROM 
  test_data 
WHERE 
  regexp_like(quarter_cd, 
  '^([NSEW]/[2]{1}|[NSEW]{2}/[4]{1})( [NSEW]/[2]{1}| [NSEW]{2}/[4]{1})*$', 'c');

It will match the "E2 / N2" case, though. If you do this instead:

SELECT * FROM 
  test_data 
WHERE 
  regexp_like(quarter_cd, 
  '^([NSEW]/[2]{1}|[NSEW]{2}/[4]{1})( [NSEW]{2}/[4]{1})*$', 'c');

then it won't match that, but it also wouldn't match any case that contains [NSEW]/2 after the initial position. So this wouldn't be good if you need to match, say "NW/4 E/2"... the eastern half of the northwest quarter.

like image 28
Mike Avatar answered Dec 30 '25 10:12

Mike