Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL similar to regex, equivalents for start ^ and end $ of line

I want to use a regex like (^|\s)1001(\s|$) in a Firebird similar to expression:

Examples:

  • abc 1001 abc - true
  • abc 121001 abc - false
  • 1001 abc - true
  • 121001 - false
  • abc 1001 - true

I try'd to convert it to a regex in Firebird:

Where COLUMN similar to (^|[:WHITESPACE:])abc 1001 abc($|[:WHITESPACE:]), but ^ (start of line) and $ (end of line) is not working and the query end with:

Invalid SIMILAR TO pattern Exception.

I can not find anything about start and end of line in the Firebird Doc's at https://firebirdsql.org/refdocs/langrefupd25-similar-to.html

like image 718
smoothie Avatar asked Aug 31 '18 16:08

smoothie


People also ask

How do I specify start and end in RegEx?

To match the start or the end of a line, we use the following anchors: Caret (^) matches the position before the first character in the string. Dollar ($) matches the position right after the last character in the string.

Which RegEx matches the end of line?

End of String or Line: $ The $ anchor specifies that the preceding pattern must occur at the end of the input string, or before \n at the end of the input string. If you use $ with the RegexOptions. Multiline option, the match can also occur at the end of a line.

Can you use RegEx in SQL like?

You can use RegEx in many languages like PHP, Python, and also SQL. RegEx lets you match patterns by character class (like all letters, or just vowels, or all digits), between alternatives, and other really flexible options.


2 Answers

From the Firebird 2.5 Language Reference, SIMILAR TO documentation:

SIMILAR TO matches a string against an SQL regular expression pattern. Unlike in some other languages, the pattern must match the entire string in order to succeed—matching a substring is not enough.

In other words, the regular expression is multi-line and - given the linked documentation - provides no start/end anchors as those are already implied (but then whole string, not per line), as partial matches are not supported.

The regular expression implementation in Firebird conforms to the SQL standard, which also doesn't define start / end anchors.

Given your requirements, you probably need something like:

'(abc 1001( %)?)|((% )?1001 abc)'

Where ( %)? means optionally match a space and zero or more of any character. Given the whole string must match, that means it finds either a space or the end of the string, and similar for (% )?.

You may need to add additional terms if you also need to find this in the middle of a string (but none of your examples suggested that).

Or, a direct equivalent of (^|\s)1001(\s|$):

'(%[[:WHITESPACE:]])?1001([[:WHITESPACE:]]%)?'

An earlier version of this answer used (% |) instead of (% )?, but given empty terms are not documented nor part of the standard, that is possibly an implementation bug or at best an undocumented feature. Use that at your own risk.

like image 146
Mark Rotteveel Avatar answered Oct 19 '22 02:10

Mark Rotteveel


Now, the (^|\s)1001(\s|$) would not work since it means you want to get partial matches. It is not possible with SIMILAR TO:

SIMILAR TO matches a string against an SQL regular expression pattern. Unlike in some other languages, the pattern must match the entire string in order to succeed—matching a substring is not enough.

Then, (^|\s) means either start of string or whitespace. That means, you should check if the string has any chars and then a whitespace or just 1001 can appear at the start of the string. ($|\s) means either end of string or whitespace. That means, you need to account for 3 cases:

  • Any chars, whitespace, 1001, whitespace and any chars
  • 1001, whitesapce, any chars
  • Any chars, whitespace, 1001

You need to use

WHERE col SIMILAR TO '%[[:WHITESPACE:]]1001[[:WHITESPACE:]]%' or col SIMILAR TO '1001[[:WHITESPACE:]]%' or col SIMILAR TO '%[[:WHITESPACE:]]1001'
like image 2
Wiktor Stribiżew Avatar answered Oct 19 '22 02:10

Wiktor Stribiżew