My Ubuntu Web development system was recently updated from 19.04 to 19.10 that included an update of MySQL, which is now version 8.0.18. The live server has version 5.6.45 (it is a shared server running CentOS so I cannot change it) and I discovered that something in REGEXP has apparently changed with the update.
The following query, when run locally, gives Illegal argument to a regular expression but runs properly on the live server. I am not too familiar with regular expressions much less REGEXP in MySQL so how can I make this work in the newer MySQL without breaking it in the old one?
These are the two REGEXP lines by themselves, followed by the complete query.
REGEXP '[[:<:]][0-9]+(-[0-9]+)*[[:>:]]'
REGEXP '^[0-9]+(-[0-9]+)*$'
SELECT t.ID AS partID, t.partNo,
TRIM(CONCAT(
IF(N.n = 0, '', LEFT(t.Model, 2)),
SUBSTRING_INDEX(SUBSTRING_INDEX(t.Model, '-', N.n + 1), '-', -1)
)) AS modelNo, NULL AS bodyNo, t.xHD AS isRHD
FROM (
SELECT
SUBSTRING_INDEX(SUBSTRING_INDEX(t.part, ' ', N.n + 1), ' ', -1) AS Model,
CASE LEFT(t.part, 3) WHEN 'LHD' THEN 1 WHEN 'RHD' THEN 2 ELSE 0 END AS xHD,
t.ID, t.GroupNumber, t.partNo, t.Models
FROM (
SELECT
LTRIM(SUBSTRING_INDEX(SUBSTRING_INDEX(p.Models, ';', N.n + 1), ';', -1)) part,
p.ID, p.GroupNumber, p.partNo, p.Models
FROM parts_listing AS p CROSS JOIN parts_temp AS N
WHERE Models REGEXP '[[:<:]][0-9]+(-[0-9]+)*[[:>:]]' AND
N.n <= LENGTH(Models) - LENGTH(REPLACE(Models, ';', ''))
) AS t CROSS JOIN parts_temp AS N
WHERE N.n <= LENGTH(t.part) - LENGTH(REPLACE(t.part, ' ', ''))
) AS t CROSS JOIN parts_temp AS N
WHERE t.Model REGEXP '^[0-9]+(-[0-9]+)*$' AND
N.n <= LENGTH(t.model) - LENGTH(REPLACE(t.Model, '-', ''));
MySQL 8.0.4 introduced changed the implementation of its regexes engine from Henry Spencer's implementation to Internation Components for Unicode (ICU). This involves some non-backward compatible changes, which are listed in the documentation. Here is the part that is of interest for your use case:
The Spencer library supports word-beginning and word-end boundary markers (
[[:<:]]and[[:>:]]notation). ICU does not. For ICU, you can use\bto match word boundaries; double the backslash because MySQL interprets it as the escape character within strings.
In other words, the following regexp is invalid in MySQL 8.0.4 onwards: '[[:<:]][0-9]+(-[0-9]+)*[[:>:]]', because it contains Henry Spencer's boudary markers.
how can I make this work in the newer MySQL without breaking it in the old one?
Short answer: you can't. Changes are not backward compatible. You may be able to work around this situation by defining a custom expression (or character class) that represents word word boundaries for your use case. Here is an extremely simplified version, that handles a space or the beginning/end of the string:
'(^|\s)[0-9]+(-[0-9]+)*(\s|$)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With