I have a column of type text, that contains random values. Some of them are numbers, some other texts, and some other mixed text and numbers. I'm trying to discriminate them as:
myfield is_numeric
____________________
-4 true
0004 true
4.00 true
dog false
D04 false
04f false
Whereas others are names and other strings. I was using the regular expression
SELECT id,
myfield
(myfield::varchar~ '^-?[0-9]*.?[0-9]*$') is_numeric
FROM mytable
To tell wether the row contains a valid number or not. However, I noticed that values like D04
and 04c
return true
for that regex, which, for my use case, is a false positive.
Why is this happening? It seems that ^
is matching not necessarily the whole value, but any valid substring of the value. However, values like D04f
do return false
, so even if there's a numeric substring in the field, the combination of ^
and $
operators is doing its job.
I have temporarily resorted to using:
SELECT id,
myfield
(myfield::varchar ~ '^-?[0-9]*.?[0-9]*$'
AND myfield::varchar !~ '[^0-9\-\.]') is_numeric
FROM mytable
But this seem inneficient (and does not rule out double dots), and I'm still wondering why the regex is correctly ruling out strings that start and end with a non-numeric character, while incorrecly returning true for string that contain only a trailing or leading non numeric character.
Would this work for you?
^-?[0-9]+\.?[0-9]*$
I'm asuming -0.07.5
is invalid (double dot present).
D04
will return false as well.
The problem in your original regex is that you're not escaping the dot, so it will match any characher, including D
in your D04
.
Hope it helps.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With