I need to determine whether a given string can be interpreted as a number (integer or floating point) in an SQL statement. As in the following: <pre class="prettyprint"><code>SELECT AVG(CASE WHEN x ~ '^[0-9]*.?[0-9]*$' THEN x::float ELSE NULL END) FROM test </code></pre> I found that Postgres' pattern matching could be used for this. And so I adapted the statement given in this place to incorporate floating point numbers. This is my code: <pre class="prettyprint"><code>WITH test(x) AS ( VALUES (''), ('.'), ('.0'), ('0.'), ('0'), ('1'), ('123'), ('123.456'), ('abc'), ('1..2'), ('1.2.3.4')) SELECT x , x ~ '^[0-9]*.?[0-9]*$' AS isnumeric FROM test; </code></pre> The output: <pre class="prettyprint"><code> x | isnumeric ---------+----------- | t . | t .0 | t 0. | t 0 | t 1 | t 123 | t 123.456 | t abc | f 1..2 | f 1.2.3.4 | f (11 rows) </code></pre> As you can see, the first two items (the empty string <code>''</code> and the sole period <code>'.'</code>) are misclassified as being a numeric type (which they are not). I can't get any closer to this at the moment. Any help appreciated! <hr> Update Based on this answer (and its comments), I adapted the pattern to: <pre class="prettyprint"><code>WITH test(x) AS ( VALUES (''), ('.'), ('.0'), ('0.'), ('0'), ('1'), ('123'), ('123.456'), ('abc'), ('1..2'), ('1.2.3.4'), ('1x234'), ('1.234e-5')) SELECT x , x ~ '^([0-9]+[.]?[0-9]*|[.][0-9]+)$' AS isnumeric FROM test; </code></pre> Which gives: <pre class="prettyprint"><code> x | isnumeric ----------+----------- | f . | f .0 | t 0. | t 0 | t 1 | t 123 | t 123.456 | t abc | f 1..2 | f 1.2.3.4 | f 1x234 | f 1.234e-5 | f (13 rows) </code></pre> There are still some issues with the scientific notation and with negative numbers, as I see now.

As you may noticed, regex-based method is almost impossible to do correctly. For example, your test says that <code>1.234e-5</code> is not valid number, when it really is. Also, you missed negative numbers. What if something looks like a number, but when you try to store it it will cause overflow? Instead, I would recommend to create function that tries to actually cast to <code>NUMERIC</code> (or <code>FLOAT</code> if your task requires it) and returns <code>TRUE</code> or <code>FALSE</code> depending on whether this cast was successful or not. This code will fully simulate function <code>ISNUMERIC()</code>: <pre class="prettyprint"><code>CREATE OR REPLACE FUNCTION isnumeric(text) RETURNS BOOLEAN AS $$ DECLARE x NUMERIC; BEGIN x = $1::NUMERIC; RETURN TRUE; EXCEPTION WHEN others THEN RETURN FALSE; END; $$ STRICT LANGUAGE plpgsql IMMUTABLE; </code></pre> Calling this function on your data gets following results: <pre class="prettyprint"><code>WITH test(x) AS ( VALUES (''), ('.'), ('.0'), ('0.'), ('0'), ('1'), ('123'), ('123.456'), ('abc'), ('1..2'), ('1.2.3.4'), ('1x234'), ('1.234e-5')) SELECT x, isnumeric(x) FROM test; x | isnumeric ----------+----------- | f . | f .0 | t 0. | t 0 | t 1 | t 123 | t 123.456 | t abc | f 1..2 | f 1.2.3.4 | f 1x234 | f 1.234e-5 | t (13 rows) </code></pre> Not only it is more correct and easier to read, it will also work faster if data was actually a number.

isnumeric() with PostgreSQL

Tags:

regex

postgresql

I need to determine whether a given string can be interpreted as a number (integer or floating point) in an SQL statement. As in the following:

Click to copy

SELECT AVG(CASE WHEN x ~ '^[0-9]*.?[0-9]*$' THEN x::float ELSE NULL END) FROM test

I found that Postgres' pattern matching could be used for this. And so I adapted the statement given in this place to incorporate floating point numbers. This is my code:

Click to copy

WITH test(x) AS (     VALUES (''), ('.'), ('.0'), ('0.'), ('0'), ('1'), ('123'),     ('123.456'), ('abc'), ('1..2'), ('1.2.3.4'))  SELECT x      , x ~ '^[0-9]*.?[0-9]*$' AS isnumeric FROM test;

The output:

Click to copy

    x    | isnumeric  ---------+-----------          | t  .       | t  .0      | t  0.      | t  0       | t  1       | t  123     | t  123.456 | t  abc     | f  1..2    | f  1.2.3.4 | f (11 rows)

As you can see, the first two items (the empty string '' and the sole period '.') are misclassified as being a numeric type (which they are not). I can't get any closer to this at the moment. Any help appreciated!

Update Based on this answer (and its comments), I adapted the pattern to:

Click to copy

WITH test(x) AS (     VALUES (''), ('.'), ('.0'), ('0.'), ('0'), ('1'), ('123'),     ('123.456'), ('abc'), ('1..2'), ('1.2.3.4'), ('1x234'), ('1.234e-5'))  SELECT x      , x ~ '^([0-9]+[.]?[0-9]*|[.][0-9]+)$' AS isnumeric FROM test;

Which gives:

Click to copy

     x    | isnumeric  ----------+-----------           | f  .        | f  .0       | t  0.       | t  0        | t  1        | t  123      | t  123.456  | t  abc      | f  1..2     | f  1.2.3.4  | f  1x234    | f  1.234e-5 | f (13 rows)

There are still some issues with the scientific notation and with negative numbers, as I see now.

380

asked Apr 24 '13 15:04

moooeeeep

1 Answers

As you may noticed, regex-based method is almost impossible to do correctly. For example, your test says that 1.234e-5 is not valid number, when it really is. Also, you missed negative numbers. What if something looks like a number, but when you try to store it it will cause overflow?

Instead, I would recommend to create function that tries to actually cast to NUMERIC (or FLOAT if your task requires it) and returns TRUE or FALSE depending on whether this cast was successful or not.

This code will fully simulate function ISNUMERIC():

Click to copy

CREATE OR REPLACE FUNCTION isnumeric(text) RETURNS BOOLEAN AS $$ DECLARE x NUMERIC; BEGIN     x = $1::NUMERIC;     RETURN TRUE; EXCEPTION WHEN others THEN     RETURN FALSE; END; $$ STRICT LANGUAGE plpgsql IMMUTABLE;

Calling this function on your data gets following results:

Click to copy

WITH test(x) AS ( VALUES (''), ('.'), ('.0'), ('0.'), ('0'), ('1'), ('123'),   ('123.456'), ('abc'), ('1..2'), ('1.2.3.4'), ('1x234'), ('1.234e-5')) SELECT x, isnumeric(x) FROM test;      x     | isnumeric ----------+-----------           | f  .        | f  .0       | t  0.       | t  0        | t  1        | t  123      | t  123.456  | t  abc      | f  1..2     | f  1.2.3.4  | f  1x234    | f  1.234e-5 | t  (13 rows)

Not only it is more correct and easier to read, it will also work faster if data was actually a number.

174

answered Sep 24 '22 11:09

mvp

Related questions
                            
                                Regex to change to sentence case
                            
                                What is the correct regex for matching values generated by uuid.uuid4().hex?
                            
                                MongoDB, performance of query by regular expression on indexed fields
                            
                                Regex to replace values that include part of match in replacement in sublime?
                            
                                RegEx with multiple groups?
                            
                                What is the need for caret (^) and dollar symbol ($) in regular expression?
                            
                                Match a^n b^n c^n (e.g. "aaabbbccc") using regular expressions (PCRE)
                            
                                passing variable to a regexp in javascript [duplicate]
                            
                                Have HTML5's a inputs pattern attribute ignore case
                            
                                find & replace commas with newline on Google Spreadsheet
                            
                                Java doesn't work with regex \s, says: invalid escape sequence
                            
                                Extract filename from file result in ansible
                            
                                How to use re match objects in a list comprehension
                            
                                What's the meaning of a number after a backslash in a regular expression?
                            
                                python regex first/shortest match
                            
                                Regular Expression for extracting text from an RTF string
                            
                                How do I get the match data for all occurrences of a Ruby regular expression in a string?
                            
                                Capture groups not working in NSRegularExpression
                            
                                Make regular expression case insensitive in ASP.NET RegularExpressionValidator
                            
                                How to remove the backslash in string using regex in Java?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

isnumeric() with PostgreSQL

Tags:

regex

postgresql

moooeeeep

People also ask

1 Answers

mvp

Recent Activity

Donate For Us