Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Checking that a value contains only digits, regex or no?

I have a function that is used throughout my code. The function expects that the passed parameter is a positive integer. Since PHP is loosely typed, the data type is unimportant. But it is important that it contain nothing but digits. Currently, I am using a regular expression to check the value before continuing.

Here is a simplified version of my code:

function do_something($company_id) {
    if (preg_match('/\D/', $company_id)) exit('Invalid parameter');
    //do several things that expect $company_id to be an integer
}

I come from a Perl background and tend to reach for regular expressions often. However, I know their usage is controversial.

I considered using intval() or (int) and forcing $company_id to be an integer. However, I could end up with some unexpected values and I want it to fail fast.

The other option is:

if (!ctype_digit((string) $company_id)) exit('Invalid parameter');

Is this scenario a valid use of regular expressions? Is one way preferred over the other? If so, why? Are there any gotchas I haven't considered?

like image 361
toxalot Avatar asked Dec 08 '12 16:12

toxalot


People also ask

How do you check if input contains only numbers JS?

To check if a string contains only numbers in JavaScript, call the test() method on this regular expression: ^\d+$ . The test() method will return true if the string contains only numbers. Otherwise, it will return false .

What is the regular expression for numbers only?

To get a string contains only numbers (0-9) we use a regular expression (/^[0-9]+$/) which allows only numbers.


1 Answers

The Goal

The original question is about validating a value of unknown data type and discarding all values except those that contain nothing but digits. There seems to be only two ways to achieve this desired result.

If the goal is to fail fast, one would want to check for invalid values and then fail rather than checking for valid values and having to wrap all code in an if block.

Option 1 from Question

if (preg_match('/\D/', $company_id)) exit('Invalid parameter');

Using regex to fail if match non-digits. Con: regex engine has overhead

Option 2 from Question

if (!ctype_digit((string) $company_id)) exit('Invalid parameter');

Using ctype_digit to fail if FALSE. Con: value must be cast to string which is a (small) extra step

You must cast value to a string because ctype_digit expects a string and PHP will not cast the parameter to a string for you. If you pass an integer to ctype_digit, you will get unexpected results.

This is documented behaviour. For example:

ctype_digit('42'); // true
ctype_digit(42); // false (ASCII 42 is the * character)

Difference Between Option 1 and 2

Due to the overhead of the regex engine, option two is probably the best option. However, worrying about the difference between these two options may fall into the premature optimization category.

Note: There is also a functional difference between the two options above. The first option considers NULL and empty strings as valid values, the second option does not (as of PHP 5.1.0). That may make one method more desirable than the other. To make the regex option function the same as the ctype_digit version, use this instead.

if (!preg_match('/^\d+$/', $company_id)) exit('Invalid parameter');

Note: The 'start of string' ^ and 'end of string' $ anchors in the above regex are very important. Otherwise, abc123def would be considered valid.

Other Options

There are other methods that have been suggested here and in other questions that will not achieve the stated goals, but I think it is important to mention them and explain why they won't work as it might help someone else.

  • is_numeric allows exponential parts, floats, and hex values

  • is_int checks data type rather than value which is not useful for validation if '1' is to be considered valid. And form input is always a string. If you aren't sure where the value is coming from, you can't be sure of the data type.

  • filter_var with FILTER_VALIDATE_INT allows negative integers and values such as 1.0. This seems like the best function to actually validate an integer regardless of data type. But doesn't work if you want only digits. Note: It's important to check FALSE identity rather than just truthy/falsey if 0 is to be considered a valid value.

like image 79
toxalot Avatar answered Oct 06 '22 01:10

toxalot