Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP Typecasting - Good or bad?

Tags:

types

php

After some work in C and Java I've been more and more annoyed by the wild west laws in PHP. What I really feel that PHP lacks is strict data types. The fact that string('0') == (int)0 == (boolean)false is one example.

You cannot rely on what the data type a function returns is. You can neither force arguments of a function to be of a specific type, which might lead to a non strict compare resulting in something unexpected. Everything can be taken care of, but it still opens up for unexpected bugs.

Is it good or bad practice to typecast arguments received for a method? And is it good to typecast the return?

IE

public function doo($foo, $bar) {
   $foo = (int)$foo;
   $bar = (float)$bar;
   $result = $bar + $foo;
   return (array)$result;
}

The example is quite stupid and I haven't tested it, but I think everyone gets the idea. Is there any reason for the PHP-god to convert data type as he wants, beside letting people that don't know of data types use PHP?

like image 985
Anders Avatar asked Sep 12 '10 15:09

Anders


1 Answers

For better or worse, loose-typing is "The PHP Way". Many of the built-ins, and most of the language constructs, will operate on whatever types you give them -- silently (and often dangerously) casting them behind the scenes to make things (sort of) fit together.

Coming from a Java/C/C++ background myself, PHP's loose-typing model has always been a source of frustration for me. But through the years I've found that, if I have to write PHP I can do a better job of it (i.e. cleaner, safer, more testable code) by embracing PHP's "looseness", rather than fighting it; and I end up a happier monkey because of it.

Casting really is fundamental to my technique -- and (IMHO) it's the only way to consistently build clean, readable PHP code that handles mixed-type arguments in a well-understood, testable, deterministic way.

The main point (which you clearly understand as well) is that, in PHP, you can not simply assume that an argument is the type you expect it to be. Doing so, can have serious consequences that you are not likely to catch until after your app has gone to production.

To illustrate this point:

<?php

function displayRoomCount( $numBoys, $numGirls ) {
  // we'll assume both args are int

  // check boundary conditions
  if( ($numBoys < 0) || ($numGirls < 0) ) throw new Exception('argument out of range');

  // perform the specified logic
  $total = $numBoys + $numGirls;
  print( "{$total} people: {$numBoys} boys, and {$numGirls} girls \n" );
}

displayRoomCount(0, 0);   // (ok) prints: "0 people: 0 boys, and 0 girls" 

displayRoomCount(-10, 20);  // (ok) throws an exception

displayRoomCount("asdf", 10);  // (wrong!) prints: "10 people: asdf boys, and 10 girls"

One approach to solving this is to restrict the types that the function can accept, throwing an exception when an invalid type is detected. Others have mentioned this approach already. It appeals well to my Java/C/C++ aesthetics, and I followed this approach in PHP for years and years. In short, there's nothing wrong with it, but it does go against "The PHP Way", and after a while, that starts to feel like swimming up-stream.

As an alternative, casting provides a simple and clean way to ensure that the function behaves deterministically for all possible inputs, without having to write specific logic to handle each different type.

Using casting, our example now becomes:

<?php

function displayRoomCount( $numBoys, $numGirls ) {
  // we cast to ensure that we have the types we expect
  $numBoys = (int)$numBoys;
  $numGirls = (int)$numGirls;

  // check boundary conditions
  if( ($numBoys < 0) || ($numGirls < 0) ) throw new Exception('argument out of range');

  // perform the specified logic
  $total = $numBoys + $numGirls;
  print( "{$total} people: {$numBoys} boys, and {$numGirls} girls \n" );
}

displayRoomCount("asdf", 10);  // (ok now!) prints: "10 people: 0 boys, and 10 girls"

The function now behaves as expected. In fact, it's easy to show that the function's behavior is now well-defined for all possible inputs. This is because the the cast operation is well-defined for all possible inputs; the casts ensure that we're always working with integers; and the rest of the function is written so as to be well-defined for all possible integers.

Rules for type-casting in PHP are documented here, (see the type-specific links mid-way down the page - eg: "Converting to integer").

This approach has the added benefit that the function will now behave in a way that is consistent with other PHP built-ins, and language constructs. For example:

// assume $db_row read from a database of some sort
displayRoomCount( $db_row['boys'], $db_row['girls'] ); 

will work just fine, despite the fact that $db_row['boys'] and $db_row['girls'] are actually strings that contain numeric values. This is consistent with the way that the average PHP developer (who does not know C, C++, or Java) will expect it to work.


As for casting return values: there is very little point in doing so, unless you know that you have a potentially mixed-type variable, and you want to always ensure that the return value is a specific type. This is more often the case at intermediate points in the code, rather than at the point where you're returning from a function.

A practical example:

<?php

function getParam( $name, $idx=0 ) {
  $name = (string)$name;
  $idx = (int)$idx;

  if($name==='') return null;
  if($idx<0) $idx=0;

  // $_REQUEST[$name] could be null, or string, or array
  // this depends on the web request that came in.  Our use of
  // the array cast here, lets us write generic logic to deal with them all
  //
  $param = (array)$_REQUEST[$name];

  if( count($param) <= $idx) return null;
  return $param[$idx];
}

// here, the cast is used to ensure that we always get a string
// even if "fullName" was missing from the request, the cast will convert
// the returned NULL value into an empty string.
$full_name = (string)getParam("fullName");

You get the idea.


There are a couple of gotcha's to be aware of

  • PHP's casting mechanism is not smart enough to optimize the "no-op" cast. So casting always causes a copy of the variable to be made. In most cases, this not a problem, but if you regularly use this approach, you should keep it in the back of your mind. Because of this, casting can cause unexpected issues with references and large arrays. See PHP Bug Report #50894 for more details.

  • In php, a whole number that is too large (or too small) to represent as an integer type, will automatically be represented as a float (or a double, if necessary). This means that the result of ($big_int + $big_int) can actually be a float, and if you cast it to an int the resulting number will be gibberish. So, if you're building functions that need to operate on large whole numbers, you should keep this in mind, and probably consider some other approach.


Sorry for the long post, but it's a topic that I've considered in depth, and through the years, I've accumulated quite a bit of knowledge (and opinion) about it. By putting it out here, I hope someone will find it helpful.

like image 197
Lee Avatar answered Nov 11 '22 21:11

Lee