Firehed's Blog

Strong Typing in PHP - A Simple Guide

I love PHP, and use it for almost all of my web projects. It does what I want pretty well for almost everything, and while it certainly has its quriks (like anything else), they?re usually insignificant enough to not cause severe problems. More importantly, it originated as a scripting language specifically for web use, as compared to numerous other languages which have evolved into desktop/web hybrids.

However, as I do more complex things with PHP, one of the quirks for me is its lack of native strong typing support. This means a ton of very repetitive is_string (etc.) checks acting as a means of data sanitization. Ick.

Thankfully, in PHP5, support was added for type hinting in function and method parameters. Sort of. While you can specify array and object types, there?s no native support for the primative types (int, bool, string, etc). What the hell good is that? I can force an input of any types except the most important ones? Come on, guys. But with a little creativity, you can use the object type support to the same effect.

The following approach keeps things quite simple, and takes PHP to as close to a true object-oriented language as it ever will be.

<?php
abstract class TypeHint {
protected $value;
abstract public static function validate($value);

public function __construct() {
if (func_num_args() == 1) {
$this->set(func_get_arg(0));
}
} // function __construct

public function set($value) {
if ($this->validate($value)) {
$this->value = $value;
return $self;
}
else {
throw new UnexpectedValueException('Bad input.');
}
} // function set

public function get() {
return $this->value;
} // function get

// This is optional, and depending on your coding style may be inadvisable to keep.
final public function __toString() {
return (string) $this->value; // Cast the value to a string
} // function __toString

}

Like I said, nothing overly complicated. I?ve overloaded the constructor method for the base TypeHint class, so you can either set up an empty instance like $someVar = new TypeHint; and fill it in later, or set the value all in one like $someVar = new TypeHint('sample string');. The get() and set($value) methods are fairly self-explanatory, with the latter throwing an UnexpectedValueException if the provided value fails validation. Finally, I?ve added a very basic __toString() magic method so you have the option of retrieving the value (re-casted to a string) without explicitly calling get(). If you?re just using the type-hinted classes to validate inputs and are OK with PHP

Obviously, this isn?t the whole deal - it?s an abstract class that must be extended to be used, and of course to be of any use whatsoever. For the basic types (bool, string, float, int) have validation methods as simple as you?d be natively using in your code, but without the need to constantly repeat yourself:

<?php
class THInt extends TypeHint {
public static function validate($value) {
return is_int($value);
} // function validate
} // class THInt
class THString extends TypeHint {
public static function validate($value) {
return is_string($value);
} // function validate
}
// Etc

You?ll note that rather than using Int or String, I?ve prefixed the class names with ?TH? (for TypeHint). This was a move inspired by Objective-C?s ?NS? (NextStep) prefix on their primative type objects (NSString, etc.) - it just seemed like a much safer idea than using ?string? or ?int? for a class name. I?ll note that, as of writing (Feb 2009), there?s no technical reason that you can?t do it, it just seems like it violates some sort of unspoken best practice rule.

Another thing worth pointing out is that this method isn?t at all confined to the basic types, nor must you make a new instance of the objects to use their validation methods. While TypeHint->set() doesn?t appear to call validate statically, this is really just due to PHP?s syntax (since TypeHint::validate($value) is defined as abstract, trying to call self::validate($value) results in a fatal error). Basically, don?t worry about it. There are two practical upshots of this:

  1. You can call THString::validate($someValue) anywhere without having to make a new instance of THString
  2. Statically-defined methods can run significantly faster than non-static methods (though are still slower than standard, inline code - but that?s a tradeoff you?ll have to be willing to make). Numbers vary widely (I?ve seen as much as 4x faster than non-static methods!), but free performance by adding the word ?static? is always a plus in my books.

Remember how I just said that this isn?t confined to the basic types? I bet you?re pretty damn sick of insane regexes to deal with email addresses, and validating currency inputs can be almost as tedious. Ready for a hundred lines of fun?
(just a warning here - wordpress may possibly eat this code, though my markdown conversion should have preserved it. Hit the project URL in the first comment line if this doesn't work for you)

<?php
class THEmail extends THString {
// code.google.com/p/php-email-address-validation
// Comments stripped to compact things down a bit
// Also converted to static methods to increase speed a bit

public static function validate($value) {
return self::check_email_address($value);
} // function val

protected static function check_email_address($strEmailAddress) {
if (preg_match('/[\x00-\x1F\x7F-\xFF]/', $strEmailAddress)) {
return false;
}

$intAtSymbol = strrpos($strEmailAddress, '@');

if ($intAtSymbol === false) {
return false;
}

$arrEmailAddress[0] = substr($strEmailAddress, 0, $intAtSymbol);
$arrEmailAddress[1] = substr($strEmailAddress, $intAtSymbol + 1);

$arrTempAddress[0] = preg_replace('/"[^"]+"/','',$arrEmailAddress[0]);
$arrTempAddress[1] = $arrEmailAddress[1];
$strTempAddress = $arrTempAddress[0] . $arrTempAddress[1];

if (strrpos($strTempAddress, '@') !== false) {
return false;
}

if (!self::check_local_portion($arrEmailAddress[0])) {
return false;
}

if (!self::check_domain_portion($arrEmailAddress[1])) {
return false;
}

return true;
}

protected static function check_local_portion($strLocalPortion) {
if (!self::check_text_length($strLocalPortion, 1, 64)) {
return false;
}

$arrLocalPortion = explode('.', $strLocalPortion);

for ($i = 0, $max = sizeof($arrLocalPortion); $i < $max; $i++) {
if (!preg_match('.^(([A-Za-z0-9!#$%&\'*+/=?^_`{|}~-][A-Za-z0-9!#$%&\'*+/=?^_`{|}~-]{0,63})|("[^\\\"]{0,62}"))$.', $arrLocalPortion[$i])) {
return false;
}
}
return true;
}

protected static function check_domain_portion($strDomainPortion) {
if (!self::check_text_length($strDomainPortion, 1, 255)) {
return false;
}

if (preg_match('/^(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])){3}$/', $strDomainPortion) || preg_match('/^\[(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])){3}\]$/', $strDomainPortion)) {
return true;
}
else {
$arrDomainPortion = explode('.', $strDomainPortion);

if (sizeof($arrDomainPortion) < 2) {
return false;
}
for ($i = 0, $max = sizeof($arrDomainPortion); $i < $max; $i++) {
if (!self::check_text_length($arrDomainPortion[$i], 1, 63)) {
return false;
}
if (!preg_match('/^(([A-Za-z0-9][A-Za-z0-9-]{0,61}[A-Za-z0-9])|([A-Za-z0-9]+))$/', $arrDomainPortion[$i])) {
return false;
}
}
}

return true;
}

protected static function check_text_length($strText, $intMinimum, $intMaximum) {
$intTextLength = strlen($strText);
if (($intTextLength < $intMinimum) || ($intTextLength > $intMaximum)) {
return false;
}
else {
return true;
}
}
} // class THEmail

The only thing worth pointing out here (other than reiterating that most of the above chunk of code is from a Google Code project, and I?ve just made some modifications so can?t take credit for the heavy lifting) is that rather than having THEmail extend the base TypeHint class, I?ve chosen to instead extend THString. Why? Email addresses are always going to be strings, for one. Further, you may want to create a function or method somewhere that accepts a THString for one (or more) parameter, and you wouldn?t want email addresses to fail validation. Basically, the way that type hinting works in PHP is use of the instanceof operator as an automatic validation. The pseudocode for this equates these two functions:

<?php
function demo(THString $string) {
// do stuff
}

function demo($string) {
if ($string instanceof THString) {
// do stuff
}
else {
// Trigger a catchable fatal error:
// PHP Catchable fatal error: Argument 1 passed to demo() must
// be an instance of THString, instance of *something* given,
// called *somewhere* and defined *somewhere else*
}
}

Since THEmail extends THString, an instance of THEmail inherits THString (and TypeHint, for that matter). As such, it will pass the instanceof check for a function or method with a type hinted THString parameter. Conversely, an instance of THString is not an instance of THEmail (inheritance only works in one direction, as you would expect), so a function that requires an email address won?t accept any old string.

Now for a practical lesson!

All of the actual code that supports this is out of the way, so here?s quick usage example:

<?php
// Include the above TypeHint classes
try {
$userId = new THInt(47);
$name = new THString('Eric Stern');
$email = new THEmail('firehed{at}gmail dot com'); // This will fail the validation!

$query = 'UPDATE users SET name="' . $name->get() . '", email="' . $email->get() . '" WHERE id=' . $userId->get();
// Run the query
}
catch (UnexpectedValueException $e) {
echo 'An invalid value was provided. Please fill out the form again.';
// Since an exception was thrown on the $email= line, the query will not be run
}

Of course that?s very basic usage, but it gives you a basic idea. Very important: This does not sanitize user data! It merely checks that it?s of the correct type. You?ll still have to either escape the data with mysql_real_escape_string() or use Prepared Statements (yes, you can use $typeHintedVar->get() as an argument directly in MySQLi_stmt::bind_param()). Also worth noting is that all values provided from $_GET and $_POST are treated as strings; you?ll have to manually re-cast them for other types:

<?php
$userId = new THInt((int)$_POST['userId']);

And that should do it! Questions are best directed to the comments section or @Firehed on Twitter.