Sunday, July 18, 2010

Objects vs. Arrays in PHP

Traditionally, an array is a data structure storing values indexed by an integer. What PHP calls an array is really an ordered map, a data structure that maps a key to a value with some notion of how keys should be ordered when iterating through the map. Keys can be either integers or strings, and values can be any PHP type.

The obvious uses for arrays are when you need to store data indexed either by a number or by a key. The ordered map semantics of PHP arrays also allows them to be used to build arbitrary data structures fairly easily. The array keys are like field names and array values are values. New fields can be added at run time, simply by assigning a value to a new key representing the new field name.

This is a wimpy form of object oriented programming (OOP) -- by convention the methods that operate on an array pseudo-objects can be kept in one file, perhaps along with comments that describe which keys could be present. This wimpy OOP saves us the trouble of creating a new class, but relies on convention and extra documentation for clarity.

I have seen quite a bit of code that does this, so it appears to be a common idiom, probably left over from PHP4. PHP5 provides better object oriented support, so when should programmers use an array, and when should they declare a new object?

When To Use Arrays

The built-in array functions in PHP make it easy to use PHP arrays to implement the same functionality as a number of common data structures like queues, stacks, lists, etc. Most of these data structures are similar enough to arrays that it is best to take advantage of all of the built-in array functions, rather than trying to re-implement all that functionality with objects.

Sometimes, it is pretty clear when an array is called for. If you want a variable number of items, indexed numerically, that begs for an array. For example, if to write an RSS feed reader that gets all the articles from a group of sites within the last 3 days, the list of sites and the articles returned would make make sense to store in arrays. Or, to store definitions for words, it would make sense to use an array with the words as keys and the definitions as values. For example,

[sourcecode language="php"]
$words = array(
'doe' => 'a deer, a female deer',
'ray' => 'a drop of golden sun'
);
[/sourcecode]

Retrieving the definition of 'doe' is as simple as $words['doe'].

Things get a little less clear cut when dealing with more complicated data. In the word definitions example, you might also want to store pronunciation, and etymological information. The straightforward incremental change is to make each array value be an array with all of the properties you care about. For example,
[sourcecode language="php"]
$words = array(
'doe' => array('definition' => 'a deer, a female deer'
'pronunciation' => 'do̅'),
'ray' => array('definition' => 'a drop of golden sun',
'pronunciation' => 'rey')
);
[/sourcecode]

Is a common way to handle the issue. As more attributes of a word are needed, they can be added to the array describing each word.

In a language like C or C++ this would typically be done with structs. Passing around data like this in arrays has advantages and disadvantages. The advantage is that you can crank out code pretty quickly because you do not have to design an actual interface -- just put in all the stuff you want for now, and if you have to change it, just add or remove fields later.

Using arrays in this case does have several drawbacks, though. First, there is no easy way to check the type of the array to make sure the right array are gets passed to the right place. Passing a variable to the wrong parameter type to a function or method is a fairly common error to make. If words were defined as objects, the PHP interpreter would catch the error and warn you about it. Any time the interpreter can tell you the precise line of code that has a bug, that is a big time saver.

The second problem is that not thinking about how code will interface usually results in interfaces that are sloppy and inconsistent. In cases like this, it is fairly common that some arrays values are initialized and some are not, so the code ends up having to check for every attribute before it can be sure it is there. Often there will be a set of functions that operate on these word arrays. Putting those functions in an object with a word word makes more sense. Sometimes sets of attributes are mutually exclusive depending on the value of another attribute. This is an excellent time to take advantage of inheritance.

When To Use Objects

The above example can be recoded using objects like this:
[sourcecode language="php"]
class word {
public $definition;
public $pronunciation;
public function __construct($pronunciation, $definition) {
$this->definition = $definition;
$this->pronunciation = $pronunciation;
}

$words = array('doe' => new word('do̅', 'a deer, a female deer'),
'ray' => new word('rey', 'a drop of golden sun')
);
[/sourcecode]

Though this example is about twice as many lines, the more arrays of words you create, the smaller the overhead is. We don't have any word operations defined here, but as they are used more and more, some are likely to crop up.

An advantage of the object-based approach is that PHP can check the type of an object passed as a parameter, when passing a single word, whereas all arrays look the same to it. If you want to make sure an array someone passed is valid you have to explicitly check it, which is more work for the programmer. Unfortunately, PHP does not provide an easy way to require a function be passed an array of a particular object.

Conclusion

If you want a list of items or a hash table or dictionary, then arrays make perfect sense. There are lots of array operations that make operating on arrays fast and easy, so there is no reason to avoid them when they are a logic fit.

When data has well defined attributes and operations an object makes more sense. PHP can provide type checking, implementation can be hidden behind interfaces, operations can be coupled with data, inheritance can simplify implementation, and all that other good object oriented stuff.