Ok, obviously you and I both knew that you can turn an array into an object.  You just do a nice and easy for loop around it:

$obj = new stdClass;
foreach ($a as $key => $value) {
	$obj->key = $value;
}

Or maybe you’re sharper than me and already knew there was a much quicker way of doing this.  Congrats.  In my years of PHP development I never came across this, but evidently PHP allows you to cast an array to an object to accomplish the same thing:

$a = array(
	'name' => 'Jimmy',
	'species' => 'Brown-headed Parrot',
	'temperament' => 'vicious but lovable',
	'angry' => 'usually'
);
print_r((object)$a);

// stdClass Object
// (
//     [name] => Jimmy
//     [species] => Brown-headed Parrot
//     [temperament] => vicious but lovable
//     [angry] => usually
// )

Not too shabby, right?

Short and sweet.

Ambiguous controls
Controls are ambiguous. The same button will do 5 different things depending on context. This can be okay when context is clear, but in this game it is far from it. A terrific example of how this hurts gameplay is their delightful choice for the Xbox of using the same button to both grab ledges while falling, and to let go of a ledge while hanging. Press it once, assassin hangs. Press it twice, death.

Illogical control
The buttons are simply clunky to work with. The button to freely sprint is pressed by the same finger that is used to control the camera, so either you can run into walls or you can stop running every few seconds to adjust the camera. Fun fun fun.

Similarly, the finger used to move your character will also be responsible for changing weapons, which is something that needs to happen somewhat frequently during combat.

Synchronizing memories
Really? Is that the best terminology you can come up with for “health”? The terminology of this game needs a ton of work for it to make sense.

Repetition
Me: <saves the day>
Civilian: Thank goodness you arrived when you did. Another minute and they’d have made off with me.

Yes, civilian, I know. I know because every time I save a civilian I’m forced to listen to any one of a whopping 5 different speeches with absolutely no bearing on the story. I can’t skip it, and (my favorite) I can’t even control the camera so I can at least get ready to continue on. I get that I saved a civilian but I get tired of hearing about it after the 50th time. I get tired of having my camera hijacked by you, then by vigilantes that are unnecessary 95% of the time.

Similarly, when I have to repeat a mission 20 times because some stupid beggar (see next section) wants to hump my leg, it would be terrific if I could skip the 30 second intro dialogue. It’s interesting the first time. Maybe the second time, but probably not. Definitely not after that. Then it’s just annoying and time consuming.

Nuisances
Take out the annoying factors. I know why you put them in there. I know why you thought they were a good idea. Trust me, you misjudged. You were wrong.

I’m tired of beggars that follow me for three city blocks because they have no money. I kill them instead, because I’m fed up.

I’m tired of drunks/mentally ill randomly swinging at me when I walk by. Especially because they will home in on me from quite a distance. I kill them, too.

The added challenge is great, but please, please, please make it a challenge that makes sense and doesn’t enrage people.

Assassinations
I saved the best for last. Considering that we are playing a game called Assassin’s Creed, how fun would it be if assassinations (the arguable focal point of the game) were most stealthy and ninja? You can spend all the time in the world planning out how you will kill your mark, but when it’s go-time it doesn’t matter. Your are generally forced down to ground level, amongst the crowd, so you can watch helplessly while your mark makes themselves completely vulnerable. Once they are done and have had ample time to retire with their armed guards, you actually get to start moving.

Maybe for the sequel you can actually assassinate somebody while they’re exposed and vulnerable, instead of waiting until they hole up with their guards, guaranteeing that the “assassination” will be you charging in with a two-handed sword screaming “AAAAAAAAAAAAAAAAAAAH!” followed by a 10v1 bloodbath that can be seen across the entire city.

Introduction

In this brief article I will talk about a coding pattern in PHP (or any language with an extract-like function) suitable for highly configurable functions.  When I say “highly configurable”, I’m talking about functions that can behave in many different ways involving many different variables.  For example, a function print($string) that is intended to print $string to standard output is not very configurable because there’s only one variable.  However a function createUser($username, $password, $age = null, $favorite_color = null, $birth_place = null, $gender = ‘N/A’, $political_allegience = ‘No thank you’, $favorite_johnny_depp_movie = ‘Edward Scissor Hands’) is much more configurable.

Typical solution

The above example can be done exactly as I wrote it above, but it’s clunky.  For example:

<?php
    // No problem here, nice and clean.
    createUser('john', 'doe');

    // Not too bad, but I might have had to look up which positional argument is
    // favorite color, and the default value for any preceding optional arguments
    // (in this case, age which defaults to null).
    createUser('john', 'doe', null, 'blue');

    // Ick.
    createUser('john', 'doe', null, null, null, 'N/A', 'No thank you', 'What\'s Eating Gilbert Grape');
?>

This solution requires you to know the positioning of all of the arguments. When you want to specify an optional argument towards the end, you have to remember (or more likely look up) the arguments that come before it so you can specify the proper default.  This also creates a dependence: if your optional variables ever change their defaults, that last example might not work the way you expect it to any more.  You can alleviate this a bit by having optional arguments always default to null, and within the function itself using logic like this:

($political_allegiance === null) && $political_allegience = 'No thank you';

That helps, now you don’t have to remember default values because they’re always null.  You still have to remember the order of arguments, which can be a pain.

Alternate solution

You can work around this like this:

<?php
function createUser($username, $password, $details = array()) {
    $age = null;
    $favorite_color = null;
    $birth_place = null;
    $gender = 'N/A';
    $political_allegience = 'No thank you';
    $favorite_johnny_depp_movie = 'Edward Scissor Hands';
    extract($details, EXTR_IF_EXISTS);
}

?>

In a nutshell, this defines the defaults inside the function and overwrites them using PHP’s built-in extract function.  Extract expects an associative array and turns each $key/$value pair into a variable in the current scope (using $key as the variable name and $value as the variable value).  Thus you can change our “Ick” example from above to a much simpler:

createUser('john', 'doe', array('favorite_johnny_depp_movie' => 'What\'s Eating Gilbert Grape'));

Now you have to remember the variable names, but not their order.  That’s one drawback.  I haven’t tested it, but am pretty confident that using extract is also less efficient so you don’t get the convenience for free.

Security concerns

extract is a neat function but easy to abuse.  It’s generally not a good idea to blindly extract an array into the current scope.  If you aren’t familiar with why, Google “php register globals security” for plenty of existing documents on the topic.  If you’re too lazy for that, then just take my word for it: don’t blindly extract dynamic data.

The solution is in the second argument we passed to extract, EXTR_IF_EXISTS.  This tells PHP only to bring key/value pairs into current scope when the variable is already defined.  So if $details included a ‘gender’ key, this would be extracted and would override the $gender variable.  If it included a ‘do_something_evil’ key, this would not be extracted because there currently is no $do_something_evil variable.

The moral of this section: when extracting use EXTR_IF_EXISTS and make sure to define (before your extract call) all variables that you want to be extractable.

(Not) Using extract for $_GET and $_POST

The PHP docs mention that extract lends itself very well to handling user input from $_GET or $_POST, and I’m going to talk about why you should never ever follow their advice.  To exemplify, let’s pretend we’re writing a script to handle a simple login form:

<?php

    $username = $password = '';
    extract($_POST, EXTR_IF_EXISTS);

    if (!$username) die ('Username is required.');
    if (!$password) die ('Password is required.');

    // The rest of the script
?>

This looks fine.  But what if we modified it to include config.php on the very first line, where config.php does a bunch of application-wide setup stuff like connecting to a database.  If config.php defines a global $is_admin and sets it according to some session data, a malicious attacker can now pass in $_POST[‘is_admin’] = 1 and masquerade as an admin without ever authenticating.

You can fix this by never defining variables in config.php, but now you’ve created a not-so-obvious restriction on config.php.  You better hope everybody who ever edits it is aware of this restriction, and adheres to it.  Your QA or code review team better be sure to look out for that in any changes to config.php as well.  The point I’m getting at is that it’s very easy to make a mistake with this setup.

You can also fix this by not including config.php until after your extraction.  This is fine.  I personally don’t like it as I see it as too easy to mess up, but it does resolve this immediate issue.

But finally, there is the fact that PHP supports the (very useful) auto_prepend directive which can be used to automatically include a certain script at the start of all requests.  If that is set up for your application, or gets set up in the future, then all of your extract calls are once more exposing a vulnerability.  This is another example where your actions right now make it easy for a serious vulnerability to be created in your application later without any changes being made to your script itself.

EXTR_IF_EXISTS

This will be a quick one, because there’s not much to say that wouldn’t be superfluous.  You’re already a PHP ninja and know all the ninja file manipulation functions like fopen, file, fread, fwrite, and so on.  But did you know you can very easily slap a gzip filter between your script and the file?  Here you go:

<?php

$data = 'Mares eat oats and does eat oats and little lambs eat ivy.  A kid\'ll eat ivy too, wouldn\'t you?';
$gz_file = gzopen('my_file.gz', 'w') or die('Unable to open file.');
gzwrite($gz_file, $data) or die('Unable to write to file.');

?>

Mind-blowing, right?  The principles are largely the same, so I’ll let you explore the features.  Generally speaking, if you can do it with regular file I/O, you can do it with gzipped file I/O.  This is great for things like writing logs to files, especially if you don’t plan on reading those logs on a regular basis.  Log files (especially things like SQL queries!) tend to be veeeeeery easy to compress, so this can be a big win for certain situations.

Remember to name your file with a ‘.gz’ extension or you might confuse somebody later when application.error.log isn’t plain text.

For more information, see PHP’s manual on the Zlib functions.

This post is about my adventures building my own PC.  It’s something my wife and I have done many times (although she has more experience doing it than I), but documenting it is a first.  Inevitably when building a PC, weird things go wrong in weird ways, and it can be a real bitch figuring out why (because this activity isn’t catered to the “average Joe” but to OEM manufacturers).  Anyhow, here are the problems I encountered, and the solution.

The Specs

I am actually not building a new machine from scratch, just swapping out most parts for new parts.

Motherboard: Gigabyte EP45-UD3P
CPU: Intel Core 2 Quad Q9400
RAM: 2 x OCZ2RPR10664GK (2GB)
PSU: OCZ OCZ600ADJSLI 600W ATX12V 2.01
Video Card: EVGA 8800GTS
Chassis: Thermaltake Armor Series VA8000BWS <3 <3

The Adventure

No POST, no fans, no video.

This was actually on a previous attempt, but I’ll include it here in case it’s helpful to someone.  My wife and I put the whole machine together inside the chassis but upon pushing the magic button, it didn’t go.  We went over everything we could consider, looked for cracks in the PCB, but found nothing.  I called my old man for advice and he suggested taking everything out of the chassis and setting it on a cardboard box, and starting it there.  What a silly old man.

The solution: My old man was right.  The motherboard was shorting out, and we realized this after we removed it from the chassis and started it up sitting on a cardboard box.  It turns out we had screwed in an extra brass stand-off which didn’t match up with any screws on the motherboard.  Whoops.

A whining video card

I don’t know how best to describe the sound.  It wasn’t a beep like you hear during POST.  It was a whine or squeal of non-uniform pitch.  Very odd.  It wasn’t the fan either.

The solution: PSU wasn’t plugged into the video card.  Whoops.  Easy fix.

Continuous short beeps

So the first shot failed with contiuous short beeps during POST.  According to their manual, this is a “PSU Failure”. My PSU is one of the components I’m bringing in from the old system, where it works fine.  I tried it in my old system again, and it still worked fine.  During troubleshooting, I stripped everything out of the machine except the necessities (CPU).  My PSU was definitely not failing.

I RMA’d the board and got a replacement and encountered precisely the same behavior.

The solution: It turns out some motherboards (this one) ship with low power settings for the RAM.  My wife put her RAM in, which consumed .1 less voltage than mine, and it booted up fine.  Once in, we were able to hop into BIOS and tweak the settings for my motherboard.

To be continued? After configuring proper power settings for my RAM, we put my RAM back in the machine and tried to boot it up.  Back to continuous short beeps.  I’m not sure what the issue is here, and it doesn’t help that I don’t know much about RAM timings and settings.  I rely on the hardware and low-level software to do that for me.  So this may very well be a solvable issue, but at the same time I’m insisting that it’s a “bug” in the hardware or software.  This RAM is (at least was, I don’t know if it still is) listed on Gigabyte’s approved RAM list, so I don’t think I should have to change RAM timings and power settings just for it to boot.  Ugh.

<review>

alias ke="kill `pidof evolution`"

Stick that in .bashrc.  It will save you much typing over the many many many many many many many many times that you need to ungracefully end Evolution because it’s hung with no useful feedback.

</review>

Preface

This is something I’ve been meaning to test for a long time.  Is it faster to store data in an associative array with keys, or an object with members?  Now sometimes objects are better for the other functionality they bring (methods, magic methods, inheritance, never-ending life).  But often times we have the choice to store data in a plain old object whose only job is to hold keys and values (a fancy associative array, basically).  But which is faster?

I’ll start off with this: there’s a lot of different ways you might be accessing data that might alter the outcome of any tests you run.  So my goal here is not to be comprehensive, but just to save you some trouble of doing this stuff yourself, and maybe giving you some good information in the meantime.

The tests.

My tests follow a similar format.  I capture the time (ms), I loop through a number of iterations, I stop the time and measure the difference since I started, and report the amount of seconds that passed and the number of iterations it was over.  Then I repeat using a different method to compare the two.  Basic stuff, right?  Note that I use stderr to output results… this is because often my tests involve displaying output, and I want to be able to rediret it all into a blackhole without losing the actual results, thus stderr.

I ran these tests on my laptop, which is not a controlled environment.  For each test, A and B were run in immediate succession and my hands were off the machine during the interim to keep things as stable as possible.  I ran each test at least three times to be sure I saw consistent numbers each time.

Core testing code.

<?php

// Fetch UNIX epoch as float.
function get_ms() {
    list($sec, $usec) = explode(' ', microtime());
    return (float)($sec + $usec);
}

$start = 0;
function start() {
    global $start;
    $start = get_ms();
}

function report($str) {
    global $start, $iters;
    err(sprintf("Time passed for %s over %s iterations:\n\t%.2f\n",
        $str, number_format($iters), (float)(get_ms() - $start)));
}

function err($msg) {
    fwrite(STDERR, $msg);
}

?>

Test data.

I’m using the following test data.  The object and array hold the same information, just in different ways.  Note the order my pets appear in.  This was intentional because Jimmy’s my favorite, and I tell him that frequently.

<?php

$assoc = array(
    'color' => 'blue',
    'name' => 'Mark',
    'job' => 'Wallet Inspector',
    'pets' => array(
        array('type' => 'brown headed parrot', 'name' => 'Jimmy'),
        array('type' => 'green cheeked conure', 'name' => 'Bubba')
    )
);

$obj = new stdClass();
$obj->color = 'blue';
$obj->name = 'Mark';
$obj->job = 'Wallet Inspector';
$obj->pets = array();
$jimmy = new stdClass();
$jimmy->type = 'brown headed parrot';
$jimmy->name = 'Jimmy';
$bubba = new stdClass();
$bubba->type = 'green cheeked conure';
$bubba->name = 'Bubba';
$obj->pets[] = $jimmy;
$obj->pets[] = $bubba;

?>

Test 1: random key lookups.

// Test 1 -- assocs
$keys = array_keys($assoc);
start();
for ($i = 0; $i < $iters; $i++) {
    $key = array_rand($keys);
    $v = $assoc[$key];
}
report('Assoc');

// Test 2 -- obj
$keys = get_object_vars($obj);
start();
for ($i = 0; $i < $iters; $i++) {
    $key = array_rand($keys);
    $v = $obj->$key;
}
report('Object');

Before I start the timer, I fetch an array of keys (or members, in the case of the object).  Each iteration I choose a random key and load it up.  Now, this is a pretty small data set and it’s possible that PHP does some caching of frequently accessed data.  If that’s the case, then I’m only getting a few “cold read” iterations, with the rest of 10m being warm reads.  But do I care? no.  I think that aside from the number of iterations, this is a realistic representation of in-the-field stuff: reading the same key frequently from an array.

The results were as follows:

$ php assoc_vs_object_lookups1.php
Time passed for Assoc over 10,000,000 iterations:
        18.29
Time passed for Object over 10,000,000 iterations:
        12.32

So you can see that object lookups here performed quite a bit faster.

Test 2: static key lookups.

// Test 1 -- assocs
start();
for ($i = 0; $i < $iters; $i++) {
    $v = $assoc['name'];
}
report('Assoc');

// Test 2 -- obj
$keys = get_object_vars($obj);
start();
for ($i = 0; $i < $iters; $i++) {
    $v = $obj->name;
}
report('Object');

I should have probably done this test first, since it’s simpler.  The results here were:

$ php assoc_vs_object_lookups2.php
Time passed for Assoc over 10,000,000 iterations:
        3.50
Time passed for Object over 10,000,000 iterations:
        3.94

This surprises me quite a bit (I ran this test a number of times with similar results).  Objects performed slower with a fixed key lookup.  This makes me wonder if PHP uses some kind of caching mechanism for associative array lookups that allows it to perform better than object lookups.  Note that the associative arrays only perform a little bit faster.

Test 3: array lookups.

// Test 1 -- assocs
start();
for ($i = 0; $i < $iters; $i++) {
    $v = $assoc['pets'];
}
report('Assoc');

// Test 2 -- obj
$keys = get_object_vars($obj);
start();
for ($i = 0; $i < $iters; $i++) {
    $v = $obj->pets;
}
report('Object');

The only difference here is I was pulling out an array ($assoc[‘pets’] or $obj->pets) instead of a scalar.  I was just curious if I would see a difference.

$ php assoc_vs_object_lookups3.php
Time passed for Assoc over 10,000,000 iterations:
        3.47
Time passed for Object over 10,000,000 iterations:
        3.95

More or less in line with what we saw in test 2.

Test 3.5: wtf is with tests 1 and 2?

At this point I was unhappy with the difference between test 1 (which indicated that an object lookup went a lot faster) and test 2 (which indicated that an array lookup was just a little faster), since the only difference between the two was whether we were looking up a fixed key or a random key.  This made me wonder if it had to do with the type of data being looked up.  On little more than a slightly educated whim, I removed the ‘pets’ member from my array / object and re-ran tests 1 and 2 over 100m iterations.

I saw the same results.  When lookup up random keys, it took about 180.11 seconds / 120.95 seconds for arrays / objects.  When looking up ‘name’ over and over, it took about 34.50 seconds / 39.10 seconds for arrays / objects respectively.  I don’t get it, but I’m convinced that the numbers are an accurate representation.

Test 4: key searches.

// Test 1 -- assocs
start();
for ($i = 0; $i < $iters; $i++) {
    $v = in_array('Lauren', $assoc);
}
report('Assoc');

// Test 2 -- obj
$keys = get_object_vars($obj);
start();
for ($i = 0; $i < $iters; $i++) {
    foreach (get_object_vars($obj) as $key) {
        if ($obj->$key == 'Lauren') {
            $v = $obj->$key;
            break;
        }
    }
}
report('Object');

For this one I had to get creative.  PHP doesn’t support a direct way to search an object’s members for a value (as it shouldn’t, objects aren’t dictionaries).  So I had to implement the simplest and closest version I could, which loops through the object’s members and compares each value to what we’re looking for.  Once found, it assigns it to $v and breaks out.

The results weren’t very surprising:

$ php assoc_vs_object_lookups4.php
Time passed for Assoc over 10,000,000 iterations:
        15.48
Time passed for Object over 10,000,000 iterations:
        113.85

Objects are much much much much slower to search in this way.  That is to be expected, as associate arrays are built for this kind of thing, and searching an object isn’t something you typical do in a real application.

Test 5: large data sets.

// Prep
$num_to_make = 5000;
$assoc = array();
while (count($assoc) < $num_to_make) {
    $key = uniqid(rand(0, 10000));
    $val = uniqid(rand(0, 10000));
    $assoc[$key ] = $val;
}

$obj = new stdClass;
foreach ($assoc as $key => $val) {
    $obj->$key = $val;
}

err("Created $num_to_make key/value pairs for each test data.\n");

// Test 1 -- assocs
$keys = array_keys($assoc);
start();
for ($i = 0; $i < $iters; $i++) {
    $key = array_rand($keys);
    $v = $assoc[$key];
}
report('Assoc');

// Test 2 -- obj
$keys = get_object_vars($obj);
start();
for ($i = 0; $i < $iters; $i++) {
    $key = array_rand($keys);
    $v = $obj->$key;
}
report('Object');

For this test I wanted to find out how the numbers compared when dealing with much larger data sets.  So I create an array and object, each filled with the same set of 5000 random key/value pairs.  Note that the array and object have the same key/value pairs, even though they’re randomly generated.

I ran through this test twice to see what kind of numbers I got each time.  Here are the results:

$ php assoc_vs_object_lookups3.php
Created 5000 key/value pairs for each test data.
Time passed for Assoc over 250,000 iterations:
        58.60
Time passed for Object over 250,000 iterations:
        55.28
$ php assoc_vs_object_lookups3.php
Created 5000 key/value pairs for each test data.
Time passed for Assoc over 250,000 iterations:
        58.06
Time passed for Object over 250,000 iterations:
        55.69

The results were pretty well consistent and show that objects out-perform associative arrays.  As you can see I’m only running 250k iterations, instead of 10m.  I noticed that when dealing with larger data sets both became excrutiatingly slow.  I tried 50k records over 1m iterations and after a few hours on my laptop, I gave up.

Conclusion

The conclusion I draw from this is that object lookups are faster than arrays.  However associative arrays must take advantage of some kind of caching that causes frequent lookups for the same value to go faster than with an object.  The speed advantage of arrays over objects when talking about fixed-key lookups is small enough that I might consider using objects going forward.  The last test indicates to me that the larger the data set, the smaller the gap in efficiency, but objects still win.  I’m curious if there is a number of key/value pairs where objects would no longer be faster than arrays, but I will have to test that another day.

Admittedly this probably doesn’t fit in as a “thing you didn’t know”, because most of these concepts are commonly familiar.  But perhaps I can introduce you to some new or interesting applications.  Read on.

PHP is a typeslut or a “loosely typed” scripting language.  That means it doesn’t recognize a variable as a string, or an integer, or a boolean.  Well, it sort of does, but it will turn it into whatever you need it to be (typecast it) without warning, so the relationship between types isn’t enforced.  This can lead to problems like this:

<?php

$str = 'blue is my favorite color';

if (strpos($str, 'blue')) {
    echo 'You like blue?  Me too!';
} else {
    echo 'I guess you aren\'t the bluesy type.';
}

?>

If you run that you’ll see “I guess you aren’t the bluesy type.”.  You probably already knew that, though, because PHP features a prominent warning about it on the strpos page.  What’s happening is we are victimized by loose typing + auto-typecasting + bad function design (returning two answers in the same value: was there a match, and if so where?).

There are a few means of solving the type juggling problem and making PHP act a bit more like a typed language:

Typeful comparisons.

Use === instead of ==.  This is something you want to do as often as possible, but circumstances don’t always let allow it.  The === operator will return true only if both sides share the same value and the same type.  Thus 0 == FALSE would return TRUE whereas 0 === FALSE would return FALSE because they are not the same type.  It’s worth mentioning that because PHP won’t typecast either side of an === comparison, it’s actually slightly more efficient as well (see below).

Type hinting.

Type hinting is something you can do in function or method argument lists that tells PHP you expect a variable to be of a certain type.  Unfortunately, you can only typehint using array or class names.  For example, function my_sweet_function(array $foo, MySweetClass $bar).  This function would now insist on an array for the first argument, and an instance of MySweetClass (or a subclass of MySweetClass).  Note that contrary to PHP’s hazardously forgiving stance on argument lists and contrary to the term “hinting”, typehints are strictly enforced.  Your script will error out if you provide the wrong type to a type-hinted function.  In my opinion, this is a very good thing and a nice surprise from PHP.

Type sniffing.

What if you want to type hint an argument as an int, boolean, etc?  You can’t.  But you can implement the same feature on your own using the is_* family of functions.  For example, you might call is_resource on a variable that you want to use as a file handle or MySQL results resource.  Note that a few of these are not mutually exclusive.  For example, a variable might return true for both is_string and is_callable, which indicates that it is a string containing the name of a function (and thus can be a callback as well).

Type casting.

What if you want to be a little more forgiving?  That is when casting comes into play.  As with most languages, you can manually instruct PHP to convert a value to a certain type.  For example, $a_float = (float)$a_float.  Let’s ay $a_float was an untrusted value, but we really want a float.  The operation above will enforce that by forcing it to be a float. Note that you might lose something in the translation this way, and sometimes your value will be rendered useless.  For example, casting the string “3.14” to a float gives you the float 3.14, which is great.  Floating the string “pi” gives you 0.0, because PHP doesn’t know how to do the translation.  Some type casts just don’t make sense at all.  Similarly, (int)3.14 will give you the integer 3, and the .14 is forever lost.

Sometimes this can be a form of easy input validation.  Let’s say, for example, that you want to use $user_id in a MySQL query for an integer column, UserID.  You can safely use $user_id without concern for a SQL injection attack using something like this:

<?php
$query = 'select * from MyTable where UserID = '. (int)$user_id;
?>

Because we’re casting to an int, we know we’re safe.  If $user_id was set to “1 OR 1=1”, the type cast would turn it into 1.  Similarly, if it were NULL or FALSE, it would be cast to 0.  Pay attention to that, because this method can also give you the false impression that your values are valid when they are not.  If $user_id was FALSE due to some earlier bug, but you just happily cast it to (int) and use it, you might not notice the flaw.

Type hinting return values.

You can’t do this in PHP, but you can accomplish the same thing with type casting.  Just cast your return values on the way out, as in return (bool)$success.  This will remove any doubt that the return value is a boolean, but it’s something you’ll have to do on the code side and is thus open to coder error.

Class sniffing.

You can use is_object() to make sure a variable is an object, but what if you want to be more specific?  Use the instanceof operator.  This will tell you if the left side is an instance of the class named in the right operator (or a subclass of it).  Here you want to use the class identifier, not the class name in a string.  For example $a instanceof MyClass.  This is a great idea to use often when type hinting is insufficient.  For example, I always use this in the get() method of a singleton class to make sure my static instance is an instance of what it should be an instance of (it never hurts to be safe with something so vital).

Metrics.

And for the number crunchers… I didn’t take the time to do all kinds of useful tests here, but I did a couple tests on how == performs against === in terms of efficiency.  Here’s what I found over 100 million comparisons.  Comparing “100” to the integer 100, it took 37.91 seconds to compare them using ==.  However using === it took (brace yourself) 37.90 seconds.  That’s a .01 second (0.026%) improvement over 100 million iterations… wow!

Yeah so that wasn’t impressive.  I did the same thing, this time comparing “blue” to 3.14.  This time the numbers I got were 31.86 and 28.12 respectively,  This time we see more of a difference, but it’s still so small I would call it inconclusive.  My hunch is that === is slightly more efficient (I’ve heard as much before) but I think the difference is negligible.

Sometimes you want to define a function that accepts an arbitrary list of arguments. A rudimentary example would be a function that you pass strings into, and it combines them with a space in between. In reality, this would be easily accomplished with implode and an array of strings, but let’s walk through our own function to see how it works.  Once we’re done with the basics, I’ll show you a favorite technique of mine that I usually use in all of my projects.

The basic example

<?php

function combine_words() {
	// Get run-time arguments as an array.
	$words = func_get_args();

	// Reinvent the implode() wheel.
	switch(count($words)) {
		case 0:
			return '';

		case 1:
			return $words[0];

		default:
			$str = array_shift($words);
			foreach	($words as $word) {
				$str .= ' '. $word;
			}
			return $str;
	}
}

?>

Let’s break it down one step at a time:

function combine_words() {
There’s nothing fancy to point out here except that we did not define any arguments.  That’s because we want our argument to accept an arbitrary number of strings.  Zero, one, or a bunch.

$words = func_get_args();
This is really the only important line.  What this does is grab all of the arguments provided by the client code as an array.  The order is maintained.  Using this method you lose having useful naming conventions in your function arguments, because they aren’t named… they’re just indexes in an array.  What you gain, though, is flexibility.

That’s it.  The rest of the function is inconsequential junk that just does for us what implode() already does, and only included because I like my examples to be runnable.

A more useful example…

Have you used printf before?  Maybe not since your programming days?  Well you should.  It carries a few perks.  First of all, it’s quite efficient.  Second of all, it can make things much easier to read by leaving complex agument wrangling to the end of the printf call, and the actual string template in one nicely readable place.  Third, it does a lot of formatting for you without you having to make extra function calls.  Seriously, you should use printf more often.  Tsk tsk.

Now that I’ve plugged printf I’ll offer a brief background for anyone not familiar with it.  The first argument to printf is a string, and required.  The string may have identifiers inside it which will be substituted with dynamic values.  For example, let’s say the first string is “Hello.  My name is %s and I am %d years old.” Read the PHP manual for printf for specifics, but generally you will use %s, %d, and %f to tell PHP that you want a string, decimal, or float variable substituted in.  The rest of the arguments to printf are, in order, the values you want subbed in.  Here’s an example printf call:

<?php
$name = 'Mark';
$age = 'infinity';
printf('Hello.  My name is %s and I am %d years old.', $name, $age);

?>

So as you may have noticed, printf is an example of a variable length argument function.  The first argument, the string, is required.  How many arguments you need after that (if any) is determined by the string itself.  In our case, we needed two more arguments.

I find this very useful and in particular I find myself using it all the time with error handling.  In a large project with custom Exceptions, I’ll use this method.  Here’s how I do it.

<?php

/**
 * Our Exception class will just extend Exception and will have
 * printf-style functionality.  Other than that, it does nothing special.
class MyException extends Exception {
    public function __construct() {

        // Make sure we have good arguments.
        $args = func_get_args();
        if (count($args) === 0) {
            trigger_error('Warning: wrong parameter count for MyException, expected 1 or more.',
            E_USER_WARNING);
        }

        // Pull out the string from the rest of the arguments.
        $str = array_shift($args);

        // and printf it
        $msg = vsprintf($str, $args);

        parent::__construct($msg);
    }
}

?>

Now for explanation. We start with MyExcpetion which extends Exception (if you’re not familiar with exception-based error handling, hang your head in shame as you click the link and learn about it).  Our goal is just to have MyException function just like Exception, but to be able to call it in a way such as throw new MyException(‘The %s was set to %d!’, $foo, $bar);.  You’ll love it, trust me.

class MyException extends Exception {
Custom Exception classes must extend Exception.

public function __construct() {
We’re defining the constructor, the function called when a new instance is created.

$args = func_get_args();
We grab the arguments passed into the constructor as an array.  We don’t know the size of the array, though.

if (count($args) === 0) {
trigger_error(‘Warning: wrong parameter count for MyException, expected 1 or more.’,
E_USER_WARNING);
}
So we verify it.  We want a minimum of 1, which would simply be a static string.  0 arguments should trigger an error, so we do just that.

$str = array_shift($args);
Here we just pull the first argument off the array and save it in $str.

$msg = vsprintf($str, $args);
Here we compile the message by calling vsprintf with the remaning arguments as an array.  vsprintf is the same as printf, except for two key differences:

  1. It takes two arguments: the format string and an array containing values to sub in.  This is as opposed to printf, which takes a format string and then an argument for each sub-in.
  2. It returns the string it creates, instead of printing it out.  This allows us to capture it in the variable $msg.

parent::__construct($msg);
We call Exception’s constructor, passing in the message we constructed, and voi la!  We are done.

Some other useful stuff.

func_get_args comes with two friends which aren’t quite as useful, but can be sometimes:

  1. func_num_args lets you see how many arguments were passed into the function call before acquiring them with func_get_args.  Presumable this is a faster operation than using func_get_args() + count(), but I’ve never tested to find out.
  2. func_get_arg, not to be confused with func_get_args, lets you specify which argument you want and returns just that one.  func_num_agrs() combined with func_get_arg() would let you for-loop through your arguments if you wanted to.  I haven’t come across a reason for using this, but I’m sure one exists.

Technique for making PHP behave like it should out of the box.

Does that heading sound passive-aggressive?  Whoops.  PHP was designed for “rapid application development” or “rapid dissemination of tragically poor coding habits leading to a generation of ‘web developers’ who can’t code their way out of a paper bag”.  The second phrase was too long so they went with the first.

Anyhow, a great example is in PHP’s much-too-forgiving treatment of arguments in a function call.  Consider this:

<?php

function divide($this_one, $by_this_one) {
    return $this_one / $by_this_one;
}
echo divide(4);

?>

Against intuition, this triggers a warning, not an error.  It actual triggers two warnings, both of which should be errors:

  1. Warning: Missing argument 2 for divide(), called in /home/me/- on line 6 and defined in /home/mslade/- on line 3
  2. Warning: Division by zero in /home/me/- on line 4

#2, division by zero, should be an error because you can’t do it.  I’m asking PHP to perform a flawed calculation (my mistake) but PHP is forgiving me and returning its best guess, which happens to be bool(false) if anyone’s curious.  There’s not much room for discussion here: 4 divided by 0 is not “false”, so PHP shouldn’t return it.  It should error out.  But that’s neither here nor there, just me ranting about “rapid application development”.

#1 should be an error because I defined a function that accepts two arguments with no default value for either.  Well as it turns out, PHP treats your arguments list as a suggestion, not a rule.  If you specify that an argument is required but it’s not provided by a client, PHP will trigger a warning and default the value to NULL.  Thus any PHP function you ever write really looks like tihs: function divide($this_one = NULL, $by_this_one = NULL).  Again not much room for discussion here.  If I define a function that takes two arguments with no default values then it should require two arguments.  Failure to provide them should cause an error, not a warning.

Ok so off the soap-box, let’s talk about a technique that makes PHP a little less ‘rapid’ and a little more ‘good’.  This is a simple technique, but adds to the code and just might not be something you want to use everywhere.  You might, though, want to use it in at least your most mission-critical functions, or any functions which are super-sensitive to having the right information coming in.  It’s just another form of sanitization, which you already do, so it’s nothing special.

<?php

function divide($this_one, $by_this_one) {
    if (func_num_args() !== 2) {
        trigger_error('divide() requires two arguments.', E_USER_ERROR);
    }
    return $this_one / $by_this_one;
}
echo divide(4);

?>

Now, you’ll still get the warning about only providing one argument, but it will be immediately followed by an error that ceases program execution.  Now you can save yourself the embarrassment when somebody reviews the web logs and sees that you tried to divide by NULL.

Metrics for the geeks.

If you’re a true engineer, you’re wondering what’s faster: using an array as your variable list of arguments, or a truly variable list of arguments via func_get_args().  I was too, so I put together a simple benchmark.  The test consisted of two functions:

  1. implode_1(array $args) takes an array of strings and implodes them.
  2. implode_2() uses func_get_args() to fetch the list of arguments as an array and implodes them.

I called each 5 million times with the same list of strings.  For implode_1 I defined the array on the fly each time.  The results kind of surprised me: implode_1 took 36.76 seconds, and implode_2 took 40.54 seconds.  So apparently it’s actually faster to define on the fly arrays of arguments and pass them into a function than to just pass them in as separate arguments and have the function sort it out via func_get_args().  I re-ran the same test 1 million times but with the array for implode_1 predefined (once) and re-used each time… here it took 3.94 seconds against 8.11 for implode_2.

So in summary, the methods described for variable-length arguments above are fun and can save you having to wrap everything in arrays, but are not necessarily more efficient.  However all things considered, the timing’s pretty close unless you have a single array you’re using over and over and over (in which case using a single array argument is much faster than variable arguments).  I would say using one method over another should generally come down to a question of style unless you really need those extra milliseconds, in which case steer clear of func_get_args if you can avoid it.

Pros and cons.

Following that, some brief pros and cons.  Pros:

  • Simpler to call (no need to wrap arguments in an array).
  • Impresses your friends.

Cons:

  • Marginally slower compared to wrapping the arguments in an array and passing that in.  Much slower than re-using a pre-existing array and calling with that.
  • As with default arguments in PHP, your variable length argument list must come after any required variables and you can’t have default values for any other arguments in a variable-argument-length function.  For example, I’d love to write my_trigger_error(…) to accept printf-style strings, but in order to do so I would have to either hard-code it to trigger_error using a specific E_*_ERROR level, or the error level would have to become a required argument in my_trigger_error.  Boo.

This is intended to be the first in a series of brief posts about the many obscure features built into PHP.  You’ll notice I decided to be different and not number the series, a la “10 Cool Things…”, because I don’t know how many cool things PHP can do.  Historically I’ve had (and still have) a long list of grudges against PHP and I generally don’t think it’s very good.  However I prefer it over Python and don’t like paying the Microsoft tax, and Ruby just happens to be at the bottom of my to-do list.

Anyhow, on to what you actually care about.

Ticks

In computing a tick is somewhat arbitrary but generally means an iteration of some loop.  For example if I wanted an application to track the status of an upload, I would have it check on the bits uploaded once every second or so, and update itself accordingly.  In that case, I’m ticking once per second.  Here’s how you can do it with PHP.

Example

<?php

/**
 * This is the function I want called every tick.  It's going to write a
 * line to a file ('tick.log') containing the timestamp and number of bytes
 * being used.
 */
function tick()
{
    static $fh;

    // Make sure we have our file handle.
    if (!is_resource($fh)) {
        $fh = fopen('tick.log', 'a');
    }

    if ($fh) {
        fwrite($fh, sprintf("[%s] Consuming %d bytes.\n",
            date('m-d-Y H:i:s'), memory_get_usage()));
    }
}

// Tell PHP how many low-level instructions should make up a "tick".
declare(ticks=1);

register_tick_function('tick');

// Now we just loop for a little while, eating up random amounts of
// memory as we go.
$continue = 9;
$memory_monster = array();
while (--$continue) {
    $memory_monster[] = str_repeat('.', rand(1, 500));
    echo "Waiting... $continue\n";
    sleep(1);
}

?>

So what’s going on with that?  Let’s go through it bit by bit.

function tick() {…}
This is a basic function that we’re going to use as a callback every tick.  It just opens a file called tick.log and logs the current time and memory usage.  It’s pretty boring and hopefully self-explanatory.

register_tick_function(‘tick’);
This is where we tell PHP that we want our tick function above to be fired every tick.  If our function was called log_mem_usage then this would instead be register_tick_function(‘log_mem_usage’).  Simple callback stuff.

declare(ticks=1);
Now it’s getting interesting, and it’s time to talk about what a ‘tick’ is in PHP land.  In PHP, a tick is an event that gets triggered every N low-level instructions.  Now keep in mind that unless you’re a core PHP developer with a lot of free time, and maybe not then, you probably have no clue how many low-level instructions it takes to get from point A to point B in your code.  You shouldn’t look at this as a mathematic problem.  It would be more useful to simple say that this is how frequently you want a ‘tick’ to fire, with 1 being the most frequent and higher numbers being much less frequent.

I would also like to point out that I registered my tick function before declaring how many statements in a tick.  You can reverse the order, but may not capture some ticks between your declare(…) and your register_tick_function().

I will also mention that you can also wrap your code in a declare block.  For example:

<?php
declare(ticks=5) {
	// code goes here
}
?>

If you do this, your tick callback is only fired for ticks within the { and }.  If you do not use the block notation, PHP will fire any tickets after your declare statement.

while (–$continue) {…}
This loop is just for demonstration purposes.  All I’m doing is looping 9 times, sleeping for one second each time, taking a total of 10 seconds.  Each iteration I’m sticking a random-lengthed string into the $memory_monster array just to consume bytes and cause memory_get_usage() to produce dynamic output.

Drawbacks

There are two note-worthy drawbacks.

  1. Depending on where you look on PHP‘s website, the ticks feature (and related functions) is deprecated as of 5.3 or 6.0.  I can’t imagine why this is getting the axe in lieu of many other horrible PHP features.
  2. This isn’t multi-threading.  This is why ‘tick’ can only be fired every so many instructions instead of firing every so many seconds, which would typically be more useful.  What this means to you, the code monkey, is that the time between ticks is completely unreliable, but perhaps tempting to rely on.  Let’s say you try connecting to a MySQL server that’s not functioning.  Many seconds may pass, but you haven’t executed a single instruction.  Ticks will not fire.  Following our above example, intended to track memory usage across a script, we would see a gap of X seconds during which we tried connecting to MySQL.  In that hypothetical situation, being able to tick every N seconds would’ve been much more useful and reliable.

Conclusion

Fortunately, there’s another method that works around both drawbacks listed above called forking.  I’ll be talking about that next.