Type Preferences

I hate runtime errors. I love types. This post aims to explain why, and take a dive into my preference around approaches to navigating types. It is definitely worth noting that this is not a suggestion of a “right answer” - in fact I believe quite strongly that one does not exist. My preference is tightly coupled to the context in which I work!

I find that a decent portion of bugs I encounter, are due to “unhandled cases”. Let’s start with the following example:

    
<?php
 
function randomAnimal(): string
{
$animals = ['dog', 'cat'];
 
return $animals[array_rand($animals)];
}
 
function noise(string $animal): string
{
return match($animal) {
'dog' => 'bark',
'cat' => 'meow',
default => throw new Exception('Not an Animal!'),
};
}
 
$animal = randomAnimal(); // 'dog'|'cat'
echo noise($animal); // 'bark'|'meow'
Info
You could opt out of the default arm, and live with the unhandled match exception - all follow up points apply the same either way.

Great, we handle Dogs and Cats - but now we need Fish too. Let’s add them in:

    
<?php
 
function randomAnimal(): string
{
// added here...
$animals = ['dog', 'cat', 'fish'];
 
return $animals[array_rand($animals)];
}
 
function noise(string $animal): string
{
return match($animal) {
'dog' => 'bark',
'cat' => 'meow',
default => throw new Exception('Not an Animal!'),
};
}
 
$animal = randomAnimal(); // 'dog'|'cat'
echo noise($animal); // 'bark'|'meow'... or an Exception!

Our code is broken - but our static analyser knows no better. This is a great sign that there is room for improvement in our typing. The problem here is that our Animals aren’t actually strings (…duh). By expressing this in our types, our static analyser can report potential issues:

    
<?php
 
enum Animal {
case Dog;
case Cat;
case Fish;
}
 
function randomAnimal(): Animal
{
$animals = Animal::cases();
 
return $animals[array_rand($animals)];
}
 
function noise(Animal $animal): string
{
return match($animal) {
Animal::Dog => 'bark',
Animal::Cat => 'meow',
// Match expression does not handle remaining value: Animal::Fish
};
}
Info
Bonus points if you’re currently complaining that I didn’t move the functions inside the enum declaration - I just wanted to keep it as close to the first example as possible.

You may be wondering why I’m happy to keep the strings representing animal noises: this leads us to our next section…

Magic Strings

I hate magic strings. A magic string is any string , that actually represents something richer. In our earlier example, our Animals were magic strings - so why aren’t the animal noises? I’d argue that due to them only being used for output (a context in which any string is valid) - they’re not magic. Equally, they could have been:

    
<?php
 
function volume(string $noise): int
{
// oh no - back to magic strings!
return match($noise) {
'bark' => 10,
'meow' => 7,
'glug' => 2,
}
}

As always - it depends! Correct typing is about the context in which data is used. The previous issue was that the string type allows for more possibilities than we actually want: our type declaration was not strict enough. The stricter your types, the lower the possible states values can be in. This leads to less necessary context - which is totally underrated: who wants to think? To really hammer this home - imagine we’re dealing with animal noises again.

    
<?php
 
function randomAnimalNoise(): string
{
$animalNoises = ['meow', 'gulp', 'bark'];
 
return $animalNoises[array_rand($animalNoises)];
}

You want to add growl as a new Animal Noise . We’ve already established that your type ignorance has rendered your static analyser without a clue here - so you’re on your own. You decide to find an single item - bark in this case - and add growl to any array containing it.

    
<?php
 
function randomAnimalNoise(): string
{
// easy - right? too easy...
$animalNoises = ['meow', 'gulp', 'bark', 'growl'];
 
return $animalNoises[array_rand($animalNoises)];
}
 
function randomTreePart(): string
{
// okay - getting boring now...
$treeParts = ['branch', 'trunk', 'bark', 'growl'];
 
return $treeParts[array_rand($treeParts)];
}

Did you spot it? Growl is now a valid Tree Part - oh no. Now imagine a large codebase, with files upon files of magic strings and crossovers. No thanks.

The Contract

Type signatures can be thought of as programming contracts. The function signature function foo(string $bar): int reads “a function (named foo ) that accepts any string , and returns an integer ” - easy right? Signatures concisely describe what inputs a function accepts, and what it will return. If we start doing additional checks inside our function, this contract (and our concise description) is broken. Taking us back a little bit:

    
<?php
 
// accepts any string, and returns an integer
function noise(string $animal): string
{
// but when we look deeper...
// we only accept two very specific strings!
return match($animal) {
'dog' => 'bark',
'cat' => 'meow',
};
}

Not being able to trust type signatures can add significant mental overhead to reading code. Have you ever had to stop and think “can I pass this here?” - your static analyser should do that for you!

Another issue with breaking the type signature contract, is the appearance of resulting errors. Still considering the above function - imagine we pass an invalid string . The error will look something like this:

    
Uncaught UnhandledMatchError: Unhandled match case 'bark'
in function noise

This error is reported as a problem with the function declaration. The actual issue we’re dealing with, is caused by the function call. Consider the equivalent with our Enum implementation instead:

    
<?php
 
enum Animal: string {
case Dog = 'dog';
case Cat = 'cat';
}
 
function noise(Animal $animal): string
{
return match($animal) {
Animal::Dog => 'bark',
Animal::Cat => 'meow',
};
}
 
$animal = Animal::from('bark');
// any other code could be in between these two stages
noise($animal);
    
Uncaught ValueError: "bark" is not a valid backing value for enum Animal
in index.php line: Animal::from('bark')

Lets dive back into function signatures: The function signature function foo(X $bar): Y reads reads “a function (named foo ) that accepts any X and returns an instance of Y ”. I think it is important to understand the difference here between “accepts any X ” and “returns an instance of Y ” here.

Accepts any X , means the function in question should be able to deal with any X (no surprise here right?) - anything passing this type check should be a valid input for this function . Returns an instance of Y is a little trickier. It means that any code calling the function should be capable of dealing with any instance of Y . It may be easier to look at this from the perspective of why not to do it wrong. Consider we have this function:

    
<?php
 
function add(mixed $first, mixed $second): mixed
{
return $first . $second;
}

Alarm bells right? The concatenation operator (.) operation is only valid for a handful of types (which we’ll treat as string for now). It’d result in a type error for others: our function does not work for booleans , so they should not be in the signature. Great. So what is wrong with returning mixed here? Let's look at a usage of this function.

    
<?php
 
function concat(string $first, string $second): mixed
{
return $first . $second;
}
 
$stringOne = ask('Please enter the first string: ');
$stringTwo = ask('Please enter the second string: ');
 
$result = add($stringOne, $stringTwo); // mixed
echo $result;

Passing mixed to echo is not type safe - it only accepts strings . So we’d need to narrow it…

    
<?php
 
$result = add($stringOne, $stringTwo); // mixed
 
if(!is_string($result)) {
throw new Exception('Result is not a string');
}
 
// $result must be string here
 
echo $result;

This sucks. Having to assert that a value is not of a certain type is a big red flag that the expression producing it has too wide a return type. We know that concatenating two strings , can only return a string - lets reflect that!

    
<?php
 
function concat(string $first, string $second): string
{
return $first . $second;
}
 
$stringOne = ask('Please enter the first string: ');
$stringTwo = ask('Please enter the second string: ');
 
$result = add($stringOne, $stringTwo); // string
 
echo $result;

Much neater. Setting wide return types leads to unnecessary type detection (or lack of type safety) later on.

Fall Hard and Fall Fast

By forcing data into the narrowest relevant type at the earliest possible time, we limit the amount of back tracing we have to do when a related error occurs. Consider the following example, in which we’re signing up a new user - but require an email confirmation before storing their details.

    
<?php
 
function registerUser(Request $request, MailService $mailer): Response
{
// extract user details from the request
$data = $request->only('name', 'email');
 
// send email confirmation
$verification = $mailer->sendConfirmation($data->email);
 
$verification->onceComplete(function() use ($data) {
// Create the user account
$user = new User(...$data);
$user->save();
});
 
return response(200);
}

Awesome - but our user didn’t enter their name. Application will return a 500, but no big deal right? The user will hopefully just return to the form, see they missed their name and try again. Except the error happened down here…

    
<?php
 
function registerUser(Request $request, MailService $mailer): Response
{
// extract user details from the request
$data = $request->only('name', 'email');
 
// send email confirmation
$verification = $mailer->sendConfirmation($data['email']);
 
$verification->onceComplete(function() use ($data) {
// Create the user account
$user = new User(...$data);
// Argument #1 ($name) not passed
$user->save();
});
 
return response(200);
}

So our email went out, and the user clicked it - so they think they’ve got an account. Now they’re onto our support team… If we’d instead validated our data at the beginning of the process:

    
<?php
 
function registerUser(Request $request, MailService $mailer): Response
{
// extract user details from the request
$data = $request->only('name', 'email');
 
// Argument #1 ($name) not passed
$userData = new UserData(...$data);
 
// send email confirmation
$verification = $mailer->sendConfirmation($userData->email);
 
$verification->onceComplete(function() use ($userData) {
// Create the user account
$user = $userData->toUser();
$user->save();
});
 
return response(200);
}
 
class UserDto {
public function __construct(
protected string $name,
protected string $email,
) {}
 
public function toUser(): User
{
return new User(
name: $this->name,
email: $this->email,
);
}
}

We’d still get our error - which probably wants neatening up before being displayed to the user - but crucially it happens before they receive the confirmation email. Our support team is free to deal with someone else… This doesn’t just apply to user entered data either: narrower typing can be especially helpful for functions that transform data.

    
<?php
 
function groupByMonth(array $items): array
{
$grouped = [];
 
foreach($items as $item) {
$month = $item['month'];
 
$grouped[$month][] = $item;
}
 
return $grouped;
}

This function signature reads “a function (named groupByMonth ) that accepts any array , and returns an array ”. First of all, we’ve got an problem resembling 'magic strings' from earlier - our function accepts any array , but we can actually only handle arrays in which the items all contain a month key.

    
<?php
 
$cats = [
[
'name' => 'Cain',
'age' => 4,
],
[
'name' => 'Pumpkin',
'age' => 3,
],
];
 
groupByMonth($cats);
// Undefined array key "month"

Lets update our signature!

    
<?php
 
/**
* @param array<array-key, array{month: string}> $items
*/
function groupByMonth(array $items): array
{
$grouped = [];
 
foreach($items as $item) {
$month = $item['month'];
 
$grouped[$month][] = $item;
}
 
return $grouped;
}

Awesome, now our static analyser will catch this:

    
<?php
 
$cats = [
[
'name' => 'Cain',
'age' => 4,
],
[
'name' => 'Pumpkin',
'age' => 3,
],
];
 
groupByMonth($cats);
// Parameter #1 $items of function groupByMonth expects
// array<array{month: string}>,
// array{array{name: 'Cain', age: 4}, array{name: 'Pumpkin', age: 3}}
// given.

For our return type, we’re taking a quick peek back to the contract. Currently, code calling this function should be able to deal with any array - that represents a real variety of data, quite the responsibility! Much like our mixed type earlier, we can narrow this to reduce the responsibility.

    
<?php
 
/**
* @param array<array-key, array{month: string}> $items
* @return array<string, list<array{month: string}>>
*/
function groupByMonth(array $items): array
{
$grouped = [];
 
foreach($items as $item) {
$month = $item['month'];
 
$grouped[$month][] = $item;
}
 
return $grouped;
}

Now our calling code only has to deal with the specific type of array structure we provide - a much lighter load. You may be concerned at this scary looking type signature - “it’s so complicated” I hear you say. Maybe... until you’ve used them a bit more. But we haven’t actually changed the code, we’ve just added extra information for readers. We’ve highlighted complexity - not added it. Better the devil you know, right?

Info
Bonus points if you’re complaining I haven’t used a generic here. That’s one for another day…

Conclusion

Thanks for reading. As previously mentioned, these are my preferences - not the answer. I’d be quite surprised if they don’t change over time, as my experience & knowledge grows. As always, any feedback is welcome.