Recently I've found myself quite interested in basic type theory - mostly from seeing type related content online which I didn't understand. After doing some digging, I decided this would be a good topic for my first blog post. Any feedback is extremely welcome!
At the simplest level, a type is a collection of values. For example the type boolean has a very small collection of values: true and false. The type integer (assuming signed, 32bit) has a much larger collection: any whole number between -2,147,483,648 and 2,147,483,647. A typed value cannot be altered to an invalid state (outside its collection), or a fatal type error will occur. This guarantees that whilst a program is executing, every value is valid for its type.
Types also dictate the available operations that can be performed on a value. Leading from the previous example, the type boolean often includes the operations: AND, OR, NOT and XOR, integer often includes: add (+), subtract (-), multiply (*) and divide (/). It wouldn't make sense to multiply two booleans, nor XOR two integers.
A complete set of types is called a type system. Type systems differ greatly, often based on their use cases. Type systems greatly influences how data interactions occur, as well as a program's syntax and semantics. It is worth noting that just because a type appears in two separate type systems, they do not necessarily share the same behaviour.
Interestingly, it has been proposed (by Gilad Bracha) that choice of type system could be made independent of choice of language; that a type system could be a module that can be plugged into a language as needed. The proposition suggests this to be advantageous, describing mandatory type systems as making languages less expressive and code more fragile. The requirement for a type system not to affect the semantics of the language is difficult to fulfill, and is rather uncommon.
Type conversion (or casting, or juggling) is the process of converting data, from one type to another. For example the integer 12 could be cast into a string of value “12” (and vice versa).
Type conversion is not necessarily lossless. Take the example of casting the decimal value 1.4 to an integer of value 1. Attempting to cast the integer 1 back into a decimal, results in decimal 1.0 - unequal to it's initial value. Type conversions are reproducible, meaning converting the same value from one type to another will always yield the same result.
Type conversion can either be explicit, or implicit. Explicit type conversions require additional annotation, whilst implicit type conversions are instigated by the implementation of the type system itself. The criteria for an implicit type conversion differs between type systems.
Type safety is the extent to which a type system prevents type errors, often through its handling of implicit type conversions. For example, a function may take a single string argument. If called with an integer of value 12, what should it do? There are two reasonable options:
Implicit type conversions allows a program to run when it would otherwise fail, without adding extra boilerplate. The cost of this, is that sometimes implicit type conversions may come with unwanted side effects: for example passing two string phone numbers to a function expecting integers which then adds them together (when you'd probably want this to halt execution).
For this reason, different type systems have different views to type conversion. Generally, the stronger a type system is, the less likely it is to implicitly cast data from one type to another.
Type strictness is the extent to which a type system allows data to change type during its lifetime. The simplest example of this, is re-declaring a variable with an entirely unrelated value, of a different (non-castable) type. Strict typing helps ensure data is not being used unexpectedly, and helps prevent unwanted long-lived mutable state. Loose typing allows data to be flexible and concise, allowing rapid development.
A type system may require that each declared value is assigned a type. A type system may not allow users to assign types to their values, and instead infer the type each time. A type system may be capable of both. The ability of a type system to infer types enables programs to be less verbose, at the expense of introducing ambiguity.
There are two ways that a type system can verify that a program fulfills its typesafe rules: statically and dynamically (important distinction here: a type system with dynamic type checks is not necessarily a dynamic programming language). Any type check performed whilst running a program is dynamic, and any type check performed solely from analysing the text content of a program is static.
Static type checking comes with an immediately apparent benefit - the program does not need to be run for insight to be gained. This allows any potential issues to be caught extremely quickly. For complete type safety to be assured by static checks alone, lots of type information regarding the program must be available. The value of static type checks grow exponentially, with the strength of the type system. Programs with this requirement can often end up more verbose.
Dynamic type checks allows some of this information to remain unspecified, as it can be inferred by the running program. Dynamic type checks also allow constructs that static type checking would outright reject, such as functions that execute arbitrary strings.
It's important to understand that a type system needn't only be checked with either static or dynamic typing. In many type systems, it is of benefit for both checks to be made (even if not necessarily in full).
Developers only accustomed to one kind of type checking often find frustrations when running into type errors reported by the other kind. For example developers who predominantly use dynamic type checks often find themselves “fighting the compiler” when dealing with static type checks, whilst developers who predominantly use static type checks often find their dynamically typed programs unexpectedly halting midway through.
The balance between static and dynamic type checking that a type system uses, is often dependent on its balance between flexibility and robustness.
Readers familiar with program translation, may be surprised that the words compiler and interpreter have not yet made much of an appearance. This is to avoid the common misconception that a programming language (or type system) must either be compiled, or interpreted. Whilst it is true that programming languages (or type systems) typically only follow one approach - in principle any language can do either.
Compiled and interpreted programs, do have their own advantages and disadvantages. Compilers often produce programs with a higher performance ceiling (meaning they will usually run faster), at the cost of requiring a build step - making the development process less responsive to change. Compiled programs may also be easier to deploy (and manage the deployment of) due to not needing the translator on the target machine (unlike interpreted programs).
Memory safety is the extent to which a language prevents memory errors, often through the methods of memory access it provides. For example, a language which allows directly accessing memory by address can not be memory safe. Memory safety is a very important part of a programming language.
The most common approach for achieving memory safety, is garbage collection. A garbage collector is a separate program (built into the language) that runs alongside the executing program, deallocating memory that is no longer needed. This relieves the often heavy load of manual memory management from the user, but comes with some overhead as this program needs to calculate when memory can be deallocated, thus requiring system resources.
Type safe type systems lay the foundation for potential memory safety without the overhead of a garbage collector. For example, using of an ownership system: memory is deallocated when it's owner goes out of scope. This prevents the need to manually manage memory, with no execution overhead. It does however, require a greater level of knowledge on how memory management works. The behaviour of a datum's ownership is dependent on its type - specifically whether it's type dictates a fixed or variable amount of memory.
The needs of a type system are highly dependent on its uses. I strongly believe there are valid use cases for all the mentioned options (although probably not all combinations). At the heart of each of these options, is a tradeoff between flexibility and robustness. Which of these is more useful to your type system (or program utilising it), is rather dependent on the program. If an application halting from a type error is greatly destructive, a strongly typed robust type system is very likely for you. If you're building a web app based game, where halting from a type error may be much less destructive (albeit not ideal) - this tradeoff for a faster development experience may be preferable.
Thanks for reading, I hope it was of value. Again, any feedback is greatly appreciated!