Rust Type Metaprogramming - Part 1
Updated: 2024-12-05
Published: 2024-01-04
- Proof carrying code
- code that utilizes mechanisms provided by a compiler to demonstrate and verify its own correctness through type restrictions or requirements the compiler can understand.
- Metaprogramming
- a term used to describe development that involves writing proof carrying code executed at compile-time by the compiler in order to provide less verbose type interfaces.
- Associated type projection
- resolution of a type from relative type path.
- Type path
-
a sequence of delimited type identifiers (with
::
in Rust) whose resolution yields a type which is a child element of the initial type structure (e.g.FromStr::Err
). Not to be confused with paths in topology.
Context
This article intoduces type metaprogramming with simple type list construction and mapping. Later articles will delve into other operations that will be necessary do consume produced type lists.
It will go over type transformations that are necessary in order to provide end-users with a type safe interface while reducing the area of library internals they're exposed to.
Rust metaprogramming is generally done with macros, and when people in Rust community say metaprogramming they usually refer to writing procedural macros.
- Macros do carry proof because code generated by macros will produce compiler errors when type requirements aren't met or the produced syntax is invalid.
- Macros however aren't ideal in some scenarios and allow end-users to violate semantic API requirements which can in turn cause undefined behavior or error messages which can't be easily controlled by API author.
Prior Art
After writing the inital version of this post I stumbled upon a conference talk "Type Theory for the Working Rustacean" by Dan Pittman, which associates the ideas behind this post with formal foundations in type theory.
Other than that, I've only managed to find posts that describe metaprogramming via macros in context of Rust:
- "Understanding and Implementing Rust's Metaprogramming" by Reintech
- "That's so Rusty: Metaprogramming" by imaculate
- "Metaprogramming with macros" by packt publishing
- "Rust Macros — Advanced Use Cases, Metaprogramming Mastery, and Code Generation" by Alexandra Grosu
- and so on...
Another approach I came across, in article "Type-directed metaprogramming in Rust" by Will Crichton, is using rustc API (rust compiler) directly in order to achieve similar goals as this post, but internal rustc API is unstable which makes this approach both harder to implement and more time consuming to maintain across versions.
- I find it to be a great introduction to rust internals, HIR and AST representations.
- It makes sense to go that route it if one is developing something akin to
c2rust
.
In general, type metaprogramming and macros/preprocessors strive to achieve similar goals but through different means. Type metaprogramming usually achieves greater level of integration with the language type system, at the cost of being somewhat harder to write and debug. Conversely, macros and preprocessors can (sometimes) provide much nicer error messages, are easier to reason about, but are much more verbose to write and don't integrate into the build process quite as nicely.
Tuples As Type Lists
Lists are used to store varying number of runtime values of the same compile-time type. Tuples are used to group differently typed values with a tradeoff of being fixed in size. Tuple size constraint as well as fixed typing can't be circumvented, as that would void soundness of the type system. That is, if one can achieve it, they've found a compiler or specification/implementation bug.
Type | Type Constraint | Size Constraint |
---|---|---|
Array | Uniform | Compile-time |
List | Uniform | Runtime |
Tuple | Varying | Compile-time |
The fixed size is enforced directly by local type signatures, but it can also be inferred in languages that support one of specialization, templatization or generic types. In other words, we can use a variety of methods to evaluate final type elements - so long as we keep in mind that they will have to be inferred by the compiler at compile-time.
Rust doesn't have variable argument generics (yet) which is a well-trodden path of implementing this sort of functionality, but it's got one of the best macro systems which can be used instead.
In order to manipulate a type list or (more correctly) sequence, destructuring into head and tail elements is a necessity.
Using constructive type theory (or Martin-Löf type theory) a tuple, and thereby a TypeList
is defined as:
head
element is always the first element of the tuple:
Exceptionally, if a tuple is empty, then a head
function doesn't exist. That is, it's not defined for empty tuples. In Rust this can be expressed by projecting into a never type !
- when it gets added; or by using some sentinel zero-variant enum with the same semantics:
enum Never {}
tail
element is always the tuple without it's first (head) element:
Unlike head
, tail
can be defined for empty tuples to return another empty tuple (similarly to how containers remain empty even if pop
/remove
returns None
).
Type Sequence Trait
It's a good idea to start by defining a trait that can be used to specify some arbitrary type sequence. This will serve as an important stepping stone and make out later implementation simpler. It's equivalent to above type theory definition.
Due to lack of variable argument (vararg) generics, there must exist an upper bound to supported sequence length, but thanks to macros we can make it arbitrarily large. This is primarily a development inconvenience as vararg generics end up producing the same executable code.
We declare our type sequence trait with previously mentioned fundamental properties attached as we have limited use for an empty (marker) trait:
pub trait TypeList {
type Head;
type Tail: TypeList;
}
impl TypeList for () {
type Head = Never; // Should be ! type; once it's supported
type Tail = ();
}
impl<A> TypeList for (A,) {
type Head = A;
type Tail = ();
}
impl<A, B> TypeList for (A, B) {
type Head = A;
type Tail = (B,);
}
// and so on...
Now after seeing the first few cases, it's clearer that TypeList
implementations can be generalized and then generated using macros:
pub trait TypeList {
type Head;
type Tail: TypeList;
}
macro_rules! impl_type_list {
() => {
impl TypeList for () {
type Head = ();
type Tail = ();
}
};
($A:ident) => {
impl<$A> TypeList for ($A,) {
type Head = $A;
type Tail = ();
}
};
($A:ident, $($B:ident),*) => {
impl<$A, $($B),*> TypeList for ($A, $($B),*) {
type Head = $A;
type Tail = ($($B,)*);
}
};
}
macro_rules! smaller_tuples_too {
($m: ident, $ty: ident) => {
$m!{}
$m!{$ty}
};
($m: ident, $ty: ident, $($tt: ident),*) => {
smaller_tuples_too!{$m, $($tt),*}
$m!{$ty, $($tt),*}
};
}
// short for calling smaller_tuples_too with 64 arguments
macro_rules! all_supported_tuples {
($m: ident) => {
#[rustfmt::skip]
smaller_tuples_too!($m,
A, B, C, D, E, F, G, H, I, J,
K, L, M, N, O, P, Q, R, S, T,
U, V, W, X, Y, Z, AA, AB, AC, AD,
AE, AF, AG, AH, AI, AJ, AK, AL, AM, AN,
AO, AP, AQ, AR, AS, AT, AU, AV, AW, AX,
AY, AZ, BA, BB, BC, BD, BE, BF, BG, BH,
BI, BJ, BK, BL
);
};
}
Combining the above two macros allows us to generate 64 implementations of the trait with a single macro call:
all_supported_tuples!(impl_type_list);
And just like that we've built a mechanism for forward iteration. You can give it a spin in the playground (gist).
It's already noticable how accessing the n-th element is inconvenient because we need to cast the Tail
type at each level. The casting is neccessary because Tail
could satisfy multiple different traits with identically named associated type.
This pain point can be alleviated, but I'll address it in a later post as operations covered in this post don't require indexing. I just wanted to acknowledge and point it out for now.
Type Operations
In order to map types, we have to figure out how to express functions that operate on type values. Compiled languages have a strict hierarchy of type families which encode when the type/data is available and how much information about it is available at each execution stage:
Name | Description | Type | Size |
---|---|---|---|
Kind | Structure information of a type | Syntax | Compiler internal |
Type | Property information of a value | Inheritance and/or composition | Known |
Trait | Behavior information | Inheritance and/or composition | None |
Interface | Structure information | Inheritance and/or composition | Lower bound constraint |
Partial Type ( dyn ) | Partial information of value properties | Partial (structure and/or interface) | Unknown |
Constant | Immutable value known at compile time | Known | Known |
Variable | Mutable value known at runtime | Known | Known |
Pointer ( *T ) | Pointer to location of (first) constant/variable | Known | Known (if not sequence) |
Address ( *u8 /usize ) | Memory address of (first) constant/variable | Erased | Unknown |
In many cases the compiler handles conversion automatically - allowing a const function call with runtime arguments. So in order for our types to stay in the type space, all conversion must happen inside it. That's because constant functions deal with discrete values and we want to manipulate types.
Rust allows handling of type projection via type aliases much as C++ does. In Rust however, one relies on traits instead of classes for the same functionality:
trait TypeSpaceFunction<T> {
// not yet valid syntax; see issue #29661
type Value = (Transformed, T);
}
One notable difference is that in Rust, one can associate trait implementations with some other data which simplifies how conditionals are handled:
trait SplitFunction {
type Value;
}
impl<T> SplitFunction for MyType<T> {
type Value = T;
}
impl<T> SplitFunction for [T; N] {
type Value = T;
}
In C++ template metaprogramming, ifs or filters have to be used in constant expressions to achieve similar results.
Generic Type Projection
With former utilities in mind we want to construct a way of mapping a type sequence (A, B, C)
into another, more complex, type sequence.
Usually, this can be done with a simple projection and it
Projection operation we're making a producer for consumes the discriminant I
, and output type O
, and produces a type
The reason for using a discriminant is that it allows better code reuse. A minimal working version doesn't neccessarily need it, but the resulting code would need to be specialized for each use case.
Our type function type looks like the following:
that is to say - it produces a type that depends on input types
We want to get something like:
Apply Operator
We start with a simple application trait:
trait Apply {
type Value;
}
and declare the application discriminant ("function name"):
struct ToInputConnector<Model> {
_phantom: std::marker::PhantomData<Model>,
}
The discriminant type (ToInputConnector
) will never be constructed. Using it allows the same Apply
trait to be used for multiple different mappings which makes it more ergonomic to import Apply
operation.
I'm using Phantom
in ToInputConnector
because Rust will require Model
to be used in the signature of Apply
implementation. Generally, any types that stay constant across all mapping invocations have to be placed into the Phantom
here.
Then method application can be specified (think of it as the body of the method):
impl<Model, T> Apply for (ToInputConnector<Model>, T)
where
Model: 'static,
T: 'static,
{
type Value = (&'static str, &'static dyn Handler<Model, T>);
}
You'll notice I'm implementing the method for a tuple of method and arguments (in this case a single one), that's because there's no other way of passing compile time information into Apply
:
- If we tried storing
T
as a generic onApply
, we wouldn't be able to invoke it for different types within the same type list. - Conversly, if we tried storing it on
ToInputConnector
, then our mapping wouldn't be agnostic with respect to method it's applying.
To summarize, variable types are stored in the tuple we're calling Apply
on and non-variable types (such as Model
in the example case) should go into the implementation discriminant struct (ToInputConnector
).
Note that multiple variable arguments will change how you implement TypeMapping
in the following section.
Apply-to-all
Now that application part of mapping is done with, all that's left is coupling together all the operations to apply the mapping for every type in the list and return a new list with mapping results.
Sadly, this is a place multiple implementations are necessary again (for each tuple size). As always, macros come to the rescue:
pub trait TypeMapping<Method> {
type Value;
}
macro_rules! impl_mapping {
() => {
impl<Method> TypeMapping<Method> for ()
{
type Value = ();
}
};
($($A: ident),+) => {
impl<Method, $($A),+> TypeMapping<Method> for ($($A,)+)
where
$((Method, $A): Apply),+
{
type Value = (
$(<(Method, $A) as Apply>::Value,)+
);
}
};
}
all_supported_tuples!(impl_mapping);
TypeMapping
implementations here will change somewhat if multiple variable types or lifetimes are needed.
You can try out a simplified version without the constant M
type (for model) on the playground(gist).
While writing this article, I also noticed that mapping with lifetimes doesn't seem to work, so I removed the system lifetime from ToInputConnector
signature. Compiler seems to fail at normalizing associated types on a trait that is implemented for struct with a lifetime (minimal reproduction; rustc issue). For future reference (once the issue get fixed), lifetimes go on the discriminant struct if they don't change for mapping applications, or alternatively, get paired with types in tuples the TypeMapping
is implemented for if they do change.
Next posts
We'll deal with the consumption of generated type signatures in a later post, as well as going over how we can write a getter for our type lists and bridge indices from runtime to compile time.