Rust Type Metaprogramming - Part 1

Updated: 2024-12-05

Published: 2024-01-04


Terminology
Proof carrying code
code that utilizes mechanisms provided by a compiler to demonstrate and verify its own correctness through type restrictions or requirements the compiler can understand.
Metaprogramming
a term used to describe development that involves writing proof carrying code executed at compile-time by the compiler in order to provide less verbose type interfaces.
Associated type projection
resolution of a type from relative type path.
Type path

a sequence of delimited type identifiers (with :: in Rust) whose resolution yields a type which is a child element of the initial type structure (e.g. FromStr::Err). Not to be confused with paths in topology.

Context

This article intoduces type metaprogramming with simple type list construction and mapping. Later articles will delve into other operations that will be necessary do consume produced type lists.

It will go over type transformations that are necessary in order to provide end-users with a type safe interface while reducing the area of library internals they're exposed to.

Rust metaprogramming is generally done with macros, and when people in Rust community say metaprogramming they usually refer to writing procedural macros.

  • Macros do carry proof because code generated by macros will produce compiler errors when type requirements aren't met or the produced syntax is invalid.
  • Macros however aren't ideal in some scenarios and allow end-users to violate semantic API requirements which can in turn cause undefined behavior or error messages which can't be easily controlled by API author.

Prior Art

After writing the inital version of this post I stumbled upon a conference talk "Type Theory for the Working Rustacean" by Dan Pittman, which associates the ideas behind this post with formal foundations in type theory.

Other than that, I've only managed to find posts that describe metaprogramming via macros in context of Rust:

Another approach I came across, in article "Type-directed metaprogramming in Rust" by Will Crichton, is using rustc API (rust compiler) directly in order to achieve similar goals as this post, but internal rustc API is unstable which makes this approach both harder to implement and more time consuming to maintain across versions.

  • I find it to be a great introduction to rust internals, HIR and AST representations.
  • It makes sense to go that route it if one is developing something akin to c2rust.

In general, type metaprogramming and macros/preprocessors strive to achieve similar goals but through different means. Type metaprogramming usually achieves greater level of integration with the language type system, at the cost of being somewhat harder to write and debug. Conversely, macros and preprocessors can (sometimes) provide much nicer error messages, are easier to reason about, but are much more verbose to write and don't integrate into the build process quite as nicely.

Tuples As Type Lists

Lists are used to store varying number of runtime values of the same compile-time type. Tuples are used to group differently typed values with a tradeoff of being fixed in size. Tuple size constraint as well as fixed typing can't be circumvented, as that would void soundness of the type system. That is, if one can achieve it, they've found a compiler or specification/implementation bug.

TypeType ConstraintSize Constraint
ArrayUniformCompile-time
ListUniformRuntime
TupleVaryingCompile-time

The fixed size is enforced directly by local type signatures, but it can also be inferred in languages that support one of specialization, templatization or generic types. In other words, we can use a variety of methods to evaluate final type elements - so long as we keep in mind that they will have to be inferred by the compiler at compile-time.

Rust doesn't have variable argument generics (yet) which is a well-trodden path of implementing this sort of functionality, but it's got one of the best macro systems which can be used instead.

In order to manipulate a type list or (more correctly) sequence, destructuring into head and tail elements is a necessity.

Using constructive type theory (or Martin-Löf type theory) a tuple, and thereby a TypeList is defined as:

head element is always the first element of the tuple:

Exceptionally, if a tuple is empty, then a head function doesn't exist. That is, it's not defined for empty tuples. In Rust this can be expressed by projecting into a never type ! - when it gets added; or by using some sentinel zero-variant enum with the same semantics:

rust
enum Never {}
1

tail element is always the tuple without it's first (head) element:

Unlike head, tail can be defined for empty tuples to return another empty tuple (similarly to how containers remain empty even if pop/remove returns None).

Type Sequence Trait

It's a good idea to start by defining a trait that can be used to specify some arbitrary type sequence. This will serve as an important stepping stone and make out later implementation simpler. It's equivalent to above type theory definition.

Due to lack of variable argument (vararg) generics, there must exist an upper bound to supported sequence length, but thanks to macros we can make it arbitrarily large. This is primarily a development inconvenience as vararg generics end up producing the same executable code.

We declare our type sequence trait with previously mentioned fundamental properties attached as we have limited use for an empty (marker) trait:

type_list.rs
pub trait TypeList {
    type Head;
    type Tail: TypeList;
}

impl TypeList for () {
    type Head = Never; // Should be ! type; once it's supported
    type Tail = ();
}
impl<A> TypeList for (A,) {
    type Head = A;
    type Tail = ();
}
impl<A, B> TypeList for (A, B) {
    type Head = A;
    type Tail = (B,);
}

// and so on...
12345678910111213141516171819

Now after seeing the first few cases, it's clearer that TypeList implementations can be generalized and then generated using macros:

type_list.rs
pub trait TypeList {
    type Head;
    type Tail: TypeList;
}

macro_rules! impl_type_list {
    () => {
        impl TypeList for () {
            type Head = ();
            type Tail = ();
        }
    };
    ($A:ident) => {
        impl<$A> TypeList for ($A,) {
            type Head = $A;
            type Tail = ();
        }
    };
    ($A:ident, $($B:ident),*) => {
        impl<$A, $($B),*> TypeList for ($A, $($B),*) {
            type Head = $A;
            type Tail = ($($B,)*);
        }
    };
}

macro_rules! smaller_tuples_too {
    ($m: ident, $ty: ident) => {
        $m!{}
        $m!{$ty}
    };
    ($m: ident, $ty: ident, $($tt: ident),*) => {
        smaller_tuples_too!{$m, $($tt),*}
        $m!{$ty, $($tt),*}
    };
}

// short for calling smaller_tuples_too with 64 arguments
macro_rules! all_supported_tuples {
    ($m: ident) => {
        #[rustfmt::skip]
        smaller_tuples_too!($m,
            A, B, C, D, E, F, G, H, I, J,
            K, L, M, N, O, P, Q, R, S, T,
            U, V, W, X, Y, Z, AA, AB, AC, AD,
            AE, AF, AG, AH, AI, AJ, AK, AL, AM, AN,
            AO, AP, AQ, AR, AS, AT, AU, AV, AW, AX,
            AY, AZ, BA, BB, BC, BD, BE, BF, BG, BH,
            BI, BJ, BK, BL
        );
    };
}
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152

Combining the above two macros allows us to generate 64 implementations of the trait with a single macro call:

type_list.rs
all_supported_tuples!(impl_type_list);
1

And just like that we've built a mechanism for forward iteration. You can give it a spin in the playground (gist).

It's already noticable how accessing the n-th element is inconvenient because we need to cast the Tail type at each level. The casting is neccessary because Tail could satisfy multiple different traits with identically named associated type.
This pain point can be alleviated, but I'll address it in a later post as operations covered in this post don't require indexing. I just wanted to acknowledge and point it out for now.

Type Operations

In order to map types, we have to figure out how to express functions that operate on type values. Compiled languages have a strict hierarchy of type families which encode when the type/data is available and how much information about it is available at each execution stage:

NameDescriptionTypeSize
KindStructure information of a typeSyntaxCompiler internal
TypeProperty information of a valueInheritance and/or compositionKnown
TraitBehavior informationInheritance and/or compositionNone
InterfaceStructure informationInheritance and/or compositionLower bound constraint
Partial Type
(dyn)
Partial information of value propertiesPartial (structure and/or interface)Unknown
ConstantImmutable value known at compile timeKnownKnown
VariableMutable value known at runtimeKnownKnown
Pointer
(*T)
Pointer to location of (first) constant/variableKnownKnown (if not sequence)
Address
(*u8/usize)
Memory address of (first) constant/variableErasedUnknown

In many cases the compiler handles conversion automatically - allowing a const function call with runtime arguments. So in order for our types to stay in the type space, all conversion must happen inside it. That's because constant functions deal with discrete values and we want to manipulate types.

Rust allows handling of type projection via type aliases much as C++ does. In Rust however, one relies on traits instead of classes for the same functionality:

examplerust
trait TypeSpaceFunction<T> {
    // not yet valid syntax; see issue #29661
    type Value = (Transformed, T);
}
1234

One notable difference is that in Rust, one can associate trait implementations with some other data which simplifies how conditionals are handled:

examplerust
trait SplitFunction {
    type Value;
}
impl<T> SplitFunction for MyType<T> {
    type Value = T;
}
impl<T> SplitFunction for [T; N] {
    type Value = T;
}
123456789

In C++ template metaprogramming, ifs or filters have to be used in constant expressions to achieve similar results.

Generic Type Projection

With former utilities in mind we want to construct a way of mapping a type sequence (A, B, C) into another, more complex, type sequence.

Usually, this can be done with a simple projection and it

Projection operation we're making a producer for consumes the discriminant , input reqirements . It is applied to a tuple of discriminant , input type I, and output type O, and produces a type

The reason for using a discriminant is that it allows better code reuse. A minimal working version doesn't neccessarily need it, but the resulting code would need to be specialized for each use case.

Our type function type looks like the following:

that is to say - it produces a type that depends on input types and .

We want to get something like:

Apply Operator

We start with a simple application trait:

mapping.rs
trait Apply {
    type Value;
}
123

and declare the application discriminant ("function name"):

model.rs
struct ToInputConnector<Model> {
    _phantom: std::marker::PhantomData<Model>,
}
123

The discriminant type (ToInputConnector) will never be constructed. Using it allows the same Apply trait to be used for multiple different mappings which makes it more ergonomic to import Apply operation.

I'm using Phantom in ToInputConnector because Rust will require Model to be used in the signature of Apply implementation. Generally, any types that stay constant across all mapping invocations have to be placed into the Phantom here.

Then method application can be specified (think of it as the body of the method):

model.rs
impl<Model, T> Apply for (ToInputConnector<Model>, T)
where
    Model: 'static,
    T: 'static,
{
    type Value = (&'static str, &'static dyn Handler<Model, T>);
}
1234567

You'll notice I'm implementing the method for a tuple of method and arguments (in this case a single one), that's because there's no other way of passing compile time information into Apply:

  • If we tried storing T as a generic on Apply, we wouldn't be able to invoke it for different types within the same type list.
  • Conversly, if we tried storing it on ToInputConnector, then our mapping wouldn't be agnostic with respect to method it's applying.

To summarize, variable types are stored in the tuple we're calling Apply on and non-variable types (such as Model in the example case) should go into the implementation discriminant struct (ToInputConnector).

Note that multiple variable arguments will change how you implement TypeMapping in the following section.

Apply-to-all

Now that application part of mapping is done with, all that's left is coupling together all the operations to apply the mapping for every type in the list and return a new list with mapping results.

Sadly, this is a place multiple implementations are necessary again (for each tuple size). As always, macros come to the rescue:

mapping.rs
pub trait TypeMapping<Method> {
    type Value;
}

macro_rules! impl_mapping {
    () => {
        impl<Method> TypeMapping<Method> for ()
        {
            type Value = ();
        }
    };
    ($($A: ident),+) => {
        impl<Method, $($A),+> TypeMapping<Method> for ($($A,)+)
        where
            $((Method, $A): Apply),+
        {
            type Value = (
                $(<(Method, $A) as Apply>::Value,)+
            );
        }
    };
}

all_supported_tuples!(impl_mapping);
123456789101112131415161718192021222324

TypeMapping implementations here will change somewhat if multiple variable types or lifetimes are needed.

You can try out a simplified version without the constant M type (for model) on the playground(gist).

While writing this article, I also noticed that mapping with lifetimes doesn't seem to work, so I removed the system lifetime from ToInputConnector signature. Compiler seems to fail at normalizing associated types on a trait that is implemented for struct with a lifetime (minimal reproduction; rustc issue). For future reference (once the issue get fixed), lifetimes go on the discriminant struct if they don't change for mapping applications, or alternatively, get paired with types in tuples the TypeMapping is implemented for if they do change.

Next posts

We'll deal with the consumption of generated type signatures in a later post, as well as going over how we can write a getter for our type lists and bridge indices from runtime to compile time.

Comments