Rust enum-match code generation

Matching an enum and associated fields

Enums in Rust are discriminated unions that can save one of the multiple variants. The enum discriminant identifies the current interpretation of the discriminated union.

The following code shows a simple enum in Rust that represents a generalized Number that can be an Integer, a Float or Complex. Here Number is a container that can store a 64-bit integer (i64), a 64-bit floating-point number (f64), or a complex number (stored in a struct with two f64 fields).

Following the enum declaration, the code declares a function double that takes a Number parameter and returns a Number that doubles the fields of whatever type of Number is found in the enum. The match statement in Rust is used to pattern-match the contents and return the appropriate variant.

pub enum Number {
    Integer(i64),
    Float(f64),
    Complex { real: f64, imaginary: f64 },
}

pub fn double(num: Number) -> Number {
    match num {
        Number::Integer(n) => Number::Integer(n + n),
        Number::Float(n) => Number::Float(n + n),
        Number::Complex { real, imaginary } => Number::Complex {
            real: real + real,
            imaginary: imaginary + imaginary,
        },
    }
}

Memory layout of a Rust enum

Before we proceed any further, let's look at the enum organization in memory. The size of the enum depends upon the largest variant. In this example, a Number::Complex requires two 64-bit floats. The total memory needed for the variant is 16 bytes. The size of the enum is 24 bytes. The extra 8 bytes are used to store a 64-bit discriminant to identify the variant currently saved in the enum.

Byte offsetIntegerFloatComplex
0Discriminant (0)Discriminant (1)Discriminant (2)
8i64f64f64
16f64

Notes:

  • The Rust language does specify the memory layout of enums that are not tagged with #[repr(C)]. The Rust compiler can choose the memory layout it deems most efficient.
  • A 64-bit discriminant might seem wasteful here. Due to padding rules, a smaller discriminant would not have saved any memory. Rust switches to a smaller discriminant when reducing the size, which permits the addition of smaller fields.

Overview of the generated code

Before we dig deep into the assembly, let's get an overview of the generated code via the following flowchart. The code branches based on the enum discriminant and handles the processing of each enum tag separately. The results and the discriminant values are written at the provided return address.

Assembly code overview enum matching

Deep dive into the generated code

Now that we understand the generated code's basic flow let's analyze the assembly code. The following graph shows the generated assembly's branching structure. The top and middle-right boxes check the discriminant and invoke the appropriate variant handling code (the three leaf boxes).

Branching structure of the generated assembly

Let's now look at each line of the generated assembly. We have annotated the assembly code to help us understand it. The generated code looks at the discriminant and then accesses the fields corresponding to selected variants. The code then doubles the individual fields associated with the variant. The function returns the enum with doubled values. The function also copies the discriminant field to the enum being returned.

; The caller passes the following parameters:
;   🔡 rsi: Address of the enum
;   🔡 rdi: Address of the enum to be returned. 

example::double:
        mov     rax, rdi                   ; rax now contains the address to the return value
        mov     rcx, qword ptr [rsi]       ; Extract the union discriminant
        test    rcx, rcx                   ; Check if the discriminant is 0 (Number::Integer)
        je      .LBB0_5                    ; Jump if the discriminant is 0.
        cmp     ecx, 1                     ; Check if the discriminant is 1 (Number::Float).
        jne     .LBB0_3                    ; Jump if the discriminant is 2 (Number::Complex)

        ; ⭐ Number::Float match processing (discriminant is 1)
        movsd   xmm0, qword ptr [rsi + 8]  ; Move the floating point number in xmm0
        addsd   xmm0, xmm0                 ; Double the number
        movsd   qword ptr [rax + 8], xmm0  ; Save in value in the return value
        mov     qword ptr [rax], rcx       ; Copy the discriminant into the return value
        ret
.LBB0_5:
        ; ⭐ Number::Integer match processing (discriminant is 0)
        mov     rdx, qword ptr [rsi + 8]    ; Move the integer
        add     rdx, rdx                    ; Double the number
        mov     qword ptr [rax + 8], rdx    ; Write the number to the return value
        mov     qword ptr [rax], rcx        ; Write the discriminant to the return value
        ret
.LBB0_3:
        ; ⭐ Number::Complex match processing (discriminant is 2)
        ; The following code performs vector operations on the real and imaginary parts.
        ; The vector operations 64-bit real and imaginary parts are processed in parallel
        ; in the xmm0 register.

        movupd  xmm0, xmmword ptr [rsi + 8] ; Read the real and imaginary parts into xmm0
        addpd   xmm0, xmm0                  ; Double the real and imaginary parts 
        movupd  xmmword ptr [rax + 8], xmm0 ; Update the real and imaginary parts
        mov     qword ptr [rax], rcx        ; Set the discriminant in the return value
        ret

Key takeaways

Experiment in the Compiler Explorer

Experiment with the code in this post in the Compiler Explorer. Change the Number enum to 32-bit types, as shown below. The enum discriminant field is 32-bit.

pub enum Number {
    Integer(i32),
    Float(f32),
    Complex { real: f32, imaginary: f32 },
}