Rust assembly generation: Mapping a bool vector to an owned string vector

We have examined the generated assembly code for mapping a Rust vector into a string slice vector. This article will discuss the assembly code generated when mapping a Rust vector to an owned string vector.

Example code: Map a vector of bools to a vector of owned strings

/// Convert a vector of type A into a vector of type B. The user must provide a
/// closure that maps type A to type B. This is a generic function and
/// does not generate any assembly code.
pub fn convert<A,B> (v: Vec<A>, f: impl Fn(A) -> B) -> Vec<B> {
    v.into_iter().map(f).collect()
}

/// Convert a vector of bools into a vector of owned string.
/// This function uses the convert generic function to perform the conversion.
/// This is a concrete function and generates assembly code.
pub fn convert_bool_vec_to_owned_string_vec(v: Vec<bool>) -> Vec<String> {
    convert(v, |n| (if n {"true"} else {"false"}).to_owned())
}

Visualizing the input and output vectors

Let's understand the input and output vectors of the convert_bool_vec_to_dynamic_str_vec function. This will help us understand the assembly code generated.

The input vector passed to the convert_bool_vec_to_dynamic_str_vec function is a vector of bools. The memory organization of this vector is shown below. As discussed in the vector iteration article, the memory organization of a vector is as follows:

Bool vector

The output vector of the convert_bool_vec_to_owned_string_vec function is a vector of strings. The memory organization of this vector involves two level levels of heap allocation. The first level of heap allocation is for the vector array. The second level of heap allocation is for the string's byte array.

Vec<String>

String

String vector

String vector generation overview

The following figure gives an overview of the generated assembly code for the convert_bool_vec_to_owned_string_vec function. A few key points to note here are:

Overflow and memory allocation failure handling for the output vector

Preparing the output vector with owned strings

Removing the if condition for checking the boolean value in the input vector

The generated code removes the bool if condition from the loop body. This is achieved using the following techniques:

With these changes the compiler eliminates the if condition.

Cleaning up the input vector on exit from the function

The compiler generates a call to __rust_dealloc to free the input vector. Note that the function owns the input vector, so it is responsible for freeing the heap allocation for the input vector.

Flow chart describing the generated assembly code

Preparing a string vector

Annotated assembly code for the bool to string conversion

The function takes a vector of bools (address stored in the rdi register) and converts it into a vector of owned strings. It does this by allocating a new vector of owned strings with the same length as the original vector, then iterating over the elements of the original vector and converting each bool into an owned string. The resulting vector of owned strings is then returned.

The assembly code is annotated with comments to explain the assembly code.

example::convert_bool_vec_to_owned_string_vec:
        push    rbp  ; Save rbp to the stack
        push    r15  ; Save r15 to the stack
        push    r14  ; Save r14 to the stack
        push    r13  ; Save r13 to the stack
        push    r12  ; Save r12 to the stack
        push    rbx  ; Save rbx to the stack
        sub     rsp, 104  ; Allocate 104 bytes on the stack

        ; ⭐ Load the input vector's address and length into registers

        mov     rbp, rdi  ; Copy the output vector's address to rbp
        mov     rax, qword ptr [rsi] ; Get the input vector's address from the stack
        mov     rcx, qword ptr [rsi + 8] ; Get the input vector's capacity from the stack
        mov     r15, qword ptr [rsi + 16] ; Get the input vector's length from the stack
        lea     r14, [rax + r15] ; Compute the address past the last element in the input vector
        mov     qword ptr [rsp + 40], rax ; Store the input vector's address in the stack
        mov     qword ptr [rsp + 48], rcx ; Store the input vector's capacity in the stack
        mov     qword ptr [rsp + 8], rax  ; Store the input vector's address in the stack
        mov     qword ptr [rsp + 56], rax ; Store the input vector's address in the stack
        mov     qword ptr [rsp + 64], r14 ; Store the address past the last element in the input vector in the stack
        test    r15, r15 ; Check if the input vector's length is zero
        je      .LBB3_1 ; Jump to the code that will set the output vector's length to zero
        movabs  rax, 384307168202282326 ; Load the value 0x0555_5555_5555_5556 to rax
        xor     r13d, r13d ; Set r13 to zero

        ; ⭐ Compute the size of the output vector's data array
        ; The following code computes the size of the output vector's data array. The size is computed by multiplying 
        ; the string size (24 bytes) with the number of entries. If there is an overflow in the size computation,
        ; the compiler will generate a panic and drop the input vector. This is a very rare situation and most likely due
        ; to a bug in the length.

        cmp     r15, rax ; Compare the input vector's length with 0x0555_5555_5555_5556
        setb    al ; Set the least significant bit of al to 1 if the input vector's length is greater than 0x0555_5555_5555_5556
        jae     .LBB3_3 ; If the input vector's length is greater than 0x0555_5555_5555_5556, jump to the code that will throw an exception for address overflow.

        ; 0️⃣ Check if the output vector's length is zero

        lea     rdx, [8*r15] ; Multiply the input vector's length by 8 and store it in rdx
        mov     r13b, al ; Copy al as it happens to contain 1 (due to the setb instruction).
        shl     r13, 3 ; Multiply r13 by 8 so that it contains 8 for the alignment of the output vector's data array
        lea     r12, [rdx + 2*rdx] ; Multiply rdx by 3 and store it in r12 (this is the size of the output vector's data array)
        test    r12, r12 ; Check if the size of the output vector's data array is zero
        je      .LBB3_6  ; If the size of the output vector's data array is zero, jump to the code that will set the output vector's length to zero

        ; 📦 Allocate memory for the output vector's data array
        ; The __rust_alloc function allocates memory for the output vector's data array.
        ; It takes two arguments: the allocation's size and the allocation's alignment.
        ; The function returns the address of the allocated memory in rax.

        mov     rdi, r12 ; Copy the size of the output vector's data array to rdi
        mov     rsi, r13 ; Copy the alignment of the output vector's data array to rsi
        mov     rbx, rcx ; Preserve rcx in rbx (it will be used later)
        call    qword ptr [rip + __rust_alloc@GOTPCREL] ; Call the __rust_alloc function to allocate memory for the output vector's data array
        ; The __rust_alloc function returns the address of the allocated memory in rax

        mov     rcx, rbx ; Restore the rcx value from rbx
        mov     rbx, rax ; Copy the address of the output vector's data array to rbx
        test    rbx, rbx ; Check if the address allocation has failed (NULL is returned)
        je      .LBB3_9 ; If the address allocation has failed, jump to the code that will throw an exception for allocation failure
.LBB3_10:

        ; ⭐ Copy the input vector's data array to the output vector's data array

        mov     qword ptr [rbp], rbx ; Store the address of the output vector's data array in the output vector
        mov     qword ptr [rbp + 8], r15 ; Store the input vector's length in the output vector's capacity
        lea     rax, [rbp + 16] ; Store the address of the output vector's length in rax
        mov     qword ptr [rsp + 16], rax ; Store the address of the output vector's length in the stack
        mov     qword ptr [rsp + 24], rbp ; Store the output vector's address in the stack
        mov     qword ptr [rbp + 16], 0 ; Set the output vector's length to zero
        mov     rax, qword ptr [rsp + 8] ; Get the input vector's address from the stack
        mov     qword ptr [rsp + 72], rax ; Store the input vector's address in the stack
        mov     qword ptr [rsp + 32], rcx ; Store the input vector's capacity in the stack
        mov     qword ptr [rsp + 80], rcx ; Store the input vector's capacity in the stack
        mov     qword ptr [rsp + 96], r14 ; Store the address past the last element in the input vector in the stack
        xor     ebp, ebp ; Set ebp to zero
.LBB3_11:

        ; ⭐ Copying loop begins here
        ; The following code is the loop that will copy the input vector's elements to the output vector's data array

        mov     rax, qword ptr [rsp + 8] ; Get the input vector's address from the stack
        movzx   r13d, byte ptr [rax + rbp] ; Get the next input vector's element 
        mov     r12d, r13d ; Copy the next input vector's element to r12d
        and     r12d, 1 ; Map the bool value to 0 or 1
        xor     r12, 5 ; Set ebp to ebp xor 5 to get the correct length of the string representation of the current element
                       ; 101 xor 001 = 100 (4). This is the length of "true" 
                       ; 101 xor 000 = 101 (5). This is the length of "false"
        mov     esi, 1 ;Byte alignment of the output vector's data array is set to 1.
        mov     rdi, r12 ; Copy the length of the string representation of the current element to rdi
        call    qword ptr [rip + __rust_alloc@GOTPCREL] ; Call the __rust_alloc function to allocate memory for the string buffer for the current element
        test    rax, rax ; Check if the address allocation has failed (NULL is returned)
        je      .LBB3_12 ; If the address allocation has failed, jump to the code that will throw an exception for allocation failure
        mov     r14, rax ; Copy the address of the string buffer to r14
        test    r13b, 1 ; Check if the current element is true
        lea     rsi, [rip + .L__unnamed_1] ; Get the address of the static string "true"
        lea     rax, [rip + .L__unnamed_2] ; Get the address of the static string "false"
        cmove   rsi, rax ; Set rsi to the address of the static string "true" if the current element is true
                         ; Set rsi to the address of the static string "false" if the current element is false
        mov     rdi, r14 ; Copy the address of the string buffer to rdi
        mov     rdx, r12 ; Copy the length of the string representation of the current element to rdx
        call    qword ptr [rip + memcpy@GOTPCREL] ; Call the memcpy function to copy the string representation of the current element to the string buffer
        mov     qword ptr [rbx], r14 ; Store the address of the string buffer in the output vector's data array
        mov     qword ptr [rbx + 8], r12 ; Store the capacity of the string representation of the current element in the output vector's data array
        mov     qword ptr [rbx + 16], r12 ; Store the length of the string representation of the current element in the output vector's data array
        inc     rbp ; Increment the input vector's index
        add     rbx, 24 ; Increment the output vector's byte index (each entry in the output vector's data array is 24 bytes long)
        cmp     r15, rbp ; Check if the input vector's index is less than the input vector's length
        jne     .LBB3_11 ; If the input vector's index is less than the input vector's length, jump for the next iteration of the loop

        ; ⭐ Copying loop ends here

        mov     rbp, qword ptr [rsp + 24] ; Get the output vector's address from the stack
        mov     rcx, qword ptr [rsp + 32] ; Get the input vector's capacity from the stack
        mov     rax, qword ptr [rsp + 16] ; Get the address of the output vector's length from the stack
        mov     qword ptr [rax], r15 ; Store the input vector's length in the output vector's length
        test    rcx, rcx ; Check if the input vector's capacity is zero
        je      .LBB3_17 ; If the input vector's capacity is zero, jump to the code that will deallocate the input vector's data array
        
.LBB3_16:

        ; ♻️ Deallocate the input vector's data array

        mov     rdx, rcx ; Copy the input vector's capacity to rdx
        not     rdx ; Flip the bits of the input vector's capacity 
                    ; (Note that the most significant bit of the input vector's capacity is set to 0)
                    ; This means that the most significant bit of the input vector's capacity is set to 1
        shr     rdx, 63 ; Shift to bring the most significant bit of the input vector's capacity to the least significant bit (i.e, set it to 1)
                        ; This sets the alignment to 1
        mov     rdi, qword ptr [rsp + 8] ; Get the input vector's address from the stack
        mov     rsi, rcx ; Copy the input vector's capacity to rsi
        call    qword ptr [rip + __rust_dealloc@GOTPCREL] ; Call the __rust_dealloc function to deallocate the input vector's data array
.LBB3_17:

        mov     rax, rbp ; Copy the output vector's address to rax
        add     rsp, 104 ; Deallocate the stack space that was allocated for temporary variables
        pop     rbx ; Restore the value of rbx
        pop     r12 ; Restore the value of r12
        pop     r13 ; Restore the value of r13
        pop     r14 ; Restore the value of r14
        pop     r15 ; Restore the value of r15
        pop     rbp ; Restore the value of rbp
        ret ; Return from the function
.LBB3_1:
        ; The following code will be executed if the input vector's length is zero
        mov     qword ptr [rbp], 8 ; Set the output vector's address to 8
        lea     rax, [rbp + 16] ; Move the address of the output vector's length to rax
        xorps   xmm0, xmm0 ; Set xmm0 to 0
        movups  xmmword ptr [rbp + 8], xmm0 ; Vector xmm0 is used to set the output vector's capacity and length to 0
        mov     qword ptr [rax], r15 ; Store the input vector's length in the output vector's length
        test    rcx, rcx ; Check if the input vector's capacity is zero
        jne     .LBB3_16 ; If the input vector's capacity is not zero, jump to the code that will deallocate the input vector's data array
        jmp     .LBB3_17 ; If the input vector's capacity is zero, jump to exit the function

.LBB3_6:
        mov     rbx, r13 ; Copy 8 to be used as the address of the output vector's data array
        test    rbx, rbx ; Check if the address of the output vector's data array is zero
        jne     .LBB3_10 ; Jump back to exit the function eventually

.LBB3_9:
; Output vector memory allocation has failed
        mov     rdi, r12 ; Copy the output vector's capacity to rdi
        mov     rsi, r13 ; Copy the output vector's address to rsi
        call    qword ptr [rip + alloc::alloc::handle_alloc_error@GOTPCREL] ; Call the handle_alloc_error function to handle the memory allocation failure
        jmp     .LBB3_4 ; Jump back to exit the function eventually

.LBB3_12:
        ; ❌ Memory allocation for the for the heap-allocated string buffer has failed
        mov     rax, qword ptr [rsp + 8] ; Get the input vector's address from the stack
        add     rax, rbp ; Add the input vector's index to the input vector's address
        inc     rax ; Increment the input vector's index
        mov     qword ptr [rsp + 88], rax ; Store the address of the current element in the input vector's data array in the stack
        mov     esi, 1 ; Set the byte alignment to 1
        mov     rdi, r12 ; Copy the output vector's capacity to rdi
        call    qword ptr [rip + alloc::alloc::handle_alloc_error@GOTPCREL] ; Call the handle_alloc_error function to handle the memory allocation failure
        jmp     .LBB3_4

.LBB3_3:
        ; Handle the case where the input vector's length has overflowed.
        call    qword ptr [rip + alloc::raw_vec::capacity_overflow@GOTPCREL] ; Call the capacity_overflow function to handle the overflow
.LBB3_4:
        ; ❌ Handle memory allocation failure (the code frees memory allocated to the input and output vectors)
        ud2   ; Throw an invalid instruction exception
        mov     rbx, rax ;Copy the output vector's address to rbx
        lea     rdi, [rsp + 40] ; Get the address of the input vector's length from the stack
        call    core::ptr::drop_in_place<core::iter::adapters::map::Map<alloc::vec::into_iter::IntoIter<bool>,example::convert_bool_vec_to_owned_string_vec::{{closure}}>> ; Call the drop_in_place function to drop the input vector
        jmp     .LBB3_19 ; Jump to the code that will deallocate the output vector's data array
        mov     rbx, rax ; Copy the output vector's address to rbx
        mov     rdi, qword ptr [rsp + 16] ; Get the address of the output vector's length from the stack
        mov     rsi, rbp ; Copy the output vector's capacity to rsi
        call    core::ptr::drop_in_place<core::iter::adapters::map::map_fold<bool,alloc::string::String,(),example::convert_bool_vec_to_owned_string_vec::{{closure}},core::iter::traits::iterator::Iterator::for_each::call<alloc::string::String,<alloc::vec::Vec<alloc::string::String> as alloc::vec::spec_extend::SpecExtend<alloc::string::String,core::iter::adapters::map::Map<alloc::vec::into_iter::IntoIter<bool>,example::convert_bool_vec_to_owned_string_vec::{{closure}}>>>::spec_extend::{{closure}}>::{{closure}}>::{{closure}}> ; Call the drop_in_place function to cleanup for_each's state
        lea     rdi, [rsp + 72] ; Get the address of the output vector's length from the stack
        call    core::ptr::drop_in_place<core::iter::adapters::map::Map<alloc::vec::into_iter::IntoIter<bool>,example::convert_bool_vec_to_owned_string_vec::{{closure}}>> ; Call the drop_in_place function to drop the into_iter adapter
        mov     rdi, qword ptr [rsp + 24] ; Get the address of the output vector's capacity from the stack
        call    core::ptr::drop_in_place<alloc::vec::Vec<alloc::string::String>> ; Call the drop_in_place function to drop the output vector
.LBB3_19:
        mov     rdi, rbx ; Copy the output vector's address to rdi
        call    _Unwind_Resume@PLT ; Call the _Unwind_Resume function to resume unwinding
        ud2 ; Throw an invalid instruction exception
        call    qword ptr [rip + core::panicking::panic_no_unwind@GOTPCREL] ; Call the panic_no_unwind function to panic
        ud2 ; Throw an invalid instruction exception

.L__unnamed_1:
        .ascii  "true"

.L__unnamed_2:
        .ascii  "false"

DW.ref.rust_eh_personality:
        .quad   rust_eh_personality

Key takeaways

  1. Creating a vector of owned strings from a vector of booleans results in two tiers of memory allocations. The first tier is the allocation of the vector of strings, and the second tier is the allocation of individual strings.
  2. The program will panic if any of these memory allocations fail.
  3. The program will panic if the input vector's length has overflowed.
  4. When the program panics, the memory allocated to the input and output vectors will be freed.
  5. Considerable code is generated for error handling when allocations are involved. This has performance and memory implications. The additional checks may impact performance in the inner loop of a hot function. The additional code for cleaning up memory will add to code bloat.

Experiment with the Compiler Explorer

The code used in this article is shown in the Compiler Explorer window below. You can experiment with the Rust code in the left pane.

Add the following code tour convert_bool_vec_to_emoji_vec function.

pub fn convert_bool_vec_to_emoji_vec(v: Vec<bool>) -> Vec<char> {
    convert(v, |n| if n {'😀'} else {'😞'})
}

Memory cleanup code

We have looked at the main code. The compiler also generates utility functions for freeing memory in error scenarios.

Deallocate a string vector

This function iterates through the vector and frees the string buffer. Once the iteration is completed, it also deallocated the vector that was holding the now deleted strings.

The drop_in_place function is used to deallocate the memory for a Vec when it is no longer needed. It does this by deallocating the memory for each element of the Vec and then deallocating the memory for the Vec itself.

An interesting thing to note here is that the compiler uses tail call optimization to avoid the overhead of the last __rust_dealloc call. The code jumps to the last __rust_dealloc instead of making a call. The return from the __rust_dealloc function returns to the caller of the drop_in_place function.

; rdi contains the address of the output vector.

core::ptr::drop_in_place<alloc::vec::Vec<alloc::string::String>>:
        push    r15             ; Save r15 to stack
        push    r14             ; Save r14 to stack
        push    r13             ; Save r13 to stack
        push    r12             ; Save r12 to stack
        push    rbx             ; Save rbx to stack
        mov     r14, rdi        ; rdi contains the address of the vector that needs to be destroyed.
        mov     rax, qword ptr [rdi + 16]   ; rax contains the size of the vector.
        test    rax, rax                    ; Check if vector size is 0. 
        je      .LBB2_5                     ; Skip de-allocation if the size is zero.
        mov     r12, qword ptr [r14]        ; r12 contains the pointer to the first element of the vector.
        shl     rax, 3                      ; rax contains the size of the vector array in bytes.
        lea     r15, [rax + 2*rax]          ; r15 contains three times the size of the vector array in bytes.
        xor     ebx, ebx                    ; set ebx to 0. This is used to calculate the byte index of the vector array.
        mov     r13, qword ptr [rip + __rust_dealloc@GOTPCREL] ; r13 contains the pointer to the de-allocation function.
                                            ; The address of the function is saved in a register to avoid fetching it 
                                            ; repeatedly in a loop.
        jmp     .LBB2_2                     ; Jump to the de-allocation loop.
.LBB2_4:
        add     rbx, 24   ; Add 24 to ebx. This is the size of the vector array element.
        cmp     r15, rbx  ; Check if we have reached the end of the vector array.
        je      .LBB2_5   ; Reached the end of the vector array. Break out from the loop.
.LBB2_2:
        mov     rsi, qword ptr [r12 + rbx + 8]  ; Get the length of the string buffer that needs to be deallocated.
        test    rsi, rsi        ; Check if the length is zero.
        je      .LBB2_4         ; Skip de-allocation if the length is zero.
        mov     rdi, qword ptr [r12 + rbx] ; Get the pointer to the string buffer that needs to be deallocated.
        mov     edx, 1          ; Set the byte alignment to 1 byte.
        call    r13             ; Call the de-allocation function.
        jmp     .LBB2_4         ; Continue the loop.
.LBB2_5:
        mov     rax, qword ptr [r14 + 8] ; Get the capacity of the vector array.
        test    rax, rax        ; Check if the capacity is zero.
        je      .LBB2_7         ; Skip de-allocation if the capacity is zero.
        mov     ecx, 24         ; Set the size of the individual vector array elements to 24 bytes.
        mul     rcx             ; Multiply the size of the individual vector array elements by the capacity.
        test    rax, rax        ; Check if the result is zero.
        je      .LBB2_7         ; Skip de-allocation if the result is zero.
        mov     rdi, qword ptr [r14]    ; Get the vector array pointer.
        mov     edx, 8          ; Set the byte alignment to 8 bytes.
        mov     rsi, rax        ; Set the size of the vector array to rax.
        pop     rbx             ; Restore rbx from stack.
        pop     r12             ; Restore r12 from stack.
        pop     r13             ; Restore r13 from stack.
        pop     r14             ; Restore r14 from stack.
        pop     r15             ; Restore r15 from stack.
        jmp     qword ptr [rip + __rust_dealloc@GOTPCREL] ; Tail call optimized de-allocation function.
.LBB2_7:
        pop     rbx             ; Restore rbx from stack.
        pop     r12             ; Restore r12 from stack.
        pop     r13             ; Restore r13 from stack.
        pop     r14             ; Restore r14 from stack.
        pop     r15             ; Restore r15 from stack.
        ret     ; Return from drop_in_place<alloc::vec::Vec<alloc::string::String>>.

Free the into_iter owned bool vector

The Map iterator is created from a vector of bools and a closure that maps bools to owned strings. The drop_in_place function is used to deallocate the memory for the Map iterator when it is no longer needed.

core::ptr::drop_in_place<core::iter::adapters::map::Map<alloc::vec::into_iter::IntoIter<bool>,example::convert_bool_vec_to_owned_string_vec::{{closure}}>>:
        mov     rsi, qword ptr [rdi + 8] ; Get the capacity of the vector.
        test    rsi, rsi       ; Check if the input vector is empty.
        je      .LBB0_1       ; Skip de-allocation if the input vector is empty.
        mov     rdi, qword ptr [rdi] ; Get the pointer to the input vector's data array.
        ; The following three lines are used to set the alignment to 1 byte before the call to the de-allocation function.
        mov     rdx, rsi ; Copy the input vector length to rdx.
        not     rdx; Invert the bits of rdx. The objective is to set the MSB to 1.
        shr     rdx, 63; Shift the bits of rdx to the right by 63 bits. This will set register to 1.
        ; The following parameters are passed to the de-allocation function:
        ; rdi = pointer to the input vector's data array. rsi = length of the input vector. rdx = alignment.
        jmp     qword ptr [rip + __rust_dealloc@GOTPCREL] ; Tail call optimized de-allocation function.
.LBB0_1:
        ret

This function is called to clean up iteration-related exceptions. This function limits the output length to the length of completed iterations.

core::ptr::drop_in_place<core::iter::adapters::map::map_fold<bool,alloc::string::String,(),example::convert_bool_vec_to_owned_string_vec::{{closure}},core::iter::traits::iterator::Iterator::for_each::call<alloc::string::String,<alloc::vec::Vec<alloc::string::String> as alloc::vec::spec_extend::SpecExtend<alloc::string::String,core::iter::adapters::map::Map<alloc::vec::into_iter::IntoIter<bool>,example::convert_bool_vec_to_owned_string_vec::{{closure}}>>>::spec_extend::{{closure}}>::{{closure}}>::{{closure}}>:
        mov     qword ptr [rdi], rsi ; Set the length of the output vector to the number of completed 
        ret