Rust assembly generation: Mapping a bool vector to an owned string vector

We have examined the generated assembly code for mapping a Rust vector into a string slice vector. In this article, we will examine the assembly code generated when mapping a Rust vector to an owned string vector.

Example code: Map a vector of bools to a vector of owned strings

/// Convert a vector if type A into a vector of type B. The user must provide a
/// closure that maps type A to type B. This is a generic function and
/// does not generate any assembly code.
pub fn convert<A,B> (v: Vec<A>, f: impl Fn(A) -> B) -> Vec<B> {
    v.into_iter().map(f).collect()
}

/// Convert a vector of bools into a vector of owned string.
/// This function uses the convert generic function to perform the conversion.
/// This is a concrete function and generates assembly code.
pub fn convert_bool_vec_to_owned_string_vec(v: Vec<bool>) -> Vec<String> {
    convert(v, |n| (if n {"true"} else {"false"}).to_owned())
}

Visualizing the input and output vectors

Let's understand the input and output vectors of the convert_bool_vec_to_static_str_vec function. This will aid in understanding the assembly code generated.

The input vector passed to the convert_bool_vec_to_static_str_vec function is a vector of bools. The memory organization of this vector is shown below. As discussed in the vector iteration article, the memory organization of a vector is as follows:

Bool vector

The output vector of the convert_bool_vec_to_owned_string_vec function is a vector of strings. The memory organization of this vector involves two level levels of heap allocation. The first level of heap allocation is for the vector array . The second level of heap allocation is for the string's byte array.

Vec<String>

String

String vector

String vector generation overview

The following figure gives an overview of the generated assembly code for the convert_bool_vec_to_owned_string_vec function. A few key points to note here are:

Overflow and memory allocation failure handling for the output vector

Preparing the output vector with owned strings

Removing the if condition for checking the boolean value in the input vector

The generated code removes the bool if condition from the loop body. This is achieved using the following techniques:

This these changes the compiler eliminates the if condition.

Cleaning up the input vector on exit from the function

The compiler generates a call to __rust_dealloc to free the input vector. Note that the function owns the input vector, so is responsible for freeing the heap allocation for the input vector.

Flow chart describing the generated assembly code

Preparing a string vector

Annotated assembly code for the convert_bool_vec_to_owned_string_vec function

The generated assembly code has been annotated to help understand the mapping from Rust code.

example::convert_bool_vec_to_string_vec:
        push    rbp     ; Save rbp to the stack
        push    r15     ; Save r15 to the stack
        push    r14     ; Save r14 to the stack
        push    r13     ; Save r13 to the stack
        push    r12     ; Save r12 to the stack
        push    rbx     ; Save rbx to the stack
        sub     rsp, 120    ; Reserve space on the stack for local variables
        mov     rbx, rdi    ; Set rbx to the address of the input vector
        mov     rdi, qword ptr [rsi] ; rdi points to the input vector's heap allocated data array
        mov     rcx, qword ptr [rsi + 8]; rcx now contains the input vector's capacity
        mov     r13, qword ptr [rsi + 16]   ; r13 now contains the input vector's length
        lea     r12, [rdi + r13]    ; r12 points to the input vector's data array + length
                                    ; It points after the last element of the input vector
        mov     edx, 24 ; Set edx to 24 (size of each element in the output vector)
        xor     ebp, ebp ; Set ebp to 0
        mov     rax, r13 ; Set rax to the length of the input vector
        mul     rdx ; Set rax to rax * edx (total memory to allocate for the output vector)
        mov     r15, rax ; Set r15 to the total memory to allocate for the output vector
        setno   al ; Set al to 1 if no overflow occurred
        mov     qword ptr [rsp + 56], rdi ; Store the input vector's data array address into the stack
        mov     qword ptr [rsp + 64], rcx ; Store the input vector's capacity into the stack
        mov     qword ptr [rsp], rdi    ; Store the input vector's data array address into the stack
        mov     qword ptr [rsp + 72], rdi ; Store the input vector's data array address into the stack
        mov     qword ptr [rsp + 80], r12 ; Store on stack: Pointer just after the last element of the input vector
        jo      .LBB4_1 ; Jump if overflow has occurred.
        mov     bpl, al ; Copy al to bpl (0 if overflow occurred, 1 otherwise)
        shl     rbp, 3; Multiply by 8 
        test    r15, r15; Check if the size of the output vector is 0
        je      .LBB4_4 ; Jump if the size is 0 and no memory allocation is required
        mov     rdi, r15 ; Set rdi to the size of the output vector
        mov     rsi, rbp ; Set rsi to 8 (byte alignment of the output vector)
        mov     r14, rcx ; Save rcx before the call to __rust_alloc
        call    qword ptr [rip + __rust_alloc@GOTPCREL]
        mov     rcx, r14 ; Restore rcx after the call the __rust_alloc call
        mov     r14, rax ; Copy the address of the output vector's heap allocated data array to r14
        test    r14, r14 ; Check if the output vector's heap allocated data array is null
        je      .LBB4_7 ; Jump if the output vector's heap allocated data array is null
.LBB4_8:
        mov     qword ptr [rbx], r14 ; Store the output vector's heap allocated data array address into the output vector
        mov     qword ptr [rbx + 8], r13 ; Store the output vector's length into the output vector capacity
        lea     rdx, [rbx + 16] ; Get the address of the output vector's length into rdx
        mov     qword ptr [rsp + 8], rbx ; Save the output vector's address into a local variable on the stack
        mov     qword ptr [rbx + 16], 0 ; Set the output vector's length to 0
        mov     rax, qword ptr [rsp]; Get the input vector's data array address from the stack
        mov     qword ptr [rsp + 88], rax ; Save the input vector's data array address into a local variable on the stack
        mov     qword ptr [rsp + 24], rcx ; Save the input vector's capacity into a local variable on the stack
        mov     qword ptr [rsp + 96], rcx ; Save the input vector's capacity into a local variable on the stack      
        mov     qword ptr [rsp + 112], r12 ; Store on stack: Pointer just after the last element of the input vector
        mov     qword ptr [rsp + 16], rdx ; Store a pointer to the output vector's length into a local variable on the stack
        mov     qword ptr [rsp + 40], rdx ; Store a pointer to the output vector's length into a local variable on the stack
        test    r13, r13 ; Check if the input vector's length is 0        
        je      .LBB4_13 ; Jump if the input vector's length is 0
        xor     r15d, r15d ; Set r15d to 0 - This is the index of the current element in the input vector
        
.LBB4_10:
        ; == Start of the loop ==
        mov     rax, qword ptr [rsp] ; Get the input vector's data array address from the stack
        movzx   r12d, byte ptr [rax + r15]; Get the current element from the input vector's data array
        mov     ebp, r12d ; Set ebp to the current element
        and     ebp, 1    ; Set ebp to ebp & 1 (0 if the current element is false, 1 otherwise)
        xor     rbp, 5  ; Set ebp to ebp xor 5 to get the correct length of the string representation of the current element 
                        ; 101 xor 001 = 100 = 4. This is the length of "true"
                        ; 101 xor 000 = 101 = 5. This is the length of "false"
        mov     esi, 1  ; Byte alignment is set to 1
        mov     rdi, rbp ; Length of the string representation defines the length of the string to allocate
        call    qword ptr [rip + __rust_alloc@GOTPCREL] ; Allocate memory for the string representation of the current element
                                ; The __rust_alloc returns the heap address in rax
        test    rax, rax ; Check if the memory allocation has failed (returns null)
        je      .LBB4_11        ; Jump if the memory allocation has failed
        mov     rbx, rax        ; Set rbx to the address of the heap allocated string
        test    r12b, 1         ; Check if the current element is true.
        lea     rsi, [rip + .L__unnamed_1] ; Get the address of the static string "true"
        lea     rax, [rip + .L__unnamed_2] ; Get the address of the static string "false"
        cmove   rsi, rax ; Set rsi to the address of the static string "true" if the current element is true
                          ; Set rsi to the address of the static string "false" if the current element is false
        mov     rdi, rbx  ; Set rdi to the address of the heap allocated string
        mov     rdx, rbp ; Set rdx to the length of the string representation of the current element
        call    qword ptr [rip + memcpy@GOTPCREL] ; Copy the string representation of the current element to the heap allocated string

        ; Copy the String into the output vector's data array. Copies three fields:
        ; 1. The heap allocated string address
        ; 2. The capacity of the string
        ; 3. The length of the string
        mov     qword ptr [r14], rbx; Store the heap allocated string address into the output vector's data array 
        mov     qword ptr [r14 + 8], rbp ; Store the capacity of the string into the output vector's data array
        mov     qword ptr [r14 + 16], rbp ; Store the length of the string into the output vector's data array
        add     r15, 1 ; Increment the index of the current element in the input vector
        add     r14, 24 ; Increment the address of the current element in the output vector's data array
        cmp     r13, r15 ; Check if the index of the current element in the input vector is equal to the length of the input vector
        jne     .LBB4_10 ; Continue to loop if the index of the current element in the input vector is not equal to the length of the input vector
        ; == End of the loop ==

.LBB4_13:
        mov     rax, qword ptr [rsp + 16] ; Get the address of the output vector's length from the stack
        mov     qword ptr [rax], r13 ; Store the output vector's length into the output vector's length field
        mov     rsi, qword ptr [rsp + 24] ; Get the input vector's capacity from the stack
        test    rsi, rsi ; Check if the input vector's capacity is 0
        je      .LBB4_15 ; Jump if the input vector's capacity is 0
        mov     edx, 1  ; Set edx to 1
        mov     rdi, qword ptr [rsp]; Get the input vector's data array address from the stack
        call    qword ptr [rip + __rust_dealloc@GOTPCREL] ; Deallocate the memory allocated for the input vector's data array
.LBB4_15:
        mov     rax, qword ptr [rsp + 8] ; Copy the output vector's address from the stack to rax
        add     rsp, 120 ; Free up the stack allocation for the local variables
        pop     rbx ; Restore rbx
        pop     r12 ; Restore r12
        pop     r13 ; Restore r13
        pop     r14 ; Restore r14
        pop     r15 ; Restore r15
        pop     rbp ; Restore rbp
        ret    ; Return from the function
.LBB4_4:
        ; 0 entries to process
        mov     r14, rbp ; rbp contains 8 at this point
        test    r14, r14 ; Check if rbp is 0
        jne     .LBB4_8  ; Jump if rbp is not 0 (which is the case here)
                         ; This is a jump back to the start and another 0 length check will be performed after this jump

.LBB4_7:
        ; Memory allocation failed for the output vector's data array
        mov     rdi, r15 ; Copy the allocation size request that failed
        mov     rsi, rbp ; Copy the byte alignment for the failed allocation
        call    qword ptr [rip + alloc::alloc::handle_alloc_error@GOTPCREL] ; Handle the memory allocation failure
        jmp     .LBB4_2 ; Jump to the code that will throw an exception

.LBB4_11:
        ; Memory allocation failed for the heap allocated string
        mov     rax, qword ptr [rsp] ; Get the input vector's data array address from the stack
        add     rax, r15 ; Add the index of the current element in the input vector to the input vector's data array address
        add     rax, 1 ; Add 1 to the address to get the capacity of the current element in the input vector
        mov     qword ptr [rsp + 104], rax ; Store the capacity of the current element in the input vector into the stack
        mov     qword ptr [rsp + 32], r14 ; Store the output vector's length into the stack
        mov     qword ptr [rsp + 48], r15 ; Store the index of the current element in the input vector into the stack
        mov     esi, 1 ; Set the byte alignment to 1
        mov     rdi, rbp ; Set rdi to the address of the heap allocated string
        call    qword ptr [rip + alloc::alloc::handle_alloc_error@GOTPCREL] ; Handle the memory allocation failure
        jmp     .LBB4_2 ; Jump to the code that will throw an exception
.LBB4_1:
        ; Memory allocation size computation resulted in an overflow
        call    qword ptr [rip + alloc::raw_vec::capacity_overflow@GOTPCREL] ; Handle the memory allocation overflow
.LBB4_2:
        ; Handle memory allocation failure (the code frees memory allocated to the input and output vectors)
        ud2     ; Invalid opcode is generated to throw an exception
        mov     rbx, rax ; Copy the exception pointer to rbx
        lea     rdi, [rsp + 56] ; Set rdi to the address of the stack frame for the exception

        ; Call the drop_in_place function for the map iterator. It will free the input vector's data array
        call    core::ptr::drop_in_place<core::iter::adapters::map::Map<alloc::vec::into_iter::IntoIter<bool>,example::convert_bool_vec_to_string_vec::{{closure}}>> 
        jmp     .LBB4_19 ; Jump to the code that will throw an exception

        ; The following code does not seem to execute due to the previous jump statement [TBC]
        ; == Unreachable code begin ==
        mov     rbx, rax 
        lea     rdi, [rsp + 32] ; Save address of output vector's length in rdi

        ; Call the drop_in_place function for the map iterator. This function does not free any memory.
        call    core::ptr::drop_in_place<core::iter::adapters::map::map_fold<bool,alloc::string::String,(),example::convert_bool_vec_to_string_vec::{{closure}},core::iter::traits::iterator::Iterator::for_each::call<alloc::string::String,<alloc::vec::Vec<alloc::string::String> as alloc::vec::spec_extend::SpecExtend<alloc::string::String,core::iter::adapters::map::Map<alloc::vec::into_iter::IntoIter<bool>,example::convert_bool_vec_to_string_vec::{{closure}}>>>::spec_extend::{{closure}}>::{{closure}}>::{{closure}}> 
        lea     rdi, [rsp + 88] ; Save the input vector's data array address to rdi
        ; Call the drop_in_place to free the memory allocated to the input vector's data array
        call    core::ptr::drop_in_place<alloc::vec::into_iter::IntoIter<bool>>

        mov     rdi, qword ptr [rsp + 8]; Get the address of the output vector's address. Save it to rdi.
        ; Call the drop_in_place function to drop the output vector's string and the array holding the strings
        call    core::ptr::drop_in_place<alloc::vec::Vec<alloc::string::String>> 
        ; == Unreachable code end ==
.LBB4_19:
        mov     rdi, rbx 
        call    _Unwind_Resume@PLT ; Resume the unwind process
        ud2    ; Invalid opcode is generated to throw an exception
        call    qword ptr [rip + core::panicking::panic_no_unwind@GOTPCREL] ; Call the panic_no_unwind function
        ud2     ; Invalid opcode is generated to throw an exception

.L__unnamed_1:
        .ascii  "true"

.L__unnamed_2:
        .ascii  "false"

DW.ref.rust_eh_personality:
        .quad   rust_eh_personality

View in Compiler Explorer

Utility code for the convert_bool_vec_to_string_vec function

We have looked at the main code. The compiler also generates utility functions for freeing memory in error scenarios.

De-allocate a string vector

This function iterates through the vector and frees the string buffer. Once the iteration is completed, it also de-allocated the vector that was holding the now deleted strings.

; rdi contains the address of the output vector.

core::ptr::drop_in_place<alloc::vec::Vec<alloc::string::String>>:
        push    r15             ; Save r15 to stack
        push    r14             ; Save r14 to stack
        push    r13             ; Save r13 to stack
        push    r12             ; Save r12 to stack
        push    rbx             ; Save rbx to stack
        mov     r14, rdi        ; rdi contains the address of the vector that needs to be destroyed.
        mov     rax, qword ptr [rdi + 16]   ; rax contains the size of the vector.
        test    rax, rax                    ; Check if vector size is 0. 
        je      .LBB3_5                     ; Skip de-allocation if the size is zero.
        mov     r12, qword ptr [r14]        ; r12 contains the pointer to the first element of the vector.
        shl     rax, 3                      ; rax contains the size of the vector array in bytes.
        lea     r15, [rax + 2*rax]          ; r15 contains three times the size of the vector array in bytes.
        xor     ebx, ebx                    ; set ebx to 0. This is used to calculate the byte index of the vector array.
        mov     r13, qword ptr [rip + __rust_dealloc@GOTPCREL] ; r13 contains the pointer to the de-allocation function.
                                            ; The address of the function is saved in a register to avoid fetching it 
                                            ; repeatedly in a loop.
        jmp     .LBB3_2                     ; Jump to the de-allocation loop.
.LBB3_4:
        add     rbx, 24   ; Add 24 to ebx. This is the size of the vector array element.
        cmp     r15, rbx  ; Check if we have reached the end of the vector array.
        je      .LBB3_5   ; Reached the end of the vector array. Break out from the loop.
.LBB3_2:
        mov     rsi, qword ptr [r12 + rbx + 8]  ; Get the length of the string buffer that needs to be de-allocated.
        test    rsi, rsi        ; Check if the length is zero.
        je      .LBB3_4         ; Skip de-allocation if the length is zero.
        mov     rdi, qword ptr [r12 + rbx] ; Get the pointer to the string buffer that needs to be de-allocated.
        mov     edx, 1          ; Set the byte alignment to 1 byte.
        call    r13             ; Call the de-allocation function.
        jmp     .LBB3_4         ; Continue the loop.
.LBB3_5:
        mov     rax, qword ptr [r14 + 8] ; Get the capacity of the vector array.
        test    rax, rax        ; Check if the capacity is zero.
        je      .LBB3_7         ; Skip de-allocation if the capacity is zero.
        mov     ecx, 24         ; Set the size of the individual vector array elements to 24 bytes.
        mul     rcx             ; Multiply the size of the individual vector array elements by the capacity.
        test    rax, rax        ; Check if the result is zero.
        je      .LBB3_7         ; Skip de-allocation if the result is zero.
        mov     rdi, qword ptr [r14]    ; Get the pointer to the vector array.
        mov     edx, 8          ; Set the byte alignment to 8 bytes.
        mov     rsi, rax        ; Set the size of the vector array to rax.
        pop     rbx             ; Restore rbx from stack.
        pop     r12             ; Restore r12 from stack.
        pop     r13             ; Restore r13 from stack.
        pop     r14             ; Restore r14 from stack.
        pop     r15             ; Restore r15 from stack.
        jmp     qword ptr [rip + __rust_dealloc@GOTPCREL] ; Tail call optimized de-allocation function.
.LBB3_7:
        pop     rbx             ; Restore rbx from stack.
        pop     r12             ; Restore r12 from stack.
        pop     r13             ; Restore r13 from stack.
        pop     r14             ; Restore r14 from stack.
        pop     r15             ; Restore r15 from stack.
        ret     ; Return from drop_in_place<alloc::vec::Vec<alloc::string::String>>.

Free the into_iter owned bool vector

Free the heap memory allocated for the into_iter owned bool vector.

core::ptr::drop_in_place<core::iter::adapters::map::Map<alloc::vec::into_iter::IntoIter<bool>,example::convert_bool_vec_to_string_vec::{{closure}}>>:
        mov     rsi, qword ptr [rdi + 8] ; Get the pointer to the input vector.
        test    rsi, rsi       ; Check if the input vector is empty.
        je      .LBB0_1       ; Skip de-allocation if the input vector is empty.
        mov     rdi, qword ptr [rdi] ; Get the pointer to the input vector's data array.
        mov     edx, 1 ; Set the byte alignment to 1 byte.
        jmp     qword ptr [rip + __rust_dealloc@GOTPCREL] ; Tail call optimized de-allocation function.
.LBB0_1:
        ret

drop_in_place for iteration cleanup

This function is called to cleanup iteration related exceptions. This function limits the length of the output to the length of successfully completed iterations.

core::ptr::drop_in_place<core::iter::adapters::map::map_fold<bool,alloc::string::String,(),example::convert_bool_vec_to_string_vec::{{closure}},core::iter::traits::iterator::Iterator::for_each::call<alloc::string::String,<alloc::vec::Vec<alloc::string::String> as alloc::vec::spec_extend::SpecExtend<alloc::string::String,core::iter::adapters::map::Map<alloc::vec::into_iter::IntoIter<bool>,example::convert_bool_vec_to_string_vec::{{closure}}>>>::spec_extend::{{closure}}>::{{closure}}>::{{closure}}>:
        mov     rax, qword ptr [rdi + 8] ; Get a pointer to the length of the output vector. (rsp+40 for the caller)
        mov     rcx, qword ptr [rdi + 16] ; Get the successfully completed iterations. (rsp+48 for the caller)
        mov     qword ptr [rax], rcx ; Set the length of the output vector to the number of successfully completed iterations.
        ret 

drop_in_place for freeing the input vector

Free the heap memory allocated for the input vector.

core::ptr::drop_in_place<alloc::vec::into_iter::IntoIter<bool>>:
        mov     rsi, qword ptr [rdi + 8] ; Get the pointer to the input vector.
        test    rsi, rsi      ; Check if the input vector is empty.
        je      .LBB2_1      ; Skip de-allocation if the input vector is empty.
        mov     rdi, qword ptr [rdi] ; Get the pointer to the input vector's data array.
        mov     edx, 1 ; Set the byte alignment to 1 byte.
        jmp     qword ptr [rip + __rust_dealloc@GOTPCREL] ; Tail call optimized de-allocation function.
.LBB2_1:
        ret