Nested async/await in Rust: Desugaring and assembly

This article continues the previous article on the desugaring and assembly of async/await in Rust. In this article, we will look at the desugaring and assembly code of nested async/await in Rust. We will also examine how infinite loops are handled in async functions.

Rust Under the Hood

A deep dive into Rust internals and generated assembly

Discover the inner workings of Rust. Dive deep into Rust's core mechanics, compiler optimizations, SIMD, and memory management.

The `patrol` function

Our primary focus will be on desugaring the patrol function, which is a simple function that moves a unit between two positions. The patrol function is defined as follows:

async fn patrol(unit: UnitRef, poses: [i32; 2]) {
    loop {
        goto(unit.clone(), poses[0]).await;
        goto(unit.clone(), poses[1]).await;
    }
}

`patrol` and `goto` functions

Here is the complete code for the patrol and goto functions and the Unit struct. We looked at the goto function in the previous async/await article. Here, we will explore the patrol function in detail.

#[derive(Default)]
struct Unit {
    /// The 1-D position of the unit. The unit can only move along this axis.
    pub pos: i32,
}
type UnitRef = Rc<RefCell<Unit>>;

/// A future that will move the unit towards `target_pos` at each step,
/// and complete when the unit has reached that position.
struct UnitGotoFuture {
    unit: UnitRef,
    target_pos: i32,
}
impl Future for UnitGotoFuture {
    type Output = ();
    fn poll(self: Pin<&mut Self>, _cx: &mut Context) -> Poll<Self::Output> {
        let unit_pos = self.unit.borrow().pos;
        if unit_pos == self.target_pos {
            Poll::Ready(())
        } else {
            self.unit.borrow_mut().pos += (self.target_pos - unit_pos).signum();
            Poll::Pending
        }
    }
}

/// Helper async function to write unit behavior nicely
async fn goto(unit: UnitRef, pos: i32) {
    UnitGotoFuture {
        unit,
        target_pos: pos,
    }
    .await;
}

/// Let a unit go back and forth between two positions
async fn patrol(unit: UnitRef, poses: [i32; 2]) {
    loop {
        goto(unit.clone(), poses[0]).await;
        goto(unit.clone(), poses[1]).await;
    }
}

The above Rust code defines a Unit struct containing a single field, pos, which represents the unit's position.

The code also defines a UnitGotoFuture struct, which represents a future that moves a Unit towards a target position at each step and completes when the unit has reached that position. The UnitGotoFuture struct implements the Future trait, and its poll method moves the unit closer to the target_pos position by adjusting the pos field of the Unit instance. The poll method returns Poll::Pending if the unit has not yet reached the target_pos position or Poll::Ready(()) if it has.

The code also defines two async functions, goto and patrol. The goto function takes a UnitRef and a pos parameter and returns a future that awaits a new instance of the UnitGotoFuture struct. The patrol function loops infinitely, moving the unit between the two positions in poses using the goto function.

Desugaring the `patrol` function

Before digging through the generated assembly, let's first understand how the patrol function could be desugared into a state machine. The patrol function is desugared into a state machine that resembles the following code:

// The state machine enum
enum PatrolFuture {
    // Initial state
    Start(UnitRef, [i32; 2]),
    // Waiting for goto to complete
    WaitingToReachPosition0(UnitRef, [i32; 2], impl Future<Output = ()>),
    WaitingToReachPosition1(UnitRef, [i32; 2], impl Future<Output = ()>),
    // Final state (which is unreachable)
    Done,
}

// Implementing Future for PatrolFuture
impl Future for PatrolFuture {
    type Output = ();

    fn poll(mut self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output> {
        loop {
            match *self {
                // In the start state, create a goto future and move to the waiting state
                PatrolFuture::Start(unit, poses) => {
                    let fut = goto(unit.clone(), poses[0]);
                    *self = PatrolFuture::WaitingToReachPosition0(unit, poses, fut);
                }
                // In the waiting state for goto1, poll the goto future and move to the next waiting state if it's ready
                PatrolFuture::WaitingToReachPosition0(unit, poses, ref mut fut) => {
                    match Pin::new(fut).poll(cx) {
                        Poll::Ready(()) => {
                            let fut = goto(unit.clone(), poses[1]);
                            *self = PatrolFuture::WaitingToReachPosition1(unit, poses, fut);
                        }
                        Poll::Pending => return Poll::Pending,
                    }
                }
                // In the waiting state for goto2, poll the goto future and move back to the start state if it's ready
                PatrolFuture::WaitingToReachPosition1(unit, poses, ref mut fut) => {
                    match Pin::new(fut).poll(cx) {
                        Poll::Ready(()) => *self = PatrolFuture::Start(unit.clone(), poses),
                        Poll::Pending => return Poll::Pending,
                    }
                }
                // In the done state (which is unreachable), return ready
                PatrolFuture::Done => return Poll::Ready(()),
            }
        }
    }
}

// The original async function is equivalent to creating a new PatrolFuture instance
fn patrol(unit: UnitRef, pos: [i32; 2]) -> impl Future<Output = ()> {
    PatrolFuture::Start(unit.clone(), pos)
}

The above code shows that the patrol function is desugared into a state machine consisting of three key states: Start, WaitingToReachPosition0, and WaitingToReachPosition1. The Start state is the state machine's initial state, creating a new goto future and moving to the WaitingToReachPosition0 state. The WaitingToReachPosition0 state polls the goto future and moves to the WaitingToReachPosition1 state if the future is ready. The WaitingToReachPosition1 state polls the goto future and moves back to the Start state if the future is ready. The moving back to Start implements the infinite loop behavior of the patrol function.

Rust async under the hood

Closures and state machines

We have looked at the desugared code for the patrol function. The generated assembly of the async code uses closures to implement the state machine. When we looked at the' goto' in the previous async deep dive, we explored the mapping of async functions to closures.

The patrol function is converted into a closure with two inputs: the closure environment and the task context. The closure environment is a struct that holds the state of the state machine. The task context is a struct that holds the state of the task executing the future. The closure returns a Poll value, either Poll::Pending or Poll::Ready. The Poll::Pending value indicates that the future is not ready, and the Poll::Ready value indicates that the future is ready.

The following diagram shows the closure environment for the patrol function. The environment is a struct that contains the local variables of the patrol closure (highlighted in green). The environment also includes the closure environment for the goto closure that will be invoked from the patrol closure. The environment is passed to the closure as a mutable reference.

Async closure environment

The second input to the closure is the task context that is used to schedule waking up the task when the future is ready.

Async context

The closure also keeps track of the state of the state machine. The state is stored in the state field of the closure environment. The state field is updated whenever the state machine moves to a new state. As we saw with the desugared code, the state machine starts in the Start state and moves to the WaitingToReachPosition0 state when the goto future is created. The state machine then moves to the WaitingToReachPosition1 state when the goto future is ready. The state machine then moves back to the Start state when the goto future is ready. The move back to Start implements the infinite loop behavior of the patrol function.

Async closure state machine

Role of the async runtime and infinite async loops

This sequence diagram shows an example of interaction between three participants: an executor, a patrol closure, and a goto closure. The executor is the async runtime that executes the future.

In the first loop iteration, the executor sends a poll message to the patrol closure, which moves to the WaitingToReachPosition0 state. The patrol closure then sends a poll message to the goto closure, which moves the unit to the first position. The goto closure returns a Poll::Pending message to the patrol closure, which returns a Poll::Pending message to the executor. The process is repeated until the unit reaches the first position. At this point, the goto closure returns a Poll::Ready message to the patrol closure, which then sends a poll message to the goto closure to move the unit to the second position. The goto closure returns a Poll::Pending message to the patrol closure, which returns a Poll::Pending message to the executor. The process is repeated until the unit reaches the second position, at this point, the goto closure returns a Poll::Ready message to the patrol closure, which then goes back to the Start state.

In the second loop iteration, the patrol closure sends a poll message to the goto closure to move the unit to the first position, and the sequence repeats. This implements the infinite loop behavior of the patrol function. The point to note here is that the executor does not need to know anything about the infinite loop behavior of the patrol function. The executor sends a poll message to the patrol closure, which handles the infinite loop behavior.

Async closure sequence diagram

Flow chart of the `patrol` function

The following flow chart shows the overall execution flow of the patrol closure. The function uses the closure state to determine the next action when called.

In the 🚀 Start (0) state, the function prepares the goto closure environment and invokes the goto closure. If the goto closure returns Poll::Ready, the function invokes the goto closure again. If both the goto closures return Poll::Ready, the function sets the state to 🚀 Start (0) and jumps to continue the infinite loop. If any of the goto closures returns Poll::Pending, the function returns Poll::Pending after setting the state to 🕓 WaitingToReachPosition0 (3) or 🕞 WaitingToReachPosition1 (4).

The handling in the 🕓 WaitingToReachPosition0 (3) and 🕞 WaitingToReachPosition1 (4) states is a subset of the processing in the 🚀 Start (0).

Async closure flow chart

The following flow chart shows how the goto closure is invoked. The steps are:

The memory needed for the goto closure and the patrol closure environment are allocated.
The goto closure environment is initialized with the unit and the target position.
The goto closure's starting state is set to 🚀 Goto::Start (0).
The goto closure is invoked.

Preparing a goto future

The article Rust async under the hood: goto closure covers the handling of the goto closure.

Deep dive into the generated assembly

We now understand the high-level flow of the async code. Let's examine the generated assembly code.

Input and output registers

This assembly code is generated from a patrol closure that takes two inputs: the closure environment stored in rdi, and the task context stored in rsi. The output is stored in rax and can be Poll::Pending or Poll::Ready.

The code starts by saving the callee-saved registers and then moving the task context and closure environment to r14 and rbx. The state is then loaded from the closure environment, and a jump table is used to determine which block of code to execute based on the state.

State machine code blocks

The first block of code (state 0 - 🚀 Start) is executed before the first future is polled. The second block of code (state 1) throws an exception, and the third block of code (state 2) panics. The fourth block of code (state 3 - 🕓 WaitingToReachPosition0) waits for the first future to resolve, and the fifth block of code (state 4 - 🕞 WaitingToReachPosition1) waits for the second future to resolve.

Async jump table

Switching between blocks is achieved using a jump table indexed by the state of the async functions in the closure environment. The compiler generates the jump table. The patrol function's jump table looks like the following:

.LJTI58_0:
 .long .LBB58_1-.LJTI58_0  ; 🚀 Start (0): Code executed before the first future is polled.
 .long .LBB58_3-.LJTI58_0  ; 🛑 State 1: Throw exception.
 .long .LBB58_2-.LJTI58_0  ; 🛑 State 2: Panic.
 .long .LBB58_6-.LJTI58_0  ; 🕓 WaitingToReachPosition0 (3): Waiting for poses[0]
 .long .LBB58_16-.LJTI58_0 ; 🕞 WaitingToReachPosition1 (4): Waiting for poses[1]

States `🕓WaitingToReachPosition0` and `🕞WaitingToReachPosition1`

When state 3 or 4 is executed, the reference count for the captured unit is incremented. If the reference count is zero, the code jumps to undefined behavior. Otherwise, UnitGotoFuture is prepared, and the address of the goto closure environment is stored in the closure environment. The goto closure is then called, and if the future is not ready, the state is saved, and the function returns. If the future is ready, the goto closure's state is checked, and the reference count for the captured unit is decremented accordingly. If the reference count is zero, the unit is deallocated.

The updated state is stored in the closure environment, and the function returns with the updated state and either Poll::Pending or Poll::Ready in rax.

Annotated patrol closure assembly code

Now we are ready to dig into the generated assembly code. The assembly has been annotated with comments to map it to the Rust code.

; Input:
;   rdi: The closure environment
;   rsi: The task_context
; Output:
;   rax: Poll::Pending or Poll::Ready

playground::patrol::{{closure}}:
 push r15 ; Save the callee-saved registers
 push r14 
 push rbx 
 mov r14, rsi ; Store the task_context in r14. The task_context is passed as the second argument to the closure.
 mov rbx, rdi ; Store the closure environment in rbx. The environment is passed as the first argument to the closure.
 movzx eax, byte ptr [rdi + 32] ; Load the state to determine which block to execute. The state is stored in the 32nd byte of the closure environment.
 lea rcx, [rip + .LJTI58_0] ; Load the address of the jump table rcx. The jump table is a list of offsets from the start of the jump table to each block.
 movsxd rax, dword ptr [rcx + 4*rax] ; Get the jump offset from the entry corresponding to the state. The index in rax is multiplied by 4 because the jump table is an array of 32-bit jump offsets indexed by the state.

 add rax, rcx ; Add the jump offset to the jump table to determine the address of the block to execute.
 jmp rax ; Jump to the correct block. 

.LBB58_1:
; 🚀 Start (0): Code executed before the first future is polled.
; At this point, rbx contains the closure environment, and r14 contains the task_context.
; Block 0: Code executed when the future is first polled.
 mov rcx, qword ptr [rbx] ; Load the captured poses array from the closure environment.
 mov rax, qword ptr [rbx + 24] ; Load the captured Rc<RefCell<Unit>> from the closure environment.
 mov qword ptr [rbx + 8], rax ; Store the captured Rc<RefCell<Unit>> in a local variable saved in the closure environment.
 mov qword ptr [rbx + 16], rcx ; Store the captured poses array in a local variable saved in the closure environment.
 jmp .LBB58_4 ; Jump to the next block.

.LBB58_2:
; 🛑 State 2: Panic.
 lea rdi, [rip + str.1]
 lea rdx, [rip + .L__unnamed_24]
 mov esi, 34
 call qword ptr [rip + core::panicking::panic@GOTPCREL]

.LBB58_3:
; 🛑 State 1: Throw exception.
 ud2

.LBB58_4:

; == 🚀 Start (0): Getting ready to wait for poses[0] ==

 ; At this point, rbx contains the closure environment, and rax points to Rc<RefCell<Unit>>.
 ; Increment the Rc<RefCell<Unit>> reference count
 inc qword ptr [rax] ; Increment the strong reference count for Rc<RefCell<Unit>>. The strong reference count is stored in the first 8 bytes of the Rc<RefCell<Unit>>.
 je .LBB58_28 ; The reference count is 0, jump to undefined behavior. This should never happen.
 mov ecx, dword ptr [rbx + 16] ; Load the poses[0] from the closure environment
 ; Clone the Rc<RefCell<Unit>> to get a new reference to the Unit stored in the closure environment

 ; Prepare UnitGotoFuture. UnitGotoFuture is a future that will wait for the Unit to reach the specified position.
 mov qword ptr [rbx + 56], rax ; Copy the Rc<RefCell<Unit>> to the goto::{{closure}} environment
 mov dword ptr [rbx + 64], ecx ; Save the poses[0] to the goto::{{closure}} environment
 mov byte ptr [rbx + 68], 0; Set the initial state for the goto::{{closure}} to 🚀 Goto::Start (0)

.LBB58_6:

; == 🕓 WaitingToReachPosition0 (3): Waiting for poses[0] ==

 lea r15, [rbx + 40] ; Load the address of the environment for the goto::{{closure}}
 mov rdi, r15 ; Load the address of the environment to the first argument register
 mov rsi, r14 ; Load the task_context to the second argument register
 call playground::goto::{{closure}} ; Call the goto::{{closure}} closure
 test al, al ; Check if the future is ready
 jne .LBB58_25 ; The future is not ready. Save state and return.
 ; The future is ready. Continue executing the patrol::{{closure}} closure
 movzx eax, byte ptr [rbx + 68] ; Load the state of the goto::{{closure}} closure
 test eax, eax ; Check if the goto's state is 0
 je .LBB58_11 ; Goto's state is 0. Proceed to the next goto closure after decrementing the reference count for goto's unit at offset 56 in patrol::{{closure}}'s environment
 cmp eax, 3 ; Check if the goto's state is 3
 jne .LBB58_14 ; The goto's state is not 0 or 3. Proceed to the next goto closure without decrementing the reference count for goto's unit
 ; The goto's state is 3, decrement the reference count using the goto::{{closure}} environment's unit at offset 40 in patrol::{{closure}}'s environment
 mov rdi, qword ptr [r15] ; Load the Rc<RefCell<Unit>> from the goto::{{closure}} environment
 dec qword ptr [rdi] ; Decrement the reference count for unit
 jne .LBB58_14 ; The reference count is not 0. Do not deallocate the unit at offset 40 in patrol::{{closure}}'s environment
 jmp .LBB58_12

.LBB58_11:
 mov rdi, qword ptr [rbx + 56] ; Load Rc<RefCell<Unit>> from the environment
 dec qword ptr [rdi] ; Decrement the reference count for unit
 jne .LBB58_14 ; The reference count is not 0. Do not deallocate the unit

.LBB58_12:
 dec qword ptr [rdi + 8] ; Decrement the reference count for unit
 jne .LBB58_14 ; The reference count is not 0. Jump to the next block
 mov esi, 32 ; Load the size of the allocation
 mov edx, 8 ; Load the alignment of the allocation
 call qword ptr [rip + __rust_dealloc@GOTPCREL] ; Deallocate the allocation

.LBB58_14:
 mov rax, qword ptr [rbx + 8] ; Load the Rc<RefCell<Unit>> from the environment
 inc qword ptr [rax] ; Increment the reference count for unit
 je .LBB58_28 ; The reference count is 0. Jump to undefined behavior. This should never happen.
 ; Prepare to wait for goto::{{closure}} environment
 mov ecx, dword ptr [rbx + 20] ; Load poses[1] from the environment
 mov qword ptr [rbx + 56], rax ; Store the Rc<RefCell<Unit>> in the goto::{{closure}} environment
 mov dword ptr [rbx + 64], ecx ; Store the poses[1] in the goto::{{closure}} environment
 mov byte ptr [rbx + 68], 0 ; Store the goto closure's initial state - 🚀 Goto::Start (0)

.LBB58_16:

; == 🕞 WaitingToReachPosition1 (4): Waiting for poses[1] ==

 lea r15, [rbx + 40] ; Get the address of the goto closure environment
 mov rdi, r15 ; Load the address of the goto closure environment to the first argument register
 mov rsi, r14 ; Load the task_context to the second argument register
 call playground::goto::{{closure}} ; Call the goto closure
 test al, al ; Check if the future is ready
 jne .LBB58_26 ; The future is not ready yet. Save state and return

 movzx eax, byte ptr [rbx + 68] ; Load the goto closure state
 test eax, eax ; Check if the state is 0
 je .LBB58_21 ; The state is 0. Jump to the next block
 cmp eax, 3 ; Check if goto's state is 3
 jne .LBB58_24 ; The state is not 3. Jump to the next block
 mov rdi, qword ptr [r15] ; Load the goto closure environment unit
 dec qword ptr [rdi] ; Decrement the reference count for unit
 jne .LBB58_24 ; The reference count is not 0. Jump to the next block
 jmp .LBB58_22 ; The reference count is 0. Jump to the next block

.LBB58_21:
 mov rdi, qword ptr [rbx + 56] ; Load Rc<RefCell<Unit>> from the environment
 dec qword ptr [rdi] ; Decrement the strong reference count for unit
 jne .LBB58_24 ; The reference count is not 0. Skip deallocating the unit

.LBB58_22:
 ; Strong reference count is 0. Decrement the weak reference count
 dec qword ptr [rdi + 8] ; Decrement the weak reference count for unit
 jne .LBB58_24 ; The weak reference count is not 0. Jump to the next block
 ; The weak reference count is 0. Deallocate the unit
 mov esi, 32 ; Size of unit
 mov edx, 8 ; Alignment of unit
 call qword ptr [rip + __rust_dealloc@GOTPCREL] ; Deallocate the unit

.LBB58_24:
 mov rax, qword ptr [rbx + 8] ; Load the Rc<RefCell<Unit>> from the environment into rax
 jmp .LBB58_4 ; Jump to the beginning of the loop to the code handling 🚀 Start (0)
 ; Note that the compiler does not explicitly update the state variable to 0, 
 ; it just jumps to the beginning of the loop. 

.LBB58_25:
 mov al, 3 ; Set the state to "🕓 WaitingToReachPosition0 (3)" for the next call to the function.
 jmp .LBB58_27 ; Jump to return

.LBB58_26:
 mov al, 4 ; Set the state to "🕞 WaitingToReachPosition1 (4)" for the next call to the function.

.LBB58_27:
    ; The function is about to return, store the updated state 
 ; so that the next time the function is called, it will continue 
 ; from the correct block.
 mov byte ptr [rbx + 32], al ; Store the updated state
 mov al, 1 ; Return Poll::Pending 
 pop rbx
 pop r14
 pop r15
 ret

; The reference count was found to be 0 after an increment, jump to undefined behavior. This should never happen.
.LBB58_28:
 ud2 ; Undefined instruction
 ud2 
 jmp .LBB58_31 ; Drop memory and unwind the stack

.LBB58_31:
 mov r14, rax
 mov rdi, r15
 call core::ptr::drop_in_place<playground::goto::{{closure}}>
 mov rdi, qword ptr [rbx + 8]
 call core::ptr::drop_in_place<playground::UnitGotoFuture>
 mov byte ptr [rbx + 32], 2
 mov rdi, r14
 call _Unwind_Resume@PLT
 ud2

.LJTI58_0:
 .long .LBB58_1-.LJTI58_0  ; 🚀 Start (0): Code executed before the first future is polled.
 .long .LBB58_3-.LJTI58_0  ; 🛑 State 1: Throw exception.
 .long .LBB58_2-.LJTI58_0  ; 🛑 State 2: Panic.
 .long .LBB58_6-.LJTI58_0  ; 🕓 WaitingToReachPosition0 (3): Waiting for poses[0]
 .long .LBB58_16-.LJTI58_0 ; 🕞 WaitingToReachPosition1 (4): Waiting for poses[1]

.LCPI59_0:
 .quad 1
 .quad 1

🗂️ Vtable for the patrol closure.
.L__unnamed_25:
 .quad core::ptr::drop_in_place<playground::patrol::{{closure}}> ; Destructor for the FnOnce trait object
 .asciz "H\000\000\000\000\000\000\000\b\000\000\000\000\000\000" ; Size of object: 72 bytes (H)
                                                                ; Alignment of object: 8 bytes (\b)
 .quad playground::patrol::{{closure}} ; call_once method of the FnOnce trait

Key takeaways

When async functions are nested, the compiler generates a state machine for each function. The state machine is a struct with a state variable and a closure environment. The state variable is used to keep track of the current state of the function. The closure environment stores the arguments and local variables of the function.
The closure environment for the outer async function stores the closure environment for the inner async function. A large number of local variables and arguments can cause the closure environment to grow large.
We see that even an async function with an infinite loop can return Poll::Pending when a future in the loop is pending. When the future becomes ready, the function is called again, and the loop continues when the polled future returns Poll::Ready. The loop continues executing until it reaches a pending future. At this point, the function returns Poll::Pending, and the future is polled again when it becomes ready.

Articles in the async/await series

Desugaring and assembly of async/await in Rust - goto
Nested async/await in Rust: Desugaring and assembly - patrol
Rust async executor - executor