Understanding Async Await in Rust: From State Machines to Assembly Code
Introduction
This article will explore the inner workings of async await in Rust. We will examine how async functions are implemented as state machines and how they are converted to assembly code. Rust's async functions provide a mechanism for writing asynchronous code using a synchronous style. These functions are implemented as state machines, which are enums implementing the Future trait. The Future
trait is a trait with a poll
method that is repeatedly called until it returns Poll::Ready
. The poll
method is a state machine that moves between states until it reaches the final state, which returns Poll::Ready
. We will use an example async function to help illustrate these concepts.
We recommend reading the "Rust Closures Under the Hood: Comparing impl Fn
and Box<dyn Fn>
" to understand the inner workings of closures in Rust. Closures are used in async functions to capture variables from the enclosing scope.
Async example
We will use the following async function as an example. The code is taken from our fork of the simple async local executor, a single-threaded polling executor. We will be working with the game-units.rs
example.
Example of an async function
We start with the goto
function, which moves a unit towards a target position. The function takes a unit reference and a target position, returns a future that will move the unit towards the target position at each step, and completes when the unit has reached that position.
// Await this function until the unit has reached the target position.
async fn goto(unit: UnitRef, pos: i32) {
UnitGotoFuture {
unit,
target_pos: pos,
}
.await;
}
With this function, an async caller of the goto
function can write code like this:
goto(unit.clone(), 10).await;
// The code here will execute after the unit has reached position 10
The code above will move the unit towards position 10 and wait until the unit has reached position 10 before continuing execution. We will see that this is achieved without blocking the thread.
Example of a future that implements a poll
The poll
function is a method defined on the Future
trait, implemented for the UnitGotoFuture
struct. An async executor calls the poll
function to determine whether the Future
is Ready
or Pending
.
The poll
function takes two arguments: self
, a mutable reference to the future being polled, and _cx
, a reference to a Context
object. The Context
object is used to wake up the future when it becomes ready to continue execution.
In this specific implementation of poll
, the function first retrieves the current position of the Unit
that the future is associated with, by borrowing it immutably with self.unit.borrow().pos
. Then, it checks if the current position of the Unit
is equal to the target position that the future is supposed to move towards. If so, the future is considered, ready and the Poll::Ready(())
value is returned.
If the current position of the Unit
is not equal to the target position, the future updates the position of the Unit
by borrowing it mutably with self.unit.borrow_mut().pos
and adding or subtracting 1 based on the sign of the difference between the current and target positions. Finally, the future returns Poll::Pending
indicate that it is not yet ready to be completed.
Overall, the poll
function is used to check the current state of a future and either return a value indicating that the future has been completed or indicate that it needs to continue executing and can be polled again later.
#[derive(Default)]
struct Unit {
/// The 1-D position of the unit. In a real game, it would be 2D or 3D.
pub pos: i32,
}
type UnitRef = Rc<RefCell<Unit>>;
/// A future that will move the unit towards `target_pos` at each step,
/// and complete when the unit has reached that position.
struct UnitGotoFuture {
unit: UnitRef,
target_pos: i32,
}
impl Future for UnitGotoFuture {
type Output = ();
fn poll(self: Pin<&mut Self>, _cx: &mut Context) -> Poll<Self::Output> {
let unit_pos = self.unit.borrow().pos;
if unit_pos == self.target_pos {
Poll::Ready(())
} else {
self.unit.borrow_mut().pos += (self.target_pos - unit_pos).signum();
Poll::Pending
}
}
}
/// Helper async function to write unit behavior nicely
async fn goto(unit: UnitRef, pos: i32) {
UnitGotoFuture {
unit,
target_pos: pos,
}
.await;
}
Desugaring the async example
Before we delve into how the async function goto
function is implemented in assembly, let's look into equivalent non-async Rust code that could implement the same functionality. This will ease our understanding of the assembly code.
The await on UnitGotoFuture
splits the goto
function into states that may be modeled using an enum that saves the execution point and resumes from the saved point when the executor calls the future's poll function.
// The state machine enum defines three states for the goto function:
// 1. Start: The initial state, where the function is called with the unit and target position.
// 2. Waiting: The state where the function is waiting for the UnitGotoFuture to complete.
// 3. Done: The final state, where the function has been completed.
#[repr(u8)]
enum GotoFuture {
// π Initial state
Start(UnitRef, i32) = 0,
// π Waiting for UnitGotoFuture to complete
Waiting(UnitGotoFuture) = 3,
// β
Final state
Done = 1,
}
// Implementing Future for GotoFuture
impl Future for GotoFuture {
type Output = ();
// The Future's poll method will be called by the async executor to check if the future is ready and if the execution can continue.
fn poll(mut self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output> {
// The loop is used to transition between states
loop {
match &mut *self {
// π Start (0): In the start state, create a UnitGotoFuture and move to the waiting state
GotoFuture::Start(unit, pos) => {
let fut = UnitGotoFuture {unit: unit.clone(), target_pos: *pos };
*self = GotoFuture::Waiting(fut);
}
// π Waiting (3): In the waiting state, poll the UnitGotoFuture and move to the done state if it's ready
GotoFuture::Waiting(ref mut fut) => {
match Pin::new(fut).poll(cx) {
Poll::Ready(()) => *self = GotoFuture::Done,
Poll::Pending => return Poll::Pending,
}
}
// β
Done (1): In the done state, return ready
GotoFuture::Done => return Poll::Ready(()),
}
}
}
}
// The original async function is equivalent to creating a new GotoFuture instance in the start state
fn goto(unit: UnitRef, pos: i32) -> impl Future<Output = ()> {
GotoFuture::Start(unit, pos)
}
The GotoFuture
enum defines three states that correspond to the three stages of the async function's execution:
Start
: The initial state, where the function is called with the unit and target position. It holds the unit reference and target position and is waiting to transition to the next state.Waiting
: The state where the function is waiting for theUnitGotoFuture
to complete. It holds aUnitGotoFuture
instance and polls it repeatedly until it returnsPoll::Ready(())
, indicating that it has completed its work.Done
: The final state where the function has been completed. It does not hold any additional information and immediately returnsPoll::Ready(())
when polled.
We will see shortly that the compiler-generated code for the goto
function is similar to the state machine we just described. It implements the state machine using a closure and tracks its current state using a state variable.
Understanding the generated assembly
Now that we better understand the async function's state machine let's look at the assembly code generated by the compiler for the goto
async function.
Closure state machine
The compiler generates a closure to implement the state machine. The closure is then wrapped in a struct that implements the Future
trait. The Future
trait's poll
method is implemented by calling the closure. The Future
struct is then returned by the goto
function.
The closure also contains a state variable that tracks the state of the state machine. The state variable is initialized to 0
in the Start
state. When the poll
method is called, the closure checks the state variable and takes appropriate actions based on the current state. The full-state machine is shown in the following state diagram.
The following assembly shows how the closure checks the state variable and uses a jump table to jump to the appropriate block of code based on the current state. The jump table switch happens at the start of the closure.
movzx eax, byte ptr [rdi + 28] ; Load the state to determine which block to execute.
; The state is stored in the 28 offset in the closure environment.
lea rcx, [rip + .LJTI57_0] ; Load the address of the jump table rcx.
; The jump table is a list of offsets from the
; start of the jump table to each block.
movsxd rax, dword ptr [rcx + 4*rax] ; Get the jump offset from the entry
; corresponding to the state. The index in rax is
; multiplied by 4 because the jump table is an
; an array of 32-bit jump offsets indexed by the state.
add rax, rcx ; Add the jump offset to the start
; of the jump table to get the address of the block to execute.
jmp rax ; Jump to the block to execute.
Here is the compiler-generated jump table. It contains the offsets from the start of the jump table to each block.
.LJTI57_0:
.long .LBB57_4-.LJTI57_0 ; π Start (0): Entry point to the goto closure.
.long .LBB57_3-.LJTI57_0 ; β
Done (1): Throw a panic if polled after completion of the future.
.long .LBB57_2-.LJTI57_0 ; UnitState2:
.long .LBB57_1-.LJTI57_0 ; π Waiting (3): Future is pending
Wrapping a closure in a future
This section will examine the generated assembly for the goto
function. The goto
function just returns the Future
object. Calling the goto
function does not execute the async function. The async function is executed when the Future
object is await
ed.
The following code shows the Rust equivalent of the generated assembly for the goto
async function. The poll_fn
function in std::future
that creates a closure that implements the Future
trait's poll
method.
fn goto(unit: Unit, target_pos: i32) -> impl Future<Output = ()> {
poll_fn(goto_closure)
}
The goto
async function just returns the Future
struct that wraps the goto::{{closure}}
. The contents of the returned Future
are shown below. They are essentially the closure environment of goto::{{closure}}
. The returned closure environment contains the captured parameters at offset 16
and 24
. The state variable is stored at offset 28
. The closure also contains local variables that are used to store the intermediate results.
The following function shows the assembly code of the goto
async function. We see that the unit
and target_pos
parameters are being stored at offsets 16
and 24
, respectively. Β The state variable is to be initialized to 0
(Start) at offset 28
in the Future
. From the assembly, we see that no code has been executed yet.
; Input:
; rdi: goto::{{closure}} environment
; rsi: unit
; rdx: target_pos
; Output:
; rax: goto::{{closure}} environment/Future
playground::goto:
mov rax, rdi ; rax = Set the return value to the future
mov qword ptr [rdi + 16], rsi ; Save the unit.
mov dword ptr [rdi + 24], edx ; Save the target_pos
mov byte ptr [rdi + 28], 0 ; π Start (0)
ret ; Return
Role of the async executor
Rust requires an async executor to run the async functions. The executor is responsible for polling the future returned by the async function. The following sequence diagram shows how the executor polls the future returned by the goto
async function. The executor calls the poll
method of the Future
trait. The poll
method calls the goto::{{closure}}
closure. The goto::{{closure}}
closure checks the state variable and executes the appropriate code block based on the current state. The goto::{{closure}}
closure then updates the state variable and returns the Poll
object. The poll
method returns the Poll
object to the caller.
Flow chart of the generated assembly of the goto
closure
Now that we better understand the async function's state machine and async executors let's look at a high-level flow chart of the generated assembly code in the following flow chart. The compiler has inlined the poll for the UnitGotoFuture
future into the goto::{{closure}}
. The goto closure can also result in an exception if the RefCell
borrowing fails.
Generated assembly of the goto::{{closure}}
The assembly code implements an asynchronous function goto::{{closure}}
. The function receives a mutable reference to a closure environment and a Context
object. The function returns Poll::Pending
or Poll::Ready
.
The code has been annotated with comments to explain the assembly code. The comments are prefixed with the state of the state machine. The state machine has four states: Start (0)
, Waiting (3)
, Done (1)
, and UnitState2
. The Start (0)
state is the entry point to the function. The future is pending in the Waiting (3)
state. The Done (1)
state is where the future is completed. The UnitState2
state is where the future is completed with an error.
Here is a high-level description of the generated assembly code. This will help us understand the assembly code better.
The assembly code starts with a closure that is used to jump to the appropriate block of code depending on the current state of the goto::{{closure}}
. The closure loads the current state of the future from the closure environment and uses it to look up the corresponding block of code in a jump table.
The function has three blocks of code. The Start (0)
state initializes the closure environment and loads the Unit
and target_pos
fields into registers. It then saves these fields into the closure environment and jumps to the Waiting (3) State
.
The Waiting (3)
state loads the Unit
field from the closure environment and calls the inlined UnitGotoFuture::poll
logic. Β It calculates the difference between the unit_pos and the target_pos
and updates the unit_pos
accordingly. The function jumps to an error handler if the unit has already been borrowed. If the unit has reached the target_pos
, it sets the state to Done (1)
. Otherwise, it returns Poll::Pending
.
After transitioning to Done
(1), the closure decrements the unit's strong reference count. If the strong reference count is zero, it frees the memory associated with the unit using the __rust_dealloc
function. The function returns Poll::Ready
to indicate the future has completed.
; Input:
; rdi: Closure environment
; rsi: &mut Context
; Output:
; rax: Poll<()>
playground::goto::{{closure}}:
push rbp
push r15
push r14
push rbx
push rax
mov r15, rdi
movzx eax, byte ptr [rdi + 28] ; Load the state to determine which block to execute.
; The state is stored in the 28 offset in the closure environment.
lea rcx, [rip + .LJTI57_0] ; Load the address of the jump table rcx. The jump table is a list of offsets.
; from the start of the jump table to each block.
movsxd rax, dword ptr [rcx + 4*rax] ; Get the jump offset from the entry corresponding to the state.
; The index in rax is multiplied by 4 because the jump table is an array
; of 32-bit jump offsets indexed by the state.
add rax, rcx ; Add the jump offset to the start of the jump table to get the
; address of the block to execute.
jmp rax ; Jump to the block to execute.
; == π Start (0) block entry point ==
; The caller of the async function sets the state to 0 and initializes the closure environment.
.LBB57_4:
mov rdi, qword ptr [r15 + 16] ; Load the unit from the closure environment into rdi
mov eax, dword ptr [r15 + 24] ; Load the target_pos from the closure environment into eax
mov qword ptr [r15], rdi ; Save the unit in the closure environment
mov dword ptr [r15 + 8], eax ; Save the target_pos in the closure environment
jmp .LBB57_5
; == π Waiting (3) block entry point ==
; Resume the future after a poll that returned Pending
.LBB57_1:
mov rdi, qword ptr [r15] ; Load the unit from the closure environment into rdi
.LBB57_5:
; Inlined call to UnitGotoFuture::poll
mov rax, qword ptr [rdi + 16] ; Load the borrow flag from the unit's RefCell into rax
movabs rcx, 9223372036854775807 ; Load the maximum signed 64-bit integer rcx
cmp rax, rcx ; Check if the borrow flag is greater than max signed 64-bit value.
jae .LBB57_6 ; If the value in rax is greater than or equal to max signed 64-bit value,
; jump to .LBB57_6 as the unit has already been borrowed.
; The unit has not been borrowed.
mov ebx, dword ptr [rdi + 24] ; Load the unit_pos from unit into ebx
mov ebp, dword ptr [r15 + 8] ; Load the target_pos from the closure environment into ebp
mov ecx, ebp ; Set ecx to target_pos
sub ecx, ebx ; Subtract unit_pos from target_pos and store the result in ecx
jne .LBB57_8 ; If the difference is not equal to 0, jump as the unit has not reached the target position
dec qword ptr [rdi] ; Decrement the strong reference count of the unit
mov r14b, 1 ; Set the state to β
Done (1)
jne .LBB57_17 ; Check if the strong reference count is not equal to 0. If it is not equal to 0, jump to .LBB57_17
; β»οΈ Freeing Rc memory as the reference count is 0.
dec qword ptr [rdi + 8] ; Decrement the weak reference count of the unit
jne .LBB57_17 ; Jump if the weak reference count is not equal to 0.
mov esi, 32 ; Set the size of the memory to free to 32
mov edx, 8 ; Set the alignment of the memory to free to 8
call qword ptr [rip + __rust_dealloc@GOTPCREL] ; Call __rust_dealloc to free the memory
jmp .LBB57_17
.LBB57_8:
test rax, rax ; Check if the borrow flag is 0.
jne .LBB57_9 ; If the borrow flag is not 0, jump to .LBB57_9 as the unit has already been borrowed.
xor eax, eax ; Set eax to 0.
; signum function inlined - begin
test ecx, ecx ; Check if the unit_pos is greater than the target_pos
setg al ; Set the value of al to 1 if unit_pos is greater than target_pos
; Set the value of al to 0 if unit_pos is less than or equal to target_pos
lea eax, [rbx + 2*rax] ; Add 2 to unit_pos if unit_pos is less than target_pos
dec eax ; Subtract 1 from the result of the previous addition (signum addition of 1 or -1)
; signum function inlined - end
mov dword ptr [rdi + 24], eax ; Save the new unit_pos in the unit
mov qword ptr [rdi + 16], 0 ; Set the borrow flag to 0.
mov r14b, 3 ; Set the state to π Waiting (3)
.LBB57_17:
cmp ebp, ebx ; Compare unit_pos with target_pos
setne al ; Set the value of al to 1 (future not ready) if unit_pos is not equal to target_pos
; Set the value of al to 0 (future ready) if unit_pos is equal to target_pos
mov byte ptr [r15 + 28], r14b ; Save the state in the closure environment
add rsp, 8 ; Free local variables.
pop rbx
pop r14
pop r15
pop rbp
ret ; Return the future status.
.LBB57_6:
; π Prepare the panic message.
lea r8, [rip + .L__unnamed_21]
lea rcx, [rip + .L__unnamed_20] ; drop_in_place
mov esi, 24
lea rdi, [rip + .L__unnamed_19] ; Load "already borrowed".
jmp .LBB57_10
.LBB57_2:
lea rdi, [rip + str.1] ; Load address of the string "`async fn` resumed after panicking"
lea rdx, [rip + .L__unnamed_23]
mov esi, 34
call qword ptr [rip + core::panicking::panic@GOTPCREL]
ud2
; == β
Done (1) block entry point: ==
; This block should never be reached as the future has already been completed.
.LBB57_3:
lea rdi, [rip + str.2] ; Load address of the string "`async fn` resumed after completion"
lea rdx, [rip + .L__unnamed_23]
mov esi, 35
call qword ptr [rip + core::panicking::panic@GOTPCREL]
ud2
.LBB57_9:
lea r8, [rip + .L__unnamed_22]
lea rcx, [rip + .L__unnamed_8]
mov esi, 16
lea rdi, [rip + .L__unnamed_7]
.LBB57_10:
mov rdx, rsp
call qword ptr [rip + core::result::unwrap_failed@GOTPCREL]
ud2
mov r14, rax
mov rdi, qword ptr [r15]
call core::ptr::drop_in_place<playground::UnitGotoFuture>
mov byte ptr [r15 + 28], 2
mov rdi, r14
call _Unwind_Resume@PLT
ud2
.LJTI57_0:
.long .LBB57_4-.LJTI57_0 ; π Start (0) : Entry point to the goto closure.
.long .LBB57_3-.LJTI57_0 ; β
Done (1) : Throw a panic if polled after completion of the future.
.long .LBB57_2-.LJTI57_0 ; UnitState2:
.long .LBB57_1-.LJTI57_0 ; π Waiting (3); Future is pending.
; ποΈ Vtable for the goto closure.
.L__unnamed_26:
.quad core::ptr::drop_in_place<playground::goto::{{closure}}> ; Destructor for the FnOnce trait object
.asciz " \000\000\000\000\000\000\000\b\000\000\000\000\000\000" ; Size of object: 32 bytes (Leading space)
; Alignment of the Future trait object: 8 bytes (\b)
.quad playground::goto::{{closure}} ; call_once method of the FnOnce trait
Key takeaways
- An
await
in an async function typically results in apoll
call on the future. If the future is not ready, thepoll
method returnsPoll::Pending
. Thepoll
method returnsPoll::Ready
if the future is ready. - The async function itself returns a future that wraps a closure. The
poll
method of the future calls the closure. - The closure is implemented as a state machine. The
await
points are represented as state transitions.- The state is stored in a compiler-generated enum.
- A jump table is used to jump to the appropriate state.
- The state machine is a closure that implements the Future trait.
- Local variables in the async function are stored in the closure environment. Too many local variables can cause the closure environment to be too large.
Articles in the async/await series
- Desugaring and assembly of async/await in Rust -
goto
- Nested async/await in Rust: Desugaring and assembly -
patrol
- Rust async executor -
executor