Understanding Rust's Trait Objects: Vtables, Dynamic Dispatch, and Memory Deallocation
We have looked at static and dynamic dispatch. This article will investigate how Rust handles dynamic dispatch using trait objects and vtables. We will also explore how the Rust compiler can sometimes optimize tail calls in the context of dynamic dispatch. Finally, we will examine how the vtable facilitates freeing memory when using trait objects wrapped in a Box
.
To illustrate these concepts, we will work with the following example code:
pub trait Shape {
type T;
fn area(&self) -> Self::T;
}
pub trait Draw: Shape {
fn draw(&self);
}
// The function operates with a trait object reference
pub fn draw_dynamic(a: &dyn Draw<T = f64>) {
a.draw();
}
// Invokes the draw method and then the area method on the trait object
pub fn draw_and_report_area_dynamic(a: &dyn Draw<T = f64>) -> f64 {
a.draw();
a.area()
}
// The trait object is wrapped in a Box. This means that the function will have
// to free the memory associated with the trait object in the function body.
pub fn draw_and_report_area_dynamic_box(a: Box<dyn Draw<T = f64>>) -> f64 {
a.draw();
a.area()
}
Trait object fat pointer and the vtable
Under the hood, trait objects are implemented using fat pointers. The first pointer points to the data of the concrete type that implements the Draw
trait, and the second pointer points to the vtable (virtual function table) for the concrete type. The vtable is a data structure that stores pointers to the implementations of concrete-type methods. The vtable also stores information on the concrete type destructor, alignment, and size. This information will come in handy when we look at how the compiler frees memory when a trait object is wrapped in a Box
.
The following diagram shows the layout of a trait object fat pointer. The byte offsets of the fat pointer and the vtable are shown in the diagram.
Tail call optimization (TCO)
Before diving deeper, let us understand tail call optimization (TCO). TCO is a powerful compiler optimization technique that allows the last function call in a function body—the tail call—to bypass creating a new stack frame. This optimization is possible when the tail call's return value is also the return value of the calling function.
TCO tricks the tail-called function into returning directly to the original caller rather than the intermediate function. By reorganizing the function's execution flow, the compiler can reuse the existing stack frame and avoid the overhead of additional function calls.
When the compiler applies tail call optimization (TCO), it replaces a function call with a jump (jmp
) to eliminate the need for a new stack frame. By replacing the call
with a jump, the compiler avoids the overhead of setting up and tearing down a stack frame, making the execution more efficient.
Assembly output for draw_dynamic
The generated assembly code for the draw_dynamic
function includes a jump to the vtable address of the draw
function rather than a call. This optimization, known as Tail Call Optimization (TCO), eliminates creating a new stack frame and return address for the call chain's last (tail) function.
In this case, the draw
function returns directly to the caller of draw_dynamic
, using the return address that was already present on the stack. This optimization is possible because the rdi
register already points to the concrete object that implements the Draw
trait, satisfying the self
parameter requirement for the draw
function.
; Inputs:
; rdi: Address of a
; rsi: Address of the vtable associated with a
example::draw_dynamic:
jmp qword ptr [rsi + 32] ; a.draw() is called via the vtable
; and the tail call optimization is
; applied.
Assembly output for draw_and_report_area_dynamic
The generated assembly code for the draw_and_report_area_dynamic
function is more complex than that of draw_dynamic
. It first obtains the addresses of the draw
and area
methods from the vtable using offsets of 32
and 24
, respectively. It then calls the draw
method using a call
instruction, which creates a new stack frame and return address.
However, the area
method uses a jump
instruction, indicating that it is eligible for tail call optimization (TCO). Since the area
method is the last function call in the body of draw_and_report_area_dynamic
, the compiler can optimize it by replacing the call with a jump, eliminating the need for a new stack frame and return address. The return value of the area
method is also the return value of draw_and_report_area_dynamic
, so the optimization has no impact on the correctness of the code.
; Inputs:
; rdi: Address of a
; rsi: Address of the vtable associated with a
; Outputs:
; xmm0: Result of function
example::draw_and_report_area_dynamic:
push r14 ; Save the value of r14
push rbx ; Save the value of rbx
push rax ; Save the value of rax
mov r14, rsi ; Save the address of the vtable
mov rbx, rdi ; Save the address of a
; ⭐ Call a.draw()
; This is not the last function call in the function body, so it cannot
; be optimized away with a tail call.
call qword ptr [rsi + 32] ; a.draw() is called via the vtable
mov rdi, rbx ; Restore the address of a
mov rax, r14 ; Restore the address of the vtable
; ⭐ Pop off the three pushes on the stack in preparation of the tail call
; The stack is left in a clean state so that the jump to a.area() can
; can be followed by a return instruction that will return from
; draw_and_report_area_dynamic.
add rsp, 8 ; Adjust the stack pointer
pop rbx ; Restore the value of rbx
pop r14 ; Restore the value of r14
; ⭐ Tail call a.area()
; This is the last function call in the function body and its return
; value is also the return value of draw_and_report_area_dynamic, so it
; can be optimized away with a tail call.
jmp qword ptr [rax + 24] ; a.area() is called via the vtable
; and the tail call optimization is
; applied.
Assembly output for draw_and_report_area_dynamic_box
The function draw_and_report_area_dynamic_box
is a function that takes a Box
containing a trait object as an argument and returns the result of calling the area
method of the trait object. The function starts by saving the values of the r15
, r14
, and rbx
registers
on the stack, and then adjusts the stack pointer to reserve space for temporary variables.
It then calls the draw
method of the trait object contained in the Box
, followed
by the area
method. The result of the area method is saved on the stack.
The function then calls the destructor of the trait object contained in the Box
and
frees the memory allocated for the trait object. The function then returns the result
of the area
method. There is unreachable code after the ret
instruction.
; Inputs:
; rdi: Address of a
; rsi: Address of the vtable associated with a
; Outputs:
; xmm0: Result of function
example::draw_and_report_area_dynamic_box:
push r15 ; Save the value of r15 on the stack
push r14 ; Save the value of r14 on the stack
push rbx ; Save the value of rbx on the stack
sub rsp, 32 ; Adjust the stack pointer to reserve space for temporary variables
mov rbx, rsi ; Save the address of the vtable into rbx
mov r14, rdi ; Save the address of the Box into r14
mov qword ptr [rsp + 16], rdi ; Save the address of the Box on the stack
mov qword ptr [rsp + 24], rsi ; Save the address of the vtable on the stack
; ⭐ Call the draw method of the trait object contained in the Box
call qword ptr [rsi + 32] ; Call the draw method of the trait object contained in the Box
mov rdi, r14 ; Restore the address of the Box into rdi
; ⭐ Call the area method of the trait object contained in the Box
call qword ptr [rbx + 24] ; Call the area method of the trait object contained in the Box
movsd qword ptr [rsp + 8], xmm0 ; Save the result of the area method on the stack
; ⭐ Call the destructor of the trait object contained in the Box
mov rdi, r14 ; Load the address of the Box into rdi
call qword ptr [rbx] ; Call the destructor of the value contained in the Box
; ♻️ Free the memory allocated for the trait object contained in the Box
mov rsi, qword ptr [rbx + 8] ; Load the size of the value contained in the Box into rsi
test rsi, rsi ; Check if the size of the value contained in the Box is 0
je .LBB5_5 ; If the size of the value contained in the Box is 0, jump to .LBB5_5
mov rdx, qword ptr [rbx + 16] ; Load the alignment of the value contained in the Box into rdx
mov rdi, r14 ; Load the address of the Box into rdi
call qword ptr [rip + __rust_dealloc@GOTPCREL] ; Call the __rust_dealloc function
.LBB5_5:
movsd xmm0, qword ptr [rsp + 8] ; Load the result of the area method into xmm0
add rsp, 32 ; Adjust the stack pointer to free space for temporary variables
pop rbx ; Restore the value of rbx from the stack
pop r14 ; Restore the value of r14 from the stack
pop r15 ; Restore the value of r15 from the stack
ret ; Return the result of the area method
; 💀 UNREACHABLE CODE
; The following code is unreachable as the function has already returned.
mov r15, rax ; Load the address of the Box into r15
mov rsi, qword ptr [rbx + 8] ; Load the size of the value contained in the Box into rsi
mov rdx, qword ptr [rbx + 16] ; Load the alignment of the value contained in the Box into rdx
mov rdi, r14 ; Load the address of the Box into rdi
call alloc::alloc::box_free ; Call the box_free function
jmp .LBB5_7 ; Jump to the end of the function
mov r15, rax ; Load the address of the Box into r15
lea rdi, [rsp + 16] ; Load the address of the Box on the stack into rdi
call core::ptr::drop_in_place<alloc::boxed::Box<dyn example::Draw+T = f64>> ; Call the drop_in_place function
.LBB5_7:
mov rdi, r15 ; Load the address of the Box into rdi
call _Unwind_Resume@PLT ; Call the _Unwind_Resume function
ud2 ; Trap
call qword ptr [rip + core::panicking::panic_no_unwind@GOTPCREL] ; Call the panic_no_unwind function
ud2 ; Trap
DW.ref.rust_eh_personality:
.quad rust_eh_personality
Rectangle - a concrete type implementing the Draw trait
Example of a concrete type implementation of the Draw
trait. The vtable, destructor, area method and draw method implementations are also covered.
// This function invokes the three functions we have been dealing with
// In this case, the compiler can inline the functions and remove
// the dynamic dispatch.
pub fn draw_rectangle(rectangle: Rectangle<f64>) -> (f64, f64) {
draw_dynamic(&rectangle);
let a = draw_and_report_area_dynamic(&rectangle);
let b = draw_and_report_area_dynamic_box(Box::new(rectangle));
(a, b)
}
pub fn draw_rectangle(rectangle: Rectangle<f64>) {
draw_dynamic(&rectangle);
draw_and_report_area_dynamic(&rectangle);
draw_and_report_area_dynamic_box(Box::new(rectangle));
}
pub struct Point<T> {
x: T,
y: T,
}
pub struct Rectangle<T> {
top_left: Point<T>,
bottom_right: Point<T>,
}
impl<T> Shape for Rectangle<T>
where
T: std::ops::Sub<Output = T> + std::ops::Mul<Output = T> + Copy,
{
type T = T;
fn area(&self) -> T {
let width = self.bottom_right.x - self.top_left.x;
let height = self.top_left.y - self.bottom_right.y;
width * height
}
}
impl<T> Draw for Rectangle<T>
where
T: std::ops::Sub<Output = T> + std::ops::Mul<Output = T> + Copy,
{
fn draw(&self) {
println!("Draw a rectangle");
}
}
Vtable for Rectangle
See the definition of the vtable for the Rectangle
type. Compare this definition with the trait object and vtable figure.
.L__unnamed_3:
.quad core::ptr::drop_in_place<example::Rectangle<f64>> ; Destructor
.asciz " \000\000\000\000\000\000\000\b\000\000\000\000\000\000" ; Size: 32 (leading space)
; Alignment: 8 (\b character)
.quad <example::Rectangle<T> as example::Shape>::area ; Address of the area method for Rectangle
.quad <example::Rectangle<T> as example::Draw>::draw ; Address of the draw method for Rectangle
Destructor for Rectangle
The destructor for Rectangle is core::ptr::drop_in_place<example::Rectangle<f64>>
. The defined destructor returns.
core::ptr::drop_in_place<example::Rectangle<f64>>:
ret
Area method for Rectangle
The area
method for Rectangle
is <example::Rectangle<T> as example::Shape>::area
. The area
method is defined as follows:
<example::Rectangle<T> as example::Shape>::area:
movupd xmm1, xmmword ptr [rdi + 8] ; Load the bottom_right.x and bottom_right.y fields into xmm1
movsd xmm0, qword ptr [rdi + 24] ; Load the top_left.x field into xmm0
movhpd xmm0, qword ptr [rdi] ; Load the top_left.y field into higher xmm0
subpd xmm1, xmm0 ; Perform vector subtractions to compute the width and height
movapd xmm0, xmm1 ; Copy the width and height into xmm0
unpckhpd xmm0, xmm1 ; Unpack the width and height into xmm0 and xmm1
mulsd xmm0, xmm1 ; Perform vector multiplication to compute the area
ret ; Return the result in xmm0
Draw method for Rectangle
The draw
method for Rectangle
is <example::Rectangle<T> as example::Draw>::draw
. The draw
method is defined as follows:
<example::Rectangle<T> as example::Draw>::draw:
sub rsp, 56 ; Allocate space for the arguments on the stack
lea rax, [rip + .L__unnamed_1] ; Load the address of the format string into rax
mov qword ptr [rsp + 8], rax ; Store the address of the format string into the stack
mov qword ptr [rsp + 16], 1 ; Store the number of arguments into the stack
mov qword ptr [rsp + 24], 0 ; Store the address of the first argument into the stack
lea rax, [rip + .L__unnamed_2] ; Load the address of the argument into rax
mov qword ptr [rsp + 40], rax ; Store the address of the argument into the stack
mov qword ptr [rsp + 48], 0 ; Store the address of the second argument into the stack
lea rdi, [rsp + 8] ; Load the address of the arguments into rdi
call qword ptr [rip + std::io::stdio::_print@GOTPCREL] ; Call the print function
add rsp, 56 ; Deallocate the space for arguments from the stack
ret
Key takeaways
- Tail Call Optimization (TCO) is a technique compilers use to optimize away the last function (tail function) stack frame and return.
- Reorganizing to prepare the return value from a single function might improve performance due to tail call optimization.
- Vtables store addresses of methods associated with a trait object.
- The rdi and rsi registers are used to pass the address of a trait object and its vtable, respectively.
- When a trait object wrapped in a Box goes out of scope, the compiler frees the memory allocated for the trait object by calling the destructor and the
__rust_dealloc
function.
Experiment with the Compiler Explorer
Edit the code in the Compiler Explorer to see the difference in the assembly output.
-
Change the
T
type toi32
and see the difference in the assembly output. -
Change the
draw_and_report_area_dynamic
function to reverse the order of thearea
anddraw
calls as shown below. The compiler is not able to tail call optimize the function.
pub fn draw_and_report_area_dynamic(a: &dyn Draw<T = f64>) -> f64 {
let area = a.area();
a.draw();
area
}