We write programs that manage large data structures quite often. On occasions, said data structures are just large, plain structs spanning tens of MiBs. The stack space on most OSes is quite constrained, so putting such large objects onto the stack will almost certainly make it overflow, forcing the app to exit. Unluckily, Rust does not have a “safe” solution that avoids constructing the object on the stack first. All present workarounds are error-prone for deeply cascaded structures with non-trivial constructors.

This is exactly what happened during my recent simulation project. The data structure I used is largely depicted in the following pseudo-Rust:

struct SubSub1 {
    t: u32,
    c: i8,
    u: u8,
}

struct SubSub2<const N: usize> {
    table: [i8; N],
}

struct Sub1<const N0: usize, const N: usize, const C: usize> {
    t_0: SubSub2<N0>,
    tables: [[SubSub1; C]; N],
    bitarr: Box<[bool]>,
    l: [usize; N],
    ctr: usize,
    dir: bool,
}

struct Sub2<const L: usize, const N: usize, const M: usize> {
    w: [[[i8; L]; N]; M],
    w_0: [i8; M],
    bitarr: [bool; L],
    table: [u32; L],
}

struct Selector<const N: usize> {
    table: [i8; N],
}

struct Top<...> {
    sub_1: Sub1<N10, N1, C1>,
    sub_2: Sub2<L2, N2, M2>,
    selector: Selector<N_SEL>,
}

In reality, all of these objects have non-trivial initialization behavior. These structs are so heavily parametrized that their exact behavior can be customized via zero-sized function objects.

But the main idea is: Although the object is quite large and may vary in behavior, its size is known at compile time. So let’s use the following structure to represent this family of objects:

struct VeryLargeButStaticallySizedObject {
    data: [u8; 2 * 1024 * 1024],
}

type MyObject = VeryLargeButStaticallySizedObject;

So initially I just constructed one MyObject on the stack and let it run.

fn main() {
    let top = MyObject::new(...);
    ...
}

It inevitably failed with stack overflow on my Windows.

PS D:\Workspace\project> cargo run            
   Compiling project v0.1.0 (D:\Workspace\project)
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 2.55s
     Running `target\debug\project.exe`

thread 'main' has overflowed its stack
error: process didn't exit successfully: `target\debug\project.exe` (exit code: 0xc00000fd, STATUS_STACK_OVERFLOW)

So the solution is clear:

  1. Increase the stack size.
  2. Construct the object on the heap, not the stack.

Solution 1 is generally unfavorable since the way to adjust stack size varies from compiler to compiler, and from OS to OS. I would have to force my coworkers to type in a long command line to achieve this. Solution 2 would be much better. In languages like C++, this is as simple as one line:

#include <memory>

auto my_obj = std::make_unique<MyObject>(...);

This call forwards all parameters to MyObject’s constructor, then initializes internal bookkeeping to automatically deallocate (in this case, calling std::default_delete on) the object. No stack allocation is needed.

A more historical technique known as placement new is also there to help in C++. It allows users to view arbitrary memory as the object’s storage, and then run the constructor on it. But that kind of low-level operation is generally considered “unsafe” by Rust pros, so we don’t talk about them.

So I tried to replicate std::make_unique in safe Rust. This is what I wrote at first:

fn main() {
    let top = Box::new(MyObject::new(...));
    ...
}

Running it with the “debug” profile, it failed once more. Looking at the generated assembly (Godbolt link), I found the following code:

fun:
        mov     r11, rsp
        sub     r11, 2097152
.LBB8_3:
        sub     rsp, 4096
        mov     qword ptr [rsp], 0
        cmp     rsp, r11
        jne     .LBB8_3
        sub     rsp, 24
        lea     rdi, [rsp + 8]
        call    example::VeryLargeButStaticallySizedObject::new::h94705197ed0c48cf
        mov     edi, 2097152
        mov     esi, 1
        call    alloc::alloc::exchange_malloc::hed5e7d0e2c1c5709
        mov     qword ptr [rsp], rax
        jmp     .LBB8_2
        mov     rcx, rax
        mov     eax, edx
        mov     qword ptr [rsp + 2097160], rcx
        mov     dword ptr [rsp + 2097168], eax
        mov     rdi, qword ptr [rsp + 2097160]
        call    _Unwind_Resume@PLT
.LBB8_2:
        mov     rdi, qword ptr [rsp]
        lea     rsi, [rsp + 8]
        mov     edx, 2097152
        call    memcpy@PLT
        mov     rax, qword ptr [rsp]
        add     rsp, 2097176
        ret

MyObject is first constructed on the stack, before being moved into the Box! Box-ing the object does not eliminate the stack allocation.

But when I turn on optimizations with release profile, the stack allocation magically disappeared (Godbolt link):

fun:
        push    rax
        mov     rax, qword ptr [rip + __rust_no_alloc_shim_is_unstable@GOTPCREL]
        movzx   eax, byte ptr [rax]
        mov     edi, 2097152
        mov     esi, 1
        call    qword ptr [rip + __rust_alloc@GOTPCREL]
        test    rax, rax
        je      .LBB0_1
        mov     edx, 2097152
        mov     rdi, rax
        xor     esi, esi
        pop     rax
        jmp     qword ptr [rip + memset@GOTPCREL]
.LBB0_1:
        mov     edi, 1
        mov     esi, 2097152
        call    qword ptr [rip + alloc::alloc::handle_alloc_error::ha0547c441587f574@GOTPCREL]

And this is the intended behavior.

Is there a way to permanently enable this “move elision”? Unfortunately, no. There are many discussions already on Rust forums:

Very interestingly, in the last thread, one user “khimru” replied the following:

“khimru” wrote in Jun 2023:

Is this practical question or an attempt to find an excuse to not use Rust?

Guaranteed copy elision is an interesting property, but it’s C++17 property which means most C++ codebases don’t use it.

Similarly the ability to create object on heap without unsafe code is something that is nice in theory but quite problematic in practice. But we may hope that in year 2034, when Rust would be as old as C++17, that would be a solved problem. We are not there yet, though.

Later, in reply to a user named “kpreid”, “khimru” wrote these:

“khimru” wrote in Jun 2023:

That’s why I asked if that’s an attempt to show that Rust is not yet “done” (is there any language which is actually 100% done?) or some kind of practical issue.

Well, here I give you a practical case as you have requested. This does happen from time to time, and yes, Rust is surely not yet “done”.

In the end, I gave up and told all my users to adjust stack space as needed.

RUSTFLAGS="-C link-args=/STACK:4194304" cargo run