Why I Migrated My Stack to Rust

For five years, Node.js was my default backend choice. Express, then Fastify, then NestJS — each iteration a little more structured, a little more typed, a little closer to a "real" backend language. But in mid-2025, after a particularly painful debugging session involving a memory leak in production, I decided it was time to try something different.

That something was Rust. Six months later, I've rewritten two critical services. Here's an honest account of what changed.

The Trigger

Our main API service — a JSON-over-HTTP gateway handling order processing — was running on Node.js 20 with Fastify. By all accounts, it was fast. P95 latency sat around 12ms under normal load. The problem showed up at scale.

During peak hours, memory usage would creep up steadily, eventually triggering OOM kills in our Kubernetes pods. The garbage collector would pause for 50-80ms at the worst moments. We patched around it with aggressive pod recycling and memory limits, but it felt like duct tape.

When your solution to a memory leak is "restart the process every 4 hours," it's time to reconsider your choices.

Why Rust?

I considered Go seriously. It's simpler, has a gentler learning curve, and the goroutine model is elegant. But several factors pushed me toward Rust:

Memory control — Rust's ownership model gives you C-level control without C-level footguns. No GC pauses, no surprise allocations.
Type system — Coming from TypeScript, Rust's type system felt like a natural evolution. Enums with data, pattern matching, and trait-based generics are genuinely expressive.
Ecosystem maturity — The Tokio async runtime, Axum web framework, and sqlx database driver have matured significantly. The DX in 2025 is vastly better than even 2023.
Long-term bet — Rust adoption is accelerating across infrastructure, cloud, and even frontend (via WASM). Learning it now felt like investing in a skill with increasing returns.

The Migration

I didn't rewrite everything at once. Instead, I picked the two services with the clearest performance requirements and migrated them first, while keeping the rest of the stack on Node.js behind an API gateway.

Service 1: Order Gateway

The HTTP gateway that validates, enriches, and routes incoming orders. Here's what the core handler looks like in Axum:

async fn create_order(
    State(ctx): State<AppContext>,
    Json(payload): Json<CreateOrderRequest>,
) -> Result<Json<OrderResponse>, AppError> {
    let order = ctx.order_service
        .create(payload)
        .await
        .map_err(AppError::from)?;

    Ok(Json(OrderResponse::from(order)))
}

If you've used Express or Fastify, this structure will look familiar. Axum's extractor pattern is remarkably ergonomic — it handles deserialization, validation, and error conversion at the type level.

Service 2: Event Processor

A streaming service that consumes events from Kafka, applies business logic, and writes results to ClickHouse. This was the service with the memory leak. In Rust, the core loop is straightforward:

while let Some(msg) = consumer.recv().await {
    let event: DomainEvent = serde_json::from_slice(msg.payload())?;

    match event {
        DomainEvent::OrderCreated(e) => handle_order_created(&ctx, e).await?,
        DomainEvent::PaymentReceived(e) => handle_payment(&ctx, e).await?,
        DomainEvent::OrderCancelled(e) => handle_cancellation(&ctx, e).await?,
    }

    consumer.commit_message(&msg).await?;
}

The match expression is one of Rust's killer features. The compiler guarantees that every event variant is handled — you literally cannot forget a case without getting a compile error.

Results

After two months of development and a staged rollout, here are the numbers:

Order Gateway

P95 latency: 12ms → 2.1ms (5.7x improvement)
P99 latency: 45ms → 4.8ms (9.4x improvement)
Memory usage: 380MB → 18MB (21x reduction)
CPU utilisation: ~35% → ~8% at equivalent load

Event Processor

Throughput: 12k events/sec → 85k events/sec
Memory: Stable at 24MB (was 200-800MB and growing)
Zero OOM kills in 3 months of production

The tail latency improvement was the most impactful for user experience. Those GC pauses that spiked P99 to 45ms? Gone entirely.

The Hard Parts

It wasn't all smooth sailing. Rust's learning curve is real, and I hit several walls:

Lifetimes — The borrow checker's complaints are pedagogically useful but initially maddening. It took about three weeks before I stopped fighting it and started thinking in ownership.
Async complexity — Async Rust is powerful but has rough edges. Pinning, Send bounds, and the lack of async traits (until recently) added friction.
Compile times — A clean build of our workspace takes 90 seconds. Incremental builds are fast (2-5s), but the initial compile is painful compared to Node.js's instant startup.
Ecosystem gaps — Some niche libraries simply don't exist yet in Rust. We had to write a custom ClickHouse client adapter because the existing crate didn't support our insert pattern.

Was It Worth It?

Unequivocally, yes — for these specific services. The performance gains are dramatic and the operational burden (no more memory leak firefighting) has dropped significantly. Our infrastructure costs for these two services fell by roughly 60% because we could run fewer, smaller pods.

Would I rewrite everything in Rust? No. Our admin dashboard API, internal tools, and batch jobs are perfectly fine in Node.js. The overhead of Rust's strictness doesn't justify itself for services that aren't performance-critical.

The lesson isn't "Rust is better than Node.js." It's that choosing the right tool for the right job — and being willing to invest in learning — can yield outsized returns.