6 min read

1 day ago

Listen

The Single-Threaded Reality Check

Context about Node.js (feel free to skip this part if you’re already aware)

Node.js is a single-threaded JavaScript runtime that has an asynchronous event loop to handle blocking operations. When we talk about the “main thread,” we’re literally talking about a single operation running on a single core of your machine. Tasks that are blocking in nature get delegated to the event loop to be taken care of later, after all the synchronous code in the stack has executed — but still on the same thread ONE AT A TIME. This gives an illusion of concurrency as we can juggle multiple tasks, but this isn’t true parallelism.

Due to its single-threaded nature, compute-intensive tasks block the main thread and create a bottleneck in performance. File manipulations, image processing, and similar CPU-hungry operations have long processing times. In a single-threaded environment, if a task hogs the main thread for too long, it blocks other I/O bound tasks that could have been handled in much less time.

This makes a compelling case for using worker threads — one of the two parallel processing options that Node.js provides. But before diving into worker threads, let’s get crystal clear about the difference between the two parallel processing options: clusters and worker_threads.

𐂫 Clusters vs. Worker Threads: Know your armour well

Short explanation first:

Clustering — creates separates processes, one per cpu core.

Worker threads — creates additional threads within the same process.

📚 The Deep Dive

Clustering involves using the cluster module to fork the Node.js application, creating multiple child processes. Each process runs the same app.js/ main.js and can handle requests independently, making use of all available cores of your machine.

Worker threads, on the other hand, are real multi-threading — threads within the same process.

⚙️ In OS terms:

- Clustering = multi-processing
- Worker threads = multi-threading

🤨 Key differences:

- Processes have separate memory and runtime.
- Threads share memory heap, but have their own call stacks and event loops.

🚘 When to Use What

If you’re looking to scale your application, this could be your first line of defense:

Clustering shines when you’re dealing with a high-traffic environment and need to leverage all cores of your machine. This approach is usually discouraged when going for cloud deployments using AWS, K8s, or GCP as they already handle scaling and core utilization.

Worker threads are your go-to when there’s a task that could potentially block the main thread and starve other tasks of resources.

But here’s the caveat: you don’t use worker threads for simple I/O bound tasks like making a bunch of API calls or DB queries (even if they’re slow). Worker threads may take up to 100ms to spin up, which is almost always enough time to perform a network task.

In cricket terms, you use worker threads for a strategic timeout, long and important. You can’t call the support staff to ground after every over to discuss strategy. That is the job of the main thread.

Also, since worker threads are essentially threads within the same process, they share the same heap memory.

This means they use the same global variables, classes, and environment secrets, but have their own local memory — which most importantly includes your Database, Kafka connections, basically all connections.

😉 The Worker Thread Syntax: Let’s Get Coding

// This part executes in the main thread (your usual code like controllers/some-controller.js)

import {Worker} from ‘worker_threads’const worker = new Worker(filePath, options);

- filePath is the path to the JS file that’s supposed to run on this new thread. Remember those caveats before writing this script!- options is an optional object which most prominently consists of the key called ‘workerData’:{workerData: {// here is the data object or Buffer that you want to pass to the new child thread}
}

The data being passed from the parent thread to the child thread is a deep clone and not a reference to the memory location (remember we discussed worker threads have their own local memory — this data goes to that local memory).

This has three important implications:1. It’s an expensive operation to clone an entire data buffer and pass it from one thread to another, so use it judiciously. And avoid passing large datasets from main thread to child thread.2. Being a deep clone and not passed by reference ensures the worker thread doesn’t accidentally manipulate the original data (isolation and immutability).
3. You can’t share things that can’t be serialized (i.e., converted to bytes in memory) or that have circular dependencies. Make that database connection in the worker script itself.

🚅 Show Me the Code Already!

Let’s see a real example:

// Main thread code

// some controller/customerStatementController.ts

const worker = new Worker("../workers/generateStatement.js", { workerData: { name: "Siddharth Upadhyay", statementQuery: "select total from transactions where user_id = 12 and start_date = '01-01-2024' and end_date = '31–12–2024';" }});worker.on('error', (err) => { console.error("Error reported by the worker", err); // When the worker 'throws' an error, it's caught in this block of the main thread});worker.on('message', (data) => { console.log("Message sent by the worker", data); // This is the message object sent by the worker});worker.on('exit', (code) => { if (code !== 0) { console.error(`Worker stopped and exited with exit code ${code}`); // This is invoked when the worker does its processing, dies, and wants to inform the parent it didn't die peacefully } // Otherwise we're safe to assume that the worker did its job and exited the process gracefully (with process code 0)and we can proceed with our logic in the main thread});

Now for the worker thread script:
// ../workers/generateStatement.js:

import { workerData, parentPort } from 'worker_threads';// import your favorite library to make a DB connection like you do in your normal data layer filesimport { DBConfig } from '../config/DBConfig';// workerData (the cloned copy that was sent by the parent) is here:const { name, statementQuery } = workerData;// Make sure to initialize DB connections or Kafka connections in this worker script if neededlet connection;try {  connection = new Sequelize({ /* your config */ })} catch (dbConnectionError: any) {  throw new Error('Failed to make a DB connection'); // if you wish to throw an error but this is better avoided.  parentPort.postMessage({    success: false,    message: dbConnectionError,     });  process.exit(1) // to terminate the worker thread indicating an error}// Free to proceed and do whatever you want// If the task needs to inform the parent, you have the parentPort to do so// Perform the tedious processing and then gracefully exit with:process.exit(0); // Or die with some hiccups// process.exit(1);

🫡 The Bottom Line

Worker threads in Node.js are powerful tools when used correctly. They allow you to perform CPU-intensive tasks without blocking your main thread, keeping your application responsive. But remember: with great power comes great responsibility. Don’t spin up worker threads for simple I/O tasks — the overhead isn’t worth it.

Use worker threads when you have heavy calculations, complex data transformations, or any task that would make your event loop cry if it had to handle it directly.

Master this pattern, and you’ll unlock a new level of performance for your Node.js applications. Your users (and your server’s CPU) will thank you.

— -

If you found this article helpful, consider following me for more deep dives into JavaScript engineering concepts. I’m passionate about reinventing wheels and understanding what’s happening under the hood.

#nodejs

Frontender`s Spectre

Multithreading in Node.js — Worker Threads: The Heroes of JavaScript Performance