Node.js Internals - Better Dev

1. Architecture: V8 + libuv 2. The Event Loop -- All 6 Phases 3. The Module System (CJS vs ESM) 4. Streams 5. Buffers & Binary Data 6. child_process 7. Worker Threads 8. Cluster & Scaling 9. fs, path, os -- Core Modules 10. HTTP Internals 11. Error Handling & Debugging 12. Performance & Profiling 13. Concurrency Control -- p-limit, Semaphores & Resource Pools

1. Architecture: V8 + libuv

Text
    Your JS Code
         |
    ┌────▼────┐
    │   V8    │  ← Compiles & runs your JavaScript
    │ Engine  │
    └────┬────┘
         |
    ┌────▼────────────────────┐
    │    Node.js Bindings     │  ← C++ glue code (node_api, internal modules)
    │    (C++ / N-API)        │
    └────┬───────────┬────────┘
         |           |
    ┌────▼────┐ ┌────▼────┐
    │  libuv  │ │  c-ares │  ← libuv: async I/O, event loop, thread pool
    │         │ │  zlib   │     c-ares: async DNS, zlib: compression
    │         │ │  openssl│     openssl: TLS/crypto
    └─────────┘ └─────────┘

V8 handles JavaScript execution (parsing, compiling, GC). libuv handles everything that's async and OS-level: file I/O, networking, DNS, timers, the thread pool. They're connected by Node's C++ binding layer.

What libuv Actually Does

Event loop -- the central mechanism that polls for I/O events
Thread pool -- 4 threads (default, configurable via UV_THREADPOOL_SIZE) for blocking ops (fs, DNS, crypto)
epoll/kqueue/IOCP -- OS-specific async I/O mechanisms abstracted by libuv
Handles & Requests -- long-lived (TCP sockets, timers) vs one-shot (fs read) operations

2. The Event Loop -- All 6 Phases

Node's event loop isn't a simple "check for callbacks" loop. It runs through 6 distinct phases in order, each with its own queue.

Text
   ┌───────────────────────────┐
┌─>│         timers             │  ← setTimeout, setInterval callbacks
│  └──────────┬────────────────┘
│  ┌──────────▼────────────────┐
│  │     pending callbacks     │  ← I/O callbacks deferred from previous loop
│  └──────────┬────────────────┘
│  ┌──────────▼────────────────┐
│  │       idle, prepare       │  ← internal use only
│  └──────────┬────────────────┘
│  ┌──────────▼────────────────┐
│  │          poll              │  ← retrieve new I/O events; execute I/O callbacks
│  └──────────┬────────────────┘      (node blocks here when nothing else to do)
│  ┌──────────▼────────────────┐
│  │         check             │  ← setImmediate callbacks
│  └──────────┬────────────────┘
│  ┌──────────▼────────────────┐
│  │     close callbacks       │  ← socket.on('close', ...)
│  └──────────┬────────────────┘
└─────────────┘

Microtask Queues (Between Every Phase)

Between each phase, Node drains two microtask queues:

process.nextTick() queue -- always runs first
Promise microtasks (.then, .catch, .finally, await)

JavaScript
setTimeout(() => console.log("1: timer"), 0);
setImmediate(() => console.log("2: immediate"));
process.nextTick(() => console.log("3: nextTick"));
Promise.resolve().then(() => console.log("4: promise"));

// Output: 3: nextTick → 4: promise → 1: timer → 2: immediate
// (nextTick + promise run before timers phase)

process.nextTick() can starve the event loop. If you recursively call nextTick, the I/O phase never runs. Use setImmediate() when you want to defer but still let I/O happen.

3. The Module System (CJS vs ESM)

CommonJS (CJS) -- The Original

JavaScript
// CJS: synchronous require, module.exports
const fs = require("fs");
const { readFile } = require("fs");

module.exports = { myFunction };
module.exports.myFunction = function() {};
exports.myFunction = function() {};

ES Modules (ESM) -- The Standard

JavaScript
// ESM: async import, export
import fs from "node:fs";
import { readFile } from "node:fs/promises";

export function myFunction() {}
export default function() {}

How Node Decides CJS vs ESM

Text
.cjs file       → always CJS
.mjs file       → always ESM
.js file        → check nearest package.json "type" field:
                     "type": "module"    → ESM
                     "type": "commonjs"  → CJS (default if omitted)

The node: prefix (e.g., import fs from "node:fs") explicitly tells Node this is a built-in module. It prevents name collisions with npm packages and is the recommended way to import built-ins.

Key Differences

Text
Feature          CJS                    ESM
─────────────────────────────────────────────────────
Loading          Synchronous            Asynchronous
Top-level await  ❌                     ✅
this at top      module.exports         undefined
__filename       ✅ available           ❌ use import.meta.url
__dirname        ✅ available           ❌ use import.meta.dirname (v21+)
require()        ✅                     ❌ (use createRequire as escape hatch)
import           ❌ static              ✅
import()         ✅ dynamic             ✅ dynamic
Circular deps    Partial exports        Live bindings (resolved)

import.meta in ESM

JavaScript
// Get __filename and __dirname equivalents in ESM
import { fileURLToPath } from "node:url";
import { dirname } from "node:path";

const __filename = fileURLToPath(import.meta.url);
const __dirname = dirname(__filename);

// Node 21+: import.meta.dirname and import.meta.filename

4. Streams

Streams let you process data piece by piece instead of loading everything into memory. There are 4 types.

Text
Type         Description                  Example
─────────────────────────────────────────────────────
Readable     Data source you read from    fs.createReadStream, http req
Writable     Data sink you write to       fs.createWriteStream, http res
Duplex       Both readable + writable     net.Socket, TCP connection
Transform    Duplex that modifies data    zlib.createGzip, crypto

Piping (The Main Pattern)

JavaScript
import { createReadStream, createWriteStream } from "node:fs";
import { createGzip } from "node:zlib";
import { pipeline } from "node:stream/promises";

// Compress a file: read → gzip → write
await pipeline(
  createReadStream("input.txt"),
  createGzip(),
  createWriteStream("input.txt.gz")
);

Always use pipeline() instead of .pipe(). The old .pipe() method doesn't handle errors or cleanup properly. pipeline propagates errors and destroys streams when done.

Creating a Custom Transform Stream

JavaScript
import { Transform } from "node:stream";

const upperCase = new Transform({
  transform(chunk, encoding, callback) {
    this.push(chunk.toString().toUpperCase());
    callback();
  }
});

process.stdin.pipe(upperCase).pipe(process.stdout);

Readable Streams with async iteration

JavaScript
import { createReadStream } from "node:fs";

const stream = createReadStream("big-file.txt", { encoding: "utf8" });

for await (const chunk of stream) {
  console.log(`Got ${chunk.length} chars`);
}

5. Buffers & Binary Data

A Buffer is a fixed-size chunk of raw memory, outside the V8 heap. It's how Node handles binary data (files, network packets, images).

JavaScript
// Creating buffers
const b1 = Buffer.alloc(10);                  // 10 zero-filled bytes
const b2 = Buffer.from("hello");              // from string (UTF-8)
const b3 = Buffer.from([0x48, 0x69]);       // from byte array → "Hi"
const b4 = Buffer.from("aGVsbG8=", "base64"); // from base64

// Converting
b2.toString("utf8");    // "hello"
b2.toString("hex");     // "68656c6c6f"
b2.toString("base64");  // "aGVsbG8="

// Buffer is a Uint8Array subclass
b2[0];          // 104 (ASCII 'h')
b2.length;      // 5 bytes
b2.slice(0, 2); // Buffer containing "he" (shares memory!)

Buffer.alloc() vs Buffer.allocUnsafe(): alloc zeroes memory (safe), allocUnsafe skips zeroing (faster but may contain old data). Use alloc unless you have a performance reason and will overwrite all bytes.

6. child_process

Run external commands or spawn new Node processes. Four main functions.

JavaScript
import { exec, execFile, spawn, fork } from "node:child_process";

// exec: runs in a SHELL, buffers entire output
exec("ls -la | grep .js", (err, stdout, stderr) => {
  console.log(stdout);
});

// execFile: no shell, safer, still buffers output
execFile("git", ["status"], (err, stdout) => {
  console.log(stdout);
});

// spawn: no shell, STREAMS output (best for large output)
const child = spawn("git", ["log", "--oneline"]);
child.stdout.on("data", (chunk) => console.log(chunk.toString()));
child.on("close", (code) => console.log(`exited ${code}`));

// fork: spawn a new NODE process with IPC channel
const worker = fork("./worker.js");
worker.send({ task: "compute" });
worker.on("message", (result) => console.log(result));

Promise-based (Node 16+)

JavaScript
import { execFile } from "node:child_process";
import { promisify } from "node:util";

const execFileAsync = promisify(execFile);
const { stdout } = await execFileAsync("git", ["status"]);

Text
Function    Shell?   Output     IPC?    Best For
────────────────────────────────────────────────────────
exec        ✅ yes   Buffered   ❌     Quick shell commands
execFile    ❌ no    Buffered   ❌     Running binaries safely
spawn       ❌ no    Streamed   ❌     Large output, long processes
fork        ❌ no    Streamed   ✅     Node-to-Node communication

7. Worker Threads

Worker threads run JavaScript in separate V8 isolates within the same process. Unlike child_process, they share memory via SharedArrayBuffer.

JavaScript
// main.js
import { Worker } from "node:worker_threads";

const worker = new Worker("./heavy-task.js", {
  workerData: { iterations: 1_000_000 }
});

worker.on("message", (result) => console.log("Result:", result));
worker.on("error", (err) => console.error(err));
worker.on("exit", (code) => console.log("Worker exited", code));

JavaScript
// heavy-task.js
import { workerData, parentPort } from "node:worker_threads";

let sum = 0;
for (let i = 0; i < workerData.iterations; i++) {
  sum += Math.sqrt(i);
}
parentPort.postMessage(sum);

When to use Worker Threads vs child_process:

Worker Threads -- CPU-heavy JS work (hashing, parsing, math). Shares memory, lower overhead.
child_process -- Running external binaries, isolating untrusted code, or when you need a full separate process.

8. Cluster & Scaling

The cluster module forks multiple Node processes that share the same server port. This lets you use all CPU cores.

JavaScript
import cluster from "node:cluster";
import { cpus } from "node:os";
import http from "node:http";

if (cluster.isPrimary) {
  const numCPUs = cpus().length;
  console.log(`Primary ${process.pid} forking ${numCPUs} workers`);

  for (let i = 0; i < numCPUs; i++) {
    cluster.fork();
  }

  cluster.on("exit", (worker) => {
    console.log(`Worker ${worker.process.pid} died, restarting...`);
    cluster.fork(); // auto-restart
  });
} else {
  http.createServer((req, res) => {
    res.end(`Handled by worker ${process.pid}\n`);
  }).listen(3000);
}

In production, use PM2 instead of rolling your own cluster code. pm2 start app.js -i max handles forking, restarting, log management, and zero-downtime reloads.

9. fs, path, os -- Core Modules

File System (fs)

JavaScript
import { readFile, writeFile, mkdir, readdir, stat, rm } from "node:fs/promises";

// Always use fs/promises (not callback-based fs)
const content = await readFile("./data.json", "utf8");
const data = JSON.parse(content);

await writeFile("./output.json", JSON.stringify(data, null, 2));
await mkdir("./nested/dirs", { recursive: true });

// List directory
const files = await readdir("./src", { withFileTypes: true });
for (const f of files) {
  console.log(f.name, f.isDirectory() ? "dir" : "file");
}

// Watch for changes
import { watch } from "node:fs/promises";
for await (const event of watch("./src")) {
  console.log(event.eventType, event.filename);
}

Path

JavaScript
import path from "node:path";

path.join("/users", "sean", "docs");     // "/users/sean/docs"
path.resolve("./src", "index.ts");       // absolute path
path.basename("/a/b/file.ts");            // "file.ts"
path.extname("file.ts");                  // ".ts"
path.dirname("/a/b/file.ts");             // "/a/b"
path.parse("/a/b/file.ts");               // { root, dir, base, ext, name }

OS

JavaScript
import os from "node:os";

os.cpus().length;    // number of CPU cores
os.totalmem();       // total RAM in bytes
os.freemem();        // free RAM
os.homedir();        // "/home/sean"
os.tmpdir();         // "/tmp"
os.platform();       // "linux", "darwin", "win32"
os.hostname();       // machine hostname

10. HTTP Internals

Node's built-in http module is what Express, Fastify, and Hono all build on top of.

JavaScript
import http from "node:http";

const server = http.createServer((req, res) => {
  // req is a Readable stream
  // res is a Writable stream
  console.log(req.method, req.url, req.headers);

  if (req.method === "POST") {
    const chunks = [];
    req.on("data", (chunk) => chunks.push(chunk));
    req.on("end", () => {
      const body = Buffer.concat(chunks).toString();
      res.writeHead(200, { "Content-Type": "application/json" });
      res.end(JSON.stringify({ received: body }));
    });
  } else {
    res.writeHead(200, { "Content-Type": "text/plain" });
    res.end("Hello World");
  }
});

server.listen(3000, () => console.log("Listening on :3000"));

Making HTTP Requests (fetch is built-in now)

JavaScript
// Node 18+: global fetch (uses undici under the hood)
const res = await fetch("https://api.github.com/users/sean");
const data = await res.json();

11. Error Handling & Debugging

Error Types

JavaScript
// Operational errors -- expected failures (file not found, network timeout)
try {
  await readFile("missing.txt");
} catch (err) {
  if (err.code === "ENOENT") console.log("File not found");
}

// Programmer errors -- bugs (TypeError, null reference)
// These should crash the process (fix the bug, don't catch it)

Global Error Handlers

JavaScript
// Uncaught exception -- something threw and nobody caught it
process.on("uncaughtException", (err) => {
  console.error("UNCAUGHT:", err);
  process.exit(1); // always exit -- state may be corrupt
});

// Unhandled promise rejection
process.on("unhandledRejection", (reason) => {
  console.error("UNHANDLED REJECTION:", reason);
  process.exit(1);
});

Debugging

Bash
# Built-in inspector (open chrome://inspect)
node --inspect src/index.js

# Break on first line
node --inspect-brk src/index.js

# Print memory usage
node -e "console.log(process.memoryUsage())"

12. Performance & Profiling

process.hrtime.bigint() for Benchmarking

JavaScript
const start = process.hrtime.bigint();
// ... do work ...
const end = process.hrtime.bigint();
console.log(`Took ${Number(end - start) / 1e6}ms`);

Performance Hooks

JavaScript
import { performance, PerformanceObserver } from "node:perf_hooks";

performance.mark("start");
// ... do work ...
performance.mark("end");
performance.measure("my-op", "start", "end");

const [entry] = performance.getEntriesByName("my-op");
console.log(`Took ${entry.duration}ms`);

CPU Profiling

Bash
# Generate a CPU profile (load it in Chrome DevTools)
node --cpu-prof src/index.js

# Heap snapshot for memory leaks
node --heap-prof src/index.js

# Trace garbage collection
node --trace-gc src/index.js

Common performance killers in Node:

Synchronous I/O (fs.readFileSync) on the main thread
Blocking the event loop with CPU-heavy work (use worker threads)
Creating too many closures or callbacks (memory pressure)
Not streaming large payloads (buffering entire files in memory)

13. Concurrency Control -- p-limit, Semaphores & Resource Pools

Why Node Needs Concurrency Control

Node is single-threaded, but it handles many concurrent I/O operations at once. That's usually a strength -- until you accidentally launch 10,000 file reads or HTTP requests simultaneously and overwhelm the system.

JavaScript
// This looks innocent but will crash on large arrays
const files = await getFileList(); // 10,000 files
const results = await Promise.all(
  files.map(f => readFile(f, "utf8"))
); // BOOM: EMFILE -- too many open files

What goes wrong when you don't limit concurrency:

EMFILE -- OS file descriptor limit (typically 1024 on Linux)
ECONNRESET / ETIMEDOUT -- too many TCP connections at once
API rate limits -- external services return 429 Too Many Requests
Thread pool starvation -- libuv's thread pool only has 4 threads by default, so 10,000 fs.readFile calls queue up massively
Memory exhaustion -- thousands of in-flight buffers pile up in memory

The core idea: Instead of Promise.all(everything), run at most N operations at a time. When one finishes, start the next. This is concurrency limiting.

p-limit -- Concurrency Limiter

The p-limit npm package is the standard tool for this. It wraps async functions so that only N run concurrently.

JavaScript
import pLimit from "p-limit";
import { readFile } from "node:fs/promises";

const limit = pLimit(10); // max 10 concurrent operations

const files = await getFileList(); // 1000 files

// All 1000 are "scheduled" but only 10 run at a time
const results = await Promise.all(
  files.map(f => limit(() => readFile(f, "utf8")))
);

Build p-limit From Scratch

Understanding how it works under the hood is more valuable than just using the package. The core is a queue + counter.

JavaScript
function pLimit(concurrency) {
  let active = 0;
  const queue = [];

  function next() {
    if (active >= concurrency || queue.length === 0) return;
    active++;
    const { fn, resolve, reject } = queue.shift();
    fn().then(resolve, reject).finally(() => {
      active--;
      next();
    });
  }

  return function limit(fn) {
    return new Promise((resolve, reject) => {
      queue.push({ fn, resolve, reject });
      next();
    });
  };
}

// Usage -- identical to the npm package
const limit = pLimit(5);
const results = await Promise.all(
  urls.map(url => limit(() => fetch(url)))
);

How it works step by step:

limit(fn) pushes the task onto a queue and returns a promise
next() checks if we're under the concurrency cap
If yes, it dequeues a task, runs it, and increments active
When the task finishes (.finally), it decrements active and calls next() again
This creates a self-sustaining pipeline -- always N tasks running until the queue is empty

Real Example: Reading 1000 Files with Concurrency of 10

JavaScript
import { readFile, readdir } from "node:fs/promises";
import { join } from "node:path";

function pLimit(concurrency) {
  let active = 0;
  const queue = [];
  function next() {
    if (active >= concurrency || queue.length === 0) return;
    active++;
    const { fn, resolve, reject } = queue.shift();
    fn().then(resolve, reject).finally(() => { active--; next(); });
  }
  return (fn) => new Promise((resolve, reject) => {
    queue.push({ fn, resolve, reject });
    next();
  });
}

async function processAllFiles(dir) {
  const entries = await readdir(dir);
  const limit = pLimit(10); // only 10 files open at a time

  const contents = await Promise.all(
    entries.map(name =>
      limit(() => readFile(join(dir, name), "utf8"))
    )
  );

  console.log(`Read ${contents.length} files`);
  return contents;
}

Semaphore Pattern

A semaphore is the classic computer science primitive behind concurrency limiting. It's a counter with two operations: acquire() (decrement and wait if zero) and release() (increment and wake up a waiter). This gives you more control than p-limit when you need to hold a resource across multiple async steps.

JavaScript
class Semaphore {
  constructor(max) {
    this.max = max;
    this.active = 0;
    this.waiters = [];
  }

  async acquire() {
    if (this.active < this.max) {
      this.active++;
      return;
    }
    // At capacity -- wait until someone releases
    await new Promise(resolve => this.waiters.push(resolve));
  }

  release() {
    if (this.waiters.length > 0) {
      // Wake up the next waiter (they get our slot)
      const next = this.waiters.shift();
      next();
    } else {
      this.active--;
    }
  }
}

// Usage
const sem = new Semaphore(5);

async function doWork(item) {
  await sem.acquire();
  try {
    // ... do async work with the resource ...
    await processItem(item);
  } finally {
    sem.release(); // always release, even on error
  }
}

// Launch many tasks -- only 5 run concurrently
await Promise.all(items.map(item => doWork(item)));

p-limit vs Semaphore -- when to use which:

p-limit -- when each task is a single async function call. Simpler API.
Semaphore -- when you need to hold the "slot" across multiple await points (e.g., acquire a DB connection, run 3 queries, then release). The acquire/release pattern gives you explicit control.

Resource Pool

A resource pool is a pre-allocated set of reusable resources (database connections, worker threads, etc). Instead of creating a new connection for every request (expensive), you check one out from the pool and return it when done.

Why pooling matters: Creating a PostgreSQL connection takes ~50-100ms (TCP handshake + TLS + auth). Reusing one from a pool takes ~0ms. With 1000 requests/sec, that's the difference between working and melting.

Build a Simple Generic Resource Pool From Scratch

JavaScript
class Pool {
  constructor({ create, destroy, max = 10 }) {
    this.create = create;     // factory: () => Promise<Resource>
    this.destroy = destroy;   // cleanup: (resource) => Promise<void>
    this.max = max;
    this.available = [];      // idle resources ready for reuse
    this.size = 0;            // total created (available + in-use)
    this.waiters = [];        // callers waiting for a resource
  }

  async acquire() {
    // 1. Reuse an idle resource if available
    if (this.available.length > 0) {
      return this.available.pop();
    }

    // 2. Create a new one if under the limit
    if (this.size < this.max) {
      this.size++;
      return await this.create();
    }

    // 3. At capacity -- wait for one to be released
    return new Promise(resolve => this.waiters.push(resolve));
  }

  release(resource) {
    if (this.waiters.length > 0) {
      // Hand directly to a waiting caller
      const next = this.waiters.shift();
      next(resource);
    } else {
      // No one waiting -- put back in the idle list
      this.available.push(resource);
    }
  }

  async drain() {
    // Destroy all idle resources (for graceful shutdown)
    for (const r of this.available) {
      await this.destroy(r);
    }
    this.available = [];
  }
}

Using the Pool

JavaScript
const pool = new Pool({
  create: async () => {
    console.log("Creating new DB connection...");
    return await connectToDatabase();
  },
  destroy: async (conn) => {
    await conn.close();
  },
  max: 20
});

async function handleRequest(req) {
  const conn = await pool.acquire();
  try {
    const rows = await conn.query("SELECT * FROM users WHERE id = $1", [req.userId]);
    return rows;
  } finally {
    pool.release(conn); // always return to pool
  }
}

generic-pool npm Package

For production, the generic-pool package adds idle timeouts, validation, min/max sizing, and priority queuing.

JavaScript
import { createPool } from "generic-pool";

const pool = createPool({
  async create() {
    return await connectToDatabase();
  },
  async destroy(conn) {
    await conn.close();
  }
}, {
  min: 2,               // keep 2 idle connections warm
  max: 20,              // never exceed 20 connections
  idleTimeoutMillis: 30000  // destroy idle connections after 30s
});

const conn = await pool.acquire();
try {
  await conn.query("...");
} finally {
  pool.release(conn);
}

// Graceful shutdown
await pool.drain();
pool.clear();

Database Connection Pools (pg.Pool)

Most database drivers have built-in pooling. PostgreSQL's pg module is the canonical example.

JavaScript
import pg from "pg";

const pool = new pg.Pool({
  host: "localhost",
  port: 5432,
  database: "myapp",
  user: "sean",
  password: process.env.DB_PASSWORD,
  max: 20,                // max connections in pool
  idleTimeoutMillis: 30000, // close idle connections after 30s
  connectionTimeoutMillis: 2000 // fail if can't connect in 2s
});

// Option 1: auto-acquire and release
const { rows } = await pool.query("SELECT * FROM users WHERE id = $1", [42]);

// Option 2: manual acquire for transactions
const client = await pool.connect();
try {
  await client.query("BEGIN");
  await client.query("UPDATE accounts SET balance = balance - $1 WHERE id = $2", [100, 1]);
  await client.query("UPDATE accounts SET balance = balance + $1 WHERE id = $2", [100, 2]);
  await client.query("COMMIT");
} catch (err) {
  await client.query("ROLLBACK");
  throw err;
} finally {
  client.release(); // return to pool, NOT close
}

// Graceful shutdown
await pool.end();

Always release/return connections. If you acquire a connection and forget to release it (e.g., an error path skips the release), the pool eventually runs out. Always use try/finally to guarantee release.

Backpressure

Backpressure is what happens when a producer generates data faster than a consumer can handle it. Without backpressure, the in-between buffer grows without bound and you run out of memory.

Text
Producer (fast)  →  Buffer (growing!)  →  Consumer (slow)
  1000 items/s        [####......]          100 items/s

Without backpressure: buffer grows forever → OOM crash
With backpressure:    producer slows down to match consumer

Streams Already Handle Backpressure

Node streams have built-in backpressure via highWaterMark. When the internal buffer fills up, the readable stream pauses automatically.

JavaScript
import { createReadStream, createWriteStream } from "node:fs";

const src = createReadStream("huge-file.csv");
const dest = createWriteStream("output.csv");

// .write() returns false when the internal buffer is full
src.on("data", (chunk) => {
  const canContinue = dest.write(chunk);
  if (!canContinue) {
    src.pause();                 // stop reading until drain
    dest.once("drain", () => src.resume());
  }
});

// pipeline() handles all this automatically
import { pipeline } from "node:stream/promises";
await pipeline(src, dest); // backpressure built in

Backpressure with Queues and p-limit

When you're not using streams (e.g., processing API responses, database rows), combine p-limit with controlled batching to create backpressure.

JavaScript
import { createReadStream } from "node:fs";
import { createInterface } from "node:readline";

function pLimit(concurrency) {
  let active = 0;
  const queue = [];
  function next() {
    if (active >= concurrency || queue.length === 0) return;
    active++;
    const { fn, resolve, reject } = queue.shift();
    fn().then(resolve, reject).finally(() => { active--; next(); });
  }
  return (fn) => new Promise((resolve, reject) => {
    queue.push({ fn, resolve, reject });
    next();
  });
}

// Process a huge CSV: read rows as a stream, but limit DB writes
const limit = pLimit(20); // max 20 concurrent DB writes
const promises = [];

const rl = createInterface({
  input: createReadStream("million-rows.csv")
});

for await (const line of rl) {
  const record = parseCsvLine(line);
  promises.push(
    limit(() => db.query("INSERT INTO records VALUES ($1, $2)", [record.id, record.name]))
  );
}

await Promise.all(promises);
console.log("All rows inserted");

The backpressure trifecta:

Streams -- use pipeline() and let Node handle it automatically
p-limit -- cap concurrent async operations (API calls, DB writes)
Batching -- accumulate items and flush in groups (e.g., bulk INSERT instead of row-by-row)

Node.js -- Under the Hood

Table of Contents