I wrote the same email parser three times — and the hard part was making them fail identically
Parsing MIME in Node, Python, and Go is the easy part. Getting all three to break the same way on the same broken email — and to say so with the same hash — is where the real work is. Here's how, and the two lessons that transfer to any polyglot library.
Email is a swamp. Not the protocol — the content. The moment you accept inbound mail from the open internet, you stop receiving the tidy RFC 5322 messages from the spec and start receiving whatever thirty years of mail clients, marketing tools, and misconfigured servers actually emit. Base64 that isn’t padded. Content-Type headers with a charset that doesn’t exist. Boundaries that never close. winmail.dat. 8-bit bytes in a header that swears it’s ASCII.
So you write a MIME parser. Then, because your platform runs email through more than one runtime, you write it again in another language. And that’s where it gets interesting — not because parsing is hard the second time, but because two parsers agreeing on the easy emails is worthless if they quietly disagree on the broken ones.
This is the story of building @mailkite/mail-parse — a streaming MIME parser that now runs as one engine model in Node/TypeScript, Python, and Go — and why the feature I’m proudest of isn’t that all three parse email correctly, but that all three fail correctly, and produce a byte-identical fingerprint when they do.
Why three languages at all
The honest answer: because the email doesn’t get to pick where it lands.
Our inbound path parses raw MIME at the SMTP edge, which is a Node process on a plain VPS — no CPU cap, the natural home for streaming a 20 MB message straight to object storage without buffering it. But the same parsed-message shape also has to be producible inside a Cloudflare Worker (buffered, a different runtime), and the SDKs developers actually call live in a spread of languages.
If those parsers drift, you get the worst class of bug: an email that produces one JSON shape in the Node path and a subtly different one somewhere else. No stack trace. No crash. Just a support ticket that says “the attachment is missing” for one customer and no one else — and no way to reproduce it, because the email that triggered it is gone.
So “three languages” wasn’t a flex. It was a constraint that forced a discipline: there is one parser, expressed three times, and I need a mechanical way to prove they’re the same.
Making success identical: one golden, generated from the reference
The first half is conformance testing, and the trick is to have a single source of truth rather than three hand-written expectation sets that rot independently.
The TypeScript package is the reference implementation. A script runs it over the 15 gold .eml fixtures — the pathological ones, collected from real breakage — and serializes the full parsed result to a parse_golden.json: subject, from, every recipient, the text body, the HTML body, every attachment’s metadata, and the diagnostics.
Then Python and Go each assert field-for-field against that same file:
# tests/test_streaming.py
class TestTsParity(unittest.TestCase):
def test_matches_ts_reference(self):
golden = json.load(open("tests/golden/parse_golden.json"))
for name, expected in golden.items():
msg = parse(read_fixture(name))
self.assertEqual(msg.subject, expected["subject"])
self.assertEqual(msg.from_.email, expected["from"]["email"])
self.assertEqual(len(msg.to), len(expected["to"]))
self.assertEqual([a.filename for a in msg.attachments],
[a["filename"] for a in expected["attachments"]])
# ...text, html, diagnostics — the whole shape
// parse_test.go
func TestTsParity(t *testing.T) {
golden := loadGolden(t, "testdata/golden/parse_golden.json")
for name, want := range golden {
got := Parse(readFixture(t, name))
if got.Subject != want.Subject { t.Errorf("%s: subject", name) }
// ...field-for-field over all 15 fixtures
}
}
The nice property: the golden is generated, not authored. When the reference parser’s behavior changes, the golden regenerates, and the Python and Go tests fail until they match. There’s no world where the three implementations silently diverge on a fixture and everyone’s test suite stays green — the fixtures include TypeScript’s exact 8-bit-header quirks, so “close enough” doesn’t pass.
That covers the emails that parse. It’s the emails that don’t that taught me the real lesson.
Making failure identical: a fingerprint, not a stack trace
When a MIME parser hits something it can’t cleanly handle, the naive move is to throw. That’s wrong for inbound email for two reasons: one broken part shouldn’t sink the whole message, and — more subtly — an exception is a terrible unit of aggregation. Ten thousand deployments hitting the same malformed-boundary bug should be one signal, not ten thousand log lines with slightly different byte offsets.
So the parser never throws on bad input. Every degradation emits a typed diagnostic, and — the part I want to show you — a failure signature: a deterministic, PII-free hash of the structural features of what broke.
interface FailureSignature {
hash: string; // = fnv1a(canonicalize(features))
features: {
libVersion: string;
scope: 'envelope' | 'structure' | 'part'; // header? assembly? one leaf?
stage: Phase; // ingest | normalize | decode | structure | extract | enrich
diagnosticCodes: string[]; // e.g. ["BOUNDARY_NOT_FOUND"]
contentType?: string; // the offending leaf's declared type
transferEncoding?: string;
byteSignature?: string; // hex magic of the first N bytes — STRUCTURE, never content
headerNames?: string[]; // header NAMES present — never their values
mailerFamily?: string; // X-Mailer normalized → "Outlook/16"
structurePath?: string; // "multipart/mixed>multipart/alternative>application/ms-tnef"
};
}
Notice what’s not in there: no subject, no addresses, no body bytes. The signature describes the shape of a failure — a base64 attachment that won’t decode, an HTML part with a bogus charset, a winmail.dat at a particular position in the tree — using only structural facts. That’s what makes it safe to emit from a library running on other people’s mail: the fingerprint can leave the box; the email never does.
And because it’s a pure hash of canonicalized structural features, the same broken email produces the same hash everywhere. Which brings us to the second conformance proof — the one that actually matters:
# tests/test_signature.py — hashes captured from the TS reference, asserted in Python
def test_signature_parity(self):
for case in load_captured_ts_hashes():
sig = compute_signature(case["features"])
self.assertEqual(sig.hash, case["expected_hash"]) # byte-identical FNV-1a
I picked FNV-1a deliberately — it’s a handful of lines, no dependencies, and trivially portable, so “compute this hash” means the same thing in three languages without pulling in a crypto library or hoping two implementations of something fancier agree on edge cases. The canonicalization (lowercase types, strip noisy params, bucket byte-signatures to known magic numbers, reduce the mailer to a family) is the real work; the hash is just the cheap, deterministic seal on top.
The payoff: a malformed email that breaks in the Node edge and the same email replayed through the Python SDK don’t just both fail — they report the same signature, land in the same dedup bucket, and (when a threshold is crossed) file one GitHub issue with a structural description precise enough to write a regression test from. Cross-language observability falls out of cross-language determinism for free.
The thing I’d tell past-me
Two lessons, both slightly counterintuitive:
-
Generate your conformance oracle; don’t hand-write it per language. One reference implementation plus a generated golden beats three lovingly-maintained expectation files that drift the day you’re not looking. The languages get to have different idioms internally — synchronous middleware in Python and Go, streams in Node — as long as they’re forced through the same external truth.
-
Design your failures to be aggregatable, and you get portability and privacy as side effects. The instinct is to make errors rich — full context, the offending bytes, a stack trace. For a library that runs on data you’re not allowed to see, the opposite is right: make errors structural and hashable. A PII-free fingerprint is the thing you can compare across languages, dedup across deployments, and safely emit from someone else’s process. Determinism is what makes it identical across three parsers; structure-only is what makes it safe to emit at all.
The parser is MIT-licensed and lives here — Node/TS, Python, Go. It grew out of building MailKite (inbound email → webhook), which is where the appetite for “the same broken email must behave the same everywhere” came from — but the parser stands on its own if you just need to turn MIME into clean, typed JSON without a service in the loop.
If you’ve fought the email-content swamp, I’d genuinely like to hear which message finally made you write your own parser. Mine was a winmail.dat inside a multipart/mixed that three off-the-shelf libraries each mangled a different way. That’s a fingerprint I’ll never forget.