Karlheinz Brandenburg

The engineer who compressed a revolution into three letters

By VastBlue Editorial · 2026-03-26 · 22 min read

Series: The Inventors · Episode 9

The Size Problem

In 1987, one minute of CD-quality stereo audio required approximately ten megabytes of storage. A typical album — fifty minutes — consumed half a gigabyte. A standard hard drive of the era held twenty to forty megabytes. Internet connections, where they existed, operated at 2,400 to 9,600 bits per second. Transmitting a single song over a modem would have taken several hours. The mathematics were clear: digital audio, in its raw form, was too large to store conveniently and too large to transmit at all.

This was not an abstract problem. The music industry was transitioning from analogue to digital. The compact disc, introduced commercially in 1982, had demonstrated that consumers would embrace digital audio. But CD-quality sound — stereo, sampled at 44,100 times per second, with sixteen bits of resolution per sample — generated data at 1,411 kilobits per second. The fidelity was superb. The data rate was punishing. Anyone who wanted to move audio beyond the physical disc — over a telephone line, across a computer network, onto a hard drive — hit the same wall. The bits were too many.

At the Fraunhofer Institute for Integrated Circuits (IIS) in Erlangen, a mid-sized Bavarian city better known for its university and its beer than for its contribution to global culture, a team of audio engineers led by Karlheinz Brandenburg were working on this problem. Brandenburg had trained as an electrical engineer and mathematician at the University of Erlangen-Nuremberg, where his doctoral thesis explored the application of psychoacoustic models to digital audio coding. The thesis title was dry. Its implications were not.

The question Brandenburg asked was not how to make audio data smaller through conventional means — not how to find more efficient binary representations or remove statistical redundancies, though those techniques played supporting roles. His question was more radical: what if the audio signal contained information that no human being could hear? And what if you could identify that information precisely, frame by frame, and throw it away?

What You Cannot Hear

Brandenburg's approach was rooted in a field called psychoacoustics — the science of how humans actually perceive sound, as opposed to how sound objectively exists as a physical phenomenon. The human auditory system has well-documented limitations, and Brandenburg's insight was that these limitations could be systematically exploited.

The foundation is the structure of the inner ear itself. The cochlea — the spiral-shaped organ that converts sound vibrations into neural signals — is lined with approximately 3,500 inner hair cells arranged along its length. These cells are not uniformly sensitive. They respond to specific frequency ranges, organised tonotopically: high frequencies near the base, low frequencies near the apex. But the frequency ranges overlap, and the overlapping regions create what psychoacousticians call critical bands — frequency intervals within which the ear integrates sound energy into a single percept. There are roughly twenty-five critical bands spanning the range of human hearing, from about 20 Hz to 20 kHz. The bands are narrow at low frequencies (about 100 Hz wide below 500 Hz) and progressively wider at higher frequencies (several thousand Hz wide above 10 kHz). This structure means the ear has high frequency resolution where it matters most — in the range of human speech and melodic content — and lower resolution at the extremes.

Within and across these critical bands, the ear exhibits three forms of masking that Brandenburg exploited relentlessly.

First: simultaneous frequency masking. A loud sound at a given frequency raises the hearing threshold for quieter sounds at nearby frequencies. If a cymbal crash at 8 kHz is sufficiently loud, your ear cannot hear a quiet violin note at 7.5 kHz that occurs at the same moment. The violin note is physically present in the audio signal. It is perceptually absent from the listener's experience. The masking curve is asymmetric — a loud tone masks sounds at higher frequencies more effectively than sounds at lower frequencies, creating a skewed shadow. The louder the masking tone, the wider the shadow extends. A practical example: in a full orchestral passage, a fortissimo brass section will mask substantial amounts of string and woodwind detail at nearby frequencies. The masked detail is real sound, faithfully captured by the microphone. But no listener in any concert hall or living room would hear it. It can be discarded without perceptual loss.

Second: temporal masking. A loud sound briefly masks quieter sounds that occur immediately before and after it. When a drummer strikes a snare, the sharp transient raises the hearing threshold for several milliseconds afterward — a phenomenon called post-masking or forward masking. The faint room ambience, the sympathetic resonance of other drum heads, the subtle rattle of a nearby cymbal stand: all are rendered inaudible for a brief window after the snare hit. More surprisingly, there is also pre-masking — a quieter sound occurring up to about five milliseconds before a loud transient can be masked retroactively, as the brain requires processing time to register the earlier sound. The practical implication is that around every sharp transient in a piece of music — every drum hit, every plucked guitar string, every consonant in a vocal line — there is a temporal window where the encoder can discard information freely.

Third: the absolute threshold of hearing. At every frequency, there is a minimum sound pressure level below which sound is simply inaudible, regardless of masking. These thresholds have been measured precisely across thousands of human subjects and compiled into standardised equal-loudness contours (the ISO 226 standard). The threshold is lowest — the ear is most sensitive — between 2 kHz and 5 kHz, the frequency range of human speech. It rises steeply at very low and very high frequencies. A 50 Hz tone must be roughly 40 decibels louder than a 3 kHz tone to be perceived at the same loudness. Any audio content below the absolute threshold at any frequency can be discarded with zero perceptual consequence.

Inside the Codec

Brandenburg's psychoacoustic model was the conceptual breakthrough. But turning that model into a working codec required solving a cascade of engineering problems — the kind of problems that separate a good idea from a functioning technology.

The encoder's first operation is to divide the incoming audio into short frames — typically 1,152 samples, or about 26 milliseconds of audio at a 44.1 kHz sampling rate. Each frame is then transformed from the time domain (a waveform showing amplitude over time) into the frequency domain (a spectrum showing which frequencies are present and at what energy levels) using the Modified Discrete Cosine Transform, or MDCT. The MDCT is a particular flavour of transform chosen because it avoids blocking artefacts — the audible clicks and discontinuities that occur at frame boundaries when using simpler transforms. It does this by overlapping adjacent frames: each frame shares half its samples with the preceding frame and half with the following frame, and the MDCT guarantees that when the decoder reverses the transform, the overlapping regions cancel perfectly. The result is a seamless reconstruction, frame to frame, with no audible seams.

Once the audio is in the frequency domain, the psychoacoustic model goes to work. For each frame, it calculates the masking threshold — a curve across all frequencies showing the minimum sound level that a listener could perceive, given the specific combination of loud and quiet components in that frame. Every frequency component that falls below the masking threshold can be quantised more coarsely or discarded entirely. Every component that rises above the threshold must be encoded with enough precision to avoid audible distortion.

The bit allocation algorithm then distributes the available bits across the frequency bands within each frame. This is where the quality-bitrate tradeoff becomes concrete. At 320 kbps, there are enough bits to encode every perceptually significant component with high precision. The result is, for most listeners on most playback equipment, indistinguishable from the original CD. At 192 kbps, the encoder begins making harder choices — some frequency bands receive fewer bits, introducing quantisation noise that remains below the masking threshold for most material but may become faintly audible on demanding passages. At 128 kbps — the bitrate that became the de facto consumer standard — the encoder is working aggressively: significant quantisation is applied, and the psychoacoustic model must be precise to keep the resulting noise inaudible. At 64 kbps, the quality degrades noticeably — the sound acquires a characteristic watery, swirling quality as the encoder strips away information that the psychoacoustic model predicts is marginal but that trained ears can detect.

One of the cleverest engineering features of MP3 is the bit reservoir. Not every frame of audio requires the same number of bits. A passage of solo piano, with its complex harmonics and wide dynamic range, demands more bits than a passage of spoken word or silence. The bit reservoir allows the encoder to borrow bits from frames that need fewer — quiet passages, pauses, simple tonal content — and spend them on frames that need more. The result is that the effective bitrate fluctuates frame by frame, even though the average bitrate remains constant. A four-second drum solo might receive 180 kbps from a 128 kbps stream, while the four seconds of ambient noise that follow might be encoded at 80 kbps. The borrowing is invisible to the listener but dramatically improves perceived quality on demanding material.

576 Frequency lines per channel in each MP3 frame — The MDCT divides each frame into 576 spectral components per channel, each of which must be individually quantised based on the psychoacoustic model's masking predictions.

Tom's Diner

Brandenburg's test track — the song he used more than any other to evaluate compression quality — was Suzanne Vega's "Tom's Diner." It was an unusual choice for an audio codec benchmark. No drums, no bass, no loud transients. Just a solo female voice, recorded with minimal processing, in a way that preserved every subtle dynamic: breath sounds, the slight room tone of the recording environment, the microphone's proximity effect on plosive consonants. These are exactly the characteristics that lossy compression damages most visibly.

The choice was technically astute. Dense, loud music is forgiving of compression artefacts — a wall of distorted electric guitars hides a multitude of encoding sins, because the masking is so thorough that the encoder's noise is buried beneath the signal's own energy. A solo a cappella vocal, by contrast, has almost nowhere to hide artefacts. The masking thresholds are low because the signal itself is sparse. Any quantisation noise — the subtle distortion introduced when frequency components are encoded with insufficient precision — sits exposed in the quiet spaces between the vocal's harmonics. If the codec could handle "Tom's Diner" transparently, it could handle anything.

Brandenburg listened to "Tom's Diner" thousands of times. Not casually, but analytically — on studio-grade headphones, in a treated listening room, comparing the original CD audio against compressed versions at various bitrates, adjusting his psychoacoustic model's parameters each time he heard an artefact. A slight warble in the upper harmonics — the encoding was smearing the natural vibrato. A metallic shimmer on the sibilants — the high-frequency quantisation was too coarse for the sharp spectral energy in 's' and 't' sounds. A subtle loss of spatial depth — the stereo encoding was discarding phase information that the ear uses to localise a voice in a room. Each detected artefact sent him back to the model to adjust a threshold, refine a masking curve, or add a special case.

The iterative process was painstaking and, by Brandenburg's own account, obsessive. He would adjust a parameter in the psychoacoustic model, re-encode "Tom's Diner," listen to the result, compare it to the original, identify any remaining artefact, hypothesise which parameter of the model was responsible, adjust it, re-encode, and listen again. Hundreds of cycles. The engineering was not mathematical in the abstract sense — it was mathematical in the empirical sense. The model's parameters were tuned not by theory alone but by the continuous feedback loop between prediction and perception. The human ear was both the specification and the test suite.

I knew we had it right when I could no longer hear the difference between the compressed file and the CD. If I could not hear it, with trained ears in a studio environment, no consumer would hear it either.
Karlheinz Brandenburg, paraphrased from interviews

The Standards War

A codec that works brilliantly in a laboratory is a curiosity. A codec that becomes an international standard is a foundation. Brandenburg understood from the beginning that standardisation was essential — without it, the MP3 would remain a proprietary format used by a handful of Fraunhofer licensees. With it, the format could become the universal language of digital audio. The path to standardisation, however, was political as much as it was technical.

The Moving Picture Experts Group — MPEG — was established under the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC) to develop standards for coded representation of audio and video. When MPEG began its work on audio coding in the late 1980s, three competing proposals emerged. Musicam (Masking pattern adapted Universal Subband Integrated Coding And Multiplexing), developed primarily by Philips and the French research institute CCETT, used a simpler subband coding approach — dividing the audio spectrum into thirty-two equal-width frequency bands and encoding each band separately. It was computationally cheaper and was backed by the considerable institutional weight of Philips, which had co-invented the CD and had deep interests in consumer audio hardware. ASPEC (Adaptive Spectral Perceptual Entropy Coding) was a more complex transform-based coder developed jointly by Fraunhofer, AT&T Bell Labs, and Thomson. It delivered better audio quality at lower bitrates but demanded more computational power.

The political dynamics were intense. Philips lobbied aggressively for Musicam, arguing that its lower computational requirements made it more practical for hardware implementation — a genuine concern in an era when dedicated audio decoder chips needed to be cheap enough for consumer devices. The Fraunhofer-AT&T-Thomson coalition argued that ASPEC's superior quality at low bitrates was worth the additional complexity, and that Moore's Law would soon make the computational cost irrelevant. National interests added another layer: Philips was Dutch, CCETT was French, Fraunhofer was German, AT&T and Bell Labs were American. The standard was, in part, a proxy for which national research establishments would shape the future of digital audio.

MPEG resolved the dispute through a compromise that satisfied no one completely but proved remarkably durable. The standard would define three layers of increasing complexity and quality. Layer I, the simplest, was essentially Musicam with minor modifications — suitable for applications where computational simplicity mattered and bitrates were generous. Layer II was a refined version of Musicam that became the standard for digital radio broadcasting (DAB) and Video CD. Layer III — MP3 — incorporated the advanced psychoacoustic modelling and transform coding from ASPEC, merged with elements of Musicam's subband filter architecture. It was the most computationally demanding but delivered the best quality at low bitrates.

The layered compromise had an unintended consequence. Because Layer II was computationally simpler and "good enough" for broadcasting, the broadcast industry adopted it. Layer III was positioned for applications where bitrate was scarce and quality mattered — which, in 1993, seemed like a niche. ISDN lines. Satellite links. Telecommunications. No one on the MPEG committee foresaw that within six years, tens of millions of consumers would be encoding and trading Layer III files on personal computers over the internet. The niche codec became the world's most widely used audio format not because the standards committee chose it, but because the internet created exactly the conditions — scarce bandwidth, abundant computing power — that Layer III was designed for.

The European Patent

The European patent, EP 0402973, was filed on July 14, 1989. The codec was standardised as part of the MPEG-1 standard in 1993 and the MPEG-2 standard in 1995. At 128 kbps — the bitrate that would become the de facto standard for consumer MP3 files — the compression ratio was approximately 11:1. A four-minute song that occupied 40 megabytes as raw CD audio became a 3.5-megabyte MP3 file. At 320 kbps, the quality was, for most listeners on most equipment, indistinguishable from the original CD.

1:11 Compression ratio at 128 kbps — An MP3 file at standard quality was approximately one-eleventh the size of equivalent CD audio. A four-minute song went from 40 megabytes to 3.5 megabytes — small enough to download, store, and share.

Fraunhofer licensed the MP3 technology to hardware manufacturers, software developers, and device makers worldwide. The licensing structure was tiered and, for an applied research institute, remarkably sophisticated. Encoder licences — the right to build software or hardware that creates MP3 files — cost more than decoder licences, on the reasoning that encoding was the value-creating act while decoding was the consumption act. Per-unit fees were modest: typically $0.75 per unit for decoders in consumer devices, rising to several dollars for professional encoding software. But the volume was extraordinary. Every portable music player, every car stereo that played MP3 files, every software media player, every mobile phone with music playback capability required a licence from Fraunhofer.

The royalties accumulated methodically. By the early 2000s, Fraunhofer IIS was collecting over $100 million per year in MP3 licensing revenue. The technology had cost a few million euros to develop over roughly a decade. The return on investment was staggering by any standard, and extraordinary for a publicly funded research institute. The revenue funded further audio research — including the development of AAC (Advanced Audio Coding), MP3's more sophisticated successor, which became the default format for Apple's iTunes Store and later for most streaming services. When the core MP3 patents expired in 2017, Fraunhofer had collected an estimated $600 million to $1 billion in cumulative royalties. The institute quietly stopped licensing the format and recommended AAC as the preferred codec for new applications.

The Revolution They Did Not Intend

Brandenburg and the Fraunhofer team designed MP3 for professional applications: broadcasting, telecommunications, digital audio workstations. They envisioned it as a tool for radio stations transmitting audio over ISDN lines, for telecommunications companies reducing bandwidth costs, for studios archiving recordings more efficiently. The consumer market was an afterthought.

The first crack in this assumption came not from a corporate strategy but from a graduate student. In 1997, Tomislav Uzelac, a Croatian developer studying in Germany, created the AMP MP3 Playback Engine — one of the first software MP3 players for personal computers. Two American developers, Justin Frankel and Dmitry Boldyrev, took AMP's codebase and built Winamp, released in 1997 with the tagline "it really whips the llama's ass." Winamp was free, fast, skinnable, and had a visualiser that pulsed with the music. It made MP3 playback trivial. Within two years, it had tens of millions of users. The professional codec had become a consumer toy.

Then came Napster. In June 1999, a nineteen-year-old college student named Shawn Fanning released a peer-to-peer file-sharing application that used the MP3 format as its native currency. Napster's design was simple: a central directory of files that users on the network had made available for sharing, combined with a direct file transfer mechanism between users. You searched for a song title, Napster showed you which users had it, and you downloaded it directly from their computer. The central directory made discovery easy. The peer-to-peer transfer made it scalable. The MP3 format made the files small enough to download in minutes rather than hours.

The adoption was explosive. Napster reached 80 million registered users at its peak — a growth rate that would not be matched until Facebook. On college campuses, where high-speed Ethernet connections were available, music sharing became the dominant use of network bandwidth. Entire music libraries were assembled in weeks. Albums that had not yet been released in stores appeared on Napster days after being mastered. The recording industry's business model — predicated on controlling the physical distribution of music through CDs, priced at $15 to $18 per disc — did not collapse gradually. It shattered.

$23.8B → $15B Global recorded music revenue, 1999 to 2010 — A 37% decline in a decade. The music industry lost more revenue than the GDP of some small nations, and would not recover to 1999 levels until 2021.

The Recording Industry Association of America (RIAA) responded with litigation. In December 1999, the RIAA sued Napster for contributory and vicarious copyright infringement. The Ninth Circuit Court of Appeals upheld an injunction against Napster in February 2001, and the service was shut down in July 2001. But the legal victory was pyrrhic. By the time Napster closed, its successors — Kazaa, LimeWire, Grokster, BitTorrent — had decentralised file sharing beyond any single point of legal attack. Kazaa, which used a distributed network with no central directory, was far harder to shut down. The RIAA then pivoted to suing individual file sharers, filing over 35,000 lawsuits between 2003 and 2008 against ordinary people — college students, grandmothers, teenagers — who had shared music files. The strategy generated headlines and resentment in equal measure. It did not stop file sharing.

The revenue collapse bottomed out around 2014, when global recorded music revenue hit approximately $14 billion. What reversed the decline was not legal enforcement but a business model innovation: streaming. Spotify, founded in Stockholm in 2006, launched its service in 2008 with a proposition that file sharing could not match — legal access to virtually all recorded music for a monthly subscription fee, with a user experience that was faster and more convenient than piracy. Apple Music followed in 2015. By 2023, global recorded music revenue had recovered to $28.6 billion — surpassing the 1999 peak for the first time — with streaming accounting for 67% of the total. The industry that MP3 nearly destroyed was rebuilt on a foundation that MP3 made possible: the principle that music is a stream of data, not a physical object.

Brandenburg has spoken publicly about the discomfort of watching a technology he created be used to dismantle an industry. He is a music lover. He did not set out to make music free. He set out to make audio files smaller. The distinction illustrates an uncomfortable truth about foundational technology: the inventor does not control how the invention is used. The inventor builds a capability. Society decides what to do with it. Brandenburg built a codec. Society used it to restructure the economics of one of the world's oldest creative industries — first by demolishing the existing model, then by building something new on the rubble.

The European Innovation Pattern

The MP3 is one of the most consequential consumer technologies ever created in Europe. It was developed at a government-funded applied research institute in a mid-sized German city by a team of academic engineers. It was not venture-funded. It was not built in a garage. It was not the product of a "move fast and break things" culture. It was the product of patient, methodical, publicly funded research — the kind of innovation that Europe's institutional research system excels at producing.

The Fraunhofer-Gesellschaft, the parent organisation of the institute where Brandenburg worked, operates seventy-six institutes across Germany, employing over 30,000 people and generating annual revenue exceeding €3 billion. Its mission is applied research — taking basic scientific discoveries and developing them into technologies that can be licensed to industry. The model is distinctly European: the state funds the research infrastructure, industry provides contract research revenue, and licensing generates returns that fund further research. It is, in effect, a publicly capitalised invention factory. The MP3 was its greatest hit, but it was not an outlier — Fraunhofer institutes have contributed foundational work in semiconductor manufacturing, solar cell technology, medical imaging, and dozens of other fields.

But the Fraunhofer model, for all its success in generating inventions and licensing revenue, reveals a structural limitation in European technology commercialisation. Fraunhofer's licensing model was brilliant in generating revenue and insufficient in capturing value. The institute collected its royalties — hundreds of millions, perhaps a billion dollars over the life of the patents. But the real commercial value of MP3 — the streaming platforms, the device ecosystems, the cultural transformation — was captured almost entirely by American companies. iTunes was built in Cupertino. The iPod was designed in Cupertino. Spotify was founded in Stockholm but built its platform, its engineering culture, and its market capitalisation in the United States.

The pattern repeats with striking consistency across European technology history. ARM was designed in Cambridge, England; the ARM-based smartphone ecosystem was commercialised by Apple, Samsung, and Qualcomm. The World Wide Web was invented at CERN in Geneva; Google, Facebook, and Amazon were built in California. GSM, the mobile communications standard, was a European creation; the dominant mobile platform companies — Apple and Google — are American. In each case, Europe produced the foundational technology and America produced the platform that monetised it at scale. Europe earned licensing fees. America earned market capitalisations.

Why? The explanations are structural, not cultural. European capital markets are fragmented, making it harder to raise the growth-stage funding needed to build platform companies. European labour markets are less fluid, making it harder to assemble and disassemble teams at startup speed. The European single market, despite decades of integration, still presents regulatory and linguistic barriers that make continental-scale customer acquisition harder than it is in the United States. And the Fraunhofer model itself — licensing technology to external companies rather than building companies around it — is optimised for revenue from invention rather than value from commercialisation. Brandenburg built the codec. A licensing office collected the royalties. Nobody at Fraunhofer built the iPod.

The MP3 is the most vivid case study of this pattern — European invention, American commercialisation — because the commercial value was so enormous and so visibly captured elsewhere. It is also, perhaps, the most instructive, because it shows that the problem is not a lack of inventive capacity. Europe does not need to learn how to invent. It needs to learn how to build platforms around its inventions before someone else does. Brandenburg gave the world a codec. The world gave the value to Cupertino.

Sources

European Patent EP 0402973 B1 — "Digital coding of audio signals" (Brandenburg et al., Fraunhofer, filed 1989) — https://worldwide.espacenet.com/patent/search/family/006384810/publication/EP0402973B1
Brandenburg, K. "MP3 and AAC Explained." Audio Engineering Society Convention Paper, 2000.
Witt, S. "How Music Got Free: The End of an Industry, the Turn of the Century, and the Patient Zero of Piracy." Viking/Penguin, 2015.
Fraunhofer Institute for Integrated Circuits (IIS) — MP3 history page — https://www.iis.fraunhofer.de/en/ff/amm/consumer-electronics/mp3.html
Sterne, J. "MP3: The Meaning of a Format." Duke University Press, 2012.
IFPI Global Music Report — recorded music revenue data, 1999-2024 — https://www.ifpi.org/resources/
ISO/IEC 11172-3:1993 — MPEG-1 Audio Layer III specification
Pan, D. "A Tutorial on MPEG/Audio Compression." IEEE Multimedia, 1995.
Fraunhofer-Gesellschaft Annual Report — institute structure, revenue, and applied research model — https://www.fraunhofer.de/en/about-fraunhofer.html