Rethinking compression: why "good enough" quality might actually be better
Navigating the rate-distortion-perception tradeoff with AI.
When you compress data for transmission over a wireless link, you’re typically trying to minimize two things: how many bits you send (rate) and how much the reconstructed signal differs from the original (distortion). But there’s a third factor that matters, especially for images and video: does the output look realistic?
This is the rate-distortion-perception (RDP) tradeoff. A compressed image might score well on traditional distortion metrics like PSNR while still looking wrong to human eyes, with blurry textures or unnatural smoothness. The perception constraint forces the reconstruction’s statistical distribution to stay close to that of natural images.
The problem with current neural compressors
Recent neural compression methods have tackled RDP, but they’ve hit some theoretical walls. Classical rate-distortion theory shows that optimal compressors should efficiently pack the representation space, like fitting spheres into a box. RDP theory adds another wrinkle: achieving truly optimal compression may require infinite shared randomness between encoder and decoder.
That’s not practical. So the question becomes: how do you build compressors that get close to optimal given realistic constraints on complexity and shared randomness?
Lattice coding meets dithering
The approach here combines two ideas: lattice coding for packing efficiency and dithering for controlled randomness.
Standard neural transform coding (NTC) uses scalar quantization in the latent space. This limits how efficiently you can pack representations. Lattice transform coding (LTC) replaces scalar quantization with lattice quantization, which packs more efficiently because lattices tile space better than axis-aligned grids.
For the randomness piece, the method uses dithering: adding carefully designed noise before quantization and removing it after. Encoder and decoder share the same random seed, so they can coordinate without sending extra bits. This effectively implements randomized vector quantization, which theory says is necessary for RDP optimality.
What this means for telecom
Compression matters everywhere in wireless networks, from video streaming to backhaul optimization. If you can send fewer bits while maintaining perceptual quality, you free up capacity.
The RDP framework is particularly relevant for applications where human perception is the final arbiter. A video call that looks natural is more valuable than one that technically minimizes pixel-level error but produces uncanny artifacts. Similarly, compressed sensor data for AR/VR applications needs to preserve the qualities that make reconstructed scenes feel real.
The shared-randomness approach is interesting for practical systems. If encoder and decoder can agree on a seed (easy over any bidirectional link), you get the benefits of randomized coding without bandwidth overhead.
This is still research-stage work with simulation results. Production deployment would need to address computational complexity at both encoder and decoder, and the lattice quantization step adds implementation complexity compared to standard NTC. But the theoretical grounding is solid, and it suggests that current neural compressors may be leaving performance on the table by ignoring the perception constraint or using suboptimal quantization schemes.
Paper: Optimal Neural Compressors for the Rate-Distortion-Perception Tradeoff Authors: Eric Lei, Hamed Hassani, Shirin Saeedi Bidokhti
