Fix DPT decoder bugs for Scenic parity by bingyic · Pull Request #31 · google-deepmind/tips

bingyic · 2026-05-11T09:02:51Z

Summary

Fixes multiple bugs in pytorch/decoders.py to achieve numerical parity (max diff < 1e-4) with the Scenic/Flax reference implementation.

DPTHead: Add output_activation parameter (default False).
When True, applies F.relu() after project conv, matching Scenic.
DepthDecoder: Replace with classification-based depth prediction:
- nn.Linear(channels, num_depth_bins) head
- bin_centers buffer via torch.linspace(min_depth, max_depth, num_depth_bins)
- Forward: relu(logits) + min_depth → normalize → einsum(probs, bin_centers)
ReassembleBlocks: Use F.gelu(x, approximate='tanh') to match JAX default.
ConvTranspose kernel: Apply 180° spatial flip during Flax→PyTorch conversion.
load_decoder_weights(): Unified weight loading from Scenic .zip checkpoints
with auto-detection and key remapping for all decoder types:
pixel_segmentation, pixel_depth_classif, pixel_normals → head

All three decoder types verified for numerical parity against Scenic reference:

bingyic added 3 commits May 11, 2026 01:58

Fix DPT decoder bugs for Scenic parity

75d91dc

Add parity verification script

f2d59b9

Remove verify script (not for release)

bb106fa