Skip to content

Add lw-detr models#332

Closed
srishtiii28 wants to merge 4 commits into
mlverse:mainfrom
srishtiii28:feature/lw-detr
Closed

Add lw-detr models#332
srishtiii28 wants to merge 4 commits into
mlverse:mainfrom
srishtiii28:feature/lw-detr

Conversation

@srishtiii28

@srishtiii28 srishtiii28 commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

Closes #328

Adds four variants of the LW-DETR from the Atten4Vis implementation:

  • model_lw_detr_tiny - ViT-Ti (6 layers, embed_dim=192), 100 queries
  • model_lw_detr_small - ViT-Ti (10 layers, embed_dim=192), 300 queries
  • model_lw_detr_medium - ViT-S (10 layers, embed_dim=384), 300 queries
  • model_lw_detr_large - ViT-S (10 layers, embed_dim=384), 2-scale projector, 300 queries

The architecture is a ViT encoder with interleaved window or global attention, a C2f projector from the YOLOv8 and a 3-layer DETR decoder which has deformable cross-attention. Two-stage query selection and Group DETR which contains 13 groups are used during training but only the primary group is used at inference. The pretrained COCO weights are fetched via download_and_cache() . And all the four checkpoints load with zero missing or unexpected keys.

Since torchvisionlib #25 is still open, deformable cross-attention uses a pure PyTorch nnf_grid_sample fallback and no CUDA dependency. The CUDA operation can be swapped in once that issue is resolved.

The input images should be ImageNet-normalized tensors of shape (B, 3, H, W), square and divisible by 64. 640×640 would be recommended . Output would be a list of detections per image with boxes i.e. xyxy pixels), labels, and scores.

@srishtiii28 srishtiii28 marked this pull request as draft June 14, 2026 12:00
@srishtiii28 srishtiii28 marked this pull request as ready for review June 14, 2026 17:56
@srishtiii28 srishtiii28 marked this pull request as draft June 19, 2026 07:08
@srishtiii28 srishtiii28 marked this pull request as ready for review June 19, 2026 07:33
@srishtiii28 srishtiii28 marked this pull request as draft June 21, 2026 21:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Object Detection Model] Please implement LW-DETR

1 participant