Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024
-
Updated
Apr 13, 2026 - Python
Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024
A curated list of early exiting (LLM, CV, NLP, etc)
Artifact for "Apparate: Rethinking Early Exits to Tame Latency-Throughput Tensions in ML Serving" [SOSP '24]
[NeurIPS'25] FreqExit: Enabling Early-Exit Inference for Visual Autoregressive Models via Frequency-Aware Guidance
Official PyTorch implementation of "LGViT: Dynamic Early Exiting for Accelerating Vision Transformer" (ACM MM 2023)
Code for paper "EdgeKE: An On-Demand Deep Learning IoT System for Cognitive Big Data on Industrial Edge Devices"
A deep learning framework that implements Early Exit strategies in Convolutional Neural Networks (CNNs) using Deep Q-Learning (DQN). This project enhances computational efficiency by dynamically determining the optimal exit point in a neural network for image classification tasks on CIFAR-10.
Official repository of Busolin et al., "Learning Early Exit Strategies for Additive Ranking Ensembles", ACM SIGIR 2021.
This repository provides an implementation of Early Exit Neural Network (EENN) and provides a way to profile the processing time of the model.
Dynamic per-token early exit for LLM inference. Skip layers tokens don't need
This repository is dedicated to self-learning about early exit papers, including relevant code and documentation.
The project aim to experiment implementing a modular architecture: an early-exit model and testing it using Tensorflow.
Relational Time Engine (RTE): runtime density regulation for compute-efficient transformer inference. Demonstrates up to 75% layer reduction with improved latency and throughput.
This project focuses on the automatic classification of corn leaf diseases using deep neural networks. The dataset includes over 4000 images categorized into four classes: Common Rust, Gray Leaf Spot, Blight, and Healthy. Through the use of Convolutional Neural Networks and advanced techniques, the model achieves a classification accuracy of 91.5%
C implementation of a SHA-1 cracker with various optimizations
Early exit inference framework for HuggingFace LLMs — skip unnecessary transformer layers when the model is already confident. Supports LLaMA, Mistral, Phi, Gemma, Qwen, Pythia and more.
CascadeExit: Adaptive early-exit speculative decoding for LLM inference acceleration. 1.76x speedup on Llama-3.2-3B with 0.51% parameter overhead via confidence-calibrated cascade routing
Add a description, image, and links to the early-exit topic page so that developers can more easily learn about it.
To associate your repository with the early-exit topic, visit your repo's landing page and select "manage topics."