SinkFix is a deployed full-stack app for inspecting attention sinks in BERT-style transformer models.
The project is best described as an attention interpretability tool. It helps answer:
Which tokens receive unusually concentrated attention, and do those tokens look structural, useful, or suspicious?
SinkFix takes a Hugging Face model name and input text, runs the model with attention and hidden-state outputs enabled, and returns a token-level diagnostic report.
SinkFix is deployed at:
SinkFix currently works as a deployed attention diagnostics app:
- Backend API built with FastAPI
- Frontend built with Next.js, React, TypeScript, and Tailwind CSS
- Default model input set to
google-bert/bert-base-uncased - Analysis designed around BERT-style encoder internals
- Results shown as a token-level diagnostic table in the frontend
- Averaged attention heatmap shown on the results page
- JSON and CSV export available from the displayed analysis result
The app focuses on internal attention behavior. It does not claim to fully explain why a model produced a specific final prediction.
For each input, SinkFix returns:
- model tokens from the tokenizer
- normalized attention received by each token
- normalized value-vector norm for each token
- a token classification:
beneficial,neutral, ordetrimental - summary counts by classification
- the strongest attention receiver
- the top attention sinks
- a full table of token-level diagnostics
Special tokens such as [CLS] and [SEP] are included in the report. That is intentional in the current version because structural-token behavior is part of what the project is inspecting.
The current backend pipeline is:
- Load the requested model and tokenizer with Hugging Face Transformers.
- Tokenize the input text.
- Run the model with attention and hidden-state outputs enabled.
- Average attention across layers and heads.
- Compute normalized attention received by each token.
- Compute normalized value-vector norms from BERT layer
0. - Detect tokens above the attention threshold.
- Classify detected sink candidates.
The classification rule is intentionally simple:
- token index
0, usually[CLS], is classified asbeneficialwhen evaluated at early layer depth - high attention received with lower value norm is classified as
detrimental - everything else is classified as
neutral
The frontend displays the token-level diagnostics as summary cards, top sinks, and a full results table.
- not a training or fine-tuning pipeline
- not a model repair system
- not a claim that attention diagnostics fully explain model decisions
- not an ML monitoring system
- not currently designed for autoregressive language models
Backend:
- Python
- FastAPI
- Hugging Face Transformers
- PyTorch
Frontend:
- Next.js
- React
- TypeScript
- Tailwind CSS
backend/
api/
main.py FastAPI app and CORS setup
routes.py analysis endpoint
schemas.py request and response models
ml/
utils.py model loading and attention extraction
sink_detector.py attention sink detection
classifier.py sink classification rule
frontend/
app/ Next.js routes
src/features/analysis/
api/ frontend API request helper
components/ analysis form and results UI
types/ TypeScript response types
Install backend dependencies:
pip install -r requirements.txtIf PyTorch is not already installed in your environment, install the build appropriate for your machine from the official PyTorch instructions.
Start the backend:
uvicorn backend.api.main:app --reloadInstall frontend dependencies:
cd frontend
npm installStart the frontend:
npm run devOpen the frontend at:
http://localhost:3000
The frontend expects the backend at:
http://localhost:8000
Allowed frontend origins can be configured with:
FRONTEND_ORIGINS=https://www.sinkfix.xyz,https://sinkfix.xyz,http://localhost:3000Call the analysis endpoint:
curl -X POST http://127.0.0.1:8000/api/analyze \
-H "Content-Type: application/json" \
-d '{"model_name":"google-bert/bert-base-uncased","text":"The wheels on the bus go round and round."}'Request body:
{
"model_name": "google-bert/bert-base-uncased",
"text": "The wheels on the bus go round and round."
}Response fields:
token_list: model tokensclassifications: one label per tokenatt_received_scores: normalized attention received by each tokenvalue_norms: normalized value-vector norm per token
The input page submits a model name and text to the backend. On success, the frontend stores the latest response in sessionStorage and navigates to /results.
The results page reads that stored response and renders:
- total token count
- classification counts
- strongest attention receiver
- top five attention sinks
- averaged attention heatmap
- full token table
- JSON and CSV export actions
Refreshing or opening /results without a stored response shows an empty-state message.
Backend syntax check:
python -m compileall backendFrontend lint:
cd frontend
npm run lintFrontend production build:
cd frontend
npm run build- The value-vector extraction assumes BERT internals at
model.encoder.layer[...]. - The model is loaded on every request, which is slow and inefficient.
- Classification thresholds are heuristic and may need validation for broader model coverage.
- The frontend only keeps the latest result in browser
sessionStorage. - Autoregressive language models are not supported.
- The project currently inspects attention behavior, not full causal explanations of model predictions.