perf(for-you): cap my_saved_artists to 200 most-recent#805
Merged
Conversation
The similar_artists CTE in /v1/users/{id}/feed/for-you self-joins
saves against my_saved_artists. For long-tenure users with thousands
of saved artists, my_saved_artists explodes and the planner times out
(observed: prod request hangs >60s for users with deep save history).
Replace the unbounded DISTINCT with a GROUP BY on owner_id ordered by
the most-recent save_at, capped at 200. Recency is the right axis —
old saves are a weaker signal of current taste anyway — and 200 artists
still gives the collaborative-filter step enough surface to find
similar-artists candidates.
The shape of the CTE (column `artist_id`) is preserved, so downstream
consumers `t1.owner_id IN (SELECT artist_id FROM my_saved_artists)`
and the `NOT IN` exclusion in similar_artists are unchanged.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
dylanjeffers
added a commit
that referenced
this pull request
May 13, 2026
## Summary Builds on #805 (which capped \`my_saved_artists\`). After that fix the endpoint still times out on prod for power users — the remaining unbounded CTEs are scanning the user's full history every request: 1. **\`my_artist_affinity\`** unions saves + reposts + plays for the caller. \`plays\` is the biggest table by far — a heavy listener can have hundreds of thousands of rows, all scanned on every request. Cap each source to most recent N: **200 saves, 200 reposts, 500 plays**. 2. **\`follow_set\`** is every user the caller follows; for a power-user with thousands of follows this becomes a wide hash join against every recent-track upload. Cap to **500 most-recently followed**. Recency is the right axis on all three: old engagement is a weak signal of current taste, and the bounds match the magnitude of the hidden cost (plays >> saves ≈ reposts). ## Diff (CTEs) \`\`\`sql follow_set AS ( SELECT followee_user_id AS user_id FROM follows WHERE follower_user_id = @userid AND is_current AND NOT is_delete ORDER BY created_at DESC LIMIT 500 -- new ), my_artist_affinity AS ( SELECT t.owner_id, LN(1 + COUNT(*)) AS affinity FROM ( (SELECT save_item_id ... ORDER BY created_at DESC LIMIT 200) -- new UNION ALL (SELECT repost_item_id ... ORDER BY created_at DESC LIMIT 200) -- new UNION ALL (SELECT play_item_id ... ORDER BY created_at DESC LIMIT 500) -- new ) eng JOIN tracks t ON ... GROUP BY t.owner_id ), \`\`\` ## Test plan - ✅ All 9 \`TestV1FeedForYou_*\` tests pass locally. Fixtures have <200 saves/<200 reposts/<500 plays/<500 follows so the caps don't kick in and observable behavior is unchanged. - ✅ \`go build ./api/...\` / \`go vet ./api/...\` clean. - After deploy: \`/v1/users/eYZmn/feed/for-you?user_id=eYZmn&limit=5\` (notjulian, deep history) — currently times out at the Cloudflare upstream (>120s). Target: <2s. ## Follow-ups Parallel EXPLAIN ANALYZE work happening to verify the bound shifts the cost as expected and to flag any missing indexes. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
dylanjeffers
added a commit
that referenced
this pull request
May 13, 2026
## Summary
Retiring the dedicated For You feed endpoint. The clients are being
switched to use \`/v1/users/{id}/recommended-tracks\` instead — the same
endpoint that already powers the Explore page's For You section and
works fine in production. See companion PR: AudiusProject/apps#14301.
## Why
The custom \`/feed/for-you\` endpoint had repeated issues since it
shipped:
* **Auth gate bug** (fixed in #804) — global authMiddleware rejected
unsigned \`user_id\` requests, making the endpoint unreachable from the
web RC.
* **Perf** — even after #805 and #806 capped the \`my_saved_artists\`,
\`my_artist_affinity\`, and \`follow_set\` CTEs, EXPLAIN on prod showed
the \`similar_artists\` self-join still produced a 301M-row merge for
power users (and a fixed ~12s \`track_trending_scores\` scan for *every*
user due to a missing partial index). The endpoint never reliably
completed within Cloudflare's 100s upstream limit for power users.
* **Duplication** — the response shape (ranked track list for the
signed-in user) is already what \`/recommended-tracks\` returns. Two
endpoints solving the same problem isn't worth maintaining.
Consolidating on the working endpoint is simpler than continuing to
optimize the custom one.
## Removed
| File | What |
|---|---|
| \`api/v1_users_feed_for_you.go\` | Handler + the 200-row
candidate-pool SQL (4 candidate sources, similar_artists CF, diversity
pass) |
| \`api/v1_users_feed_for_you_test.go\` | 9 unit tests |
| \`api/server.go\` (1 line) | Route registration |
| \`api/auth_middleware.go\` (~10 lines) | The \`/feed/for-you\`
exemption from #804 — no longer needed |
| \`api/swagger/swagger-v1.yaml\` (~70 lines) | The endpoint's swagger
entry |
## Test plan
- ✅ \`go build ./api/...\` clean
- ✅ \`go vet ./api/...\` clean
- ✅ All remaining \`TestV1UsersFeed*\` / \`TestAuth*\` tests pass
locally
- After merge + deploy + AudiusProject/apps#14301 deploy: Feed → For You
tab on the web RC should show the same recommended tracks as Explore's
For You section.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The
similar_artistsCTE in/v1/users/{id}/feed/for-youself-joinssavesagainst themy_saved_artistsset. For long-tenure power users (e.g. an account with thousands of saved artists) the join blows up and the planner times out — observed in prod: the request hangs >60s and returns nothing.Cap
my_saved_artiststo the 200 most-recently saved artists. Recency is the right axis to cut on:Change
CTE output shape (column
artist_id) is preserved, so the downstreamIN (SELECT artist_id FROM my_saved_artists)andNOT IN (...)consumers insimilar_artistsare unchanged.Test plan
TestV1FeedForYou_*tests pass locally against the test DB (fixtures have <200 saved artists, so the cap doesn't kick in and behavior is identical to before for the test cases).go build ./api/.../go vet ./api/...clean./v1/users/eYZmn/feed/for-you?user_id=eYZmn&limit=5(notjulian, deep save history) — should return 200 in <2s instead of hanging.Follow-ups not in this PR
EXPLAIN ANALYZEinstrumentation behind a flag to catch this class of regression earlier next time.my_artist_affinityCTE also unions saves+reposts+plays for the user unbounded — likely the next-slowest piece for power users. Worth a similar bound in a follow-up if this fix doesn't fully resolve the timeout.🤖 Generated with Claude Code