This document describes the performance benchmark suite for Lynx URL shortener.
The performance benchmark suite is designed to validate the caching optimizations and measure real-world performance characteristics of the Lynx service under various load conditions.
The redirect endpoint (GET /:code) has received the most optimization effort:
- Moka read cache: In-memory caching of URL lookups (500k entries, 5-minute TTL)
- Actor pattern write buffering: Zero-lock-contention click counting
- Dual-layer flush system: 100ms actor flush + 5s database flush
- Non-blocking writes: Database operations don't block redirect responses
Target Performance: ~70,000 requests/second at 1000 concurrency, with slight degradation at 5000 concurrency.
Management API endpoints are expected to have lower performance as they involve database queries:
POST /api/urls- Create short URL (write operation)GET /api/urls/:code- Get URL details (may benefit from read cache)GET /api/urls- List URLs (pagination, database query)PUT /api/urls/:code/deactivate- Deactivate URL (state change + cache invalidation)PUT /api/urls/:code/reactivate- Reactivate URL (state change + cache invalidation)
-
Single Hot URL (1000 concurrency)
- Best case for cache effectiveness
- Validates cache hit rate and performance
- Tests actor pattern under concentrated load
-
Single Hot URL (5000 concurrency)
- Higher concurrency test
- Expected slight performance drop
- Validates system stability under increased load
-
Distributed Load (100 URLs)
- Tests cache with distributed access patterns
- Validates cache eviction and refresh logic
- More realistic traffic pattern
-
Extreme Load (10000 connections)
- Stress test for actor pattern
- Validates zero-lock-contention design
- Tests system limits
-
Management Endpoint Tests
- Individual tests for each CRUD operation
- Validates database query performance
- Measures cache invalidation overhead
Primary tool for performance testing:
- High-performance, multi-threaded HTTP benchmarking
- Lua scripting support for complex scenarios
- Detailed latency percentiles (p50, p90, p99, p99.9, p99.99)
- Coordinated omission correction for accurate measurements
Installation: Built from source in GitHub Actions
Capabilities:
- Can generate 100k+ requests/second
- Supports custom Lua scripts for complex request patterns
- Provides comprehensive latency histograms
- Thread-safe and efficient
Fallback tool for simple scenarios:
- Simpler interface
- Good for quick tests
- Limited to single URL per test
- Less accurate for high-performance testing
# Start PostgreSQL
docker run -d \
--name postgres \
-e POSTGRES_USER=lynx \
-e POSTGRES_PASSWORD=lynx_pass \
-e POSTGRES_DB=lynx \
-p 5432:5432 \
postgres:17-alpine
# Start Lynx with host network (bypasses userland proxy)
docker run -d \
--name lynx \
--network host \
-e DATABASE_BACKEND=postgres \
-e DATABASE_URL=postgresql://lynx:lynx_pass@localhost:5432/lynx \
-e DATABASE_MAX_CONNECTIONS=50 \
-e AUTH_MODE=none \
-e API_HOST=0.0.0.0 \
-e API_PORT=8080 \
-e REDIRECT_HOST=0.0.0.0 \
-e REDIRECT_PORT=3000 \
ghcr.io/btreemap/lynx:latest
# Wait for service to be ready
sleep 10
# Run benchmarks
bash tests/benchmark.sh http://localhost:8080 http://localhost:3000 ./results 30sThe benchmark runs automatically after the Docker image is built and published:
- Triggered by:
docker-publishworkflow completion - Can also be triggered manually via workflow dispatch
- Runs on: ubuntu-24.04 with GitHub Actions hosted runners
- Network: Uses
--network hostto bypass Docker userland proxy - Duration: Configurable (default 30s per test)
Manual Trigger:
# Via GitHub CLI
gh workflow run performance-benchmark.yml
# With custom duration
gh workflow run performance-benchmark.yml -f duration=60sThe benchmark uses production-like configuration:
DATABASE_BACKEND=postgres
DATABASE_MAX_CONNECTIONS=50
CACHE_MAX_ENTRIES=500000
CACHE_FLUSH_INTERVAL_SECS=5
ACTOR_BUFFER_SIZE=1000000
ACTOR_FLUSH_INTERVAL_MS=100- Default: 30 seconds per test
- Configurable: Via workflow input or script parameter
- Recommendation:
- 30s for quick validation
- 60s for accurate measurements
- 120s+ for stress testing
-
benchmark-results-TIMESTAMP.txt
- Raw wrk output
- Detailed latency histograms
- Request rate and throughput
- Error counts and types
-
benchmark-results-TIMESTAMP.json
- Structured results data
- Programmatically parseable
- Suitable for visualization tools
- Schema documented in GRAPHS.txt
-
PERFORMANCE_REPORT.md
- Human-readable summary
- Configuration details
- Key findings
- Comparison with expected performance
-
lynx-logs.txt
- Service logs during benchmark
- Cache configuration at startup
- Any warnings or errors
- Useful for debugging
The repository includes tests/visualize_benchmarks.py for automatic graph generation:
# Install dependencies (if not already installed)
pip3 install matplotlib numpy
# Generate graphs from JSON results
python3 tests/visualize_benchmarks.py benchmark-results-*.json -o ./graphsThis creates:
- RPS comparison bar chart
- Latency percentiles grouped chart
- Performance heatmap with normalized metrics
- Text summary with top performers
See BENCHMARK_RESULTS.md for more visualization options.
For each test:
- Requests per second (RPS): Total throughput
- Average latency: Mean response time
- Latency percentiles:
- p50 (median)
- p90 (90th percentile)
- p99 (99th percentile)
- p99.9 (99.9th percentile) - when available
- p99.99 (99.99th percentile) - when available
- Error count: Failed requests
- Transfer rate: Data throughput (MB/s)
All results are uploaded as GitHub Actions artifacts:
- Retention: 90 days
- Name:
performance-benchmark-results - Size: Typically < 10 MB
Based on docs/PERFORMANCE_OPTIMIZATIONS.md:
| Metric | Target | Notes |
|---|---|---|
| RPS @ 1k concurrency | ~70,000 | Single hot URL, best case |
| RPS @ 5k concurrency | ~60,000+ | Expected slight drop |
| RPS @ 10k concurrency | Stable | Should not crash or degrade severely |
| p50 latency | < 1ms | Cached hits are memory operations |
| p99 latency | < 10ms | Should be very low for cached hits |
| Endpoint | Expected RPS | Notes |
|---|---|---|
| POST /api/urls | 1,000 - 5,000 | Database write + validation |
| GET /api/urls/:code | 5,000 - 20,000 | May benefit from cache |
| GET /api/urls | 500 - 2,000 | Pagination query |
| PUT deactivate/reactivate | 1,000 - 3,000 | State change + cache invalidation |
| GET /api/health | 50,000+ | Simple status check |
Note: These are estimates. Actual performance depends on hardware, database, and configuration.
Docker's userland proxy significantly hurts performance. The benchmark bypasses it by using --network host:
- With userland proxy: ~10-20k RPS
- Without userland proxy: ~70k+ RPS
This is a 3-5x performance difference for high-throughput scenarios.
- Default: 50 connections
- Impact: Higher pool size can improve concurrent database operations
- Recommendation:
- SQLite: Keep at 30 or lower (limited concurrent writes)
- PostgreSQL: Scale to 50-100 based on load
- Max entries: 500,000 (approximately 100 MB)
- TTL: 5 minutes
- Eviction: LRU (Least Recently Used)
- Thread-safety: Lock-free concurrent access
- Size: 1,000,000 messages
- Flush interval: 100ms (Layer 1 → Layer 2)
- Backpressure: Blocking channel prevents message loss
- Performance: ~500k increments/second with minimal overhead
- Check userland proxy: Use
--network hostto bypass - Database connections: Increase pool size if seeing connection timeouts
- System resources: Ensure sufficient CPU and memory
- File descriptors: Check
ulimit -n(should be > 100,000)
- Cold cache: First requests will be slower (cache miss)
- Database slow: Check PostgreSQL performance and indexes
- Network: Ensure low-latency connection to database
- Buffer flushing: Check if database flush is blocking (shouldn't happen)
- Connection refused: Service not ready, increase wait time
- Timeout: Increase test duration or reduce concurrency
- 502/503 errors: Service overloaded, check logs
- Database errors: Check connection string and credentials
After running benchmarks several times, establish baseline metrics:
- Record p50, p90, p99 latencies for each test
- Document RPS for redirect endpoint at different concurrency levels
- Note any anomalies or variations
- Update performance targets based on actual results
Monitor for performance regressions:
- Compare benchmark results across commits
- Alert if RPS drops > 10% for redirect endpoint
- Alert if p99 latency increases > 20%
- Investigate any new errors or timeouts
Track trends over time:
- Plot RPS over commits/releases
- Monitor latency percentiles
- Track cache hit rates (from logs)
- Analyze database query patterns
- Interpreting Benchmark Results - How to read and analyze benchmark output
- Performance Optimizations - Detailed explanation of caching and optimization strategies
- Integration Tests - Functional correctness tests
- Concurrent Tests - Concurrency and data consistency tests
When modifying the benchmark suite:
- Add new tests for new endpoints or features
- Update targets if optimizations improve performance
- Document changes in this file and commit messages
- Validate locally before pushing to CI
- Consider test duration - balance thoroughness with CI time
For questions or issues with the benchmark suite, please open a GitHub issue.