feat: automated fresh node startup health check

## Summary

We should have an automated tool that periodically spins up a fresh light node on each long-running network (mocha, arabica, mainnet) to verify that new nodes can successfully start, sync, and sample. This would catch startup regressions like the recent tail height overshoot (#4840) before users report them.

## Motivation

In v0.29.1, light nodes on mocha failed to start because the syncer tail height estimation overshot the pruning window by ~3.8 hours. This was only discovered through manual testing. An automated canary would have caught this days earlier.

## Proposed Behavior

- Run on a schedule (e.g., daily or every few hours)
- For each target network, start a fresh light node (clean datastore) and verify:
  - Successful connection to bootstrappers
  - Head header obtained
  - Tail header within the pruning window
  - Initial sync completes (e.g., first 100 headers)
  - DAS sampling begins
- Report results to telemetry (OTLP metrics / Grafana dashboard)
- Alert on failure (e.g., PagerDuty, Slack, or Grafana alerting)

## Implementation Ideas

- Could be a CI cron job, a standalone service, or a cel-shed subcommand
- Could reuse the existing tastora Docker infrastructure
- Metrics to export: startup latency, time-to-first-sample, bootstrapper reachability, sync speed

## Related

- #4840 — tail height overshoot that motivated this
- #4841 — go-header v0.8.5-rc with parallel performRequest

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: automated fresh node startup health check #4842

Summary

Motivation

Proposed Behavior

Implementation Ideas

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

feat: automated fresh node startup health check #4842

Description

Summary

Motivation

Proposed Behavior

Implementation Ideas

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions