|
| 1 | +# Lab Creation Infrastructure |
| 2 | + |
| 3 | +Tools for creating new lab-validation snapshots using |
| 4 | +[containerlab](https://containerlab.dev/) on AWS EC2 with Juniper |
| 5 | +vJunos-router virtual images. |
| 6 | + |
| 7 | +## Overview |
| 8 | + |
| 9 | +This directory contains everything needed to: |
| 10 | + |
| 11 | +1. Provision an AWS EC2 instance with KVM, Docker, and containerlab |
| 12 | +2. Deploy Juniper virtual router topologies |
| 13 | +3. Collect device operational data (show command outputs) |
| 14 | +4. Package the data as lab-validation snapshots |
| 15 | +5. Validate against Batfish |
| 16 | + |
| 17 | +The workflow is designed to be driven by Claude Code or run manually. |
| 18 | + |
| 19 | +## Prerequisites |
| 20 | + |
| 21 | +- **AWS CLI v2.34+** with configured credentials (`aws configure` or |
| 22 | + `AWS_PROFILE` environment variable) |
| 23 | +- **Juniper vJunos-router qcow2 image** — free download from |
| 24 | + [Juniper vJunos Labs](https://www.juniper.net/us/en/dm/vjunos-labs.html) |
| 25 | + (non-production use, no time limit) |
| 26 | +- **Batfish** running locally for validation (Docker image or built from |
| 27 | + source) |
| 28 | + |
| 29 | +## One-Time Setup |
| 30 | + |
| 31 | +### 1. Download the Juniper Image |
| 32 | + |
| 33 | +Download `vJunos-router-*.qcow2` from Juniper's website and save it to |
| 34 | +`infra/images/` (this directory is gitignored): |
| 35 | + |
| 36 | +```bash |
| 37 | +ls infra/images/ |
| 38 | +# vJunos-router-25.4R1.12.qcow2 |
| 39 | +``` |
| 40 | + |
| 41 | +### 2. Upload to S3 |
| 42 | + |
| 43 | +```bash |
| 44 | +cd infra |
| 45 | +AWS_PROFILE=<profile> ./upload-image.sh |
| 46 | +``` |
| 47 | + |
| 48 | +This creates an S3 bucket named `lab-validation-images-<account-id>` (if it |
| 49 | +doesn't exist) and uploads all qcow2 files from `infra/images/`. Idempotent — |
| 50 | +skips files already in S3. |
| 51 | + |
| 52 | +### 3. First Launch — Build the Docker Image |
| 53 | + |
| 54 | +The first EC2 launch will find the qcow2 in S3 but no pre-built Docker image. |
| 55 | +The setup script automatically builds the vrnetlab container and uploads the |
| 56 | +result to S3 for future launches. This takes ~10 minutes total for the first |
| 57 | +launch. |
| 58 | + |
| 59 | +```bash |
| 60 | +AWS_PROFILE=<profile> ./ec2-launch.sh |
| 61 | +# Wait for setup to complete (~5-10 min) |
| 62 | +ssh -i <key> ubuntu@<ip> 'cat /var/log/ec2-setup-complete' |
| 63 | +``` |
| 64 | + |
| 65 | +### Subsequent Launches |
| 66 | + |
| 67 | +After the Docker image is in S3, new instances load it directly (~2-3 min): |
| 68 | + |
| 69 | +```bash |
| 70 | +AWS_PROFILE=<profile> ./ec2-launch.sh |
| 71 | +``` |
| 72 | + |
| 73 | +## Creating a Lab |
| 74 | + |
| 75 | +### Step 1: Design the Topology |
| 76 | + |
| 77 | +Create a containerlab topology YAML file and Junos configs. Configs must be |
| 78 | +in **curly-brace format** (not `set` format) because vrnetlab concatenates |
| 79 | +them with its init.conf and loads them as a config disk. |
| 80 | + |
| 81 | +See `infra/examples/` for working examples: |
| 82 | + |
| 83 | +- `two-router-ebgp.clab.yml` — minimal 2-router eBGP lab |
| 84 | +- `evpn-type5/topology.clab.yml` — 4-node EVPN Type 5 fabric |
| 85 | + |
| 86 | +**Interface mapping**: containerlab `ethN` maps to Junos `ge-0/0/(N-1)`: |
| 87 | + |
| 88 | +| containerlab | Junos | |
| 89 | +| ------------ | ----------------- | |
| 90 | +| eth0 | management (auto) | |
| 91 | +| eth1 | ge-0/0/0 | |
| 92 | +| eth2 | ge-0/0/1 | |
| 93 | +| eth3 | ge-0/0/2 | |
| 94 | +| ethN | ge-0/0/(N-1) | |
| 95 | + |
| 96 | +### Step 2: Launch EC2 and Upload |
| 97 | + |
| 98 | +```bash |
| 99 | +# Launch instance |
| 100 | +AWS_PROFILE=<profile> ./ec2-launch.sh |
| 101 | + |
| 102 | +# Upload topology and configs |
| 103 | +IP=<from launch output> |
| 104 | +KEY=<from launch output> |
| 105 | +ssh -i $KEY ubuntu@$IP 'mkdir -p ~/lab/mylab/configs ~/lab/src' |
| 106 | +scp -i $KEY -r src/lab_builder ubuntu@$IP:~/lab/src/ |
| 107 | +scp -i $KEY topology.clab.yml ubuntu@$IP:~/lab/mylab/ |
| 108 | +scp -i $KEY configs/*.cfg ubuntu@$IP:~/lab/mylab/configs/ |
| 109 | +``` |
| 110 | + |
| 111 | +### Step 3: Deploy Topology |
| 112 | + |
| 113 | +```bash |
| 114 | +ssh -i $KEY ubuntu@$IP \ |
| 115 | + 'cd ~/lab/mylab && sudo containerlab deploy -t topology.clab.yml' |
| 116 | +``` |
| 117 | + |
| 118 | +vJunos-router takes 5-10 minutes to boot. Monitor with: |
| 119 | + |
| 120 | +```bash |
| 121 | +ssh -i $KEY ubuntu@$IP 'sudo containerlab inspect -t ~/lab/mylab/topology.clab.yml' |
| 122 | +``` |
| 123 | + |
| 124 | +Wait until all nodes show `(healthy)`. |
| 125 | + |
| 126 | +### Step 4: Health Check |
| 127 | + |
| 128 | +Verify SSH access and routing protocol convergence: |
| 129 | + |
| 130 | +```bash |
| 131 | +ssh -i $KEY ubuntu@$IP \ |
| 132 | + 'cd ~/lab && PYTHONPATH=src python3 -m lab_builder health-check mylab/topology.clab.yml' |
| 133 | +``` |
| 134 | + |
| 135 | +This waits for SSH on all nodes, then polls BGP/OSPF/ISIS neighbor status |
| 136 | +until sessions are established. |
| 137 | + |
| 138 | +### Step 5: Collect Show Commands |
| 139 | + |
| 140 | +```bash |
| 141 | +ssh -i $KEY ubuntu@$IP \ |
| 142 | + 'cd ~/lab && PYTHONPATH=src python3 -m lab_builder collect mylab/topology.clab.yml --output-dir /tmp/collected' |
| 143 | +``` |
| 144 | + |
| 145 | +Collects 9 show commands per Junos node (see "Show Commands Collected" below). |
| 146 | +Files are named to match the lab-validation parser conventions. |
| 147 | + |
| 148 | +### Step 6: Build Snapshot |
| 149 | + |
| 150 | +```bash |
| 151 | +ssh -i $KEY ubuntu@$IP \ |
| 152 | + 'cd ~/lab && PYTHONPATH=src python3 -m lab_builder build-snapshot mylab/topology.clab.yml --name junos_my_feature --collected-dir /tmp/collected --snapshots-dir /tmp/snapshots' |
| 153 | +``` |
| 154 | + |
| 155 | +### Step 7: Download Snapshot |
| 156 | + |
| 157 | +```bash |
| 158 | +scp -i $KEY -r ubuntu@$IP:/tmp/snapshots/junos_my_feature snapshots/ |
| 159 | +``` |
| 160 | + |
| 161 | +### Step 8: Tear Down |
| 162 | + |
| 163 | +```bash |
| 164 | +AWS_PROFILE=<profile> ./ec2-teardown.sh |
| 165 | +``` |
| 166 | + |
| 167 | +### Step 9: Validate Against Batfish |
| 168 | + |
| 169 | +Locally (requires Batfish running): |
| 170 | + |
| 171 | +```bash |
| 172 | +pytest lab_tests/test_labs.py --labname=junos_my_feature -v --tb=short |
| 173 | +``` |
| 174 | + |
| 175 | +### Step 10: Triage Failures |
| 176 | + |
| 177 | +For each test failure, determine the cause: |
| 178 | + |
| 179 | +- **Parser bug**: the lab-validation parsers can't handle the device output. |
| 180 | + Fix the parser, add unit tests. |
| 181 | +- **Batfish modeling discrepancy**: Batfish predicts different routes or |
| 182 | + interfaces than the real device. File a GitHub issue in batfish/batfish or |
| 183 | + batfish/lab-validation and add a sickbay entry. |
| 184 | +- **Config error in the lab**: fix the config, re-deploy, re-collect. |
| 185 | +- **Expected difference**: management interfaces, pseudo-interfaces, etc. that |
| 186 | + Batfish intentionally doesn't model. Update the validator's exclusion logic. |
| 187 | + |
| 188 | +## Iterating on a Lab |
| 189 | + |
| 190 | +To modify configs without full redeploy (saves 5-10 min boot time): |
| 191 | + |
| 192 | +```bash |
| 193 | +# Push new config to a node |
| 194 | +ssh -i $KEY ubuntu@$IP \ |
| 195 | + 'cd ~/lab && PYTHONPATH=src python3 -m lab_builder push-config mylab/topology.clab.yml r1 /path/to/new-config.txt' |
| 196 | + |
| 197 | +# Re-collect just that node |
| 198 | +ssh -i $KEY ubuntu@$IP \ |
| 199 | + 'cd ~/lab && PYTHONPATH=src python3 -m lab_builder recollect mylab/topology.clab.yml r1 --output-dir /tmp/collected' |
| 200 | +``` |
| 201 | + |
| 202 | +Then re-download and re-validate locally. |
| 203 | + |
| 204 | +## Scripts Reference |
| 205 | + |
| 206 | +| Script | Where it runs | Purpose | |
| 207 | +| ----------------- | ------------- | --------------------------------------------------------- | |
| 208 | +| `ec2-launch.sh` | Local | Launch EC2 with KVM, Docker, containerlab, images from S3 | |
| 209 | +| `ec2-status.sh` | Local | Show all lab-validation instances, warn about orphans | |
| 210 | +| `ec2-teardown.sh` | Local | Terminate instance and clean up | |
| 211 | +| `upload-image.sh` | Local | Upload qcow2 images to S3 (idempotent) | |
| 212 | +| `build-image.sh` | EC2 | Build vrnetlab Docker image from qcow2, upload to S3 | |
| 213 | +| `ec2-setup.sh` | EC2 (auto) | Bootstrap script, runs as user-data | |
| 214 | + |
| 215 | +## lab_builder CLI Reference |
| 216 | + |
| 217 | +Run on EC2 as `PYTHONPATH=src python3 -m lab_builder <command>`: |
| 218 | + |
| 219 | +| Command | Purpose | |
| 220 | +| ------------------------------------------------------------------------ | ------------------------------------ | |
| 221 | +| `deploy <topo.yml>` | Deploy containerlab topology | |
| 222 | +| `inspect <topo.yml>` | Show discovered nodes and IPs | |
| 223 | +| `health-check <topo.yml> [--timeout N]` | Wait for SSH + routing convergence | |
| 224 | +| `collect <topo.yml> --output-dir DIR` | Collect show commands from all nodes | |
| 225 | +| `recollect <topo.yml> NODE --output-dir DIR` | Re-collect one node | |
| 226 | +| `push-config <topo.yml> NODE FILE` | Push set-format config and commit | |
| 227 | +| `build-snapshot <topo.yml> --name N --collected-dir D --snapshots-dir S` | Package as snapshot | |
| 228 | +| `destroy <topo.yml>` | Tear down topology | |
| 229 | + |
| 230 | +## Show Commands Collected |
| 231 | + |
| 232 | +For Juniper (vJunos-router), these are collected automatically: |
| 233 | + |
| 234 | +| Command | Goes to | Purpose | |
| 235 | +| ----------------------------------------- | ----------------- | -------------------- | |
| 236 | +| `show configuration \| display set` | `configs/<node>/` | Device config | |
| 237 | +| `show route \| display json` | `show/<node>/` | Main routing table | |
| 238 | +| `show route protocol bgp \| display json` | `show/<node>/` | BGP routes | |
| 239 | +| `show interfaces \| display json` | `show/<node>/` | Interface properties | |
| 240 | +| `show route instance \| display json` | `show/<node>/` | VRF info | |
| 241 | +| `show version \| display json` | `show/<node>/` | Software version | |
| 242 | +| `show bgp neighbor \| display json` | `show/<node>/` | BGP peer status | |
| 243 | +| `show ospf neighbor \| display json` | `show/<node>/` | OSPF status | |
| 244 | +| `show isis adjacency \| display json` | `show/<node>/` | ISIS status | |
| 245 | + |
| 246 | +## Snapshot Directory Structure |
| 247 | + |
| 248 | +The output matches the lab-validation framework's expected layout: |
| 249 | + |
| 250 | +``` |
| 251 | +snapshots/<name>/ |
| 252 | +├── configs/ |
| 253 | +│ ├── <node1>/ |
| 254 | +│ │ └── show_configuration_|_display_set.txt |
| 255 | +│ └── <node2>/ |
| 256 | +│ └── show_configuration_|_display_set.txt |
| 257 | +├── show/ |
| 258 | +│ ├── host_nos.txt # {"node1": "junos", "node2": "junos"} |
| 259 | +│ ├── <node1>/ |
| 260 | +│ │ ├── show_route_|_display_json.txt |
| 261 | +│ │ ├── show_route_protocol_bgp_|_display_json.txt |
| 262 | +│ │ ├── show_interfaces_|_display_json.txt |
| 263 | +│ │ └── ... |
| 264 | +│ └── <node2>/ |
| 265 | +│ └── ... |
| 266 | +└── validation/ # optional |
| 267 | + └── sickbay.yaml # expected failure entries |
| 268 | +``` |
| 269 | + |
| 270 | +## EC2 Instance Details |
| 271 | + |
| 272 | +### Instance Types |
| 273 | + |
| 274 | +Default is **m8i.2xlarge** (8 vCPU, 32 GB RAM). These support nested |
| 275 | +virtualization via `--cpu-options NestedVirtualization=enabled`, which vrnetlab |
| 276 | +needs to run VM-based router images inside Docker containers. |
| 277 | + |
| 278 | +| Instance | vCPU | RAM | ~$/hr | Routers | |
| 279 | +| ----------- | ---- | ----- | ------ | ------- | |
| 280 | +| m8i.xlarge | 4 | 16 GB | ~$0.23 | 1-2 | |
| 281 | +| m8i.2xlarge | 8 | 32 GB | ~$0.46 | 2-4 | |
| 282 | +| m8i.4xlarge | 16 | 64 GB | ~$0.92 | 4-8 | |
| 283 | + |
| 284 | +Each vJunos-router needs ~5 GB RAM and 4 vCPUs. |
| 285 | + |
| 286 | +For spot pricing (~70% cheaper), add `--spot`. |
| 287 | + |
| 288 | +### ec2-launch.sh Options |
| 289 | + |
| 290 | +``` |
| 291 | +--instance-type TYPE EC2 instance type (default: m8i.2xlarge) |
| 292 | +--key-name NAME Use existing EC2 key pair (auto-created if omitted) |
| 293 | +--timeout-hours N Auto-terminate after N hours (default: 4) |
| 294 | +--spot Request spot instance |
| 295 | +``` |
| 296 | + |
| 297 | +### Cost Safety |
| 298 | + |
| 299 | +- Auto-terminate alarm after 4 hours (configurable) |
| 300 | +- `ec2-status.sh` warns about orphaned instances |
| 301 | +- The launch script prevents creating multiple tracked instances |
| 302 | + |
| 303 | +### What Gets Installed (ec2-setup.sh) |
| 304 | + |
| 305 | +- Docker CE |
| 306 | +- containerlab (from netdevops apt repo) |
| 307 | +- KVM/QEMU tools (qemu-kvm, libvirt) |
| 308 | +- Python 3 with netmiko, paramiko, PyYAML, awscli |
| 309 | +- Pre-built Docker images from S3 (or builds from qcow2 as fallback) |
| 310 | + |
| 311 | +## Lab Design Principles |
| 312 | + |
| 313 | +- **Simplicity**: minimum routers to demonstrate the feature (2-3 typical) |
| 314 | +- **Feature isolation**: one feature per lab |
| 315 | +- **Corner cases**: misconfigurations, asymmetric settings, boundary values |
| 316 | +- **Reproducibility**: deterministic results, avoid time-dependent behavior |
| 317 | +- **Documentation**: README explaining what the lab tests and why |
| 318 | + |
| 319 | +## Supported Vendor Profiles |
| 320 | + |
| 321 | +| containerlab kind | Vendor | Default creds | Boot time | KVM required | |
| 322 | +| ----------------------- | ------------------- | ----------------- | --------- | ------------ | |
| 323 | +| `juniper_vjunosrouter` | Junos (MX) | admin / admin@123 | 5-10 min | Yes | |
| 324 | +| `juniper_vjunosevolved` | Junos Evolved (PTX) | admin / admin@123 | ~15 min | Yes | |
| 325 | +| `juniper_crpd` | Junos cRPD | root / clab123 | ~1 min | No | |
| 326 | + |
| 327 | +## Troubleshooting |
| 328 | + |
| 329 | +**SSH connection refused after deploy**: vJunos-router takes 5-10 minutes to |
| 330 | +boot. Wait for `(healthy)` in `containerlab inspect` output before attempting |
| 331 | +SSH. |
| 332 | + |
| 333 | +**Startup config not applied**: Configs must be in curly-brace format, not |
| 334 | +set format. vrnetlab concatenates the config with its init.conf and mounts |
| 335 | +it as a USB config disk. |
| 336 | + |
| 337 | +**KVM not available**: Verify the instance type supports nested virtualization |
| 338 | +(M8i/C8i/R8i families) and that `--cpu-options NestedVirtualization=enabled` |
| 339 | +was used at launch. Check with `ls /dev/kvm` on the instance. |
| 340 | + |
| 341 | +**Docker image not loaded**: Check `docker images | grep vjunos`. If empty, |
| 342 | +the S3 bucket may not have the Docker tarball. The setup script falls back to |
| 343 | +building from qcow2 if available. |
| 344 | + |
| 345 | +**Node names wrong in collected data**: containerlab names containers as |
| 346 | +`clab-<topology>-<node>`. The lab_builder extracts node names by stripping |
| 347 | +this prefix. If node names contain hyphens, verify with |
| 348 | +`python3 -m lab_builder inspect topology.clab.yml`. |
0 commit comments