Skip to content

Commit 9e57d91

Browse files
authored
Add iOS/iPadOS app monitoring via OpenTelemetry Swift SDK (SWIP-11) (#13828)
* Add iOS/iPadOS app monitoring via OpenTelemetry Swift SDK (SWIP-11). Includes the `IOS` layer, `IOSHTTPSpanListener` for outbound HTTP client metrics (supports OTel Swift `.old`/`.stable`/`.httpDup` semantic-convention modes via stable-then-legacy attribute fallback), `IOSMetricKitSpanListener` for daily MetricKit metrics (exit counts split by foreground/background, app-launch / hang-time percentile histograms with finite 30 s overflow ceiling), LAL rules for crash/hang diagnostics, Mobile menu, and iOS dashboards. * Fix LAL `layer: auto` mode dropping logs after extractor set the layer. Codegen now propagates `layer "..."` assignments to `LogMetadata.layer` so `FilterSpec.doSink()` sees the script-decided layer. * Fix MetricKit histogram percentile metrics being reported at 1000× their true value — the listener now marks its `SampleFamily` with `defaultHistogramBucketUnit(MILLISECONDS)` so MAL's default SECONDS→MS rescale of `le` labels is not applied.
1 parent 272ba7d commit 9e57d91

44 files changed

Lines changed: 2663 additions & 168 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.claude/skills/package/SKILL.md

Lines changed: 113 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,113 @@
1+
---
2+
name: package
3+
description: Rebuild the SkyWalking distribution and OAP Docker image after source changes. Use before running e2e tests so the image reflects your code changes. Avoids the "image looks updated but runtime has stale jars" trap.
4+
argument-hint: "[oap|all|dist-only]"
5+
---
6+
7+
# Package OAP Distribution & Docker Image
8+
9+
Rebuilding the Docker image after a source-code change has a **two-step dependency**: rebuild the dist tarball, then rebuild the image. Skipping the first step silently produces an image with stale jars — the Docker build still "succeeds," the image has a fresh timestamp, but the embedded jar is the one from your previous build.
10+
11+
## Why this matters
12+
13+
`make docker.oap` in the root `Makefile` is defined as:
14+
15+
```
16+
docker.% push.docker.%: $(CONTEXT)/$(DIST) $(SW_ROOT)/docker/%/*
17+
$(DOCKER_RULE)
18+
```
19+
20+
It depends on `$(CONTEXT)/$(DIST)` (the dist tarball) **as a file prerequisite**. If that tarball already exists, make does not regenerate it — it just copies whatever is on disk into `dist/docker_build/oap/` and runs `docker buildx build`. There is **no** dependency that rebuilds the tarball from source.
21+
22+
Only `make docker` triggers the full chain (`init → build.all → docker.all`). So:
23+
24+
| Command | Rebuilds source? | Rebuilds tarball? | Rebuilds image? |
25+
|---------|:--:|:--:|:--:|
26+
| `./mvnw -pl <module> package` | only that module | no | no |
27+
| `./mvnw -pl apm-dist -am package -Pbackend,dist` | all backend deps + dist | yes | no |
28+
| `make docker.oap` | **no** | **no** (uses whatever's on disk) | yes |
29+
| `make docker` | yes (`build.all`) | yes | yes (`docker.all`) |
30+
31+
## `flatten:flatten` — always run it before `package`/`install`
32+
33+
SkyWalking's poms use `${revision}` as a placeholder version (e.g., `10.5.0-SNAPSHOT`). The `flatten-maven-plugin` resolves `${revision}` into concrete versions and writes a `.flattened-pom.xml`. Without it:
34+
35+
- Installed artifacts carry the literal string `${revision}` in their coordinates and cannot be resolved as dependencies.
36+
- Downstream modules (including `apm-dist`) see an inconsistent dependency graph.
37+
- Symptoms are subtle: `-pl <module> -am` may succeed in isolation but fail or pull in stale transitive artifacts when the same module is consumed by another build invocation in the same session.
38+
39+
**Always run `flatten:flatten` in the same goal chain as `package` or `install`.** The CI `dist-tar` job and the `compile` skill both do this. Example:
40+
41+
```bash
42+
./mvnw clean flatten:flatten package -Pbackend,dist -DskipTests
43+
```
44+
45+
Not optional — treat it as part of `package`.
46+
47+
## Pick the right command
48+
49+
### Changed OAP source → want a new image
50+
51+
```bash
52+
# Recommended: full chain
53+
make docker
54+
```
55+
56+
or, faster and equivalent for backend-only changes (skips UI):
57+
58+
```bash
59+
./mvnw -pl apm-dist -am -o clean flatten:flatten package -Pbackend,dist -DskipTests -Dcheckstyle.skip=true -Dmaven.javadoc.skip=true
60+
make docker.oap
61+
```
62+
63+
**Do not** just run `make docker.oap` after a code edit — the image will look rebuilt but your change will not be in the jars.
64+
**Do not** skip `flatten:flatten` — see the section above.
65+
66+
### Changed only the dist packaging (e.g., `apm-dist/`, log4j2.xml)
67+
68+
```bash
69+
./mvnw -pl apm-dist -am -o clean flatten:flatten package -Pbackend,dist -DskipTests
70+
make docker.oap
71+
```
72+
73+
### Changed only Dockerfile / entrypoint (`docker/oap/*`)
74+
75+
The tarball is untouched, so `make docker.oap` alone is correct:
76+
77+
```bash
78+
make docker.oap
79+
```
80+
81+
## Verify the fix reached the image
82+
83+
Docker's "image created" timestamp updates even when the content is stale (because `--no-cache` is used). **Don't trust the timestamp.** Verify the jar contents directly:
84+
85+
```bash
86+
# Copy the jar out of the container
87+
docker cp <container-name>:/skywalking/oap-libs/<module>-<version>.jar /tmp/verify.jar
88+
89+
# Extract the specific class
90+
cd /tmp && jar -xf verify.jar <path/to/YourClass.class>
91+
92+
# Grep for a string your fix introduced (unique literal, method name, etc.)
93+
grep -oa "myFixSignature" <path/to/YourClass.class>
94+
```
95+
96+
If grep finds nothing, the image does not contain your fix — you forgot to rebuild the dist. Re-run with the correct chain.
97+
98+
## Common pitfalls
99+
100+
- **`make docker.oap` after `./mvnw package`**: The module jar is fresh in `oap-server/.../target/`, but the dist tarball at `dist/apache-skywalking-apm-bin.tar.gz` is untouched. Image uses the old jar.
101+
- **Cancelling a `make docker` mid-flight**: leaves the dist half-rebuilt. Re-run `make docker` or delete `dist/` first.
102+
- **Stale buildx cache**: not usually the issue (the Makefile passes `--no-cache`), but if you suspect it, run `docker buildx prune`.
103+
- **Trusting image timestamps**: `docker images` shows when the image was *tagged*, not when its content actually changed. A no-op rebuild still updates the timestamp.
104+
- **Interactive buildx hangs**: if `docker buildx build` sits silent for minutes with no stdout, kill it and retry. The buildx container driver occasionally stalls; a restart is faster than waiting.
105+
106+
## The golden rule
107+
108+
After any Java source edit that affects code shipped in `oap-libs/`, run one of:
109+
110+
1. `make docker` (full, safest)
111+
2. `./mvnw -pl apm-dist -am -o package -Pbackend,dist -DskipTests -Dcheckstyle.skip=true && make docker.oap` (fast)
112+
113+
Then **verify the fix is in the image's jar** before running e2e. Don't assume.

.claude/skills/run-e2e/SKILL.md

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -134,6 +134,69 @@ Do NOT run cleanup immediately. Instead:
134134
e2e cleanup -c test/e2e-v2/cases/<case-path>/e2e.yaml
135135
```
136136

137+
### 6. Manually fire each verify query (fast triage)
138+
139+
The `e2e verify` retry loop runs in sequence and stops at the first failing case, so a single bad query hides every case after it. When a verify fails, **run each verify case directly against the still-running OAP** before editing anything — you'll see the real error (bad flag, missing data, wrong expected), not the progress spinner. This is also the right way to author new verify cases: craft the query against live OAP, confirm the actual YAML, then write the expected file.
140+
141+
```bash
142+
# Find the host-side port that infra-e2e bound to OAP's container port 12800.
143+
# (Each run picks a new random port; the trigger log prints it too.)
144+
docker ps --filter "name=skywalking_e2e-oap" --format "{{.Ports}}" \
145+
| grep -oE "[0-9]+->12800" | head -1
146+
# => e.g. 56381->12800
147+
148+
URL=http://localhost:56381/graphql
149+
SWCTL=/tmp/skywalking-infra-e2e/bin/swctl
150+
151+
# Copy the query from e2e.yaml verbatim, then substitute ${oap_host} → localhost
152+
# and ${oap_12800} → the port you just found:
153+
$SWCTL --display yaml --base-url=$URL service ly IOS
154+
$SWCTL --display yaml --base-url=$URL logs list --service-name=MyiOSApp
155+
$SWCTL --display yaml --base-url=$URL metrics exec --expression=service_cpm --service-name=MyiOSApp
156+
```
157+
158+
When a `swctl` subcommand rejects a flag (`Incorrect Usage: flag provided but not defined: -layer`), the e2e config is using syntax the pinned `swctl` commit doesn't support. Find the right syntax with `swctl <cmd> --help` and update the e2e config. Common cases encountered:
159+
160+
| Broken flag/form | Working form |
161+
|---|---|
162+
| `service ls --layer IOS` | `service ly IOS` |
163+
| `metrics exec ... --is-normal=true` | drop `--is-normal` (default behavior) |
164+
165+
For queries that *don't* use `swctl` (raw `curl` against `/loki/...`, Zipkin, PromQL), hit the matching exposed port:
166+
167+
```bash
168+
curl "http://localhost:$(docker ps --filter name=skywalking_e2e-oap --format '{{.Ports}}' | grep -oE '[0-9]+->3100' | head -1 | cut -d'-' -f1)/loki/api/v1/labels"
169+
```
170+
171+
### 7. UI template changes require a fresh DB
172+
173+
`UITemplateInitializer.initTemplate()` (in `oap-server/server-core`) calls `uiTemplateManagementService.addIfNotExist(setting)` — keyed by the `id` field in each `ui-initialized-templates/**/*.json`. Same ID → skipped. So edits to an *existing* template JSON (adding widgets, relabeling, changing expressions) will **not** be applied on an already-initialized OAP, even after a container restart, because the old copy still lives in storage.
174+
175+
To pick up dashboard JSON changes:
176+
177+
```bash
178+
# Remove both containers — BanyanDB stores state inside the container FS in the
179+
# e2e compose (no named volume), so removing the container wipes state cleanly.
180+
docker rm -f skywalking_e2e-oap-1 skywalking_e2e-banyandb-1
181+
182+
# For compose setups that use a named volume, also:
183+
# docker volume rm <volume-name>
184+
185+
# Then re-run — OAP sees empty storage, loads the new template JSON.
186+
e2e run -c test/e2e-v2/cases/<case>/e2e.yaml
187+
```
188+
189+
Symptom to watch for: you edit the JSON, rebuild, redeploy — dashboard in the UI still shows the pre-edit layout. That's not a caching bug; that's `addIfNotExist` doing exactly what its name says.
190+
191+
### 8. Author the expected YAML from live output
192+
193+
For a new verify case, the workflow is:
194+
195+
1. Fire the query manually (see step 6) and capture the YAML.
196+
2. Pick which fields are *meaningful domain values* (must match exactly) vs *dynamic runtime values* (`notEmpty` / `gt` / `ge`). See `test/e2e-v2/CLAUDE.md` for the decision guide.
197+
3. Write the expected file. If the response is a list, wrap the items in `{{- contains . }} ... {{- end }}` so ordering and extra actual items don't fail the match.
198+
4. Re-run `e2e verify` alone (the containers are still up from the previous run); iterate on the expected file without rebuilding.
199+
137200
## Common test cases
138201

139202
| Shorthand | Path |

.github/workflows/skywalking.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -640,6 +640,8 @@ jobs:
640640
config: test/e2e-v2/cases/otlp-virtual-genai/e2e.yaml
641641
- name: Zipkin Virtual GenAI
642642
config: test/e2e-v2/cases/zipkin-virtual-genai/e2e.yaml
643+
- name: iOS Monitoring
644+
config: test/e2e-v2/cases/ios/e2e.yaml
643645

644646
- name: Nginx
645647
config: test/e2e-v2/cases/nginx/e2e.yaml

CLAUDE.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -246,6 +246,7 @@ Actions owned by `actions/*` (GitHub), `github/*`, and `apache/*` are always all
246246
1. **Always check submodules**: Protocol changes may require submodule updates
247247
2. **Generate sources first**: Run `mvnw compile` before analyzing generated code
248248
3. **Install package**: Use `mvnw flatten:flatten install` to build the precompiler and export generated classes before running tests. ref to [compile skill doc](.claude/skills/compile/SKILL.md)
249+
3. **Full rebuild on cross-module changes**: If you changed more than two modules or pulled/rebased code from git remote, run `mvnw clean install` (or `mvnw clean package`) on the **whole project** rather than picking individual modules with `-pl`. Incremental `-pl ... -am` builds can leave stale jars in `.m2` or `oap-libs/` when jar sizes don't change but content does, causing hard-to-debug runtime issues.
249250
3. **Respect checkstyle**: No System.out, no @author, no Chinese characters
250251
4. **Follow module patterns**: Use existing modules as templates
251252
5. **Check multiple storage implementations**: Logic may vary by storage type

dist-material/release-docs/LICENSE

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -531,8 +531,8 @@ The text of each license is also included in licenses/LICENSE-[project].txt.
531531
https://npmjs.com/package/estree-walker/v/2.0.2 2.0.2 MIT
532532
https://npmjs.com/package/iconv-lite/v/0.6.3 0.6.3 MIT
533533
https://npmjs.com/package/is-plain-object/v/5.0.0 5.0.0 MIT
534-
https://npmjs.com/package/lodash/v/4.17.23 4.17.23 MIT
535-
https://npmjs.com/package/lodash-es/v/4.17.23 4.17.23 MIT
534+
https://npmjs.com/package/lodash/v/4.18.1 4.18.1 MIT
535+
https://npmjs.com/package/lodash-es/v/4.18.1 4.18.1 MIT
536536
https://npmjs.com/package/lodash-unified/v/1.0.3 1.0.3 MIT
537537
https://npmjs.com/package/magic-string/v/0.30.21 0.30.21 MIT
538538
https://npmjs.com/package/memoize-one/v/6.0.0 6.0.0 MIT

docs/en/changes/changes.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,11 +31,17 @@
3131
* Add OTLP/HTTP receiver support for traces, logs, and metrics (`/v1/traces`, `/v1/logs`, `/v1/metrics`). Supports both `application/x-protobuf` and `application/json` content types.
3232
* Fix: TTL query add metadata TTL.
3333
* Fix: PersistentWorker used wrong TTL for metrics cache if the storage is BanyanDB.
34+
* Add iOS/iPadOS app monitoring via OpenTelemetry Swift SDK (SWIP-11). Includes the `IOS` layer, `IOSHTTPSpanListener` for outbound HTTP client metrics (supports OTel Swift `.old`/`.stable`/`.httpDup` semantic-convention modes via stable-then-legacy attribute fallback), `IOSMetricKitSpanListener` for daily MetricKit metrics (exit counts split by foreground/background, app-launch / hang-time percentile histograms with finite 30 s overflow ceiling), LAL rules for crash/hang diagnostics, Mobile menu, and iOS dashboards.
35+
* Fix LAL `layer: auto` mode dropping logs after extractor set the layer. Codegen now propagates `layer "..."` assignments to `LogMetadata.layer` so `FilterSpec.doSink()` sees the script-decided layer.
36+
* Fix MetricKit histogram percentile metrics being reported at 1000× their true value — the listener now marks its `SampleFamily` with `defaultHistogramBucketUnit(MILLISECONDS)` so MAL's default SECONDS→MS rescale of `le` labels is not applied.
3437

3538
#### UI
39+
* Add mobile menu icon and i18n labels for the iOS layer.
40+
* Fix metric label rendering in multi-expression dashboard widgets.
3641

3742
#### Documentation
3843
* Update LAL documentation with `sourceAttribute()` function and `layer: auto` mode.
44+
* Add iOS app monitoring setup documentation.
3945

4046
All issues and pull requests are [here](https://github.com/apache/skywalking/issues?q=milestone:10.5.0)
4147

docs/en/security/README.md

Lines changed: 12 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,15 @@ Remote Code Execution (RCE) issues.
2424
For some sensitive environment, consider to limit the telemetry report frequency in case of DoS/DDoS for exposed OAP
2525
and UI services.
2626

27-
## appendix
28-
29-
The SkyWalking [client-js](https://github.com/apache/skywalking-client-js) agent is always running out of the secured
30-
environment. Please follow its **security notice** for more details.
27+
## Client-Side Monitoring
28+
29+
Client-side applications — iOS/iPadOS apps (via OpenTelemetry Swift SDK), browser web apps
30+
(via [client-js](https://github.com/apache/skywalking-client-js)), and WeChat/Alipay
31+
mini-programs (via [mini-program-monitor](https://github.com/SkyAPM/mini-program-monitor)) —
32+
send telemetry data **from the public internet** to OAP endpoints including OTLP/HTTP
33+
(`/v1/traces`, `/v1/logs`, `/v1/metrics`), SkyWalking native (`/v3/segments`), and browser
34+
reporting endpoints.
35+
36+
These endpoints accept data from any client without authentication by default. Apply the
37+
security policies listed above, especially rate limiting, to prevent abuse from untrusted
38+
client-side sources.

0 commit comments

Comments
 (0)