Skip to content

Commit 42acfb9

Browse files
committed
Merge origin/main into codex/docs-sync
2 parents ac36b34 + 922b7db commit 42acfb9

14 files changed

Lines changed: 1529 additions & 90 deletions

README.md

Lines changed: 104 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,60 @@
11
# Janus
22

3-
Janus is a Rust engine for hybrid RDF stream processing. It combines:
3+
Janus is a Rust engine for unified historical and live RDF stream processing.
4+
5+
It combines:
6+
7+
- historical window evaluation over segmented RDF storage
8+
- live window evaluation over incoming streams
9+
- a single Janus-QL query model for hybrid queries
10+
- an HTTP/WebSocket API for query lifecycle management and result delivery
11+
12+
The name comes from the Roman deity Janus, associated with transitions and with looking both backward and forward. That dual perspective matches Janus's goal: querying past and live RDF data together.
13+
14+
## What Janus Supports
15+
16+
- Historical windows with `START` / `END`
17+
- Sliding live windows with `RANGE` / `STEP`
18+
- Hybrid queries that mix historical and live windows
19+
- Extension functions for anomaly-style predicates such as thresholds, relative change, z-score, outlier checks, and trend divergence
20+
- Optional baseline bootstrapping for hybrid anomaly queries with `USING BASELINE <window> LAST|AGGREGATE`
21+
- HTTP endpoints for registering, starting, stopping, listing, and deleting queries
22+
- WebSocket result streaming for running queries
23+
24+
## Query Model
25+
26+
Janus uses Janus-QL, a hybrid query language for querying historical and live RDF data in one query.
27+
28+
Example:
29+
30+
```sparql
31+
PREFIX ex: <http://example.org/>
32+
PREFIX janus: <https://janus.rs/fn#>
33+
PREFIX baseline: <https://janus.rs/baseline#>
34+
35+
REGISTER RStream ex:out AS
36+
SELECT ?sensor ?reading
37+
FROM NAMED WINDOW ex:hist ON LOG ex:store [START 1700000000000 END 1700003600000]
38+
FROM NAMED WINDOW ex:live ON STREAM ex:stream1 [RANGE 5000 STEP 1000]
39+
USING BASELINE ex:hist AGGREGATE
40+
WHERE {
41+
WINDOW ex:hist {
42+
?sensor ex:mean ?mean .
43+
?sensor ex:sigma ?sigma .
44+
}
45+
WINDOW ex:live {
46+
?sensor ex:hasReading ?reading .
47+
}
48+
?sensor baseline:mean ?mean .
49+
?sensor baseline:sigma ?sigma .
50+
FILTER(janus:is_outlier(?reading, ?mean, ?sigma, 3))
51+
}
52+
```
53+
54+
`USING BASELINE` is optional. If present, Janus bootstraps baseline values from the named historical window before or during live execution:
455

5-
- historical query execution over locally stored RDF events
6-
- live query execution over streaming RDF data
7-
- a JanusQL parser that lowers hybrid queries to SPARQL and RSP-QL
8-
- an HTTP/WebSocket API for query management and result streaming
56+
- `LAST`: use the final historical window snapshot as baseline
57+
- `AGGREGATE`: merge the historical window outputs into one compact baseline
958

1059
## Repository Status
1160

@@ -18,75 +67,94 @@ The backend repository is active and locally healthy:
1867
This repository is the backend and engine implementation.
1968

2069
The maintained dashboard lives in a separate repository:
70+
2171
- `https://github.com/SolidLabResearch/janus-dashboard`
2272

2373
The `janus-dashboard/` folder in this repository is a lightweight local demo client, not the primary frontend.
2474

25-
## What You Can Run
75+
## Performance
2676

27-
### HTTP API server
77+
Janus uses dictionary encoding and segmented storage for high-throughput ingestion and historical reads.
2878

29-
```bash
30-
cargo run --bin http_server
31-
```
79+
- Write throughput: 2.6-3.14 million quads/sec
80+
- Read throughput: 2.7-2.77 million quads/sec
81+
- Point query latency: 0.235 ms at 1M quads
82+
- Space efficiency: about 40% smaller encoded events
83+
84+
Detailed benchmark data is in [docs/BENCHMARK_RESULTS.md](./docs/BENCHMARK_RESULTS.md).
85+
86+
## Quick Start
87+
88+
### Prerequisites
3289

33-
This starts the backend on `http://127.0.0.1:8080` by default.
90+
- Rust stable
91+
- Cargo
92+
- Docker, if you want to run the local MQTT broker from `docker-compose.yml`
3493

35-
### Stream replay / ingestion CLI
94+
### Build
3695

3796
```bash
38-
cargo run --bin stream_bus_cli -- --help
97+
make build
98+
make release
3999
```
40100

41-
Use this to replay RDF input files into storage and optionally publish them to MQTT.
101+
### Run the HTTP API
42102

43-
### Tests
103+
```bash
104+
cargo run --bin http_server -- --host 127.0.0.1 --port 8080 --storage-dir ./data/storage
105+
```
106+
107+
Then check the server:
44108

45109
```bash
46-
make test
110+
curl http://127.0.0.1:8080/health
47111
```
48112

49-
### Linting
113+
### Try the HTTP client example
50114

51115
```bash
52-
make lint
116+
cargo run --example http_client_example
53117
```
54118

55-
## Development
119+
This example demonstrates:
56120

57-
### Prerequisites
121+
- query registration
122+
- query start and stop
123+
- query inspection
124+
- replay control
125+
- WebSocket result consumption
58126

59-
- Rust stable
60-
- Cargo
61-
- Docker, if you want to run the local MQTT broker from `docker-compose.yml`
127+
## Development
62128

63-
### Build
129+
### Common Commands
64130

65131
```bash
66-
make build
67-
make release
132+
make build # debug build
133+
make release # optimized build
134+
make test # full test suite
135+
make test-verbose # verbose tests
136+
make fmt # format code
137+
make fmt-check # check formatting
138+
make lint # clippy with warnings as errors
139+
make check # formatting + linting
140+
make ci-check # local CI script
68141
```
69142

70-
### Full local CI checks
143+
### Examples
71144

72-
```bash
73-
make ci-check
74-
```
145+
The repository includes runnable examples under [`examples/`](./examples), including:
75146

76-
This runs formatting, clippy, tests, and build checks.
147+
- [`examples/http_client_example.rs`](./examples/http_client_example.rs)
77148

78149
## Documentation
79150

80151
Start here:
81152

82153
- [GETTING_STARTED.md](./GETTING_STARTED.md)
83154
- [START_HERE.md](./START_HERE.md)
155+
- [docs/DOCUMENTATION_INDEX.md](./docs/DOCUMENTATION_INDEX.md)
84156
- [docs/README.md](./docs/README.md)
85-
- [docs/README_HTTP_API.md](./docs/README_HTTP_API.md)
86-
87-
Performance notes:
88-
89-
- [docs/BENCHMARK_RESULTS.md](./docs/BENCHMARK_RESULTS.md)
157+
- [docs/HTTP_API_CURRENT.md](./docs/HTTP_API_CURRENT.md)
90158

91159
## Notes
92160

docs/ANOMALY_DETECTION.md

Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
# Anomaly Detection
2+
3+
Janus already supports anomaly-oriented extension functions, but they are stateless functions evaluated within one query execution context.
4+
5+
That distinction matters.
6+
7+
## What Extension Functions Are Good At
8+
9+
Current extension functions are sufficient for:
10+
11+
- fixed thresholds
12+
- relative change checks
13+
- z-score style checks when mean and sigma are already present
14+
- simple outlier or divergence predicates over current bindings
15+
16+
This works well when the query already has everything it needs in one evaluation context.
17+
18+
## Where Baselines Help
19+
20+
Baselines help when live anomaly scoring depends on historical context such as:
21+
22+
- deviation from normal behavior
23+
- per-sensor baselines
24+
- volatility comparison
25+
- recent historical trend
26+
27+
In those cases, Janus can bootstrap compact historical values into live static data and let the live query compare current readings against them.
28+
29+
## What Janus Does Not Do
30+
31+
Janus does not currently maintain a full continuously updated hybrid historical/live relation.
32+
33+
So if you need:
34+
35+
- long-running stateful models
36+
- full seasonal context
37+
- large retained historical buffers inside the engine
38+
39+
you will need either:
40+
41+
- external model state
42+
- future dedicated baseline refresh logic
43+
- more specialized stateful operators
44+
45+
## Recommended Pattern
46+
47+
For a first anomaly-detection pipeline in Janus:
48+
49+
1. Use a historical query that emits one compact row per anchor.
50+
2. Materialize baseline values such as `mean` and `sigma`.
51+
3. Join those values in the live query using `baseline:*` predicates.
52+
4. Apply extension functions on the live side.
53+
54+
Example:
55+
56+
```sparql
57+
PREFIX ex: <http://example.org/>
58+
PREFIX janus: <https://janus.rs/fn#>
59+
PREFIX baseline: <https://janus.rs/baseline#>
60+
61+
REGISTER RStream ex:out AS
62+
SELECT ?sensor ?reading
63+
FROM NAMED WINDOW ex:hist ON LOG ex:store [START 1700000000000 END 1700003600000]
64+
FROM NAMED WINDOW ex:live ON STREAM ex:stream1 [RANGE 5000 STEP 1000]
65+
USING BASELINE ex:hist LAST
66+
WHERE {
67+
WINDOW ex:hist {
68+
?sensor ex:mean ?mean .
69+
?sensor ex:sigma ?sigma .
70+
}
71+
WINDOW ex:live {
72+
?sensor ex:hasReading ?reading .
73+
}
74+
?sensor baseline:mean ?mean .
75+
?sensor baseline:sigma ?sigma .
76+
FILTER(janus:is_outlier(?reading, ?mean, ?sigma, 3))
77+
}
78+
```
79+
80+
## Choosing LAST vs AGGREGATE
81+
82+
- Use `LAST` when you care about the most recent historical regime before live execution.
83+
- Use `AGGREGATE` when you want a more stable summary across multiple historical sliding windows.
84+
- Prefer fixed historical windows unless you have a clear reason to derive a baseline from many historical subwindows.

0 commit comments

Comments
 (0)