Skip to content

Commit 39be5f0

Browse files
fulloclaude
andcommitted
Ensure all 3 examples show progressive SCI improvement on every iteration
01-string-processing: - Iteration 1 now more wasteful: str_replace on 2MB string + substr_count to parse own HTML output. Dataset increased to 20,000 records. - Iteration 2 left with second loop for stats (intentional gap). - Results: 0.182 → 0.154 → 0.104 mgCO2eq (-43%) 02-database-simulation: - Iteration 2 now uses array_filter for join (O(n) per lookup = O(n²) total on flat arrays) instead of O(1) hash-map. - Iteration 3 unchanged (hash-map + inline aggregation). - Results: 0.464 → 0.211 → 0.007 mgCO2eq (-98%) 03-json-api: unchanged (already progressive). - Results: 0.453 → 0.212 → 0.144 mgCO2eq (-68%) All sparklines now show clear downward trend: █▅▁, █▄▁, █▂▁ Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent e4cf6e9 commit 39be5f0

7 files changed

Lines changed: 113 additions & 83 deletions

File tree

examples/01-string-processing.php

Lines changed: 29 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -8,9 +8,9 @@
88
* Simulates building an HTML report from 5,000 records.
99
* Run 3 times with increasing iteration number to see SCI drop:
1010
*
11-
* php 01-string-processing.php 1 ← naive: .= in loop
12-
* php 01-string-processing.php 2 ← fix: array + implode
13-
* php 01-string-processing.php 3 ← refined: sprintf + single-pass stats
11+
* php 01-string-processing.php 1 ← naive: .= in loop + str_replace on 2MB + substr_count
12+
* php 01-string-processing.php 2 ← fix: array + implode (but still 2 loops)
13+
* php 01-string-processing.php 3 ← refined: sprintf + single-pass stats in one loop
1414
*
1515
* @author fullo <https://github.com/fullo>
1616
* @license MIT
@@ -20,10 +20,11 @@
2020
$iteration = (int) ($argv[1] ?? 1);
2121
echo "=== String Processing — iteration {$iteration}/3 ===\n";
2222

23-
// ── Generate 5,000 user records (same seed for all iterations) ──
23+
// ── Generate 20,000 user records (same seed for all iterations) ──
24+
// Larger dataset makes string handling differences measurable.
2425
mt_srand(42);
2526
$users = [];
26-
for ($i = 0; $i < 5000; $i++) {
27+
for ($i = 0; $i < 20000; $i++) {
2728
$users[] = [
2829
'id' => $i + 1,
2930
'name' => 'User ' . str_pad((string) ($i + 1), 4, '0', STR_PAD_LEFT),
@@ -42,14 +43,18 @@
4243
$footer = '</tbody></table>';
4344

4445
match ($iteration) {
45-
// ── Iteration 1: Naive — string concatenation in a loop ──
46-
// Each .= copies the entire $html string (O(n²) memory operations).
46+
// ── Iteration 1: Maximally naive — concatenation + redundant processing ──
47+
// 7 separate .= per row (each copies the entire growing string).
48+
// After building the HTML, runs str_replace on the full string to
49+
// "fix" the CSS class names — a common anti-pattern in legacy code.
50+
// Then re-counts everything in separate loops.
4751
1 => (function () use ($users, $header, $footer): string {
4852
$html = $header;
4953

5054
foreach ($users as $user) {
5155
$class = $user['active'] ? '' : ' class="inactive"';
5256
$status = $user['active'] ? 'Active' : 'Inactive';
57+
// 7 separate concatenations per row — each copies entire $html
5358
$html .= '<tr' . $class . '>';
5459
$html .= '<td>' . $user['id'] . '</td>';
5560
$html .= '<td>' . htmlspecialchars($user['name']) . '</td>';
@@ -61,25 +66,33 @@
6166

6267
$html .= $footer;
6368

64-
// Summary: second loop over all users
65-
$active = 0;
69+
// Wasteful: "fix" class names via str_replace on the entire ~2MB string
70+
$html = str_replace('class="inactive"', 'class="user-inactive"', $html);
71+
$html = str_replace('class="user-inactive"', 'class="inactive"', $html);
72+
73+
// Wasteful: count active users by parsing the HTML we just built
74+
$active = substr_count($html, '<td>Active</td>');
75+
$inactive = substr_count($html, '<td>Inactive</td>');
76+
77+
// Wasteful: compute total score in a separate loop
6678
$total = 0.0;
6779
foreach ($users as $user) {
68-
if ($user['active']) {
69-
$active++;
70-
}
7180
$total += $user['score'];
7281
}
73-
$html .= '<p>Active: ' . $active . '/' . count($users) . '</p>';
82+
83+
$html .= '<p>Active: ' . $active . '/' . ($active + $inactive) . '</p>';
7484
$html .= '<p>Avg score: ' . number_format($total / count($users), 2) . '</p>';
7585
$html .= '</body></html>';
7686

7787
echo 'Output: ' . strlen($html) . " bytes | Active: {$active}\n";
7888
return $html;
7989
})(),
8090

81-
// ── Iteration 2: Fix — array + implode, single allocation ──
82-
// Each $parts[] = '...' is O(1). implode() does one allocation at the end.
91+
// ── Iteration 2: array + implode, but still two loops ──
92+
// Fixed: no more .= concatenation. Uses array + implode.
93+
// Remaining issue: summary stats computed in a separate second loop
94+
// over all 20,000 records. Also uses string concatenation for each row
95+
// instead of sprintf.
8396
2 => (function () use ($users, $header, $footer): string {
8497
$parts = [$header];
8598

@@ -97,7 +110,7 @@
97110

98111
$parts[] = $footer;
99112

100-
// Summary: still a second loop
113+
// Still a second loop for summary — iterates 20,000 records again
101114
$active = 0;
102115
$total = 0.0;
103116
foreach ($users as $user) {

examples/02-database-simulation.php

Lines changed: 26 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -9,8 +9,8 @@
99
* Uses usleep() to simulate real database query latency (50μs per query).
1010
*
1111
* php 02-database-simulation.php 1 ← naive: N+1 queries (1,001 total)
12-
* php 02-database-simulation.php 2 ← fix: 3 batch queries + hash join
13-
* php 02-database-simulation.php 3 ← refined: batch + inline aggregation
12+
* php 02-database-simulation.php 2 ← fix: 3 batch queries, but linear scan join O(n²)
13+
* php 02-database-simulation.php 3 ← refined: 3 batch + hash-map O(1) + inline aggregation
1414
*
1515
* @author fullo <https://github.com/fullo>
1616
* @license MIT
@@ -101,18 +101,34 @@ function dbQuery(string $description): void
101101
echo "Orders: " . count($results) . " | Queries: {$queryCount} | Revenue: $" . number_format($revenue, 2) . "\n";
102102
})(),
103103

104-
// ── Iteration 2: 3 batch queries ──
105-
// Fetch all data upfront, join in PHP with O(1) hash lookups.
104+
// ── Iteration 2: 3 batch queries, but linear scan for join ──
105+
// Good: only 3 queries instead of 1,001.
106+
// Bad: customer lookup uses array_filter (O(n) per order = O(n²) total)
107+
// instead of indexed array access. Also builds a flat customer list
108+
// first, losing the indexed structure.
106109
2 => (function () use ($orders, $customers, $orderItems, &$queryCount): void {
107110
dbQuery('SELECT * FROM orders');
108111
dbQuery('SELECT * FROM customers WHERE id IN (...)');
109112
dbQuery('SELECT * FROM order_items WHERE order_id IN (...)');
110113
$queryCount = 3;
111114

115+
// Simulate receiving batch results as flat arrays (no index)
116+
$customerList = array_values($customers);
117+
$itemsByOrder = [];
118+
foreach ($orderItems as $orderId => $items) {
119+
foreach ($items as $item) {
120+
$itemsByOrder[] = $item;
121+
}
122+
}
123+
112124
$results = [];
113125
foreach ($orders as $order) {
114-
$customer = $customers[$order['customer_id']];
115-
$items = $orderItems[$order['id']];
126+
// O(n) linear scan to find customer — array_filter on 200 customers × 500 orders
127+
$matches = array_filter($customerList, fn ($c) => $c['id'] === $order['customer_id']);
128+
$customer = reset($matches);
129+
130+
// O(n) linear scan for order items
131+
$items = array_filter($itemsByOrder, fn ($i) => $i['order_id'] === $order['id']);
116132

117133
$total = 0.0;
118134
foreach ($items as $item) {
@@ -131,15 +147,16 @@ function dbQuery(string $description): void
131147
echo "Orders: " . count($results) . " | Queries: {$queryCount} | Revenue: $" . number_format($revenue, 2) . "\n";
132148
})(),
133149

134-
// ── Iteration 3: batch + inline aggregation ──
135-
// Same 3 queries, but revenue computed inline — no second loop,
136-
// no intermediate $results array (saves memory + CPU).
150+
// ── Iteration 3: batch queries + hash-map join + inline aggregation ──
151+
// 3 queries + O(1) hash-map lookups + revenue computed inline.
152+
// No intermediate $results array, no second summary loop.
137153
3 => (function () use ($orders, $customers, $orderItems, &$queryCount): void {
138154
dbQuery('SELECT * FROM orders');
139155
dbQuery('SELECT * FROM customers WHERE id IN (...)');
140156
dbQuery('SELECT * FROM order_items WHERE order_id IN (...)');
141157
$queryCount = 3;
142158

159+
// $customers and $orderItems are already indexed by ID — O(1) lookup
143160
$revenue = 0.0;
144161
$count = 0;
145162

examples/README.md

Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -32,35 +32,35 @@ The `run-all.sh` script:
3232

3333
## Examples
3434

35-
### 01 — String Processing
35+
### 01 — String Processing (20,000 records)
3636

3737
| Iteration | Approach | SCI |
3838
|-----------|----------|-----|
39-
| 1 (naive) | `.=` concatenation in loop — O(n²) memory copies | 0.035 mgCO2eq |
40-
| 2 (optimized) | Array of parts + `implode()` — O(n) allocation | 0.030 mgCO2eq |
41-
| 3 (refined) | `sprintf` per row + single-pass stats — no second loop | 0.026 mgCO2eq |
39+
| 1 (naive) | `.=` in loop (7 per row) + `str_replace` on 2MB + `substr_count` | 0.182 mgCO2eq |
40+
| 2 (optimized) | Array + `implode()` (but still 2 loops for summary) | 0.154 mgCO2eq |
41+
| 3 (refined) | `sprintf` per row + single-pass stats in one loop | 0.104 mgCO2eq |
4242

43-
**Total reduction: ~30%**
43+
**Total reduction: ~43%**
4444

45-
### 02 — Database Simulation (N+1 Queries)
45+
### 02 — Database Simulation (N+1 → Batch)
4646

4747
| Iteration | Approach | SCI |
4848
|-----------|----------|-----|
49-
| 1 (naive) | N+1 queries: 1,001 total (50μs each) | 0.468 mgCO2eq |
50-
| 2 (optimized) | 3 batch queries + hash-map join | 0.008 mgCO2eq |
51-
| 3 (refined) | Batch + inline aggregation, no intermediate array | 0.007 mgCO2eq |
49+
| 1 (naive) | N+1 queries: 1,001 total (50μs each) | 0.464 mgCO2eq |
50+
| 2 (optimized) | 3 batch queries, but linear scan O(n²) for join | 0.211 mgCO2eq |
51+
| 3 (refined) | 3 batch + hash-map O(1) join + inline aggregation | 0.007 mgCO2eq |
5252

5353
**Total reduction: ~98%**
5454

5555
### 03 — JSON API Processing (10,000 events)
5656

5757
| Iteration | Approach | SCI |
5858
|-----------|----------|-----|
59-
| 1 (naive) | Double decode, sort, 6 `array_filter` passes, per-record `json_encode` | 0.506 mgCO2eq |
60-
| 2 (optimized) | Single-pass aggregation + one `json_encode` | 0.219 mgCO2eq |
61-
| 3 (refined) | Regex extraction from raw JSON — no full decode at all | 0.151 mgCO2eq |
59+
| 1 (naive) | Double decode, sort, 6 `array_filter` passes, per-record `json_encode` | 0.453 mgCO2eq |
60+
| 2 (optimized) | Single-pass aggregation + one `json_encode` | 0.212 mgCO2eq |
61+
| 3 (refined) | Regex extraction from raw JSON — no full decode at all | 0.144 mgCO2eq |
6262

63-
**Total reduction: ~70%**
63+
**Total reduction: ~68%**
6464

6565
## Generated Reports
6666

0 commit comments

Comments
 (0)