Commit 34f5ef4
metal: add per-token flush/op counters for Issue #16 investigation
tq_metal_diag_get/reset return how many times tq_metal_batch_flush()
hit the GPU sync path during a run, plus the total ops in those flushes.
The PPL tool now prints flushes/token + ops/flush at the end of an eval
when TQ_HAS_METAL is set.
This gives empirical answers (instead of guesses) to:
- How often does the dispatch path actually fire?
- How many ops are amortized per flush?
Used during Issue #16 investigation to confirm the Q8_0 weight path
never enters Metal batch mode (0 flushes/token), narrowing the
slowdown source to Q4_K (gguf_w*) and the fused tq_metal_forward_layer
Q4 path.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>1 parent 2dcbde4 commit 34f5ef4
2 files changed
Lines changed: 33 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
667 | 667 | | |
668 | 668 | | |
669 | 669 | | |
| 670 | + | |
| 671 | + | |
| 672 | + | |
| 673 | + | |
| 674 | + | |
| 675 | + | |
| 676 | + | |
| 677 | + | |
| 678 | + | |
| 679 | + | |
| 680 | + | |
| 681 | + | |
| 682 | + | |
| 683 | + | |
| 684 | + | |
| 685 | + | |
670 | 686 | | |
671 | 687 | | |
672 | 688 | | |
| |||
677 | 693 | | |
678 | 694 | | |
679 | 695 | | |
| 696 | + | |
| 697 | + | |
680 | 698 | | |
681 | 699 | | |
682 | 700 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
510 | 510 | | |
511 | 511 | | |
512 | 512 | | |
| 513 | + | |
| 514 | + | |
| 515 | + | |
| 516 | + | |
| 517 | + | |
| 518 | + | |
| 519 | + | |
| 520 | + | |
| 521 | + | |
| 522 | + | |
| 523 | + | |
| 524 | + | |
| 525 | + | |
| 526 | + | |
| 527 | + | |
513 | 528 | | |
514 | 529 | | |
515 | 530 | | |
| |||
0 commit comments