Commit 26bbd49
committed
fix(compute): reuse dst GPU memory instead of allocating per call (#84)
GPU ops (gpuBinaryOp, gpuUnaryOp, gpuScalarOp, Transpose, MatMul, Sum)
were allocating fresh device memory via pool.Alloc on every call even
when a pre-sized dst tensor was provided, then swapping dst's storage
to the new allocation. The old GPUStorage was orphaned and depended on
Go's GC finalizer to call pool.Free. At large training shapes with
hundreds of batches and ~20 ops per batch, orphaned allocations piled
up faster than the GC could reclaim, causing unbounded GPU memory
growth and OOM.
Fix: add tryReuseDstPtr helper that checks if dst[0] already has a
GPUStorage with sufficient capacity. If so, the kernel writes directly
into the existing device pointer — no pool.Alloc, no orphaned storage,
no GC pressure. When dst is nil or undersized, the existing alloc path
is preserved unchanged.
Applied to the six hot-path op families that cover PatchTST GPU training:
- gpuBinaryOp (Add, Sub, Mul same-shape)
- gpuUnaryOp (Exp, Log, Sin, Cos, Tanh, Sqrt)
- gpuScalarOp (MulScalar, AddScalar, DivScalar)
- Transpose (gpu_engine_memory.go)
- MatMul standard float32 path (gpu_engine.go)
- Sum/ReduceSum (gpu_kernels.go)
Other ops (broadcast, Q4/Q8/BF16 matmul, fused kernels) continue
using the existing alloc path and can be converted incrementally.
Full ztensor test suite passes on CPU host.
Closes #84
Refs zerfoo/zerfoo#3731 parent 18a53fe commit 26bbd49
3 files changed
Lines changed: 124 additions & 31 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
978 | 978 | | |
979 | 979 | | |
980 | 980 | | |
981 | | - | |
982 | | - | |
983 | | - | |
984 | | - | |
985 | | - | |
| 981 | + | |
| 982 | + | |
| 983 | + | |
| 984 | + | |
| 985 | + | |
| 986 | + | |
| 987 | + | |
986 | 988 | | |
987 | | - | |
| 989 | + | |
| 990 | + | |
988 | 991 | | |
989 | 992 | | |
990 | 993 | | |
| |||
1013 | 1016 | | |
1014 | 1017 | | |
1015 | 1018 | | |
1016 | | - | |
| 1019 | + | |
| 1020 | + | |
| 1021 | + | |
1017 | 1022 | | |
1018 | 1023 | | |
| 1024 | + | |
| 1025 | + | |
| 1026 | + | |
1019 | 1027 | | |
1020 | 1028 | | |
1021 | 1029 | | |
| |||
1052 | 1060 | | |
1053 | 1061 | | |
1054 | 1062 | | |
1055 | | - | |
| 1063 | + | |
| 1064 | + | |
| 1065 | + | |
1056 | 1066 | | |
1057 | 1067 | | |
1058 | 1068 | | |
1059 | 1069 | | |
1060 | 1070 | | |
| 1071 | + | |
| 1072 | + | |
| 1073 | + | |
1061 | 1074 | | |
1062 | 1075 | | |
1063 | 1076 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
132 | 132 | | |
133 | 133 | | |
134 | 134 | | |
135 | | - | |
136 | | - | |
137 | | - | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
138 | 143 | | |
139 | 144 | | |
140 | 145 | | |
| |||
145 | 150 | | |
146 | 151 | | |
147 | 152 | | |
148 | | - | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
149 | 156 | | |
150 | 157 | | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
151 | 161 | | |
152 | 162 | | |
153 | 163 | | |
| |||
175 | 185 | | |
176 | 186 | | |
177 | 187 | | |
178 | | - | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
179 | 191 | | |
180 | 192 | | |
181 | 193 | | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
182 | 197 | | |
183 | 198 | | |
184 | 199 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
115 | 115 | | |
116 | 116 | | |
117 | 117 | | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
118 | 147 | | |
119 | 148 | | |
120 | 149 | | |
| |||
522 | 551 | | |
523 | 552 | | |
524 | 553 | | |
525 | | - | |
526 | | - | |
527 | | - | |
| 554 | + | |
| 555 | + | |
| 556 | + | |
| 557 | + | |
| 558 | + | |
| 559 | + | |
| 560 | + | |
528 | 561 | | |
529 | 562 | | |
530 | 563 | | |
531 | | - | |
| 564 | + | |
| 565 | + | |
| 566 | + | |
532 | 567 | | |
533 | 568 | | |
534 | 569 | | |
535 | 570 | | |
| 571 | + | |
| 572 | + | |
| 573 | + | |
536 | 574 | | |
537 | 575 | | |
538 | 576 | | |
| |||
559 | 597 | | |
560 | 598 | | |
561 | 599 | | |
562 | | - | |
563 | | - | |
564 | | - | |
| 600 | + | |
| 601 | + | |
| 602 | + | |
| 603 | + | |
| 604 | + | |
| 605 | + | |
| 606 | + | |
565 | 607 | | |
566 | 608 | | |
567 | 609 | | |
568 | | - | |
| 610 | + | |
| 611 | + | |
| 612 | + | |
569 | 613 | | |
570 | 614 | | |
571 | 615 | | |
572 | 616 | | |
| 617 | + | |
| 618 | + | |
| 619 | + | |
573 | 620 | | |
574 | 621 | | |
575 | 622 | | |
| |||
597 | 644 | | |
598 | 645 | | |
599 | 646 | | |
600 | | - | |
601 | | - | |
602 | | - | |
| 647 | + | |
| 648 | + | |
| 649 | + | |
| 650 | + | |
| 651 | + | |
| 652 | + | |
| 653 | + | |
603 | 654 | | |
604 | 655 | | |
605 | 656 | | |
606 | | - | |
| 657 | + | |
| 658 | + | |
| 659 | + | |
607 | 660 | | |
608 | 661 | | |
609 | 662 | | |
610 | 663 | | |
| 664 | + | |
| 665 | + | |
| 666 | + | |
611 | 667 | | |
612 | 668 | | |
613 | 669 | | |
| |||
957 | 1013 | | |
958 | 1014 | | |
959 | 1015 | | |
960 | | - | |
961 | | - | |
962 | | - | |
963 | | - | |
| 1016 | + | |
| 1017 | + | |
| 1018 | + | |
| 1019 | + | |
| 1020 | + | |
| 1021 | + | |
| 1022 | + | |
964 | 1023 | | |
965 | | - | |
| 1024 | + | |
| 1025 | + | |
966 | 1026 | | |
967 | 1027 | | |
968 | 1028 | | |
969 | | - | |
| 1029 | + | |
| 1030 | + | |
| 1031 | + | |
970 | 1032 | | |
971 | 1033 | | |
972 | 1034 | | |
973 | 1035 | | |
| 1036 | + | |
| 1037 | + | |
| 1038 | + | |
974 | 1039 | | |
975 | 1040 | | |
976 | 1041 | | |
| |||
0 commit comments