You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: Rocgdb/README.md
+31-10Lines changed: 31 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -26,22 +26,43 @@ You can see some information on the GPU you will be running on by doing:
26
26
rocm-smi
27
27
```
28
28
29
-
To introduce an error in your program, comment out the `hipMalloc` calls at line 71 and 72, then compile with:
29
+
To introduce an error in your program, comment out the `hipMalloc` calls at lines 81 and 82 in `saxpy.hip`. You can do this manually or use the following `sed` commands:
Since the code uses the `hipCheck` error-checking macro on the `hipMemcpy` calls at lines 83 and 84, you also need to remove these wrappers. Otherwise, the invalid pointer error will be caught at the `hipMemcpy` stage before reaching the kernel launch:
Running the program, you will see the expected runtime error:
51
+
Running the program, you will see a runtime error. The error message format differs between ROCm versions:
38
52
53
+
**ROCm 6.x:**
39
54
```bash
40
55
./saxpy
41
56
Memory access fault by GPU node-2 (Agent handle: 0x2284d90) on address (nil). Reason: Unknown.
42
57
Aborted (core dumped)
43
58
```
44
59
60
+
**ROCm 7.x:**
61
+
```bash
62
+
./saxpy
63
+
GPU API Error - /path/to/saxpy.hip:89: 'an illegal memory access was encountered'
64
+
```
65
+
45
66
To run the code with the `rocgdb` debugger, do:
46
67
47
68
```bash
@@ -60,10 +81,10 @@ For the latter command above, you need to have `cgdb` installed on your system.
60
81
In the debugger, type `run` (or just `r`) and you will get an error similar to this one:
61
82
62
83
```bash
63
-
Thread 3"saxpy" received signal SIGSEGV, Segmentation fault.
64
-
[Switching to thread 3, lane 0 (AMDGPU Lane 1:2:1:1/0 (0,0,0)[0,0,0])]
65
-
0x00007ffff7ec1094insaxpy() at saxpy.cpp:57
66
-
57 y[i] += a*x[i];
84
+
Thread 5"saxpy" received signal SIGSEGV, Segmentation fault.
85
+
[Switching to thread 5, lane 0 (AMDGPU Lane 1:1:1:1/0 (0,0,0)[0,0,0])]
86
+
0x00007ffff623198cinsaxpy() at saxpy.hip:67
87
+
67 y[i] += a*x[i];
67
88
```
68
89
69
90
Note that the cmake build type is set to `RelWithDebInfo` (see line 8 in CMakeLists.txt). With this build type, the debugger will be aware of the debug symbols. If that was not the case (for instance if compiling in `Release` mode), running the code with the debugger you would get an error message ***without*** line info, and also a warning like this one:
@@ -81,7 +102,7 @@ th 1
81
102
where
82
103
```
83
104
84
-
You can add breakpoints with `break` (or `b`) followed by the line number. For instance to put a breakpoint right after the `hipMalloc` lines do `b 72`.
105
+
You can add breakpoints with `break` (or `b`) followed by the line number. For instance to put a breakpoint right after the `hipMalloc` lines do `b 83`.
85
106
86
107
When possible, it is also advised to compile without optimization flags (so using `-O0`) to avoid seeing breakpoints placed on lines different than those specified with the breakpoint command.
87
108
@@ -92,11 +113,11 @@ To list all the breakpoints that have been inserted type `info break` (or `i b`)
92
113
```bash
93
114
(gdb) i b
94
115
Num Type Disp Enb Address What
95
-
1 breakpoint keep y 0x000000000020b334 inmain() at /HPCTrainingExamples/HIP/saxpy/saxpy.hip:74
96
-
2 breakpoint keep y 0x000000000020b350 inmain() at /HPCTrainingExamples/HIP/saxpy/saxpy.hip:78
116
+
1 breakpoint keep y 0x000000000020b334 inmain() at /HPCTrainingExamples/HIP/saxpy/saxpy.hip:85
117
+
2 breakpoint keep y 0x000000000020b350 inmain() at /HPCTrainingExamples/HIP/saxpy/saxpy.hip:88
97
118
```
98
119
99
-
A breakpoint can be removed with `delete <Num>` (or `d <Num>`): note that `<Num>` is the breakpoint ID displayed above. For instance, to remove the breakpoint at line 74, you have to do `d 1`.
120
+
A breakpoint can be removed with `delete <Num>` (or `d <Num>`): note that `<Num>` is the breakpoint ID displayed above. For instance, to remove the breakpoint at line 85, you have to do `d 1`.
100
121
101
122
To proceed to the next line you can do `next` (or `n`). To step into a function, do `step` (or `s`) and to get out do `finish`. Note that if a breakpoint is at a kernel, doing `n` or `s` will switch between different threads. To avoid this behavior, it is necessary to disable the breakpoint at the kernel with `disable <Num>`.
0 commit comments