Skip to content

Commit ce050f3

Browse files
vldlyrobclark
authored andcommitted
drm/msm/a5xx: fix races in preemption evaluation stage
On A5XX GPUs when preemption is used it's invietable to enter a soft lock-up state in which GPU is stuck at empty ring-buffer doing nothing. This appears as full UI lockup and not detected as GPU hang (because it's not). This happens due to not triggering preemption when it was needed. Sometimes this state can be recovered by some new submit but generally it won't happen because applications are waiting for old submits to retire. One of the reasons why this happens is a race between a5xx_submit and a5xx_preempt_trigger called from IRQ during submit retire. Former thread updates ring->cur of previously empty and not current ring right after latter checks it for emptiness. Then both threads can just exit because for first one preempt_state wasn't NONE yet and for second one all rings appeared to be empty. To prevent such situations from happening we need to establish guarantee for preempt_trigger to make decision after each submit or retire. To implement this we serialize preemption initiation using spinlock. If switch is already in progress we need to re-trigger preemption when it finishes. Fixes: b1fc283 ("drm/msm: Implement preemption for A5XX targets") Signed-off-by: Vladimir Lypak <vladimir.lypak@gmail.com> Patchwork: https://patchwork.freedesktop.org/patch/612045/ Signed-off-by: Rob Clark <robdclark@chromium.org>
1 parent 64fd6d0 commit ce050f3

2 files changed

Lines changed: 23 additions & 2 deletions

File tree

drivers/gpu/drm/msm/adreno/a5xx_gpu.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,7 @@ struct a5xx_gpu {
3636
uint64_t preempt_iova[MSM_GPU_MAX_RINGS];
3737

3838
atomic_t preempt_state;
39+
spinlock_t preempt_start_lock;
3940
struct timer_list preempt_timer;
4041

4142
struct drm_gem_object *shadow_bo;

drivers/gpu/drm/msm/adreno/a5xx_preempt.c

Lines changed: 22 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -97,12 +97,19 @@ void a5xx_preempt_trigger(struct msm_gpu *gpu)
9797
if (gpu->nr_rings == 1)
9898
return;
9999

100+
/*
101+
* Serialize preemption start to ensure that we always make
102+
* decision on latest state. Otherwise we can get stuck in
103+
* lower priority or empty ring.
104+
*/
105+
spin_lock_irqsave(&a5xx_gpu->preempt_start_lock, flags);
106+
100107
/*
101108
* Try to start preemption by moving from NONE to START. If
102109
* unsuccessful, a preemption is already in flight
103110
*/
104111
if (!try_preempt_state(a5xx_gpu, PREEMPT_NONE, PREEMPT_START))
105-
return;
112+
goto out;
106113

107114
/* Get the next ring to preempt to */
108115
ring = get_next_ring(gpu);
@@ -127,9 +134,11 @@ void a5xx_preempt_trigger(struct msm_gpu *gpu)
127134
set_preempt_state(a5xx_gpu, PREEMPT_ABORT);
128135
update_wptr(gpu, a5xx_gpu->cur_ring);
129136
set_preempt_state(a5xx_gpu, PREEMPT_NONE);
130-
return;
137+
goto out;
131138
}
132139

140+
spin_unlock_irqrestore(&a5xx_gpu->preempt_start_lock, flags);
141+
133142
/* Make sure the wptr doesn't update while we're in motion */
134143
spin_lock_irqsave(&ring->preempt_lock, flags);
135144
a5xx_gpu->preempt[ring->id]->wptr = get_wptr(ring);
@@ -152,6 +161,10 @@ void a5xx_preempt_trigger(struct msm_gpu *gpu)
152161

153162
/* And actually start the preemption */
154163
gpu_write(gpu, REG_A5XX_CP_CONTEXT_SWITCH_CNTL, 1);
164+
return;
165+
166+
out:
167+
spin_unlock_irqrestore(&a5xx_gpu->preempt_start_lock, flags);
155168
}
156169

157170
void a5xx_preempt_irq(struct msm_gpu *gpu)
@@ -188,6 +201,12 @@ void a5xx_preempt_irq(struct msm_gpu *gpu)
188201
update_wptr(gpu, a5xx_gpu->cur_ring);
189202

190203
set_preempt_state(a5xx_gpu, PREEMPT_NONE);
204+
205+
/*
206+
* Try to trigger preemption again in case there was a submit or
207+
* retire during ring switch
208+
*/
209+
a5xx_preempt_trigger(gpu);
191210
}
192211

193212
void a5xx_preempt_hw_init(struct msm_gpu *gpu)
@@ -300,5 +319,6 @@ void a5xx_preempt_init(struct msm_gpu *gpu)
300319
}
301320
}
302321

322+
spin_lock_init(&a5xx_gpu->preempt_start_lock);
303323
timer_setup(&a5xx_gpu->preempt_timer, a5xx_preempt_timer, 0);
304324
}

0 commit comments

Comments
 (0)