You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Definitively fix race in periodic job enqueuer (#438)
Related to #409. I think I finally cracked it. For real this time. No
workarounds, no hacks.
If you look at the "after start" tests, they look roughly like this,
start the client, then add some jobs:
startService(t, svc)
svc.Add(
&PeriodicJob{ScheduleFunc: periodicIntervalSchedule(500 * time.Millisecond), ConstructorFunc: jobConstructorFunc("periodic_job_500ms", false)},
)
svc.Add(
&PeriodicJob{ScheduleFunc: periodicIntervalSchedule(500 * time.Millisecond), ConstructorFunc: jobConstructorFunc("periodic_job_500ms_start", false), RunOnStart: true},
)
This would usually work as expected, with the enqueuer starting up, then
jobs added, then those new jobs scheduled.
_However_, what can happen is that the "after start" jobs are actually
added in _before_ the enqueuer has finished entering its run loop. So,
the service is starting up, makes a first call to `timeUntilNextRun`,
and because the newly added periodic jobs have not been assigned a first
run time, the code below would get a `timerUntilNextRun` of zero, which
would cause it to enter the loop and run an insert iteration
immediately, despite the jobs not actually being appropriate for
scheduling as of yet:
timerUntilNextRun := time.NewTimer(s.timeUntilNextRun())
for {
select {
case <-timerUntilNextRun.C:
The fix is to modify `timeUntilNextRun` so that it ignores periodic jobs
that have not yet been scheduled:
for _, periodicJob := range s.periodicJobs {
// Jobs may have been added after service start, but before this
// function runs for the first time. They're not scheduled properly yet,
// but they will be soon, at which point this function will run again.
// Skip them for now.
if periodicJob.nextRunAt.IsZero() {
continue
}
Those jobs will be scheduled soon, then `timeUntilNextRun` called again,
and the expected "time until" values returned.
I'm confident enough in the fix that I've reverted most of #416 so the
5s jobs go back to 500ms.
Fixes#409.
0 commit comments