Skip to content

Commit d91ccb0

Browse files
authored
Merge pull request #833 from cderici/handle-allwatcher-task-exceptions
#833 #### Description This one was a bit tricky. fixes #829 The `_all_watcher` task is a coroutine for the AllWatcher to run in the background all the time forever, and it involves a while loop that's being controlled manually through some flags (asyncio events), e.g. things like `_watch_stopping`, `watch_stopped`. The problem is that when the `_all_watcher` raises an exception (or receives one from things like `get_config()` like in the case of #829, that exception is thrown in the background somewhere in ether in the event loop, not handled/or re-raised. This is because this coroutine is not `await`ed (for good reason), it can't be `await`ed because there won't ever be any results, this method is supposed to be working in the background forever getting the deltas for us. As a result of this, if `_all_watcher` fails, then external flags like `_watch_received` is never set, and whoever's calling `await self._watch_received.wait()` will block forever (in this case the `_after_connect()`). Similarly the `disconnect()` waits for the `_watch_stopped` flag, which won't be set either, so if we call disconnect when all_watcher failed then it'll hang forever. This change fixes this problem by allowing (at the wait-for-flag spots) to wait for two things, 1) whichever flag we're waiting for, 2) `_all_watcher` task to be `"done()"`. In the latter case, we should expect to see an exception because that task is not supposed to be finished. More importantly, if we do see that the `_watcher_task.done()`, then we don't sit and wait forever for the _all_watcher event flags to be set, so we won't hang. Also a nice side effect of this should be that we should be getting less number of extra exception outputs saying that the "Task exception is never handled", since we do call the `.exception()` on the `_all_watcher` task. Though we'll probably continue to get those from the tasks like `_pinger` and `_debug_log` etc. However, this is a good first example solution to handle them as well. #### QA Steps This should be rigorously tested, as it slightly changes a fundamental mechanism. We do need to make sure all the tests are passing for sure. For the manual QA, what I did was that I changed the body of the `model.get_config()` with `raise JujuError("FOO")` (artificially inducing an error seemingly coming from the api outside of the all_watcher loop). This creates the exact condition happening in the #829. You can also get two controllers and use pylibjuju while running a migration in the background, getting a `migration is in progress` error. Whichever it is, with the error in place, run the following and it should print "Error handled" in the stdout: ```python async def juju_stats(): m = Model() await m.connect() try: asyncio.run(juju_stats()) except JujuError: print("Error handled") ``` #### Notes & Discussion We might wanna also get this onto the other branches after we carefully test and land this onto `2.9` as requested by the #829 .
2 parents eba8dec + 74fd81d commit d91ccb0

15 files changed

Lines changed: 123 additions & 118 deletions

juju/model.py

Lines changed: 38 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -759,7 +759,26 @@ async def _after_connect(self):
759759
# we've received all the model data, which might be
760760
# a whole load of unneeded data if all the client wants
761761
# to do is make one RPC call.
762-
await self._watch_received.wait()
762+
async def watch_received_waiter():
763+
await self._watch_received.wait()
764+
waiter = jasyncio.create_task(watch_received_waiter())
765+
766+
# If we just wait for the _watch_received event and the _all_watcher task
767+
# fails (e.g. because API fails like migration is in progress), then
768+
# we'll hang because the _watch_received will never be set
769+
# Instead, we watch for two things, 1) _watch_received, 2) _all_watcher done
770+
# If _all_watcher is done before the _watch_received, then we should see
771+
# (and raise) an exception coming from the _all_watcher
772+
# Otherwise (i.e. _watch_received is set), then we're good to go
773+
done, pending = await jasyncio.wait([waiter, self._watcher_task],
774+
return_when=jasyncio.FIRST_COMPLETED)
775+
if self._watcher_task in done:
776+
# Cancel the _watch_received.wait
777+
waiter.cancel()
778+
# If there's no exception, then why did the _all_watcher broke its loop?
779+
if not self._watcher_task.exception():
780+
raise JujuError("AllWatcher task is finished abruptly without an exception.")
781+
raise self._watcher_task.exception()
763782

764783
await self.get_info()
765784
self.uuid = self.info.uuid
@@ -771,6 +790,12 @@ async def disconnect(self):
771790
if not self._watch_stopped.is_set():
772791
log.debug('Stopping watcher task')
773792
self._watch_stopping.set()
793+
# If the _all_watcher task is finished,
794+
# check to see an exception, if yes, raise,
795+
# otherwise we should see the _watch_stopped
796+
# flag is set
797+
if self._watcher_task.done() and self._watcher_task.exception():
798+
raise self._watcher_task.exception()
774799
await self._watch_stopped.wait()
775800
self._watch_stopping.clear()
776801

@@ -1040,6 +1065,7 @@ def _watch(self):
10401065
See :meth:`add_observer` to register an onchange callback.
10411066
10421067
"""
1068+
10431069
def _post_step(obj):
10441070
# Once we get the model, ensure we're running in the correct state
10451071
# as a post step.
@@ -1129,7 +1155,7 @@ async def _all_watcher():
11291155
self._watch_received.clear()
11301156
self._watch_stopping.clear()
11311157
self._watch_stopped.clear()
1132-
jasyncio.ensure_future(_all_watcher())
1158+
self._watcher_task = jasyncio.create_task(_all_watcher())
11331159

11341160
async def _notify_observers(self, delta, old_obj, new_obj):
11351161
"""Call observing callbacks, notifying them of a change in model state
@@ -2403,7 +2429,7 @@ async def _get_source_api(self, url, controller_name=None):
24032429

24042430
async def wait_for_idle(self, apps=None, raise_on_error=True, raise_on_blocked=False,
24052431
wait_for_active=False, timeout=10 * 60, idle_period=15, check_freq=0.5,
2406-
status=None, wait_for_units=1, wait_for_exact_units=-1):
2432+
status=None, wait_for_units=None, wait_for_exact_units=-1):
24072433
"""Wait for applications in the model to settle into an idle state.
24082434
24092435
:param apps (list[str]): Optional list of specific app names to wait on.
@@ -2452,6 +2478,8 @@ async def wait_for_idle(self, apps=None, raise_on_error=True, raise_on_blocked=F
24522478
warnings.warn("wait_for_active is deprecated; use status", DeprecationWarning)
24532479
status = "active"
24542480

2481+
_wait_for_units = wait_for_units if wait_for_units is not None else 1
2482+
24552483
timeout = timedelta(seconds=timeout) if timeout is not None else None
24562484
idle_period = timedelta(seconds=idle_period)
24572485
start_time = datetime.now()
@@ -2501,12 +2529,14 @@ def _raise_for_status(entities, status):
25012529
(wait_for_exact_units, len(app.units)))
25022530
continue
25032531
# If we have less # of units then required, then wait a bit more
2504-
elif len(app.units) < wait_for_units:
2532+
elif len(app.units) < _wait_for_units:
25052533
busy.append(app.name + " (not enough units yet - %s/%s)" %
2506-
(len(app.units), wait_for_units))
2534+
(len(app.units), _wait_for_units))
25072535
continue
2508-
elif len(units_ready) >= wait_for_units:
2509-
# No need to keep looking, we have the desired number of units ready to go
2536+
# User wants to see a certain # of units, and we have enough
2537+
elif wait_for_units and len(units_ready) >= _wait_for_units:
2538+
# So no need to keep looking, we have the desired number of units ready to go,
2539+
# exit the loop. Don't return, though, we might still have some errors to raise
25102540
break
25112541
for unit in app.units:
25122542
if unit.machine is not None and unit.machine.status == "error":
@@ -2531,7 +2561,7 @@ def _raise_for_status(entities, status):
25312561
units_ready.add(unit.name)
25322562
now = datetime.now()
25332563
idle_start = idle_times.setdefault(unit.name, now)
2534-
print(f'unit {unit.name} is waiting since : {idle_start} -- now : {now} -- waiting for : {now - idle_start}')
2564+
25352565
if now - idle_start < idle_period:
25362566
busy.append("{} [{}] {}: {}".format(unit.name,
25372567
unit.agent_status,

tests/bundle/bundle.yaml

Lines changed: 8 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -1,32 +1,12 @@
1-
series: xenial
1+
series: jammy
22
applications:
3-
wordpress:
4-
charm: "wordpress"
5-
series: "xenial"
6-
channel: "candidate"
3+
grafana:
4+
charm: "grafana"
5+
channel: stable
76
num_units: 1
8-
annotations:
9-
"gui-x": "339.5"
10-
"gui-y": "-171"
11-
to:
12-
- "0"
13-
mysql:
14-
charm: "mysql"
15-
series: "trusty"
16-
channel: "candidate"
7+
prometheus:
8+
charm: "prometheus"
9+
channel: stable
1710
num_units: 1
18-
annotations:
19-
"gui-x": "79.5"
20-
"gui-y": "-142"
21-
to:
22-
- "1"
2311
relations:
24-
- - "wordpress:db"
25-
- "mysql:db"
26-
machines:
27-
"0":
28-
series: xenial
29-
constraints: "arch=amd64 cores=1 cpu-power=100 mem=1740 root-disk=8192"
30-
"1":
31-
series: trusty
32-
constraints: "arch=amd64 cores=1 cpu-power=100 mem=1740 root-disk=8192"
12+
- ["prometheus:grafana-source", "grafana:grafana-source"]

tests/bundle/mini-bundle.yaml

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,12 @@
1+
series: jammy
12
applications:
2-
ghost:
3-
charm: "ghost"
3+
grafana:
4+
charm: "grafana"
45
channel: stable
56
num_units: 1
6-
mysql:
7-
charm: "mysql"
8-
channel: candidate
7+
prometheus:
8+
charm: "prometheus"
9+
channel: stable
910
num_units: 1
1011
relations:
11-
- ["ghost", "mysql"]
12+
- ["prometheus:grafana-source", "grafana:grafana-source"]
Lines changed: 11 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,15 @@
1-
series: xenial
21
applications:
3-
ghost:
4-
charm: "ghost"
5-
num_units: 1
6-
mysql:
7-
charm: "mysql"
8-
channel: "candidate"
9-
series: "trusty"
2+
helloa:
3+
charm: "hello-juju"
4+
name: "helloa"
5+
channel: stable
106
num_units: 1
117
options:
12-
max-connections: 2
13-
tuning-level: include-base64://config-base64.yaml
8+
application-repo: include-base64://config-base64.yaml
9+
hellob:
10+
charm: "hello-juju"
11+
name: "hellob"
12+
channel: stable
13+
num_units: 1
1414
test:
15-
charm: "../charm"
16-
relations:
17-
- ["ghost", "mysql"]
15+
charm: "../charm"
Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,15 @@
11
applications:
2-
ghost:
3-
charm: "ghost"
2+
helloa:
3+
charm: "hello-juju"
4+
name: "helloa"
45
channel: stable
56
num_units: 1
67
options:
78
config: include-file://config1.yaml
8-
mysql:
9-
charm: "mysql"
10-
channel: candidate
9+
hellob:
10+
charm: "hello-juju"
11+
name: "hellob"
12+
channel: stable
1113
num_units: 1
1214
test:
13-
charm: "../charm"
14-
relations:
15-
- ["ghost", "mysql"]
15+
charm: "../charm"
Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,14 @@
1-
series: xenial
1+
series: jammy
22
applications:
3-
ghost:
4-
charm: "ghost"
3+
grafana:
4+
charm: "grafana"
55
channel: stable
66
num_units: 1
7-
mysql:
8-
charm: "mysql"
7+
prometheus:
8+
charm: "prometheus"
99
channel: stable
1010
num_units: 1
1111
test:
1212
charm: "./tests/integration/charm"
1313
relations:
14-
- ["ghost", "mysql"]
14+
- ["prometheus:grafana-source", "grafana:grafana-source"]
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
ZmFzdA==
1+
aHR0cDovL215LWp1anUuY29t
Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
1-
ghost:
2-
url: "http://my-ghost.blg"
3-
port: 2369
1+
helloa:
2+
application-repo: "http://my-juju.com"
3+
port: 666

tests/integration/bundle/test-overlays/bundle-with-overlay-multi.yaml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,14 @@
11
applications:
22
ghost:
3-
charm: "ghost"
3+
charm: "prometheus"
44
channel: stable
55
num_units: 1
66
mysql:
7-
charm: "mysql"
8-
channel: candidate
7+
charm: "prometheus"
8+
channel: stable
99
num_units: 1
1010
relations:
11-
- ["ghost", "mysql"]
11+
- ["ghost:grafana-source", "mysql:grafana-source"]
1212
--- # overlay.yaml
1313
description: Overlay to remove the ghost app and the relation
1414
applications:

tests/integration/bundle/test-overlays/test-multi-overlay.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,6 @@ description: Another overlay for test multi-overlay
1010
applications:
1111
memcached:
1212
mysql:
13-
charm: "mysql"
14-
channel: candidate
13+
charm: "prometheus"
14+
channel: stable
1515
num_units: 1

0 commit comments

Comments
 (0)