Development/plugin server service state machine by bramoosterhuis · Pull Request #2162 · rdkcentral/Thunder

bramoosterhuis · 2026-06-23T16:44:37Z

No description provided.

- Introduced a StateMachine class to manage the lifecycle states of plugins, including DEACTIVATED, PRECONDITION, ACTIVATED, HIBERNATED, and UNAVAILABLE. - Defined stable and transient states with clear guarantees for callbacks during state transitions. - Added detailed documentation for state transitions, callback guarantees, reentrancy, and crash handling. - Refactored existing plugin state handling to utilize the new StateMachine, improving clarity and maintainability. - Updated the Service class to integrate the StateMachine and handle state change notifications. - Removed redundant locks and streamlined the Evaluate method to leverage the StateMachine's re-evaluation capabilities.

…chine

…sitionSuspender Replaced the manual lock management inside `TransitionYield` with a private, nested RAII class `TransitionSuspender` within `StateMachine`. - Eliminates potential Undefined Behavior (UB) in Release builds where `Unlock()` could accidentally be called on an unheld mutex when assertions are disabled. - Guarantees proper scope-based lock re-acquisition and thread-ID tracking upon leaving `Evaluate()`, ensuring state machine consistency. - Enhances encapsulation by hiding internal transition-locking mechanics from the outside world.

Move Suspend and Resume handling completely into the StateMachine to ensure proper lifecycle tracking and robust thread synchronization. - Eliminate fragile cross-state delegation from PreconditionState by introducing stateless worker methods RequestSuspend() and RequestResume() on the Service class. - Fix potential distributed deadlocks and threadpool starvation by ensuring Service::Lock() is released BEFORE invoking the blocking out-of-process COM/RPC _stateControl->Request() calls in both suspend and resume flows. - Protect _stateControl lifecycle during RPC calls by capturing a local pointer copy under a short-lived Service::Lock() within the transition thread. - Prevent ERROR_ILLEGAL_STATE regressions by explicitly overriding Resume and Suspend in PreconditionState, ensuring symmetric lifecycle state transitions. - Resolve reentrancy/ASSERT risks in DeactivatedState::Resume by invoking the internal _Activate() trigger directly, keeping the operation safe within the existing transition thread context without re-acquiring _transitionLock. - Enforce architectural invariants by adding ASSERT(IsTransitionThread()) guards to the new worker methods.

Introduce a 'DestroyedState' tombstone to the StateMachine to resolve the lifecycle divergence where services could be destroyed while in an active state. - Implement StateMachine::Tombstone() to perform a lock-safe, atomic transition to the DestroyedState. - Ensure that Destroy() properly awaits in-flight transitions by acquiring the _transitionLock before tombstoning. - Guarantee consistent state synchronization between the internal state machine and the base class enum. - Ensure all service entry points (QueryInterface, Activate, etc.) return safe error codes immediately after destruction.

Enforce consistency by upgrading the atomic load of '_current' from relaxed to acquire before dispatching the Resume trigger. - Eliminate an architectural inconsistency that was misleading to readers, bringing Resume inline with Suspend, Reevaluate, and QueryInterface. - Note: The relaxed ordering was technically thread-safe because this load always follows a release store on the exact same transition thread, but explicit acquire semantics improve code readability and maintainability.

Destroy() is a purely administrative operation — it removes the service from the map without deactivating it. The caller is responsible for deactivating the plugin first. Make that precondition explicit so violations are caught immediately in debug builds.

- Add DESTROYED to the STATES section including tombstone precondition - Add Resume and Suspend to the public triggers list in REENTRANCY - Add ServiceMap::_adminLock as level 0 in the lock ordering, above _transitionLock, documenting the hold during Tombstone() - Remove stale reference to TransitionYield, replaced by TransitionSuspender

Both were left over from the original monolithic Activate/Deactivate methods. Wakeup() is now only reachable via HibernatedState, making the HIBERNATED guard dead code. State(ACTIVATED) was redundant since both callers already call SetState(_stateActivated) after Wakeup() returns. Add ASSERT guards consistent with other work methods.

The re-entrancy guard used a thread id compared against 0 as a "no thread" sentinel. pthread_t has no portable null value, so this relied on a glibc/musl assumption. It also lived behind ASSERT, leaving release builds with no protection at all, while _transitionLock is recursive and does not block same-thread re-entry on its own. Split the guard into an active flag plus an owner id so the id never needs a sentinel, and turn re-entry detection into real control flow: public triggers now return ERROR_ILLEGAL_STATE on same-thread re-entry in every build. Wrap ownership in a TransitionScope RAII helper so the flag is cleared even if a state method throws.

The state checks after HibernateProcess() guarded against a concurrent Wakeup() or Deactivate() changing state mid-hibernation. Under the StateMachine model _transitionLock is held for the whole transition, so no concurrent transition can run and these checks can no longer fire. In Hibernate() the check was a standalone guard, now an ASSERT so the invariant stays verifiable in debug without a dead branch or an empty lock/unlock pair in release. In HibernateChildren() the loop break is kept as a visible abort path but no longer takes Service::Lock() to read State(), consistent with the other lock-free State() reads in this file.

The state classes and their base were public but are never named outside the StateMachine. Make the whole hierarchy private — it is an implementation detail. SetState() is only called by the state classes, so move it to private as well. Drop Current(), which had no callers; state is already reachable through Service::State(). IsTransitionThread() and the triggers stay public: the work methods in the cpp assert on the former and Service drives the latter. No behaviour change.

Configure kernel core dumps to /tmp/cores, extract backtraces with gdb, and upload as artifacts. Use always() condition to capture crashes even when test runner tolerates failures with || true.

When the Controller deactivates during shutdown, the DEACTIVATED callback fires a statechange notification after UnloadPlugin() has already released _handler. Server::Notification then asserted on ClassType<Controller>() being non-null, which is null at that point, aborting the process. The function body already guards on controller != nullptr and treats it as a no-op, so a null handler here is a legal shutdown state, not an error. Drop that half of the assert and keep only the real invariant (_controller.IsValid()). StateControlStateChange is left unchanged: it dereferences controller unconditionally and its callback is unregistered in Detach() before teardown, so a null there would be a genuine bug.

The || true was added to let the backtrace step run while the test was crashing during teardown. With the crash fixed and the backtrace step on always(), it is no longer needed and would mask any future abort by keeping the job green. Drop it so the job fails honestly on a crash.

…factor ActivatedState

…t the current state's

…gin-server-service-state-machine

bramoosterhuis added 29 commits May 12, 2026 21:34

Merge branch 'master' into development/plugin-server-service-state-ma…

ca970cd

…chine

Use RAII to safely yield transition ownership in Evaluate()

1eb1ac3

Merge branch 'master' into development/plugin-server-service-state-ma…

9311f22

…chine

Refactor formatting in PluginServer to improve code readability

15f8485

ci: Enable coredump and backtrace collection for smoke test debugging

2913049

Configure kernel core dumps to /tmp/cores, extract backtraces with gdb, and upload as artifacts. Use always() condition to capture crashes even when test runner tolerates failures with || true.

fix: Correct core pattern configuration for core dumps in smoke test

ed8904c

fix: Update core pattern configuration for core dumps in smoke test

d54e7af

fix: Set state to Deactivated in activation path for PluginServer

8d17011

feat: Add ReentrantControl header for state machine testing

e9e4505

fix regression: Implement QueryInterface for DeactivationState and re…

14d5b7a

…factor ActivatedState

fix regression: evaluate both conditions on every Reevaluate, not jus…

89c97a6

…t the current state's

Disable all hibernation logic if not enabled

3d43fed

Disable all suspend and resume functionality in state machine.

d425116

Tests: add StateMachine lifecycle characterization suite

f568c6d

refactor the cmake coverage logic to modern standards

3c89704

Merge remote-tracking branch 'RDKCentral/master' into development/plu…

bd615b8

…gin-server-service-state-machine

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Development/plugin server service state machine#2162

Development/plugin server service state machine#2162
bramoosterhuis wants to merge 29 commits into
masterfrom
development/plugin-server-service-state-machine

bramoosterhuis commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bramoosterhuis commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant