Skip to content

Harden runtime task execution and offline-storage edge cases#1495

Open
bmehta001 wants to merge 4 commits into
microsoft:mainfrom
bmehta001:bhamehta/runtime-correctness-fixes
Open

Harden runtime task execution and offline-storage edge cases#1495
bmehta001 wants to merge 4 commits into
microsoft:mainfrom
bmehta001:bhamehta/runtime-correctness-fixes

Conversation

@bmehta001

Copy link
Copy Markdown
Contributor

Summary

Latent correctness/robustness fixes found during a repo-wide review. Each is small, surgical, and matches existing patterns in the surrounding code.

# Fix Why it matters
1 WorkerThread: wrap (*item)() in try/catch A queued task runs arbitrary work (storage I/O, HTTP encode, user DebugEventListener callbacks). An exception escaping the loop unwinds out of the thread entry function → std::terminate → host process crash. Now contained and logged.
2 TaskDispatcher_CAPI: exception barrier around (*m_task)() Same hazard on the host's external dispatcher thread.
3 capi.cpp mat_open_core: set ctx->handle on the EALREADY path When a guest instance with the same config is already open, the function returned without setting ctx->handle, leaving the caller with an uninitialized/stale handle. Now returns the existing instance's handle.
4 OfflineStorageHandler::Flush: null-guard m_offlineStorageMemory->GetSize() The block right below already null-checks the pointer; reading it first was inconsistent and a potential null deref.
5 OfflineStorageHandler::StoreRecord: per-instance RAM cache limit The cache size limit was held in a function-local static, so the first LogManager's CFG_INT_RAM_QUEUE_SIZE leaked into every other LogManager instance. Now computed once per instance in Initialize() into a member (preserving the original "compute once" intent).

Tests added

  • PalTests.WorkerThreadContainsThrowingTask — tasks throwing std/non-std exceptions do not tear down the worker thread; follow-up tasks still run.
  • TaskDispatcherCAPITests.ExecuteCallbackThatThrowsIsContained — a throwing CAPI task callback does not propagate back into the host dispatcher thread.

Validation

  • WSL (Linux) UnitTests: 525/525 pass, including the two new tests.
  • Targeted suites (PalTests, TaskDispatcherCAPITests, OfflineStorageTests*, MemoryStorage*) all green.

These are latent correctness/robustness fixes found during a repo-wide review.

WorkerThread.cpp: wrap the queued-task invocation `(*item)()` in try/catch.
A task runs arbitrary work (storage I/O, HTTP encode, user DebugEventListener
callbacks); an exception escaping the loop unwinds out of the thread entry
function and calls std::terminate, killing the host process. Contain it and log.

TaskDispatcher_CAPI.cpp: same exception barrier around `(*m_task)()` in
Task_CAPI::OnCallback(), which runs on the host's external dispatcher thread.

capi.cpp (mat_open_core): on the EALREADY path (a guest instance with the same
config is already open) set ctx->handle to the existing instance's handle before
returning, so the caller is not left with an uninitialized/stale handle.

OfflineStorageHandler.cpp (Flush): null-guard m_offlineStorageMemory->GetSize()
read; the subsequent block already null-checks the pointer, so reading it first
was inconsistent and a potential null deref.

OfflineStorageHandler (StoreRecord): the per-instance RAM cache size limit was
held in a function-local `static`, so the first LogManager's CFG_INT_RAM_QUEUE_SIZE
leaked to every other LogManager instance. Compute it once per instance in
Initialize() into a member (preserving the original "compute once" intent).

Tests:
- PalTests.WorkerThreadContainsThrowingTask: a task that throws std/non-std
  exceptions does not tear down the worker thread; follow-up tasks still run.
- TaskDispatcherCAPITests.ExecuteCallbackThatThrowsIsContained: a throwing CAPI
  task callback does not propagate back into the host dispatcher thread.

Files changed:
- lib/pal/WorkerThread.cpp
- lib/pal/TaskDispatcher_CAPI.cpp
- lib/api/capi.cpp
- lib/offline/OfflineStorageHandler.cpp
- lib/offline/OfflineStorageHandler.hpp
- tests/unittests/PalTests.cpp
- tests/unittests/TaskDispatcherCAPITests.cpp

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@bmehta001 bmehta001 requested a review from a team as a code owner June 22, 2026 20:05
@bmehta001 bmehta001 requested a review from Copilot June 22, 2026 21:28

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens task execution and offline-storage edge cases to prevent unhandled exceptions from terminating host processes, fix a C API handle-return correctness gap, and correct per-instance offline RAM cache configuration behavior.

Changes:

  • Add exception containment around queued task execution in WorkerThread and TaskDispatcher_CAPI, with new unit tests covering throwing tasks.
  • Fix mat_open_core to return a usable handle on the EALREADY path.
  • Fix offline storage robustness/config correctness by null-guarding memory storage size access and making RAM cache limits per-instance.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tests/unittests/TaskDispatcherCAPITests.cpp Adds a unit test ensuring a throwing CAPI-dispatched task callback is contained.
tests/unittests/PalTests.cpp Adds a unit test ensuring worker thread survives tasks that throw std and non-std exceptions.
lib/pal/WorkerThread.cpp Wraps task execution in try/catch and logs exceptions to prevent thread unwind/terminate.
lib/pal/TaskDispatcher_CAPI.cpp Adds an exception barrier around externally-dispatched tasks to prevent propagation into host threads.
lib/offline/OfflineStorageHandler.hpp Introduces a per-instance member to hold the RAM cache size limit.
lib/offline/OfflineStorageHandler.cpp Fixes a potential null deref in Flush() and removes cross-instance leakage of RAM cache limit configuration.
lib/api/capi.cpp Ensures mat_open_core sets ctx->handle when returning EALREADY.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread lib/api/capi.cpp
Comment on lines +94 to 98
// Guest instance with the same config is already open.
// Return its handle so the caller still gets a usable handle
// (rather than leaving ctx->handle uninitialized).
ctx->handle = code;
return EALREADY;
Comment on lines +42 to +46
try {
(*m_task)();
}
catch (...) {
}
capi.cpp (mat_open_core, EALREADY path): also set ctx->result to match the
returned status. The success path sets ctx->result before returning; the
EALREADY early-return set ctx->handle but left ctx->result stale, inconsistent
with the other entrypoints.

TaskDispatcher_CAPI.cpp (Task_CAPI::OnCallback): log the contained exception
instead of swallowing it silently, mirroring WorkerThread, so host apps/SDK logs
can diagnose why a queued task failed. Added <exception> include for
std::exception.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Comment thread lib/pal/WorkerThread.cpp
Comment on lines +248 to +252
catch (const std::exception& ex) {
LOG_ERROR("Unhandled exception in worker task: %s", ex.what());
}
catch (...) {
LOG_ERROR("Unhandled non-standard exception in worker task");
Comment thread lib/api/capi.cpp
Comment on lines +94 to +98
// Guest instance with the same config is already open.
// Return its handle so the caller still gets a usable handle
// (rather than leaving ctx->handle uninitialized), and set
// ctx->result to match the returned status like the other paths.
ctx->handle = code;
WorkerThread.cpp: add explicit #include <exception>; the round-1 catch block
uses std::exception but the TU only relied on transitive includes.

capi.cpp (mat_open_core): the early error returns (invalid config, and the two
HttpClient/TaskDispatcher creation catch blocks) left ctx->result and ctx->handle
untouched. Since the public mat.h inline helpers return ctx.handle after calling
in, callers could observe a stale/uninitialized handle on error. Set
ctx->result = EFAULT and ctx->handle = 0 on all three paths (ctx is guaranteed
non-null by mat_open / mat_open_with_params).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Comment thread lib/api/capi.cpp
Comment on lines 143 to 158
#if !defined (ANDROID) || defined(ENABLE_CAPI_HTTP_CLIENT)
// Create custom HttpClient
if (httpSendFn != nullptr && httpCancelFn != nullptr)
{
try
{
auto http = std::make_shared<HttpClient_CAPI>(httpSendFn, httpCancelFn);
clients[code].http = http;
clients[code].config.AddModule(CFG_MODULE_HTTP_CLIENT, http);
}
catch (...)
{
ctx->result = static_cast<evt_status_t>(EFAULT);
ctx->handle = 0;
return EFAULT;
}
Comment thread lib/api/capi.cpp
Comment on lines 161 to 175
// Create custom worker thread
if (taskDispatcherQueueFn != nullptr && taskDispatcherCancelFn != nullptr && taskDispatcherJoinFn != nullptr)
{
try
{
auto taskDispatcher = std::make_shared<PAL::TaskDispatcher_CAPI>(taskDispatcherQueueFn, taskDispatcherCancelFn, taskDispatcherJoinFn);
clients[code].taskDispatcher = taskDispatcher;
clients[code].config.AddModule(CFG_MODULE_TASK_DISPATCHER, taskDispatcher);
}
catch (...)
{
ctx->result = static_cast<evt_status_t>(EFAULT);
ctx->handle = 0;
return EFAULT;
}
capi.cpp (mat_open_core): if HttpClient_CAPI or TaskDispatcher_CAPI construction
threw, clients[code] had already been populated (config + ctx_data assigned), but
the catch returned EFAULT without removing it. That left an orphaned, half-
initialized entry in the global clients map, so a later open with the same config
would match ctx_data and wrongly return EALREADY. Call remove_client(code) on
both creation-failure paths before returning EFAULT.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated no new comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants