Skip to content

RDKB-64847: Observed spike in load and CPU due to cpu_telemetry2_0#384

Open
tabbas651 wants to merge 3 commits into
support/1.8from
RDKB-64847_copy
Open

RDKB-64847: Observed spike in load and CPU due to cpu_telemetry2_0#384
tabbas651 wants to merge 3 commits into
support/1.8from
RDKB-64847_copy

Conversation

@tabbas651

Copy link
Copy Markdown
Contributor

Reason for change: When DCA top/process marker sampling (processTopPattern) runs concurrently with telemetry HTTP uploads (curl), both CPU-intensive paths compete for resources causing WAN timeouts, FD exhaustion, and failed report uploads on resource-constrained devices. Introduced a CPU contention avoidance window that defers new curl handle acquisitions while DCA top pattern collection is active, preventing two CPU-intensive telemetry paths from running in parallel.

The upload-side deferral has no hard timeout — DCA always performs finite work and will always release the sampling window. The pool_shutting_down flag serves as the escape hatch for process restart scenarios.
Changes:

  • Added http_pool_begin_sampling_window() / http_pool_end_sampling_window()
    with refcount-based signaling in multicurlinterface.c
  • Added active_requests tracking with overflow/underflow guards
  • DCA (dca.c) calls begin/end around processTopPattern() marker loop
    with a best-effort 2000ms drain wait for in-flight uploads
  • Upload thread (acquire_pool_handle) defers when refcount > 0, retrying
    every 100ms until DCA completes — no timeout, no upload failure
    Test Procedure: Performance testing — triggered simultaneous DCA + upload
    with kill -12 && kill -10 && kill -29; verified no CPU spike
    (cpu_telemetry2_0=0.0%), no handle acquisition failure, all reports
    uploaded successfully (HTTP 200) across 4 test scenarios
    Risks: Medium
    Priority: P0
    Signed-off-by: Thamim Razith Abbas Ali [tabbas651@comcast.com]

tabbas651 added 3 commits June 9, 2026 19:24
Reason for change: When DCA top/process marker sampling (processTopPattern)
runs concurrently with telemetry HTTP uploads (curl), both CPU-intensive
paths compete for resources causing WAN timeouts, FD exhaustion, and failed
report uploads on resource-constrained devices. Introduced a CPU contention
avoidance window that defers new curl handle acquisitions while DCA top
pattern collection is active, preventing two CPU-intensive telemetry paths
from running in parallel.
Test Procedure: performance testing
Risks: Medium
Priority: P0
Signed-off-by: Thamim Razith Abbas Ali <tabbas651@comcast.com>
Copilot AI review requested due to automatic review settings June 10, 2026 03:14
@tabbas651 tabbas651 requested a review from a team as a code owner June 10, 2026 03:14

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a “CPU contention avoidance window” to prevent DCA top/process sampling from overlapping with telemetry HTTP upload work (curl), reducing CPU spikes and associated upload failures/timeouts on resource-constrained devices.

Changes:

  • Added sampling-window APIs (http_pool_begin_sampling_window / http_pool_end_sampling_window) and refcount signaling to defer new curl handle acquisitions during DCA sampling.
  • Added active_requests underflow protection and sampling-start drain wait (best-effort) before sampling begins.
  • Wrapped DCA processTopPattern() marker processing with sampling-window begin/end calls.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
source/protocol/http/multicurlinterface.h Exposes new sampling-window APIs for coordinating with DCA sampling.
source/protocol/http/multicurlinterface.c Implements sampling-window refcount + deferral in handle acquisition and improves active_requests guards.
source/dcautil/Makefile.am Adds include path for HTTP protocol headers (but currently missing required link to libhttp.la).
source/dcautil/dca.c Enters/exits sampling window around top/process marker collection to defer concurrent curl work.

Comment on lines +511 to +517
if (sampling_window_refcount > 0)
{
pthread_mutex_unlock(&pool_mutex);

usleep(POOL_ACQUIRE_RETRY_MS * 1000);
continue;
}
Comment on lines 35 to 39
-I${top_srcdir}/source/utils \
-I${top_srcdir}/source/bulkdata \
-I${top_srcdir}/source/protocol/http \
-I${PKG_CONFIG_SYSROOT_DIR}$(includedir)/ccsp

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants