Skip to content

Topic/communication engine as a component#782

Draft
bosilca wants to merge 5 commits into
ICLDisco:masterfrom
bosilca:topic/communication_engine_as_a_component
Draft

Topic/communication engine as a component#782
bosilca wants to merge 5 commits into
ICLDisco:masterfrom
bosilca:topic/communication_engine_as_a_component

Conversation

@bosilca
Copy link
Copy Markdown
Contributor

@bosilca bosilca commented May 21, 2026

Here we are, years in the making !

bosilca added 5 commits May 21, 2026 01:49
Introduce a comm MCA framework and move the existing funnelled MPI
communication engine under the new comm/mpi component. Select the
communication backend through MCA while preserving MPI as the default
backend and keeping the existing parsec_comm_engine_t callback interface.

Rename the remote dependency protocol implementation from remote_dep_mpi.c
to remote_dep_comm.c and remove direct MPI usage from that layer. The
remote dependency code now uses the selected communication engine callbacks
for AM, pack/unpack, memory registration, progress, GET, and PUT operations.

Move MPI-specific startup validation and thread-level capability detection
into the MPI comm backend. Advertise backend multithread support through
parsec_ce.capabilites.multithreaded, and let the remote dependency layer use
that generic capability instead of querying MPI directly.

Update remote dependency comments, debug/profiling names, and protocol helper
names to avoid MPI-specific wording where the logic is transport-neutral.
Keep datatype matching outside of the future datatype engine; parsec_type_match
remains a generic compatibility helper.

Signed-off-by: George Bosilca <gbosilca@nvidia.com>
Introduce a datatype module interface used by the public parsec_type_* API and
install the selected implementation during communication engine initialization.

Move the MPI datatype implementation under the MPI comm component, and keep the
basic no-MPI implementation as a fallback datatype module. Leave
parsec_type_match() as a generic helper outside the datatype backend, since it
only checks compatibility/equality and does not require transport-specific
layout handling.

Update build wiring and datatype documentation accordingly.

Signed-off-by: George Bosilca <gbosilca@nvidia.com>
Add an optional UCX communication engine component bootstrapped through PMIx.
The UCX backend is currently enabled only for non-MPI distributed builds,
because MPI builds still expose MPI_Datatype through the public datatype ABI.

The new backend provides the first CPU-contiguous transport path: PMIx rank and
worker-address exchange, UCX endpoint creation, active messages, CPU memory
registration, rkey exchange, and PUT/GET support. Non-contiguous datatype
movement, reshape, taskpool-id synchronization, and device-memory support remain
explicitly unsupported for now.

Move taskpool-id synchronization behind the communication-engine vtable. The MPI
backend keeps the current MPI_Allreduce implementation, while the UCX backend
returns PARSEC_ERR_NOT_IMPLEMENTED as a placeholder for a future UCX/PMIx
implementation.

Add a UCX set_ctx path that accepts an application-owned UCX context and worker.
PaRSEC does not take ownership of those handles, but performs its late setup on
top of them: worker-address publication, endpoint creation, and active-message
handler registration.

Also extend the basic non-MPI datatype backend so it records simple derived
datatype size, extent, and contiguity information, which gives the UCX backend a
minimal datatype representation for CPU-contiguous paths.

Signed-off-by: George Bosilca <gbosilca@nvidia.com>
Allow the MPI communication backend to initialize MPI from parsec_init() when
the application has not already done so. Track whether PaRSEC owns that
initialization so parsec_fini() only finalizes MPI when the backend performed
the matching init.

Populate the context rank and size immediately after MPI becomes available,
including the backend-owned initialization path, and keep PARSEC_CONTEXT_QUERY_NODES
reporting the context value once the communication engine has been initialized.

Document that the selected communication backend may initialize and finalize an
external process runtime on behalf of PaRSEC, and encourage callers to query
rank and size through the PaRSEC context instead of assuming MPI_COMM_WORLD.

Signed-off-by: George Bosilca <gbosilca@nvidia.com>
The test suite was still largely written around the assumption that every
distributed-capable run starts with MPI_Init/MPI_Init_thread and discovers
rank/size directly from MPI_COMM_WORLD.  That makes the tests awkward for the
new communication-engine component work, where MPI remains the default backend
but other backends, such as UCX bootstrapped through PMIx, need to run the same
tests without exposing MPI to test code.

Add a small tests_runtime_common helper library and route common test runtime
operations through it.  The helper initializes PaRSEC with parsec_init(),
retrieves rank and world size through parsec_context_query(), finalizes through
parsec_fini(), and validates requested MPI thread support when the selected
backend is MPI-backed.

Add test wrappers for the remaining small pieces of process-runtime behavior
that tests need: barrier, abort, and allreduce.  The MPI implementation maps
these to MPI_COMM_WORLD collectives.  Non-MPI single-process runs get useful
local behavior where possible, while unsupported multi-process non-MPI paths
return PARSEC_ERR_NOT_IMPLEMENTED instead of silently pretending that a
collective completed.

Rework tests/tests_timing.h so timing helpers take a PaRSEC context, use the
test barrier wrapper, and no longer override exit() with MPI_Abort.  This keeps
timed tests usable with non-MPI communication backends while preserving real
barriers for MPI-backed distributed runs.

Convert the broad init-only test population away from direct MPI calls.  This
covers API tests, many PTG and DTD tests, application tests, collection tests,
profiling tests, CUDA runtime tests, and scheduling tests.  These tests now
include tests/tests_runtime.h, link against tests_runtime_common, initialize
through parsec_tests_context_init(), and finalize through
parsec_tests_context_fini().

Convert simple MPI collectives in tests to the new wrappers where they do not
depend on MPI-specific communicator behavior.  This includes reductions in PTG
checks, reshape checks, branch/count validation, CUDA best-device validation,
and selected redistribute checks.  The MAXLOC case is represented explicitly as
PARSEC_TESTS_REDUCE_MAXLOC_INT so tests that used MPI_2INT/MPI_MAXLOC keep the
same semantics.

Update CMake wiring so all converted tests link with tests_runtime_common.
Keep tests that still genuinely exercise MPI-specific behavior in MPI-only
build/test groups.  In particular, multichain, haar_tree, and redistribute are
only built and tested when MPI_C_FOUND is available, because they still use
MPI communicators or MPI message-passing routines directly.

Replace incidental MPI datatype queries in scheduling test setup with PaRSEC
datatype helpers, so tests that do not communicate data are not tied to MPI just
to compute a datatype extent.

This prepares the test suite for selectable communication backends: ordinary
tests now ask PaRSEC for process identity and synchronization services, while
the few remaining MPI-specific tests are explicitly marked as such.

Signed-off-by: George Bosilca <gbosilca@nvidia.com>
@bosilca bosilca requested a review from a team as a code owner May 21, 2026 06:41
@bosilca bosilca marked this pull request as draft May 21, 2026 06:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant