Topic/communication engine as a component by bosilca · Pull Request #782 · ICLDisco/parsec

bosilca · 2026-05-21T06:41:21Z

Here we are, years in the making !

Introduce a comm MCA framework and move the existing funnelled MPI communication engine under the new comm/mpi component. Select the communication backend through MCA while preserving MPI as the default backend and keeping the existing parsec_comm_engine_t callback interface. Rename the remote dependency protocol implementation from remote_dep_mpi.c to remote_dep_comm.c and remove direct MPI usage from that layer. The remote dependency code now uses the selected communication engine callbacks for AM, pack/unpack, memory registration, progress, GET, and PUT operations. Move MPI-specific startup validation and thread-level capability detection into the MPI comm backend. Advertise backend multithread support through parsec_ce.capabilites.multithreaded, and let the remote dependency layer use that generic capability instead of querying MPI directly. Update remote dependency comments, debug/profiling names, and protocol helper names to avoid MPI-specific wording where the logic is transport-neutral. Keep datatype matching outside of the future datatype engine; parsec_type_match remains a generic compatibility helper. Signed-off-by: George Bosilca <gbosilca@nvidia.com>

Introduce a datatype module interface used by the public parsec_type_* API and install the selected implementation during communication engine initialization. Move the MPI datatype implementation under the MPI comm component, and keep the basic no-MPI implementation as a fallback datatype module. Leave parsec_type_match() as a generic helper outside the datatype backend, since it only checks compatibility/equality and does not require transport-specific layout handling. Update build wiring and datatype documentation accordingly. Signed-off-by: George Bosilca <gbosilca@nvidia.com>

Add an optional UCX communication engine component bootstrapped through PMIx. The UCX backend is currently enabled only for non-MPI distributed builds, because MPI builds still expose MPI_Datatype through the public datatype ABI. The new backend provides the first CPU-contiguous transport path: PMIx rank and worker-address exchange, UCX endpoint creation, active messages, CPU memory registration, rkey exchange, and PUT/GET support. Non-contiguous datatype movement, reshape, taskpool-id synchronization, and device-memory support remain explicitly unsupported for now. Move taskpool-id synchronization behind the communication-engine vtable. The MPI backend keeps the current MPI_Allreduce implementation, while the UCX backend returns PARSEC_ERR_NOT_IMPLEMENTED as a placeholder for a future UCX/PMIx implementation. Add a UCX set_ctx path that accepts an application-owned UCX context and worker. PaRSEC does not take ownership of those handles, but performs its late setup on top of them: worker-address publication, endpoint creation, and active-message handler registration. Also extend the basic non-MPI datatype backend so it records simple derived datatype size, extent, and contiguity information, which gives the UCX backend a minimal datatype representation for CPU-contiguous paths. Signed-off-by: George Bosilca <gbosilca@nvidia.com>

Allow the MPI communication backend to initialize MPI from parsec_init() when the application has not already done so. Track whether PaRSEC owns that initialization so parsec_fini() only finalizes MPI when the backend performed the matching init. Populate the context rank and size immediately after MPI becomes available, including the backend-owned initialization path, and keep PARSEC_CONTEXT_QUERY_NODES reporting the context value once the communication engine has been initialized. Document that the selected communication backend may initialize and finalize an external process runtime on behalf of PaRSEC, and encourage callers to query rank and size through the PaRSEC context instead of assuming MPI_COMM_WORLD. Signed-off-by: George Bosilca <gbosilca@nvidia.com>

The test suite was still largely written around the assumption that every distributed-capable run starts with MPI_Init/MPI_Init_thread and discovers rank/size directly from MPI_COMM_WORLD. That makes the tests awkward for the new communication-engine component work, where MPI remains the default backend but other backends, such as UCX bootstrapped through PMIx, need to run the same tests without exposing MPI to test code. Add a small tests_runtime_common helper library and route common test runtime operations through it. The helper initializes PaRSEC with parsec_init(), retrieves rank and world size through parsec_context_query(), finalizes through parsec_fini(), and validates requested MPI thread support when the selected backend is MPI-backed. Add test wrappers for the remaining small pieces of process-runtime behavior that tests need: barrier, abort, and allreduce. The MPI implementation maps these to MPI_COMM_WORLD collectives. Non-MPI single-process runs get useful local behavior where possible, while unsupported multi-process non-MPI paths return PARSEC_ERR_NOT_IMPLEMENTED instead of silently pretending that a collective completed. Rework tests/tests_timing.h so timing helpers take a PaRSEC context, use the test barrier wrapper, and no longer override exit() with MPI_Abort. This keeps timed tests usable with non-MPI communication backends while preserving real barriers for MPI-backed distributed runs. Convert the broad init-only test population away from direct MPI calls. This covers API tests, many PTG and DTD tests, application tests, collection tests, profiling tests, CUDA runtime tests, and scheduling tests. These tests now include tests/tests_runtime.h, link against tests_runtime_common, initialize through parsec_tests_context_init(), and finalize through parsec_tests_context_fini(). Convert simple MPI collectives in tests to the new wrappers where they do not depend on MPI-specific communicator behavior. This includes reductions in PTG checks, reshape checks, branch/count validation, CUDA best-device validation, and selected redistribute checks. The MAXLOC case is represented explicitly as PARSEC_TESTS_REDUCE_MAXLOC_INT so tests that used MPI_2INT/MPI_MAXLOC keep the same semantics. Update CMake wiring so all converted tests link with tests_runtime_common. Keep tests that still genuinely exercise MPI-specific behavior in MPI-only build/test groups. In particular, multichain, haar_tree, and redistribute are only built and tested when MPI_C_FOUND is available, because they still use MPI communicators or MPI message-passing routines directly. Replace incidental MPI datatype queries in scheduling test setup with PaRSEC datatype helpers, so tests that do not communicate data are not tied to MPI just to compute a datatype extent. This prepares the test suite for selectable communication backends: ordinary tests now ask PaRSEC for process identity and synchronization services, while the few remaining MPI-specific tests are explicitly marked as such. Signed-off-by: George Bosilca <gbosilca@nvidia.com>

bosilca added 5 commits May 21, 2026 01:49

bosilca requested a review from a team as a code owner May 21, 2026 06:41

bosilca marked this pull request as draft May 21, 2026 06:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Topic/communication engine as a component#782

Topic/communication engine as a component#782
bosilca wants to merge 5 commits into
ICLDisco:masterfrom
bosilca:topic/communication_engine_as_a_component

bosilca commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bosilca commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant