Topic/communication engine as a component#782
Draft
bosilca wants to merge 5 commits into
Draft
Conversation
Introduce a comm MCA framework and move the existing funnelled MPI communication engine under the new comm/mpi component. Select the communication backend through MCA while preserving MPI as the default backend and keeping the existing parsec_comm_engine_t callback interface. Rename the remote dependency protocol implementation from remote_dep_mpi.c to remote_dep_comm.c and remove direct MPI usage from that layer. The remote dependency code now uses the selected communication engine callbacks for AM, pack/unpack, memory registration, progress, GET, and PUT operations. Move MPI-specific startup validation and thread-level capability detection into the MPI comm backend. Advertise backend multithread support through parsec_ce.capabilites.multithreaded, and let the remote dependency layer use that generic capability instead of querying MPI directly. Update remote dependency comments, debug/profiling names, and protocol helper names to avoid MPI-specific wording where the logic is transport-neutral. Keep datatype matching outside of the future datatype engine; parsec_type_match remains a generic compatibility helper. Signed-off-by: George Bosilca <gbosilca@nvidia.com>
Introduce a datatype module interface used by the public parsec_type_* API and install the selected implementation during communication engine initialization. Move the MPI datatype implementation under the MPI comm component, and keep the basic no-MPI implementation as a fallback datatype module. Leave parsec_type_match() as a generic helper outside the datatype backend, since it only checks compatibility/equality and does not require transport-specific layout handling. Update build wiring and datatype documentation accordingly. Signed-off-by: George Bosilca <gbosilca@nvidia.com>
Add an optional UCX communication engine component bootstrapped through PMIx. The UCX backend is currently enabled only for non-MPI distributed builds, because MPI builds still expose MPI_Datatype through the public datatype ABI. The new backend provides the first CPU-contiguous transport path: PMIx rank and worker-address exchange, UCX endpoint creation, active messages, CPU memory registration, rkey exchange, and PUT/GET support. Non-contiguous datatype movement, reshape, taskpool-id synchronization, and device-memory support remain explicitly unsupported for now. Move taskpool-id synchronization behind the communication-engine vtable. The MPI backend keeps the current MPI_Allreduce implementation, while the UCX backend returns PARSEC_ERR_NOT_IMPLEMENTED as a placeholder for a future UCX/PMIx implementation. Add a UCX set_ctx path that accepts an application-owned UCX context and worker. PaRSEC does not take ownership of those handles, but performs its late setup on top of them: worker-address publication, endpoint creation, and active-message handler registration. Also extend the basic non-MPI datatype backend so it records simple derived datatype size, extent, and contiguity information, which gives the UCX backend a minimal datatype representation for CPU-contiguous paths. Signed-off-by: George Bosilca <gbosilca@nvidia.com>
Allow the MPI communication backend to initialize MPI from parsec_init() when the application has not already done so. Track whether PaRSEC owns that initialization so parsec_fini() only finalizes MPI when the backend performed the matching init. Populate the context rank and size immediately after MPI becomes available, including the backend-owned initialization path, and keep PARSEC_CONTEXT_QUERY_NODES reporting the context value once the communication engine has been initialized. Document that the selected communication backend may initialize and finalize an external process runtime on behalf of PaRSEC, and encourage callers to query rank and size through the PaRSEC context instead of assuming MPI_COMM_WORLD. Signed-off-by: George Bosilca <gbosilca@nvidia.com>
The test suite was still largely written around the assumption that every distributed-capable run starts with MPI_Init/MPI_Init_thread and discovers rank/size directly from MPI_COMM_WORLD. That makes the tests awkward for the new communication-engine component work, where MPI remains the default backend but other backends, such as UCX bootstrapped through PMIx, need to run the same tests without exposing MPI to test code. Add a small tests_runtime_common helper library and route common test runtime operations through it. The helper initializes PaRSEC with parsec_init(), retrieves rank and world size through parsec_context_query(), finalizes through parsec_fini(), and validates requested MPI thread support when the selected backend is MPI-backed. Add test wrappers for the remaining small pieces of process-runtime behavior that tests need: barrier, abort, and allreduce. The MPI implementation maps these to MPI_COMM_WORLD collectives. Non-MPI single-process runs get useful local behavior where possible, while unsupported multi-process non-MPI paths return PARSEC_ERR_NOT_IMPLEMENTED instead of silently pretending that a collective completed. Rework tests/tests_timing.h so timing helpers take a PaRSEC context, use the test barrier wrapper, and no longer override exit() with MPI_Abort. This keeps timed tests usable with non-MPI communication backends while preserving real barriers for MPI-backed distributed runs. Convert the broad init-only test population away from direct MPI calls. This covers API tests, many PTG and DTD tests, application tests, collection tests, profiling tests, CUDA runtime tests, and scheduling tests. These tests now include tests/tests_runtime.h, link against tests_runtime_common, initialize through parsec_tests_context_init(), and finalize through parsec_tests_context_fini(). Convert simple MPI collectives in tests to the new wrappers where they do not depend on MPI-specific communicator behavior. This includes reductions in PTG checks, reshape checks, branch/count validation, CUDA best-device validation, and selected redistribute checks. The MAXLOC case is represented explicitly as PARSEC_TESTS_REDUCE_MAXLOC_INT so tests that used MPI_2INT/MPI_MAXLOC keep the same semantics. Update CMake wiring so all converted tests link with tests_runtime_common. Keep tests that still genuinely exercise MPI-specific behavior in MPI-only build/test groups. In particular, multichain, haar_tree, and redistribute are only built and tested when MPI_C_FOUND is available, because they still use MPI communicators or MPI message-passing routines directly. Replace incidental MPI datatype queries in scheduling test setup with PaRSEC datatype helpers, so tests that do not communicate data are not tied to MPI just to compute a datatype extent. This prepares the test suite for selectable communication backends: ordinary tests now ask PaRSEC for process identity and synchronization services, while the few remaining MPI-specific tests are explicitly marked as such. Signed-off-by: George Bosilca <gbosilca@nvidia.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Here we are, years in the making !