-
Notifications
You must be signed in to change notification settings - Fork 280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ch4: separate multi-vci/nic initialization #7255
Open
hzhou
wants to merge
25
commits into
pmodels:main
Choose a base branch
from
hzhou:2412_comm_init
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Replace the rather ambiguous field is_tainted with vcis_enabled, thus allowing vci activation on a per-comm basis. It is inherited if the new comm is created from within an parent comm that has vcis_enabled. If the parent comm vcis_enabled=false, then all its descendents will have vcis_enabled until they are turned via separate APIs (to be added in the future). Intercomm and intercomm_merge may include processes outside originating comm, thus vcis_enabled=false by default. MPI_Comm_create may create an intercomm that inherits vcis_enabled=true. This is an exception because both local processes and remote processes are from within originating comm that has vcis_enabled. For now, we switch on vcis_enabled in comm_wrold after post_init. With future extension, it is possible to allow user explicitly set up multi-vics on a smaller comm than comm world.
Rather than initialize per-vci mutexes in ch4 and register with request pools, directly use MPIR-layer request pool mutexes.
Move multiple vci related init/finalize code into ch4_vci.c. Wrap per-vci code into a function and only deal with vci 0 in ch4_init.c and additional vcis in ch4_vci.c.
hzhou
force-pushed
the
2412_comm_init
branch
5 times, most recently
from
January 6, 2025 17:48
ab39110
to
111ce24
Compare
test:mpich/ch3/most ✔️ |
Gather all multiple vci init code in MPIDI_Comm_set_vcis. So - 1. The rest of the init code only deal with root vci. 2. Prepare for future dynamic and per-comm vci.
test:mpich/ch4/ucx ✔️ |
We may relax the av insertion order which may require the full (vci_local, nic_local, vci_remote, nic_remote) to look up an actual destination address. Add MPIDI_OFI_av_to_phys_root for convenience and quick survey on where we restrict in only root vci (such as the init and spawn paths). Remove MPIDI_OFI_comm_to_phys and prefer an explicit MPIDIU_comm_rank_to_av and then MPIDI_OFI_av_to_phys. Refactor MPIDI_OFI_SET_AM_HDR_COMMON in ofi_am_impl.h to directly use dst_addr (as remote_id) rather than to recalculate it.
Let MPIDI_OFI_addr_t only contain field for root vci address, and only allocate more space for additional addresses when multiple vci and nic is enabled -- potentially at runtime. This avoids wasting memory for multiple vcis unelss it is actually needed.
Move code that are related to multiple-vci setup to ofi_vci.c.
Add the flexibility of perform multiple vci address exchange within a comm other than the comm world. For one, this is important in a session where the comm_world may not exist. For two, this provides a mechanism to save resource when applications don't need multiple vcis for the entire comm world.
We exchange non-root endpoints in comm_set_vcis. Because we can't use multiple nics before the address exchange, we only can activate multiple nics in comm_set_vcis.
MPIDI_OFI_global.num_nics affects runtime paths such as ofi progress and large message striping. Only set it in MPIDI_OFI_init_vcis so we won't have complications when multi-nics is not ready.
hzhou
force-pushed
the
2412_comm_init
branch
2 times, most recently
from
January 8, 2025 20:34
b5fa39d
to
f80dcfc
Compare
This has been superseded by MPIDI_NM_comm_set_vcis.
Consolidate the shmem allocations in iqueue to 2 slabs. One root slab that is initialized at world_init. The other all_slab for per-vci transport, initialized at the time of init vcis. The goal is to eventually allow more flexible shm creation, potentially allow init within a non-world communicator.
Transition from world init to per-comm vci init.
MPIDU_shm_seg_t was used by mpidu_shm_alloc.c, mpidu_init_shm.c, and mpidu_init_shm_alloc.c. However, the usages are all slightly different and some fields are only used in one but not the other. It is simpler to locally define it or, in the case of mpidu_init_shm.c, just use static globals.
Add routine to support allocating a shared memory by a comm, which allows - * create shared memory by a smaller comm than a comm_world * attach the shared memory by later processes * potentially allowing shm communication with dynamic processes - we need a way to discover and attach to Init_shm (via intercomm) and the initial shared memory need pre-allocate to account for new processes. For now, we need this to support MPIDI_POSIX_comm_set_vcis.
We need ensure the extra fields, such as MPIDI_OFI_AV(av, all_dest), are initialized to NULL.
Because we insert all remote endpoints to all local endpoints at the same time, thus follow the exact same insertion order, they will share the same av table index except for the local root endpoint because it has inserted other remote root endpoints at init time. The local root to remote non-root endpoints will have a fixed offset from that of local non-root.
test:mpich/ch4/ofi |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Pull Request Description
There are two goals of this PR:
Separate setup of multiple vci/nic from the regular initialization. When multiple vci/nic are not enabled by the environment CVARs, the multi-vci/nic components should be skipped completely. This means, we should not incur any cost of multiple vci/nic unless it is opted in by user/application.
Allow per-comm setup of multiple vci/nic. This is necessary to support true session model where
comm_world
may never get created nor initialized.[skip warnings]
Author Checklist
Particularly focus on why, not what. Reference background, issues, test failures, xfail entries, etc.
Commits are self-contained and do not do two things at once.
Commit message is of the form:
module: short description
Commit message explains what's in the commit.
Whitespace checker. Warnings test. Additional tests via comments.
For non-Argonne authors, check contribution agreement.
If necessary, request an explicit comment from your companies PR approval manager.