Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid linking to libirc.so in spack (parallel-netcdf), turn off crypt variant for Python, and update Orion site config to fix tar issue #1435

Draft
wants to merge 6 commits into
base: develop
Choose a base branch
from

Conversation

climbfuji
Copy link
Collaborator

@climbfuji climbfuji commented Dec 24, 2024

Summary

  1. Applications built with spack-stack packages esmf, parallelio, parallel-netcdf have libirc.so dynamically linked. Applications linked against libirc.so fail to start up. See Avoid linking to Intel's libirc.so library (aka bad configure script of package parallel-netcdf) #1436. The spack PR that is part of the suggested changes here fixes this by replacing libirc.so with libintlc.so in the parallel-netcdf build. See Bug fix in parallel-netcdf to avoid linking to libirc.so AND cherry-pick spack develop PR 48251 (conflict Intel Classic with [email protected]) spack#495.
  2. Turn off crypt variant for Python; this variant leads to build errors with Intel in py-cryptography unless external curl and openssl are removed, which itself is problematic.
  3. Add external wget on Orion, latest versions don't build with Intel on the machine.
  4. Also in the spack PR: add conflict of [email protected] with Intel Classic compilers. See Bug fix in parallel-netcdf to avoid linking to libirc.so AND cherry-pick spack develop PR 48251 (conflict Intel Classic with [email protected]) spack#495.

Testing

Please try to reproduce the problem reported in #1355 with the following environment (I couldn't):

module purge
module use /work/noaa/gmtb/dheinzel/spst-libirc/envs/ue-intel-2021.9.0/install/modulefiles/Core
module load stack-intel/2021.9.0
module load stack-intel-oneapi-mpi/2021.9.0
module load stack-python/3.11.7

In addition to the testing described in JCSDA/spack#495, I built the ufs-weather-model on Orion and ran one of the ATM-only regression tests. It ran to completion, but the results didn't match the baseline (this is expected, many packages are newer in spack-stack develop than they are in spack-stack-1.6.0, which the still UFS uses)

Applications affected

All

Systems affected

Orion specifically, but basically all that use Intel compilers

Dependencies

Issue(s) addressed

Resolves #1355
Resolves #1436

Checklist

  • This PR addresses one issue/problem/enhancement, or has a very good reason for not doing so.
  • These changes have been tested on the affected systems and applications.
  • All dependency PRs/issues have been resolved and this PR can be merged.

@climbfuji climbfuji changed the title DRAFT Update .gitmodules and submodule pointer for spack for code review an… Avoid linking to libirc.so in spack (parallel-netcdf), update Orion site config Dec 26, 2024
@climbfuji climbfuji changed the title Avoid linking to libirc.so in spack (parallel-netcdf), update Orion site config Avoid linking to libirc.so in spack (parallel-netcdf), update Orion site config to solve tar issue Dec 26, 2024
@climbfuji climbfuji changed the title Avoid linking to libirc.so in spack (parallel-netcdf), update Orion site config to solve tar issue Avoid linking to libirc.so in spack (parallel-netcdf), update Orion site config to fix tar issue Dec 26, 2024
@climbfuji climbfuji force-pushed the feature/libirc_parallel_netcdf_and_scipy branch from 3365b2a to bc40b8f Compare December 26, 2024 23:09
@climbfuji climbfuji changed the title Avoid linking to libirc.so in spack (parallel-netcdf), update Orion site config to fix tar issue Avoid linking to libirc.so in spack (parallel-netcdf), turn off crypt variant for Python, and update Orion site config to fix tar issue Dec 26, 2024
@climbfuji climbfuji self-assigned this Dec 26, 2024
@climbfuji climbfuji force-pushed the feature/libirc_parallel_netcdf_and_scipy branch from bc40b8f to 96de96a Compare December 27, 2024 14:42
@srherbener
Copy link
Collaborator

I'm still running into the tar issue with an Intel build:

-- [download 99% complete]
-- [download 100% complete]
-- Checking if /work2/noaa/jcsda/herbener/jedi/build/test_data/3.1.1/fix_REL-3.1.1.2 already exists...
-- Untarring the downloaded file (~2 minutes) to /work2/noaa/jcsda/herbener/jedi/build/test_data/3.1.1
tar: Relink `/apps/spack-managed/gcc-11.3.1/intel-oneapi-compilers-2023.1.0-sb753366rvywq75zeg4ml5k5c72xgj72/compiler/2023.1.0/linux/compiler/lib/intel64_lin/libimf.so' with `/usr/lib64/libm.so.6' for IFUNC symbol `sincosf'
CMake Error at crtm/test/CMakeLists.txt:106 (message):
  Failed to untar the file.

I must have something wrong in my environment. I used this to load modules:

SPACK_STACK_INTEL_ENV=/work/noaa/gmtb/dheinzel/spst-libirc/envs/ue-intel-2021.9.0

# load modules
module purge
module use ${SPACK_STACK_INTEL_ENV}/install/modulefiles/Core
module load stack-intel/2021.9.0
module load stack-intel-oneapi-mpi/2021.9.0
module load stack-python/3.11.7

jedi-host-post-load() {
  module swap git-lfs git-lfs/3.1.2
}

# This is a fix for the issue where the spack-stack-1.8.0 udunits
# module does not get loaded propery. Without this workaround, the
# udunits module from the "spack-managed" gets loaded instead and
# ecbuild on jedi-bundle fails.
#
# Setting LMOD_TMOD_FIND_FIRST gets rid of the default marking
# of modules, and the modification of MODULEPATH makes sure
# that spack-stack-1.8.0 modules are found first before same
# named modules in other directories (ie, "spack-managed")
export LMOD_TMOD_FIND_FIRST=yes
module use $SPACK_STACK_INTEL_ENV/install/modulefiles/intel/2021.9.0

...
# Load JEDI modules
module load jedi-fv3-env
module load jedi-mpas-env
module load ewok-env
module load soca-env

# Optional host-specific post-load procedures
[ $(declare -f -F jedi-host-post-load) ] && jedi-host-post-load; unset -f jedi-host-post-load || ech
o "No post-load procedures"

@climbfuji
Copy link
Collaborator Author

Ah well, so this is another library (libm) not libirc. I wonder if we have the same problem and solution in this case (i.e. we should link to something else instead).

@climbfuji
Copy link
Collaborator Author

@srherbener Is the information from my libirc bugfix sufficient for you to look into libm and fix that?

srherbener and others added 3 commits January 14, 2025 11:43
so that both gcc and intel builds will use the external zlib package.
Added config to use the external zlib for the orion Intel build.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
2 participants