Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tests fail when using new ifx compiler #31

Open
eirikurj opened this issue Dec 19, 2024 · 6 comments
Open

Tests fail when using new ifx compiler #31

eirikurj opened this issue Dec 19, 2024 · 6 comments

Comments

@eirikurj
Copy link
Contributor

eirikurj commented Dec 19, 2024

Description

When compiling with the new ifx compiler, some tests are failing. This is preventing us from updating our docker images as part of https://github.com/mdolab/docker/pull/266.

Steps to reproduce issue

  1. Pull the docker container mdolab/public:u22-intel-impi-latest-amd64-failed (specifically sha256:d318081f9bf4cc2c110685d4592fe6ee2b0f7799aff8cbefe87e138d04b224b7)
  2. In ~/repos/cmplxfoil run testflo -v -n 1 .

Current behavior

When running one of the failed tests, for example, testflo -n 1 -s -v ./tests/test_solver_class.py:TestDerivativesCST.test_alpha_sens on the docker container the following error is printed

mdolabuser@a6661930c69d:~/repos/cmplxfoil$ testflo -n 1 -s -v ./tests/test_solver_class.py:TestDerivativesCST.test_alpha_sens
######## Fitting CST coefficients to coordinates in /home/mdolabuser/repos/cmplxfoil/tests/n0012BluntTE.dat ########
Upper surface
    L2 norm of coordinates in dat file versus fit coordinates: 0.0003504064468410577
    Fit CST coefficients: [0.16601024 0.13092967]
Lower surface
    L2 norm of coordinates in dat file versus fit coordinates: 0.0003504064499657832
    Fit CST coefficients: [-0.16601024 -0.13092967]
+----------------------------------------------------------------------+
|  Switching to Aero Problem: fc                                       |
+----------------------------------------------------------------------+
 LEXITFLAG TRUE, GOING TO 90...
 LEXITFLAG TRUE, GOING TO 90...
 LEXITFLAG TRUE, GOING TO 90...
 LEXITFLAG TRUE, GOING TO 90...
 LEXITFLAG TRUE, GOING TO 90...
 LEXITFLAG TRUE, GOING TO 90...
 LEXITFLAG TRUE, GOING TO 90...
./tests/test_solver_class.py:TestDerivativesCST.test_alpha_sens  ... FAIL (00:00:5.03, 172 MB)
Traceback (most recent call last):
  File "/home/mdolabuser/repos/cmplxfoil/./tests/test_solver_class.py", line 369, in test_alpha_sens
    np.testing.assert_allclose(checkSensFD, actualSensCS, rtol=relTol, atol=absTol)
  File "/home/mdolabuser/.pyenv/versions/3.11.9/lib/python3.11/site-packages/numpy/testing/_private/utils.py", line 1504, in assert_allclose
    assert_array_compare(compare, actual, desired, err_msg=str(err_msg),
  File "/home/mdolabuser/.pyenv/versions/3.11.9/lib/python3.11/contextlib.py", line 81, in inner
    return func(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/mdolabuser/.pyenv/versions/3.11.9/lib/python3.11/site-packages/numpy/testing/_private/utils.py", line 797, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Not equal to tolerance rtol=0.002, atol=1e-05

Mismatched elements: 1 / 1 (100%)
Max absolute difference: 10005205.62637494
Max relative difference: inf
 x: array(10005205.626375)
 y: array(0.)



The following tests failed:
test_solver_class.py:TestDerivativesCST.test_alpha_sens


Passed:  0
Failed:  1
Skipped: 0


Ran 1 test using 1 processes
Wall clock time:   00:00:5.76

Expected behavior

All tests should pass

Observations

  • The build process looks to be a bit messy when using intel. There seems to be a mix of compilers used, gcc for interface c-code, ifx for compiling source and ifort for library. While this is probably not an issue, we should address this.
  • Since this is a f77 code, its possible that we have encountered an issue when using ifx that we have not encountered yet on other repositories, since they are mostly >f90. The porting guide might help, but it states that f77 is completely implemented.
  • I did some minor tests, and it seems that just removing any optimization, i.e., change from -O2 to -O0 and rebuilding, makes the tests pass. This indicates that some optimization is affecting the code when using ifx that does not show up with ifort for some reason.
    I would appreciate it if someone can dig into this and identify the issue and possible solutions.
@A-CGray
Copy link
Member

A-CGray commented Dec 19, 2024

To add to the confusion, if you build cmplxfoil using the gcc config file on these images, the tests still fail, even though every part of the build process is done using a gcc compiler (either gcc or gfortran). See the attached log below.

cmplxfoil-make.log

Given that the tests don't fail on the GCC images, what is different between the intel and gcc images that could be causing gcc compiled code to behave differently?

Ignore the above, I was forgetting to pip install cmplxfoil again after rebuilding with gcc, after doing that the tests pass, so this is just an intel compiler issue.

@A-CGray
Copy link
Member

A-CGray commented Jan 14, 2025

This is tenuous at best, but seems like as of 2023, ifx was known to not work very well with complex numbers, at least from a performance perspective.

Also, some of the default floating point arithmetic behaviour is different between ifort and ifx, ifort checks for NaNs by default while ifx doesn't

@eirikurj
Copy link
Contributor Author

eirikurj commented Jan 14, 2025

The NaN check might be a problem since -fp-model=fast is the default. Even though I did not report it here, I feel I did run this with precise and strict at some point, and it did not have any effect. This we can test.
The issue with ifx and complex numbers and optimization seems like a more possible explanation, since the code does seem to work with no optimizations -O0. Its possible that a newer compiler version will fix this, but we should check the compiler release notes.

@eirikurj
Copy link
Contributor Author

Did a very quick test with the latest image testing these combinations of optimization flags and floating point models. All pass, but only when the optimizations are turned off.

opt flag fast precise strict
O0 pass pass pass
O1 fail fail fail
O2 fail fail fail

@A-CGray
Copy link
Member

A-CGray commented Jan 14, 2025

Damn, it's not that then. I'm also not fully convinced this is purely a complex number issue either, as some of the failing tests don't involve the complexified code.

@A-CGray
Copy link
Member

A-CGray commented Jan 14, 2025

@eirikurj , as a fallback, and to avoid holding up https://github.com/mdolab/docker/pull/266 any more we could just change the logic in the intel config file so that we use ifort -O2 if available and ifx -O0 if not?

I've implemented this in #33

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants