Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with g++ -Ofast #189

Closed
cosurgi opened this issue Jan 17, 2025 · 17 comments
Closed

Problem with g++ -Ofast #189

cosurgi opened this issue Jan 17, 2025 · 17 comments
Assignees
Labels
bug Something isn't working

Comments

@cosurgi
Copy link
Collaborator

cosurgi commented Jan 17, 2025

Hi Chris (@ckormanyos), in our benchmarks in boostorg#648 I was using -O3 flag till now. I wanted to compare cpp_double_double with float128 when using -Ofast flag. But I couldn't because I have discovered a bug which caused yade to exit, making yade -n --quickperformance -j 4 impossible to run.

Try this code with g++

quick_test.cpp:

#include <boost/multiprecision/cpp_double_fp.hpp>
#include <iostream>
#include <string>

int main() {
	using Real = boost::multiprecision::cpp_double_double;
	std::string num{"0.4978165316556861972925864431199"};
	Real a{num};
	Real b{num};
	if(a.str() != num) {
		std::cout << "Error, bad value.\n";
		std::cout << num << "\n";
		std::cout << a.str() << "\n";
		std::cout << "--\n";
	}
	std::cout << std::setprecision(std::numeric_limits<Real>::digits10);
	std::cout << a << "\n";
	std::cout << b << "\n";
	std::cout << a/b << "\n";
}

Now look that with flag -Ofast the number is wrong:

$ g++ quick_test.cpp -o quick_test
$ ./quick_test
0.4978165316556861972925864431199
0.4978165316556861972925864431199
1
k$ g++ quick_test.cpp -O3 -o quick_test
k$ ./quick_test
0.4978165316556861972925864431199
0.4978165316556861972925864431199
1
$ g++ quick_test.cpp -Ofast -o quick_test
$ ./quick_test
Error, bad value.
0.4978165316556861972925864431199
0.4978165316556861785812770904158
--
0.4978165316556861785812770904158
0.4978165316556861785812770904158
1

The printed numbers are incorrect for -Ofast above. The division in this simple example is correct. But it is even more strange that when I put these exact five lines somewhere in yade:

	RealT a{"0.4978165316556861972925864431199324"};
	RealT b{"0.4978165316556861972925864431199324"};
	std::cout << std::setprecision(std::numeric_limits<RealT>::digits10 + 4) << a << "\n";
	std::cout << std::setprecision(std::numeric_limits<RealT>::digits10 + 4) << b << "\n";
	std::cout << std::setprecision(std::numeric_limits<RealT>::digits10 + 4) << a/b << "\n";

I get this output with -Ofast:

0.4978165316556861972925864431199324
0.4978165316556861972925864431199324
1.0000000000000002215576510624637261

The printed numbers are correct, but the division is a strange result. Not the one I would get when dividing 0.4978165316556861972925864431199324/0.4978165316556861785812770904158242 because that produces 1.00000000000000003758675769660005661.

While compiling yade with the -O3 gives the correct results:

0.4978165316556861972925864431199324
0.4978165316556861972925864431199324
1
@ckormanyos
Copy link
Member

Hi Janek, I probably cut a few too many corners in my operator/= and lost the carry bit somewhere down that road. It think a volatile or a separate use of unchacked-fma wil repair this.

I will reproduce the bug (if possible) then report.

-Ofast is one of those things like -ffast-math that still might be less well-checked in this backend.

Thx and great catch!

@ckormanyos ckormanyos self-assigned this Jan 17, 2025
@ckormanyos ckormanyos added the bug Something isn't working label Jan 17, 2025
@cosurgi
Copy link
Collaborator Author

cosurgi commented Jan 17, 2025

I hope you can reproduce it. I just checked and it occurs both for g++ 12.2.0-14 and g++ 14.2.0-14

@ckormanyos
Copy link
Member

It is all going down fast, let me catch up on -Ofast. I have an impression of what might be quirky.

It is really good to find this stuff now rather in the general population later.

Thx Janek (@cosurgi)

@ckormanyos
Copy link
Member

ckormanyos commented Jan 17, 2025

Hi Janek (@cosurgi) on my WSL2, sadly with -Ofast I can NOT reproduce it.

Could you tell me some more specifics about your system/PC, etc.? I don't think this thingy will stop the show, but we need to face it rather sooner than later here in this post.

Thx Janek!

@ckormanyos
Copy link
Member

Wait, Janek, I got the answer $1$ for the division. Is this the expected value? I am getting confused. In your post it seems like the numerator and denominator ar equal? Or is it just old-man blurry in my eyes?

@ckormanyos
Copy link
Member

ckormanyos commented Jan 17, 2025

Oh you are telling me when you put the lines within the larger context of YADE, then the division becomes non equal to $1$ and thus wrong.

Ah that will be a bit more tricky to reproduce indeed.

@cosurgi
Copy link
Collaborator Author

cosurgi commented Jan 17, 2025

Oh you are telling me when you put the lines within the larger context of YADE, then the division becomes non equal to 1 and thus wrong.

yes. But also initialization is wrong. Search in your browser for strings: 4978165316556861785812770904158242 and 4978165316556861972925864431199324 and you will see.

The quick_test.cpp got the initialization wrong and division right. In the larger context of YADE the initialization was correct but division was wrong.

@cosurgi
Copy link
Collaborator Author

cosurgi commented Jan 17, 2025

on my WSL2, sadly with -Ofast I can NOT reproduce it.

But is the printed number exactly the same as the number you initialized a and b with?

@cosurgi
Copy link
Collaborator Author

cosurgi commented Jan 17, 2025

I just realized that in summary I have stumbled on two bugs here:

  1. in quick_test.cpp a simple code, the initialization is incorrect.
  2. in the larger context of YADE the initialization is correct, but division is wrong.

Ah that will be a bit more tricky to reproduce indeed.

Yes :( I need to wrap my head around this. I don't know if I can write a minimal reproducing example.

@ckormanyos
Copy link
Member

ckormanyos commented Jan 17, 2025

Hi Janek,

This one is now clear to me. I ran the backend through our arithmetic tests with -Ofast and a whole bunch of NaN errors and lots of other errors came to arise.

-Ofastactually allows the compiler to assume properties like associativity, commutativity, and other algebraic transformations that are not always valid in floating-point arithmetic.

-Ofast is also allowed to reorder operations to improve performance, ignoring potential small differences due to rounding.

Since our add/sub/mul/div/sqrt (and normalize) require no reordering, -Ofast can't be used.

Now, thinking out loud here, GCC actually offers a line-by-line noreorder in its collection of attributes. So I could define a macro that punches this attribute onto all the relevant lines so that even if clients try to use -Ofast they will still get the right answers.

Cc: @jzmaddock

@ckormanyos
Copy link
Member

Ugghhh but there will still be potential problems with NaN, infinity and the like.

-Ofast on the double-float backend is a no-go

  • until we do something similar to what we did for all the GPU math with Matt recently.

Cc: @jzmaddock and @mborland

@cosurgi
Copy link
Collaborator Author

cosurgi commented Jan 17, 2025

Could you tell me some more specifics about your system/PC, etc.?

So if the issue is clear, you no longer need these specifics? But in general I have tried this on

  • devuan daedalus with g++ 12.2.0-14, and
  • debian sid with g++ 14.2.0-14 (clean install in chroot)

@ckormanyos
Copy link
Member

Somehow I get the feeling this will creep up from time to time. It feels like publishing non-portable code. But I don't at the moment see a clear pathway to supporting -Ofast. Maybe we will come up with a reliable solution in the future.

I'm going to keep this issue open for a while and clear my head on this example and play around with it for a while.

@cosurgi
Copy link
Collaborator Author

cosurgi commented Jan 17, 2025

Ugghhh but there will still be potential problems with NaN, infinity and the like.

NaN, Inf in general does not work with -Ofast and double, so one cannot expect it to suddenly work with cpp_double_fp. So we can go here without expecting them to work. Hence these simple reordering macro guards should do the trick, I think.

@ckormanyos
Copy link
Member

I think we might need full support for something like an unchecked version, easily activate-able. But I have not fully thought through this one yet.

@cosurgi
Copy link
Collaborator Author

cosurgi commented Jan 17, 2025

Yeah. If someone uses -Ofast he basically says that his code is correct, and does not produce NaN or Inf, and he wants it to be fast. This is where your mul_unchecked can shine.

@ckormanyos
Copy link
Member

Closing. Do not use -Ofast with this backend.

We must state this LOUDLY and CLEARLY in the docs (TODO).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants