Problem with g++ `-Ofast` #189

cosurgi · 2025-01-17T16:43:02Z

Hi Chris (@ckormanyos), in our benchmarks in boostorg#648 I was using -O3 flag till now. I wanted to compare cpp_double_double with float128 when using -Ofast flag. But I couldn't because I have discovered a bug which caused yade to exit, making yade -n --quickperformance -j 4 impossible to run.

Try this code with g++

quick_test.cpp:

#include <boost/multiprecision/cpp_double_fp.hpp>
#include <iostream>
#include <string>

int main() {
	using Real = boost::multiprecision::cpp_double_double;
	std::string num{"0.4978165316556861972925864431199"};
	Real a{num};
	Real b{num};
	if(a.str() != num) {
		std::cout << "Error, bad value.\n";
		std::cout << num << "\n";
		std::cout << a.str() << "\n";
		std::cout << "--\n";
	}
	std::cout << std::setprecision(std::numeric_limits<Real>::digits10);
	std::cout << a << "\n";
	std::cout << b << "\n";
	std::cout << a/b << "\n";
}

Now look that with flag -Ofast the number is wrong:

$ g++ quick_test.cpp -o quick_test
$ ./quick_test
0.4978165316556861972925864431199
0.4978165316556861972925864431199
1
k$ g++ quick_test.cpp -O3 -o quick_test
k$ ./quick_test
0.4978165316556861972925864431199
0.4978165316556861972925864431199
1
$ g++ quick_test.cpp -Ofast -o quick_test
$ ./quick_test
Error, bad value.
0.4978165316556861972925864431199
0.4978165316556861785812770904158
--
0.4978165316556861785812770904158
0.4978165316556861785812770904158
1

The printed numbers are incorrect for -Ofast above. The division in this simple example is correct. But it is even more strange that when I put these exact five lines somewhere in yade:

	RealT a{"0.4978165316556861972925864431199324"};
	RealT b{"0.4978165316556861972925864431199324"};
	std::cout << std::setprecision(std::numeric_limits<RealT>::digits10 + 4) << a << "\n";
	std::cout << std::setprecision(std::numeric_limits<RealT>::digits10 + 4) << b << "\n";
	std::cout << std::setprecision(std::numeric_limits<RealT>::digits10 + 4) << a/b << "\n";

I get this output with -Ofast:

0.4978165316556861972925864431199324
0.4978165316556861972925864431199324
1.0000000000000002215576510624637261

The printed numbers are correct, but the division is a strange result. Not the one I would get when dividing 0.4978165316556861972925864431199324/0.4978165316556861785812770904158242 because that produces 1.00000000000000003758675769660005661.

While compiling yade with the -O3 gives the correct results:

0.4978165316556861972925864431199324
0.4978165316556861972925864431199324
1

The text was updated successfully, but these errors were encountered:

ckormanyos · 2025-01-17T17:01:36Z

Hi Janek, I probably cut a few too many corners in my operator/= and lost the carry bit somewhere down that road. It think a volatile or a separate use of unchacked-fma wil repair this.

I will reproduce the bug (if possible) then report.

-Ofast is one of those things like -ffast-math that still might be less well-checked in this backend.

Thx and great catch!

cosurgi · 2025-01-17T17:05:36Z

I hope you can reproduce it. I just checked and it occurs both for g++ 12.2.0-14 and g++ 14.2.0-14

ckormanyos · 2025-01-17T17:19:45Z

It is all going down fast, let me catch up on -Ofast. I have an impression of what might be quirky.

It is really good to find this stuff now rather in the general population later.

Thx Janek (@cosurgi)

ckormanyos · 2025-01-17T17:24:54Z

Hi Janek (@cosurgi) on my WSL2, sadly with -Ofast I can NOT reproduce it.

Could you tell me some more specifics about your system/PC, etc.? I don't think this thingy will stop the show, but we need to face it rather sooner than later here in this post.

Thx Janek!

ckormanyos · 2025-01-17T17:27:34Z

Wait, Janek, I got the answer $1$ for the division. Is this the expected value? I am getting confused. In your post it seems like the numerator and denominator ar equal? Or is it just old-man blurry in my eyes?

ckormanyos · 2025-01-17T17:29:04Z

Oh you are telling me when you put the lines within the larger context of YADE, then the division becomes non equal to $1$ and thus wrong.

Ah that will be a bit more tricky to reproduce indeed.

cosurgi · 2025-01-17T17:33:13Z

Oh you are telling me when you put the lines within the larger context of YADE, then the division becomes non equal to 1 and thus wrong.

yes. But also initialization is wrong. Search in your browser for strings: 4978165316556861785812770904158242 and 4978165316556861972925864431199324 and you will see.

The quick_test.cpp got the initialization wrong and division right. In the larger context of YADE the initialization was correct but division was wrong.

cosurgi · 2025-01-17T17:35:23Z

on my WSL2, sadly with -Ofast I can NOT reproduce it.

But is the printed number exactly the same as the number you initialized a and b with?

cosurgi · 2025-01-17T17:41:59Z

I just realized that in summary I have stumbled on two bugs here:

in quick_test.cpp a simple code, the initialization is incorrect.
in the larger context of YADE the initialization is correct, but division is wrong.

Ah that will be a bit more tricky to reproduce indeed.

Yes :( I need to wrap my head around this. I don't know if I can write a minimal reproducing example.

ckormanyos · 2025-01-17T17:48:28Z

Hi Janek,

This one is now clear to me. I ran the backend through our arithmetic tests with -Ofast and a whole bunch of NaN errors and lots of other errors came to arise.

-Ofastactually allows the compiler to assume properties like associativity, commutativity, and other algebraic transformations that are not always valid in floating-point arithmetic.

-Ofast is also allowed to reorder operations to improve performance, ignoring potential small differences due to rounding.

Since our add/sub/mul/div/sqrt (and normalize) require no reordering, -Ofast can't be used.

Now, thinking out loud here, GCC actually offers a line-by-line noreorder in its collection of attributes. So I could define a macro that punches this attribute onto all the relevant lines so that even if clients try to use -Ofast they will still get the right answers.

Cc: @jzmaddock

ckormanyos · 2025-01-17T17:52:40Z

Ugghhh but there will still be potential problems with NaN, infinity and the like.

-Ofast on the double-float backend is a no-go

until we do something similar to what we did for all the GPU math with Matt recently.

Cc: @jzmaddock and @mborland

cosurgi · 2025-01-17T19:06:55Z

Could you tell me some more specifics about your system/PC, etc.?

So if the issue is clear, you no longer need these specifics? But in general I have tried this on

devuan daedalus with g++ 12.2.0-14, and
debian sid with g++ 14.2.0-14 (clean install in chroot)

ckormanyos · 2025-01-17T20:29:46Z

Somehow I get the feeling this will creep up from time to time. It feels like publishing non-portable code. But I don't at the moment see a clear pathway to supporting -Ofast. Maybe we will come up with a reliable solution in the future.

I'm going to keep this issue open for a while and clear my head on this example and play around with it for a while.

cosurgi · 2025-01-17T20:48:08Z

Ugghhh but there will still be potential problems with NaN, infinity and the like.

NaN, Inf in general does not work with -Ofast and double, so one cannot expect it to suddenly work with cpp_double_fp. So we can go here without expecting them to work. Hence these simple reordering macro guards should do the trick, I think.

ckormanyos · 2025-01-17T20:51:23Z

I think we might need full support for something like an unchecked version, easily activate-able. But I have not fully thought through this one yet.

cosurgi · 2025-01-17T21:00:46Z

Yeah. If someone uses -Ofast he basically says that his code is correct, and does not produce NaN or Inf, and he wants it to be fast. This is where your mul_unchecked can shine.

ckormanyos · 2025-01-18T17:01:11Z

Closing. Do not use -Ofast with this backend.

We must state this LOUDLY and CLEARLY in the docs (TODO).

ckormanyos self-assigned this Jan 17, 2025

ckormanyos added the bug Something isn't working label Jan 17, 2025

ckormanyos closed this as completed Jan 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem with g++ `-Ofast` #189

Problem with g++ `-Ofast` #189

cosurgi commented Jan 17, 2025 •

edited

Loading

ckormanyos commented Jan 17, 2025

cosurgi commented Jan 17, 2025

ckormanyos commented Jan 17, 2025

ckormanyos commented Jan 17, 2025 •

edited

Loading

ckormanyos commented Jan 17, 2025

ckormanyos commented Jan 17, 2025 •

edited

Loading

cosurgi commented Jan 17, 2025 •

edited

Loading

cosurgi commented Jan 17, 2025 •

edited

Loading

cosurgi commented Jan 17, 2025 •

edited

Loading

ckormanyos commented Jan 17, 2025 •

edited

Loading

ckormanyos commented Jan 17, 2025

cosurgi commented Jan 17, 2025

ckormanyos commented Jan 17, 2025

cosurgi commented Jan 17, 2025

ckormanyos commented Jan 17, 2025

cosurgi commented Jan 17, 2025

ckormanyos commented Jan 18, 2025

Problem with g++ -Ofast #189

Problem with g++ -Ofast #189

Comments

cosurgi commented Jan 17, 2025 • edited Loading

ckormanyos commented Jan 17, 2025

cosurgi commented Jan 17, 2025

ckormanyos commented Jan 17, 2025

ckormanyos commented Jan 17, 2025 • edited Loading

ckormanyos commented Jan 17, 2025

ckormanyos commented Jan 17, 2025 • edited Loading

cosurgi commented Jan 17, 2025 • edited Loading

cosurgi commented Jan 17, 2025 • edited Loading

cosurgi commented Jan 17, 2025 • edited Loading

ckormanyos commented Jan 17, 2025 • edited Loading

ckormanyos commented Jan 17, 2025

cosurgi commented Jan 17, 2025

ckormanyos commented Jan 17, 2025

cosurgi commented Jan 17, 2025

ckormanyos commented Jan 17, 2025

cosurgi commented Jan 17, 2025

ckormanyos commented Jan 18, 2025

Problem with g++ `-Ofast` #189

Problem with g++ `-Ofast` #189

cosurgi commented Jan 17, 2025 •

edited

Loading

ckormanyos commented Jan 17, 2025 •

edited

Loading

ckormanyos commented Jan 17, 2025 •

edited

Loading

cosurgi commented Jan 17, 2025 •

edited

Loading

cosurgi commented Jan 17, 2025 •

edited

Loading

cosurgi commented Jan 17, 2025 •

edited

Loading

ckormanyos commented Jan 17, 2025 •

edited

Loading