Integrate cpp_double_fp_backend #648

ckormanyos · 2025-01-06T17:35:48Z

No description provided.

check if integer width is adequate in split()

Fixes #92 with good final report

minor change in header order

fix silly mistakes

Gsoc2021 double float chris

…into gsoc2021_double_float_chris # Conflicts: # .github/workflows/multiprecision_quad_double_only.yml # .gitignore # performance/performance_test.cpp # test/test_arithmetic.hpp

Gsoc2021 double float chris

cosurgi · 2025-01-16T17:04:18Z

Hi Chris (@ckormanyos) this is very hot off the press. I have just managed to reduce the number of rd_string calls from 233480 to 48 and str calls from 194842 to 437 in the yade -n --quickperformance -j 1 benchmark!

I cannot run the timing benchmarks yet, because I need to clean up the code and remove all these std::cerr << __PRETTY_FUNCTION__ << "\n"; everywhere ;) They slow down the calculations!

Now if we manage to add pow for integer powers we will be good to go!

ckormanyos · 2025-01-16T17:11:50Z

Hi Chris (@ckormanyos) this is very hot off the press. I have just managed to reduce the number of rd_string calls from 233480 to 48 and str calls from 194842 to 437 in the ./examples/yade -n --quickperformance -j 1 benchmark!

This is great. Nice work.

Now if we manage to add pow for integer powers we will be good to go!

I have not done that, but I verified that we go to a somewhat efficient calculation in our existing collection of default functions. If this ends up being one of the"functions-to-speed-up", then I can do that fast.

Now, we might still face some quirky issue, and the proof will be in your final bench run. If we are speedy, then good. If, for some other reason, we still face slow downs then we even now have two wins:

YADE is going to get better
And if we still need to squeeze more cpp_double_fp functions, then we can find them and push their limits down.

So no matter what actual numbers you get, your work has put us in a stronger position. Let's keep going!

Cc: @sinandredemption and @jzmaddock

cosurgi · 2025-01-16T17:26:12Z

Wow, with last single small change I reduced the rd_string calls to 2 !

cosurgi · 2025-01-16T21:09:42Z

Now if we manage to add pow for integer powers we will be good to go!

I have not done that, but I verified that we go to a somewhat efficient calculation in our existing collection of default functions. If this ends up being one of the"functions-to-speed-up", then I can do that fast.

Hi Chris (@ckormanyos), what happens if you replace z = z*z + c; with z = pow(z,2) + c; in your Mandelbrot benchmark?

ckormanyos · 2025-01-16T21:15:04Z

Hi Chris (@ckormanyos), what happens if you replace z = z*z + c; with z = pow(z,2) + c; in your Mandelbrot benchmark?

It ruins the performance completely and entirely. What a great question Janek. It took so long that I am still waiting for the timing result. i had real/imag components separated. See the pic below.

In summary, the pow function killed performance on that particular benchmark.

I went from 17 seconds to 170 seconds, a factor of 10

ckormanyos · 2025-01-16T21:16:17Z

Nightmare timing up by a factor of 10.

Oops, it is time to hand-optimize pow(x, n) where $n$ is an integer.

cosurgi · 2025-01-16T21:47:41Z

Whew. That's good news for me actually, because yade uses pow(arg, 2) 436138 times in the calculations. And I have removed calls to rd_string entirely. And I got this benchmark result (more "iter/sec" means better result):

type	calculation speed	factor
float128	159.4864 iter/sec	1
cpp_double_double	31.3289 iter/sec	5.09

Meaning that in yade float128 is still 5 times faster than cpp_double_double.

Before removing calls to rd_string it was like this:

type	calculation speed	factor	commit
float128	145.3411 iter/sec	1
cpp_double_double	30.4207 iter/sec	4.77	`9f34658`

So you may notice that float128 performance increased by a factor of 1.1 (10% faster) thanks to removing string streaming. And cpp_double_double is also faster, but only a tiny bit: by a factor of 1.03 (3% faster). While the comparison between the two actually got worse: from 4.77 to 5.09. Meaning that float128 benefitted more from the removal of string streaming than cpp_double_double.

cosurgi · 2025-01-16T22:08:45Z

I am not sure if on that screenshot the lines 140 and 141 are correct? You have zr2 = pow(zr, zr),
shouldn't it be:zr2 = pow(zr, 2) ? Or something like this, but not raising $zr^{zr}$.

ckormanyos · 2025-01-17T07:28:27Z

I am not sure if on that screenshot the lines 140 and 141 are correct? You have zr2 = pow(zr, zr),
shouldn't it be: zr2 = pow(zr, 2)?

You are right Janek. That was a silly, late evening, hurried blunder.

When I used the proper pow(zr, 2), the timing was worse $24s$ compared to $17s$, but not so bad as the previous report.

jzmaddock · 2025-01-17T09:28:52Z

Just curious, did we not optimize the default pow function for integer exponents?

Also just FYI the boost::math::pow(x) function is designed to optimise exactly this case: a power with a constant integer exponent. So far as I know there is no way within the language to detect that pow(T, int) is being called with an integer literal?

ckormanyos · 2025-01-17T10:47:57Z

Just curious, did we not optimize the default pow function for integer exponents?

Yes John, you are right. The generic collection of functions in Multiprecision DOES include specializations of eval_pow for pure integral powers.

I am experimenting with a local version, but I am not able to get significantly faster than the default version in Multiprecision, maybe only $10-20\%$ faster.

At the moment, I am not able to see any more clear bottlenecks in the overall performance of cpp_double_fp_backend.

jzmaddock · 2025-01-17T14:43:05Z

@ckormanyos does this PR improve power performance at all: #649 ?

cosurgi · 2025-01-17T15:45:26Z

Chris (@ckormanyos) can you share your Mandelbrot benchmark code? I want to make sure that I can reproduce your results. Because if I don't, then we know it's not a problem with cpp_double_fp but with my local configuration.

cosurgi · 2025-01-17T16:56:52Z

Chris (@ckormanyos) in this post with the Mandelbrot benchmark which version of g++ and optimization flags (-O3, -Ofast ?) did you use to compare cpp_double_double with float128 ?

ckormanyos · 2025-01-17T17:07:23Z

the Mandelbrot benchmark

`cpp_double_double`

type	calculation speed	factor
`cpp_double_double` g++ 12.2	449.15 iter/sec	1
`float128` g++ 12.2	263.15 iter/sec	1.70
`cpp_bin_float<32>` g++ 12.2	211.81 iter/sec	2.12
`cpp_dec_float<31>` g++ 12.2	78.15 iter/sec	5.74
`mpfr_float_backend<31>` g++ 12.2	51.01 iter/sec	8.80

Here we can see that cpp_double_double beats everyone else by over a factor of two.

`cpp_double_long_double`

type	calculation speed	factor
`cpp_bin_float<39>` g++ 12.2	122.55 iter/sec	1
`cpp_double_long_double` clang++ 19.1.4	108.79 iter/sec	1.12
`cpp_bin_float<39>` clang++ 19.1.4	102.19 iter/sec	1.20
`cpp_dec_float<39>` g++ 12.2	71.42 iter/sec	1.71
`mpfr_float_backend<39>` g++ 12.2	45.75 iter/sec	2.67
`cpp_double_long_double` g++ 12.2	14.97 iter/sec	8.18

Here we can see that cpp_double_long_double performs very good. But the compiler developers will have a mystery to solve: cpp_bin_float<39> g++ 12.2 is faster than cpp_double_long_double clang++ 19.1.4 by just a little, which in turn is faster than cpp_double_long_double g++ 12.2 by a factor of 8.

`cpp_double_float128`

type	calculation speed	factor
`cpp_bin_float<67>` g++ 12.2	118.43 iter/sec	1
`mpfr_float_backend<67>` g++ 12.2	43.34 iter/sec	2.73
`cpp_dec_float<67>` g++ 12.2	40.09 iter/sec	2.95
`cpp_double_float128` g++ 12.2	14.99 iter/sec	7.90

Here we can see that cpp_double_float128 has a lot of potential to beat cpp_bin_float<67> once the g++ developers sort out the problems with cpp_double_long_double g++ 12.2. The increase in performance should be about by a factor of 8 :)

So all is good. I think we can merge this branch once documentation and other small TODOs are complete.

ckormanyos · 2025-01-19T08:10:15Z

We can definitely mark the performance problem of the cpp_double_fp_backend as solved.

Thank you Janek (@cosurgi) that was a big effort, and it really provided a lot of information and clarity.

Some of the results on cpp_double_long_double, where long double is 80-bit, 10-byte in width are interesting. That hardware version of the 10-byte floating-point representation is running on the legendary (modernized) versions of the i387 FPU, the hardware that really put 10-byte floating-point on the map.

The newer i7 processors have extremely powerful 64-bit floating-point hardware operations, and it seems like these are being very well supported nowdays in hardware and software.

Down the road I will be doing some non-x86_64 measurements on M1 and/or M2 and a few embedded bare-metal controllers like an ARM(R) Cortex(R) M7, having double-precision floating-point FPU support.

All-in-all I'm somewhat surprised at how fast cpp_double_double ended up in certain harware/software configurations. As mentioned in previous posts, this backend (and of course that type specifically) have lots of room for optimization improvement.

I'm happy enough with it to make a first release out of this state.

Cc: @sinandredemption and @jzmaddock

jzmaddock · 2025-01-19T09:08:53Z

There might be one more thing to check: that each of the backend/compiler configurations are doing (roughly) the same amount of work. Something that can happen when there is a tolerance set for termination is you can hit "unfortunate" parameters which cause the code to thrash through many needless iterations which don't actually get you any closer to the end result. I have no idea if this is the case here, but because they don't behave quite like exactly rounded IEEE types, things like double double can easily break assumptions present in the code.

sinandredemption and others added 30 commits August 21, 2021 21:34

add more info about double-float

64f703b

check if integer width is adequate in split()

122d2b6

Merge branch 'develop' into gsoc2021_double_float

3a09d7b

Merge pull request #109 from BoostGSoC21/gsoc2021_double_float

a0794d6

check if integer width is adequate in split()

add info about quad-float

78a0108

Update report.md

11afc27

correct spelling mistakes

d0be8ff

Merge branch 'develop' into gsoc2021_quad_float

9a3dc23

Merge pull request #102 from BoostGSoC21/gsoc2021_quad_float

46d276f

Fixes #92 with good final report

minor change in header order

87a3fd2

Merge pull request #110 from BoostGSoC21/report-patch-1

f44724c

minor change in header order

fix silly mistakes

5f6e233

Merge pull request #111 from BoostGSoC21/report-patch-2

9e2d4be

fix silly mistakes

Merge branch 'develop' into gsoc2021_double_float_chris

8126d3b

Restore/adapt all tests from test_exp.cpp

18f3c8d

Try get more arithmetic tests working

cc0fb2d

Merge pull request #112 from BoostGSoC21/gsoc2021_double_float_chris

d3e52c1

Gsoc2021 double float chris

Merge branch 'develop' of https://github.com/boostorg/multiprecision …

ccf6734

…into gsoc2021_double_float_chris # Conflicts: # .github/workflows/multiprecision_quad_double_only.yml # .gitignore # performance/performance_test.cpp # test/test_arithmetic.hpp

Retry some failed CI runs

efd25b5

Simply disable selected failed arithmetic tests

ad7ef1c

Merge pull request #113 from BoostGSoC21/gsoc2021_double_float_chris

d9cf324

Gsoc2021 double float chris

Merge branch 'develop' into gsoc2021_double_float

169b1c6

Simplify syntax and adapt one arithmetic test

7effe8d

DF refactor name handle warn and check standalone

73ceded

Merge branch 'develop' into gsoc2021_double_float_chris

7fb8556

Merge pull request #114 from BoostGSoC21/gsoc2021_double_float_chris

7eb5279

Gsoc2021 double float chris

Basic constexpr double-float with lots TODO

cc4c9fe

C++11 constexpr correct but get rid of 11 anyway

0acbd45

Restore mysterious extra_normalize and its TODOs

ee46432

Correct typo(s) in eval_is_zero()

4f365c9

Specialization for eval_pow for integral powers

c6ce52d

cosurgi mentioned this pull request Jan 17, 2025

Problem with g++ -Ofast BoostGSoC21/multiprecision#189

Closed

ckormanyos added 3 commits January 17, 2025 18:58

Merge branch 'develop' of https://github.com/boostorg/multiprecision …

fc7f5ae

…into cpp_double_fp_backend

Merge branch 'develop' into cpp_double_fp_backend

5c72f8a

Merge branch 'develop' of https://github.com/boostorg/multiprecision …

375b2d3

…into cpp_double_fp_backend

Add power-to-n edge cases for cover

594156f

Try to hit a few more cover lines

41ddd91

Add more pow_n edge cases

ff854af

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate cpp_double_fp_backend #648

Integrate cpp_double_fp_backend #648

ckormanyos commented Jan 6, 2025

cosurgi commented Jan 16, 2025 •

edited

Loading

ckormanyos commented Jan 16, 2025 •

edited

Loading

cosurgi commented Jan 16, 2025

cosurgi commented Jan 16, 2025 •

edited

Loading

ckormanyos commented Jan 16, 2025

ckormanyos commented Jan 16, 2025 •

edited

Loading

cosurgi commented Jan 16, 2025 •

edited

Loading

cosurgi commented Jan 16, 2025 •

edited

Loading

ckormanyos commented Jan 17, 2025 •

edited

Loading

jzmaddock commented Jan 17, 2025

ckormanyos commented Jan 17, 2025

jzmaddock commented Jan 17, 2025

cosurgi commented Jan 17, 2025 •

edited

Loading

cosurgi commented Jan 17, 2025

ckormanyos commented Jan 17, 2025

ckormanyos commented Jan 17, 2025 •

edited

Loading

cosurgi commented Jan 17, 2025 •

edited

Loading

ckormanyos commented Jan 18, 2025

ckormanyos commented Jan 18, 2025

cosurgi commented Jan 18, 2025 •

edited

Loading

ckormanyos commented Jan 19, 2025 •

edited

Loading

jzmaddock commented Jan 19, 2025

Integrate cpp_double_fp_backend #648

Are you sure you want to change the base?

Integrate cpp_double_fp_backend #648

Conversation

ckormanyos commented Jan 6, 2025

cosurgi commented Jan 16, 2025 • edited Loading

ckormanyos commented Jan 16, 2025 • edited Loading

cosurgi commented Jan 16, 2025

cosurgi commented Jan 16, 2025 • edited Loading

ckormanyos commented Jan 16, 2025

ckormanyos commented Jan 16, 2025 • edited Loading

cosurgi commented Jan 16, 2025 • edited Loading

cosurgi commented Jan 16, 2025 • edited Loading

ckormanyos commented Jan 17, 2025 • edited Loading

jzmaddock commented Jan 17, 2025

ckormanyos commented Jan 17, 2025

jzmaddock commented Jan 17, 2025

cosurgi commented Jan 17, 2025 • edited Loading

cosurgi commented Jan 17, 2025

ckormanyos commented Jan 17, 2025

ckormanyos commented Jan 17, 2025 • edited Loading

cosurgi commented Jan 17, 2025 • edited Loading

ckormanyos commented Jan 18, 2025

ckormanyos commented Jan 18, 2025

cosurgi commented Jan 18, 2025 • edited Loading

cpp_double_double

cpp_double_long_double

cpp_double_float128

ckormanyos commented Jan 19, 2025 • edited Loading

jzmaddock commented Jan 19, 2025

cosurgi commented Jan 16, 2025 •

edited

Loading

ckormanyos commented Jan 16, 2025 •

edited

Loading

cosurgi commented Jan 16, 2025 •

edited

Loading

ckormanyos commented Jan 16, 2025 •

edited

Loading

cosurgi commented Jan 16, 2025 •

edited

Loading

cosurgi commented Jan 16, 2025 •

edited

Loading

ckormanyos commented Jan 17, 2025 •

edited

Loading

cosurgi commented Jan 17, 2025 •

edited

Loading

ckormanyos commented Jan 17, 2025 •

edited

Loading

cosurgi commented Jan 17, 2025 •

edited

Loading

cosurgi commented Jan 18, 2025 •

edited

Loading

`cpp_double_double`

`cpp_double_long_double`

`cpp_double_float128`

ckormanyos commented Jan 19, 2025 •

edited

Loading