• February 27, 2021, 10:05:01 PM

Login with username, password and session length

Author Topic: (Question) 128-bit floating point emulation: multiplication optimizations?  (Read 371 times)

0 Members and 1 Guest are viewing this topic.

Offline Mr Rebooted

  • Fractal Phenom
  • ****
  • Posts: 48
(Question) 128-bit floating point emulation: multiplication optimizations?
« on: January 20, 2021, 01:58:59 PM »
Is there simply a way to speed up stuff with quad precision just by optimizing
Code: [Select]
dvec2 mul (dvec2 dsa, dvec2 dsb) {
    precise dvec2 dsc;
    precise double c11, c21, c2, e, t1, t2;
    precise double a1, a2, b1, b2, cona, conb, split = 8193.0;
    cona = dsa.x * split;
    conb = dsb.x * split;
    a1 = cona - (cona - dsa.x);
    b1 = conb - (conb - dsb.x);
    a2 = dsa.x -a1;
    b2 = dsb.x - b1;
    c11 = dsa.x * dsb.x;
    c21 = a2 * b2 + (a2 * b1 + (a1 * b2 + (a1 * b1 - c11)));
    c2 = dsa.x * dsb.y + dsa.y * dsb.x;
    t1 = c11 + c2;
    e = t1 - c11;
    t2 = dsa.y * dsb.y + ((c2 - e) + (c11 - (t1 - e))) + c21;
    dsc.x = t1 + t2;
    dsc.y = t2 - (dsc.x - t1);
    return dsc;
}

The picture below shows that the FPS is around 400, but if I scroll around, it drops to 5 FPS.
(GLSL btw.)

Linkback: https://fractalforums.org/programming/11/128-bit-floating-point-emulation-multiplication-optimizations/3995/

Offline marcm200

  • 3c
  • ***
  • Posts: 925
Re: 128-bit floating point emulation: multiplication optimizations?
« Reply #1 on: January 20, 2021, 02:15:22 PM »
One low-level (non-multiplication) optimization could be - but it depends on what your compiler already "knows":

For large data types (depends on the machine and language what "large" is, I use it with C++ on my AMD from __float128  on), I do not pass objects and do not return objects, but work exclusively with references.

Code: [Select]
int8_t mul_TAB(dvec2& result, dvec2& dsa, dvec2& dsb);
/* return value : success or error Ü/

When passing references, I have control over allocation and deallocation of temporary objects, so one can optimize manually (espcially when using operator overloading).

You might also try putting all temporary objects of your routine in one globally allocated struct and then pass a reference for that. This
saves freeing the stack when exiting the routine - but makes parallelization a bit more tricky.



Offline claude

  • 3f
  • ******
  • Posts: 1784
    • mathr.co.uk
Re: <span style=
« Reply #2 on: January 20, 2021, 02:32:32 PM »
https://sources.debian.org/src/qd/2.3.22+dfsg.1-3/include/qd/inline.h/#L84 shows you can optimize two_prod (a building block of double-double mul) by using fused-multiply-subtract which avoids the need for splitting magic (btw your split magic number is incorrect for double precision). GLSL has fused-multiply-add (FMA), and you can implement fused-multiply subtract by fms(a, b, c) = fma(a, b, -c).

Offline Mr Rebooted

  • Fractal Phenom
  • ****
  • Posts: 48
Re: <span style=
« Reply #3 on: January 21, 2021, 01:07:40 AM »
https://sources.debian.org/src/qd/2.3.22+dfsg.1-3/include/qd/inline.h/#L84 shows you can optimize two_prod (a building block of double-double mul) by using fused-multiply-subtract which avoids the need for splitting magic (btw your split magic number is incorrect for double precision). GLSL has fused-multiply-add (FMA), and you can implement fused-multiply subtract by fms(a, b, c) = fma(a, b, -c).

What's going on here? I can't really track down what's causing the fuzziness.

Offline marcm200

  • 3c
  • ***
  • Posts: 925
Re: 128-bit floating point emulation: multiplication optimizations?
« Reply #4 on: January 21, 2021, 09:09:39 AM »
As registers are the fastest memory, if your language supports side-effect assignments of variables, you might try this:

Instead of
Code: [Select]
c11 = dsa.x * dsb.x;
c21 = a2 * b2 + (a2 * b1 + (a1 * b2 + (a1 * b1 - c11)));

use
Code: [Select]
c21 = a2 * b2 + (a2 * b1 + (a1 * b2 + (a1 * b1 - (c11 = dsa.x * dsb.x))));

That way, the value of c11 is computed in situ at first usage and might then still be in a register rather than having to be fetched from memory (most probably the L1-cache).

If you find a measurable performance gain, I'd be interested in timings. From my (limited) experience working with low-level optimizations of double precision control flows, I have not seen a speed-up (probably I'm interfering with the compiler's optimization capabilities).


Offline Mr Rebooted

  • Fractal Phenom
  • ****
  • Posts: 48
Re: 128-bit floating point emulation: multiplication optimizations?
« Reply #5 on: January 21, 2021, 02:32:53 PM »
As registers are the fastest memory, if your language supports side-effect assignments of variables, you might try this:

Instead of
Code: [Select]
c11 = dsa.x * dsb.x;
c21 = a2 * b2 + (a2 * b1 + (a1 * b2 + (a1 * b1 - c11)));

use
Code: [Select]
c21 = a2 * b2 + (a2 * b1 + (a1 * b2 + (a1 * b1 - (c11 = dsa.x * dsb.x))));

That way, the value of c11 is computed in situ at first usage and might then still be in a register rather than having to be fetched from memory (most probably the L1-cache).

If you find a measurable performance gain, I'd be interested in timings. From my (limited) experience working with low-level optimizations of double precision control flows, I have not seen a speed-up (probably I'm interfering with the compiler's optimization capabilities).

Unfortunately GLSL doesn't support side-effect assignments.

You'll have to think of something else.

Offline 3DickUlus

  • Administrator
  • *******
  • Posts: 2050
    • Digilantism
Re: 128-bit floating point emulation: multiplication optimizations?
« Reply #6 on: January 22, 2021, 02:46:44 AM »
just my 2 bits worth... study the assembler output and/or write it in assembler to bypass the compiler optimizations, without doing this you will never know what's really going on inside. >:D


xx
"Time Span"

Started by cricke49 on Fractal Image Gallery

0 Replies
803 Views
Last post August 02, 2018, 07:05:21 AM
by cricke49
xx
bounded floating-point format

Started by quaz0r on Off Topic

0 Replies
300 Views
Last post January 18, 2018, 07:10:30 PM
by quaz0r
clip
Effect of floating point precision on Mandelbrot calculations

Started by timhume on Fractal Mathematics And New Theories

5 Replies
538 Views
Last post February 27, 2018, 01:35:08 AM
by claude
xx
(Fixed precision) floating point performance: Best datatype?

Started by marcm200 on Programming

18 Replies
829 Views
Last post January 30, 2020, 08:11:23 PM
by marcm200
xx
Original Point Color

Started by mclarekin on Color Snippets

0 Replies
132 Views
Last post March 21, 2019, 08:12:34 AM
by mclarekin