Lots of really interesting and good advice in this thread.

I use MPIR, MPFR, and MPREAL (an MPFR C++ wrapper,

http://www.holoborodko.com/pavel/mpfr/, as mentioned earlier in the thread) in my fractal explorer, Fractalscope,

https://sourceforge.net/projects/fractalscope/ . MPIR is the Windows compatible version of GMP. In my experience, MPFR is slower than MPIR - probably because of its correct rounding etc.

Fractalscope uses MPREAL for calculating the required precision and precision mode (double or arbitrary) for a particular magnification level, but performs the actual Mandelbrot calculation using MPIR. The C++ operator overloading of MPREAL adds overhead to the calculation compared to the raw C calls of MPFR and MPIR and is therefore slower. Compiling MPIR specifically for your computer should help a bit with the arbitrary precision speed, but yeah using MPIR is like hitting a wall of jelly compared to double precision :-) The new perturbation technique is a huge leap in speed over conventional arbitrary precision, but the mathematics are beyond me!

Fractalscope uses SSE2 intrinsics for the Mandelbrot calculation at double precision (I never did get round to writing the SSE code for the other fractals). This enables four pixels to be calculated simultaneously without orbit detection and two pixels simultaneously with orbit detection. Things have moved on since I wrote that part of the code, new processors have more registers, AVX, etc, which I am not familiar with, so it may be possible to calculate more pixels simultaneously). Bruce Dawson's Fractal eXtreme has very fast arbitrary precision, but I believe that uses very clever and low level optimizations (and is closed source).

Orbit detection can give a very large speed up for areas that feature the inside of the set at high iteration counts, and is well worth including. Solid guessing can also help to speed up calculating areas which feature the inside of the set. It is not very effective at speeding up the calculation outside of the set at high magnifications because the iteration bands become much narrower (or non-existent).

In addition to the instruction level optimizations of SSE, AVX, thread level optimization is also something to look at. Multi-threading gives a sizeable speed increase. The fact that processors nowadays are increasing in cores rather than Mhz, means that mult-threading is a no-brainer - especially with something like calculating the Mandelbrot set which lends itself well to parallel computing.