I wrote my fractal program, FractalWorks (for Mac OS) in Objective-C. That's a superset of C with OOO extensions.
The multithreading part was using POSIX threads.
I don't have any code for your OS, or your specific language, but the approach will be similar regardless of those things.
I create a number of threads that's the same as the number of logical cores (double the number of physical cores, due to hyperthreading on Intel processors.)
I then slice up my fractal into at least 2*threads number of rectangles, and create a pool of Job objects that contain descriptions of those rectangles for rendering.
I set up a thread-safe Job queue that provides a new job to the rendering threads on demand, and a thread-safe "number of pending jobs" counter that tracks the number of remaining jobs. Each thread asks the job queue for a job to do. It then renders that job from beginning to end, and then tells the job counter to decrement the number of pending jobs, before asking the job queue for another job to do. Once the number of pending jobs drops to zero, the plot is complete.
Google Windows thread-safe queues and thread-safe counters for more information. The specific implementation will be different from UNIX to Windows, but the concepts will be the same-
Since there are at least twice as many jobs to do as threads to do them, threads that are assigned fast-completing tasks are able to ask for another task to do once they complete their work.
On the main thread, I monitor the number of pending jobs, and when it reaches zero, I render the fractal to the screen.
My program renders Mandelbrot and Julia set fractals, and uses boundary following to avoid rendering the interiors of contiguous areas of Mandelbrot points (and other contiguous regions with the same iteration count if it's not calculating fractional iterations or distance esimtates). Boundary following is the fastest when the area being rendered has a small ratio of perimeter to internal area, so cutting the plot into too many narrow slices ends up being counter-productive. I slice my plot into horizontal slices, which isn't ideal. I should really divide it into smaller and smaller squares, since long skinny rectangles have more perimeter than squares with the same area, but it nevertheless works well.
I recently added a section to the Wikipedia article on the Mandelbrot set on multi-threading and other optimizations. You can read it here. (It starts out discussing border tracing, and then there's a section on more general multi-threading.)
(Note that border tracing makes a dramatically larger difference than multi-threading.)