(Problem) Mandelbulber 2.11 and Dr. Queue

  • 11 Replies
  • 526 Views

0 Members and 1 Guest are viewing this topic.

Offline Nakedrabbit

  • *
  • Fractal Freshman
  • *
  • Posts: 8
« on: November 07, 2017, 08:24:45 PM »
Here is an odd question, maybe you can help me with it.

I am running an 8 node render farm entirely on MacOS 10.11 using the flexible but difficult to get-going "Dr. Queue."  I had written a job script to render Mandelbulber frames and have had some success with it, using v. 2.06 and maybe 2.08, I can't remember...  The last job I ran was a 3-4000 frame render at 4K.  Not bad.  The HD version is here if anyone cares:

https://youtu.be/KEmiDcsFNuI

I have recently upgraded to the latest Mandelbulber (2.11) and have run into a massive problem I don't seem to be able to get around.  The script I USED to employ successfully now runs into all kinds of snags, and I'm not sure what went wrong!  My success depended on sending each client machine this command-line argument:

$MANDELBULBER_PATH $DRQUEUE_PROJECT K -s $DRQUEUE_FRAME -e $MANDELENDFRAME -f exr -o $OUTPUT_PATH

Where $MANDELBULBER_PATH is the invocation of the Mandelbulber executable (/Applications/mandelbulber2.app/Contents/MacOS/mandelbulber2)
$DRQUEUE_PROJECT is the pathname of the Mandelbulber project file
$DRQUEUE_FRAME is the current frame sent by the render farm manager
$MANDELENDFRAME is just the previous number with a +1 so I can define the scope of each Mandelbulber instance as being from frame X to frame X+1
$OUTPUT_PATH is the place to put things.

This script works great!  And nothing has changed on it.  What HAS changed is that previously I was able to get Mandelbulber to render the ONE frame and quit.  Then the render farm manager would hand it a new frame and a new instance would pop up, one per core for each machine.

NOW when I send the job the exact same way Mandelbulber will render the frame - and not even the RIGHT frame, btw, to the local ~/mandelbulber/images folder and then HANG, waiting for Mandelbulber's queue to send it something else!  But I do not want Mandelbulber to queue renders!  Dr. Queue is doing that job fine...

Why not have Mandelbulber do it?  Well, because Dr. Queue, for all its faults, is extremely stable.  You have to kill the machine entirely to stop it.  But Mandelbulber can and sometimes does crash out.  If this happens on a render node, no harm, no foul and the frame that did not get completed will be assigned to another instance of Mandelbulber on another machine.  If I use Mandelbulber as the render master and it crashes... well, the whole things stops.  I tend to close the door on the render farm and leave it alone for a few days.  Even for a few hours, if the main program stops, I'm out of luck.

So what am I doing wrong?  Mandelbulber USED to obey my commands, rendering JUST the assigned frame X and doing so to a SHARED NETWORK location.  Now it's rendering not at all the right frame (and always the same one, too, on every machine!) to the local folder and then jamming up, reporting that there is nothing in the queue for it to do.

Can anyone help?  Thanks!

Offline buddhi

  • *
  • Fractal Phenom
  • ****
  • Posts: 53
    • Mandelbulber GitHub repository
« Reply #1 on: November 07, 2017, 10:59:39 PM »
Syntax is wrong. Should be option -K instead of K
Option -K renders keyframe animation
When K is without minus then K is wrongly considered as a filename and is queued (multiple files specified)

$MANDELBULBER_PATH $DRQUEUE_PROJECT -K -s $DRQUEUE_FRAME -e $MANDELENDFRAME -f exr -o $OUTPUT_PATH

Offline Nakedrabbit

  • *
  • Fractal Freshman
  • *
  • Posts: 8
« Reply #2 on: November 08, 2017, 07:23:47 AM »
I love you, buddhi.

For everything.

I will try this tomorrow.

Offline Nakedrabbit

  • *
  • Fractal Freshman
  • *
  • Posts: 8
« Reply #3 on: November 09, 2017, 06:58:07 AM »
Further report.  Yes, the addition of a single switch-sign means everything is working properly now!  8K equirectangular stereo Mandelbox in over 37000 frames, rendering now.  Check back with me in a couple months.

Thanks so much, again, for everything, not just the last minus sign you gave me.

Offline Nakedrabbit

  • *
  • Fractal Freshman
  • *
  • Posts: 8
« Reply #4 on: November 14, 2017, 08:14:36 PM »
OK actually maybe I've discovered something else...

Mandelbulber v. 2.08 - I slushed back, thinking the 2.11 was the issue on my 8 node mini render farm.  Sent a job that was 37000 frames.  About 134 frames in the processes came to a screeching halt - frames that took 15 minutes each were now on 6 days and counting, stuck at 95.5% and not completing. a memory leak?  I dunno.  Here's the log from Dr. Queue describing the issue.

This is from a 8 core node, which means that 8 separate instances of Mandelbulber are running simultaneously.  In contrast to this log, frame 171, frame 38 took about 14 minutes, for example. Note that Mandelbulber gets to 99.95% about 35 minutes in and then stays there for an astonishing 6 days more until I shut the process down.  No frame was written to disk.  Exactly the same thing happened on all nodes, even the 4 core ones.

The disk is not full - there are 4.5Tb left on it.  It's an ethernet shared volume, over SMB, mounted at each local node and addressed with a symlink so that Mandelbulber can find it.  The initial fracas I had was over the exact same file, which got about 92 frames in before it quit.  That one was on 2.11, I think, but I'm not sure.

Any hints?

Hey, why can't I attach a TXT or LOG file to these?  Seems a lot more lightweight than a JPG would be. Anyway, here are the relevant details:


Log started at Wed Nov  8 09:34:09 2017
Detected 8 CPUs
Default data hidden directory: /Users/jhvh-1/.mandelbulber/
Default data public directory: /Users/jhvh-1/mandelbulber/


Initialization: Setting up image buffers [                                    ]

Initialization: Loading textures [                                            ]

Frame 172 of 36750 Done 0.00%, elapsed: 2.7s, estimated to end: n/a [         ]

Initialization: Loading textures [                                            ]

Rendering image: Starting rendering of image [                                ]

Rendering image: Done 0.00%, elapsed: 1.0s, estimated to end: n/a [           ]

Rendering image: Done 0.23%, elapsed: 2.0s, estimated to end: 12m 56.5s [     ]

[snip - goes along just fine and hits the 99.95% mark]

Rendering image: Done 99.86%, elapsed: 27m 47.2s, estimated to end: 3.8s [### ]

Rendering image: Done 99.95%, elapsed: 27m 48.2s, estimated to end: 1.3s [### ]

Rendering image: Done 99.95%, elapsed: 27m 49.2s, estimated to end: 1.3s [### ]

Rendering image: Done 99.95%, elapsed: 27m 50.2s, estimated to end: 1.3s [### ]

Rendering image: Done 99.95%, elapsed: 27m 51.2s, estimated to end: 1.3s [### ]


[snip - stays that way for 6 days, here's the end of the log]

Rendering image: Done 99.95%, elapsed: 6d 0h 45m 48.0s, estimated to end: 1.3s []

Rendering image: Done 99.95%, elapsed: 6d 0h 45m 49.0s, estimated to end: 1.3s []

Rendering image: Done 99.95%, elapsed: 6d 0h 45m 50.0s, estimated to end: 1.3s []





Offline mclarekin

  • *
  • Fractal Furball
  • ***
  • Posts: 206
« Reply #5 on: November 14, 2017, 09:28:18 PM »
Hi Nakedrabbit

It is probably best if you create an Issue here:

https://github.com/buddhi1980/mandelbulber2/issues

you can attach log files and .fract files







Offline Nakedrabbit

  • *
  • Fractal Freshman
  • *
  • Posts: 8
« Reply #6 on: November 14, 2017, 09:37:59 PM »
Thanks, mclarekin, but I'm going to do one more test before reporting it as a real problem.  If it's on my end I don't want anyone to have to deal with it.

I did clear out the queue and resubmitted the job, resulting in renders on all 40 cores that got to 99.95% and hovered there for half an hour.  So I stopped the job an hour into it (come on, we know how this is going to end!) and restarted ALL nodes.  So if there's any dust and spiders in there, that should sweep them out.

I will reinstall 2.11 and try the same job again, hoping that if there were memory issues, etc. this will take care of it?  If the same issue persists I'll report it properly, because then it will be on the current code base, not on an old version.  Based on my goof with the switch last time I did not want to cry wolf.

Offline Nakedrabbit

  • *
  • Fractal Freshman
  • *
  • Posts: 8
« Reply #7 on: November 15, 2017, 05:24:48 PM »
For the record:

Mandelbulber conks out at frame 135, will not render any frame past 99.95% after updating to 2.11, restarting, changing the script to render to a separate volume (in case permissions or other communications protocols were the issue).

The previous run through would not go past frame 92.  The difference, 43 frames, does not correspond to the number of cores available (40) so it's hard to make any sense out of that, either.  This one is a stumper.

I guess it's possible the .fract file is somehow problematic, but why would the render stop at frame 92 one time and 135 the second?  I'll fill out a proper bug report at GitHub once I remember my password there.

Offline mclarekin

  • *
  • Fractal Furball
  • ***
  • Posts: 206
« Reply #8 on: November 16, 2017, 08:45:40 AM »
Can you render frame 92 93 and 135 136 individually. Use first/last frame to render controls to get the parameters for the bad frames and see if they render. I copy to text editor and close animation, then copy-load from clip board and render normally.  If they render then this eliminates the settings/parameters from causing the problem. It is unlikely, but it did happen to me just the other day, I had a situation where a  combination of settings created a very long calculation (hours). It is also possible that the formula may have a code mistake, ( i keep finding them  :embarrass:)

And as to why frame 92 and then 135 ????, I have no idea



Offline Nakedrabbit

  • *
  • Fractal Freshman
  • *
  • Posts: 8
« Reply #9 on: November 21, 2017, 11:34:27 PM »
Thanks for your help and insight on this maclarekin.

This render is happening on a 40-core junky render farm I built.  So when I say "92" that would mean the 92nd frame PLUS the next 40 frames on 40 instances of Mandelbulber.  92 plus 40 is 132, which is shockingly close to that 135 to make you wonder.  But it did those frames and then got stick on 135-175?  So yes, I wondered about the settings file itself.  Does not SEEM to be the issue.

Running an older, simpler file on exactly the same rig stalled in exactly the same way - all 40 instances frozen at 99.95%

Took both files home to run on my Mac Pro and a completely different thing happened!  Mandelbulber crashes unless started from the Terminal!  Oy vey, I let the demons out.

Offline mclarekin

  • *
  • Fractal Furball
  • ***
  • Posts: 206
« Reply #10 on: November 22, 2017, 08:03:32 AM »
OK thats as far as I can help, we need to call in the experts. I would suggest to post it as an issue at GitHub and  we will see if Buddhi or Zebastain can help you.

Offline Nakedrabbit

  • *
  • Fractal Freshman
  • *
  • Posts: 8
« Reply #11 on: November 22, 2017, 08:56:32 AM »
Thanks, mclarekin, I do appreciate it though.  I'll run through my everything and make sure I'm not just making  dumb mistake somewhere...


xx
Birdie Style

Started by gannjondal on Fractal Image Gallery

1 Replies
107 Views
Last post May 08, 2018, 02:39:37 PM
by who8mypnuts
xx
A little problem?

Started by Know That Fractal! on Fractal Image Gallery

2 Replies
224 Views
Last post December 25, 2017, 09:37:27 AM
by Fraktalist
clip
is this a lighting problem?

Started by cubsterky on Mandelbulb3d

3 Replies
159 Views
Last post March 09, 2018, 11:52:24 AM
by Bill Snowzell
xx
depth of field problem

Started by Tsarvek on Mandelbulb3d

0 Replies
99 Views
Last post May 05, 2018, 12:21:14 AM
by Tsarvek
clip
TheRedshiftRider 5, problem with zoom?

Started by julofi on Kalles Fraktaler

4 Replies
105 Views
Last post May 09, 2018, 01:32:16 PM
by Kalles Fraktaler