Your browser was unable to load all of the resources. They may have been blocked by your firewall, proxy or browser configuration.
Press Ctrl+F5 or Ctrl+Shift+R to have your browser try again.

"Disconnect Tolerance" variable is useless #4277

thang.dv2 ·

Hello all,
"Specify disconnect tolerance in seconds when the step is marked as failed upon network disconnection.Use 0 to mark step as failed as soon as network disconnection is detected."

  • When we set 0, the build will been cancel in case the agent was disconnect after ~2 mins.
  • But when we set this variable without 0 (example: 10), the build will run forever if the agent was disconnect with server.
    The build faced this problem because "setLastDisconnectDate" method was not never called.
    Please review this. Thank you so much!
  • solved #4
  • replies 10
  • views 80
  • stars 0
robinshen ADMIN ·

Which QB version are you using?

thang.dv2 ·

The latest 8 QB version.

robinshen ADMIN ·
thang.dv2 ·

Thank you so much for your quickly support.

thang.dv2 ·
Disconnect Tolerance.PNG

Dear Mr.@robinshen,
As I checked, I would like to share you some result and you can consider some problem:

  • 1: Set Disconnect Tolerance variable > 2:15 (~140s), If this value < 140s, It seem not working.
  • 2: Time to wait for re-connect = Disconnect Tolerance value. But only pass (waiting successfully) during step had not done yet.
  • 3: If agent never re-connect, build will finish and be fail after waiting (Disconnect Tolerance + 2:20 (~140s)).
  • 4: Only apply for Master step, not for any child step. Implement this variable for any child step is not working, alway got fail when reconnect or after ~140s.

And we are being faced a big problem, build will be failed with error log Unable to find job on node when job at agent has been done during master-agent disconnected and reconnect after.
I think we should improve this case, if job has been done at agent, it should wait for in period defined at Disconnect Tolerance value before finished job.
Thank you so much!

robinshen ADMIN ·

The disconnect tolerance does not guarantee build can tolerate all network disconnections as that will be very difficult. Instead it solves the most common issue: when a step runs for a long time, and if during that wait time, network disconnects and reconnect later, QB should ignore network heartbeat failures and should not mark the step as failed.

If network is disconnected while child step communicates with parent step (this will happen when step is finishing, child step is starting, etc), the build will still fail.

thang.dv2 ·

Excuse me, We'd like to you share why it fails after ~140s even it is set on child step?

robinshen ADMIN ·

Do you mean if this setting is specified as 140s on a child step, the child step will fail after 140s even if network is not disconnected?

thang.dv2 ·

Sorry, I mean if Disconnect Tolerance is setting any value excluding 0 for child step, the child step will fail after 140s even if network is re-connected.

robinshen ADMIN ·

I tried below and it works:

  1. Create a test configuration with two steps. Master step runs on server, and a child step simply running sleep command for 300 seconds on an agent.
  2. Define network disconnect tolerance as 120s on the child step.
  3. Now run the build, plug out cable of agent when build runs for 60s
  4. Wait for another 60s, and plugin cable in.
  5. Build still succeeds when child steps reaches 300s

Can you please try this set up at your side to see if it works?