Your browser was unable to load all of the resources. They may have been blocked by your firewall, proxy or browser configuration.
Press Ctrl+F5 or Ctrl+Shift+R to have your browser try again.

Long Time Updating Agent State #4166

jmuirhead ·

Environment --
QB Version: 8.0.5
Total Agents: ~465
Builds per Day: 25k to 35k

We're seeing our Quickbuild server spend a large percentage of scheduling-time updating the state of build agents with respect to builds in the queue. A server log sample below. One specific piece of code seems to be taking an unusually long time to complete. In the DefaultBuildEngine, the for loop that iterates over all agents, build requests, and steps: for (GridNode node: allNodesAndUserAgents) { .... We're seeing this loop take 20+ seconds to complete.

Other than the obvious large build volume we do, is there some underlying reason why this loop would be taking so long to complete? We're seeing queue times of up to a minute -- from the time the request hits the queue, until build execution actually begins.

jvm 1 | 2019-10-08 09:51:16,019 TRACE - Nodes and resources are prepared...
jvm 1 | 2019-10-08 09:51:16,069 TRACE - Round of request processing finished (candicate checking time: 15, preference checking time: 0, cloud checking time: 0)
jvm 1 | 2019-10-08 09:51:16,070 TRACE - Starting new round of request processing...
jvm 1 | 2019-10-08 09:51:34,760 TRACE - Nodes and resources are prepared...
jvm 1 | 2019-10-08 09:51:34,810 TRACE - Round of request processing finished (candicate checking time: 10, preference checking time: 0, cloud checking time: 0)
jvm 1 | 2019-10-08 09:51:34,810 TRACE - Starting new round of request processing...
jvm 1 | 2019-10-08 09:51:53,264 TRACE - Nodes and resources are prepared...
jvm 1 | 2019-10-08 09:51:53,334 TRACE - Round of request processing finished (candicate checking time: 26, preference checking time: 0, cloud checking time: 0)
jvm 1 | 2019-10-08 09:51:53,334 TRACE - Starting new round of request processing...
jvm 1 | 2019-10-08 09:52:11,703 TRACE - Nodes and resources are prepared...
jvm 1 | 2019-10-08 09:52:11,757 TRACE - Round of request processing finished (candicate checking time: 15, preference checking time: 0, cloud checking time: 0)
jvm 1 | 2019-10-08 09:52:11,757 TRACE - Starting new round of request processing...

  • replies 3
  • views 430
  • stars 0
robinshen ADMIN ·

Please avoid using groovy script in step node selection logic and configuration node assignment logic (node assignment logic is defined in advanced setting of a configuration), as groovy script can be slow for large number of steps and nodes

jmuirhead ·

Thanks Robin. Further investigation shows that the main slowdown is in obtaining the running steps for an agent. At any given point during peak (normal business) hours we will have 1k to 1.5k builds running or queued. Iterating over all the opened build requests to obtain running steps for an agent takes ~80 - 90 milliseconds. Multiply this by ~450 agents in the grid and we our ~40 second delay.

We'll check for groovy code in step and configuration node assignment. Thanks again.

robinshen ADMIN ·

getRunningSteps() might be slow as it needs to lock the steps for a while to get the copy. If possible, avoid calling this in your node selection setting.