Your browser was unable to load all of the resources. They may have been blocked by your firewall, proxy or browser configuration.
Press Ctrl+F5 or Ctrl+Shift+R to have your browser try again.

trying to throttle concurrent builds #4342

drdt ·

We have a build farm of forty Windows machines, allowing us to run forty builds simultaneously. However, these machines all share a common network space and at times we have found it necessary to limit the number of builds for performance reasons. We would like newly requested builds to sit in the queue and not be assigned a build host until the number of builds drops below a threshold.

We also realized such a solution could be used to stop incoming builds while letting existing builds complete, by setting the concurrent build limit to zero.. any requests made by users would then queue up until we opened the gate again.

I thought I came up with a solution, which worked on my test server, but the same code locked up my production server.

  • replies 8
  • views 903
  • stars 0
drdt ·

I have two parts to my solution. The first is a script variable, 'countRunningBuilds', which I built from code I found elsewhere on this forum. It iterates through all of the builds (only going back a week for performance reasons) and counts those in a RUNNING state:

${groovy:
// calculate time stamp a week ago.
Calendar timeStamp = Calendar.getInstance();
timeStamp.setTime( new Date() );
timeStamp.add( Calendar.DATE, -7 );

// count running builds newer than time stamp.
def runningBuilds = 0;
def build = configuration.getLatestBuild( RUNNING );
while (build != null) {
logger.debug(
"Found build {}, {}, {}", build.getId(), build.getVersion(), build.getStatus() );
if (build.getBeginDate() && (build.getBeginDate() < timeStamp.getTime())) break;
if (build.isRunning()) runningBuilds++;
build = build.getPrevious();
}
return runningBuilds;
}

drdt ·

The second is a simple pre-build script:

def loopSeconds = 15; // time to wait between retries
while (vars.getValue( "countRunningBuilds" ) >= vars.getValue( "maxRunningBuilds" )) sleep( loopSeconds * 1000 );

maxRunningBuilds is a variable that is changeable by the admin to set the throttle.

drdt ·

In my test environment, I was able to use this method to throttle and queue a simple 'sleep 1 minute' job. As each build finished, the next one waiting request would start. All of the waiting builds would show up as "CHECKING_BUILD_CONDITION".

I did see one strange behavior, which is that the finished builds in my queue would still show up as "RUNNING" in the queue after they were done. Perhaps I should have paid more attention to this.

In my live environment, once a build entered the waiting state, all future builds queued up behind it, but no new builds started running until ALL running builds finished. In other words, it would start maxRunningBuilds jobs, and wait for them all to finish, and then start another set of maxRunningBuilds jobs. And nothing I could do to delete or cancel the queued jobs.

Can I get some guidance as to why this is not working as I expect, and what I can do better?

Thanks.

robinshen ADMIN ·

Firstable, please use Thread.sleep instead of sleep in any groovy script as sleep is not interruptible in groovy, hence the build request can not be cancelled.

Secondly, this approach will result in deadlock. When executing pre-build script, the configuration lock will be held, and it is waiting for other builds to finish, while other builds need to get the configuration lock in order to finish. This may not happen when there are few builds in the configuration.

To limit build concurrency, the safe way is to use resource. If you want to reduce build concurrency, just remove some machines from build farm or change node properties of some of these machines to make them no longer matching desired resource.

And you can change node properties via groovy script running on server:

groovy:

for (eachNode: grid.allNodes) {
  if (eachNode.isBuildAgent()) {
     def attributes = eachNode.getUserAttributes();
     attributes["some attribute"] = "some value";
     eachNode.setUserAttributes(attributes, true);
  }
}

And you can then control build concurrency by controlling max number of nodes with particular attribute with groovy script

drdt ·

Thanks, Robin, for the great insight. I will think about these limitations and try to find a way to proceed.

We actually tried the method, initially, of disabling (un-authorizing) machines to limit concurrency. However, that does not work in our environment because sometimes machines are automatically removed from the build farm after a build finishes for cleanup purposes. Every few hours, an automated script detects machines in this state and recovers them. As a result, in times of heavy load, about half of the enabled machines may be in this state, requiring me to manually enable more. But I can't just double the number of enabled machines, because over time all of them will come back, and then the next morning we will be overloaded.

robinshen ADMIN ·

grid.allNodes will only include active agents, and inside of them you may control number of applicable build agents via changing node attributes. This way no matter how many build agents go offline or online, the max number of concurrent builds are determined. Of course, this script need to run periodically (every 5 min for instance)

drdt ·

Or at the start of each build..

robinshen ADMIN ·

Yes, that will also be a good time.