We check to see if the agent is active:
groovy:
import com.pmease.quickbuild.QuickbuildException;
def node = params.get("currentNodeAddress");
def trimmedNode = node.substring(0, node.indexOf('.'));
def running = false
def time = System.currentTimeMillis()
logger.info "Time: " + time;
while(System.currentTimeMillis() < time+300000) {
if (grid.getNode(node + ":8811") != null || grid.getNode(trimmedNode + ":8811") != null) {
logger.info "Node is running..."
running = true;
sleep(10*1000)
break;
}
}
if (!running) {
logger.info "Node is not running..."
throw new QuickbuildException("The agent could not be found in the Active Nodes list after 5 minutes.")
}
Then the next step uses the Node Selection to find it on node with specified script evaluating to true:
groovy:logger.info(node.getAddress());
logger.info(node.getHostName());
logger.info(params.get('currentNodeAddress'));
def testNode = params.get("currentNodeAddress");
def trimmedTestNode = testNode.substring(0, testNode.indexOf('.'));
logger.info(trimmedTestNode);
return (node.getHostName() == trimmedTestNode || node.getHostName() == testNode);
We see where sometimes the node that was found in the first step does not know up in the list for the Node Selection and then fails because it can't find a node that matches. It seems to become more stable when I increase the sleep time after it finds it in the first step, but I feel like there should not have to be a sleep at all.
In this case where we are seeing it fail there might be around 14 to 20 new agents that are being started up at the same time. Is it just not able to refresh the node selection in time? Is there a way to force that?