Is there some way in groovy scripting to retrieve the same data from the build grid that you get by viewing the Queue tab?
I'm trying to debug an issue with our build system that is frequently causing build agent deadlocks and zombie build jobs. The zombies are running builds in quickbuild that you can see in the configuration build history. If browse to the agent that the master node is running on, it says there are NO running steps on that node. This zombie build has catastrophic consequences down the line because it has a post-build and pre-build script that mutex locks that agent. Any other resource requesting that agent gets deadlocked with the zombie.
I first want to find a way to detect this error. The easiest way to do this manually is to do the following:
Find all builds in queue triggered by a specific user that fall under a specific build tree (which is massive) and have been running longer than expected. From these builds check the node they run on, and find the mutex file described above. The mutex file will have all buildIDs inside that are waiting on that node. The oldest (lowest numbered one, which is always the topmost in the file) is the stuck/zombie build. You can go to that build ID in quickbuild and see it stuck on the master step, but you can never find that build in the build queue, and you can't see that build ID in the node's "running steps" tab.
How can I get the queue entries in script so I can start working on a work-around for this problem? (automating the manual steps above)
Do you know of any thing we might be doing wrong on the grid that might result in these kinds of zombies? Our mutexing system hasn't changed in months, perhaps years, but sadly right now it has become heavily problematic, hanging nodes roughly 10 to 15 times a day.
Help or suggestions would be highly appreciated.
Tom Z.