We experience problems with certain builds terminating with Error testing job quite regularly.
One thing they have in common is that these are long running tests (several hours).
For some reason, the connection is lost between agent and server after about 3-6 hours of build running.
However, very often, when I check the agent some time after Error testing job occurs - the agent is actually running, has no errors in the console and is displayed in the list of Acitve nodes in Grid. So the connection seems to be automatically re-established later.
Agent log typically contains:
ERROR com.pmease.quickbuild.Quickbuild - Error connecting server.
com.pmease.quickbuild.RemotingException: 500: java.net.SocketException: Connection reset
at com.caucho.hessian.client.HessianURLConnection.sendRequest(HessianURLConnection.java:165)
at com.caucho.hessian.client.HessianProxy.sendRequest(HessianProxy.java:300)
at com.caucho.hessian.client.HessianProxy.invoke(HessianProxy.java:171)
Questions:
-
is there any way to narrow down the root cause of the problem?
-
is there a setting allowing to increase the time out, before the server or the agent decides that connection is lost? and/or - a setting to increase the number of re-connection attempts?
Thank you