Your browser was unable to load all of the resources. They may
have been blocked by your firewall, proxy or browser configuration.
Press Ctrl+F5 or Ctrl+Shift+R to have your browser try again.
I have started to notice that, when I restart my server, not all of my agents connect. I checked the logs, and they all have this error at the time I restarted the server, and then nothing until I trigger the restart:
jvm 1 | 2011-05-26 17:52:03,648 ERROR - Error connecting server.
jvm 1 | com.caucho.hessian.client.HessianConnectionException: 500: java.net.SocketException: Connection reset
jvm 1 | at com.caucho.hessian.client.HessianProxy.invoke(HessianProxy.java:195)
jvm 1 | at $Proxy14.connect(Unknown Source)
jvm 1 | at com.pmease.quickbuild.grid.AgentConnectivityTask.run(AgentConnectivityTask.java:55)
jvm 1 | at java.lang.Thread.run(Unknown Source)
jvm 1 | Caused by: java.net.SocketException: Connection reset
jvm 1 | at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
jvm 1 | at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
jvm 1 | at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
When I trigger the restart, I get this in the logs:
wrapper | Stopping the QuickBuild Build Agent service...
jvm 1 | 2011-05-26 18:27:12,634 INFO - Stopping QuickBuild agent...
jvm 1 | 2011-05-26 18:27:12,634 INFO - Disconnecting from server...
jvm 1 | 2011-05-26 18:27:12,666 INFO - Stopping plugin 'com.pmease.quickbuild.plugin.scm.subversion'...
jvm 1 | 2011-05-26 18:27:12,681 INFO - Stopping plugin 'com.pmease.quickbuild.plugin.report.pmd'...
.
.
.
jvm 1 | 2011-05-26 18:27:12,791 INFO - Stopping task scheduler...
wrapper | Waiting to stop...
wrapper | Waiting to stop...
wrapper | Waiting to stop...
.
.
.
wrapper | Shutdown failed: Timed out waiting for the JVM to terminate.
wrapper | Waiting to stop...
wrapper | JVM did not exit on request, terminated
wrapper | <-- Wrapper Stopped
wrapper | QuickBuild Build Agent stopped.
This looks like http://track.pmease.com/browse/QB-840, but that was fixed in version 3.1.23. I am running 3.1.42.
-
replies
6
-
views
3013
-
stars
0
-
I tested and this works fine at my side. It is normal that agent reports connection exception when server is restarted but it will recover automatically after server comes back. Have you wait sufficient time (normally within 1 minute) to see the recovery? Also does the agent works normally if you restarted it via QB grid page?
I waited over an hour, and it did not come back or give any additional log information. It looks like the agent is hanging after it loses server connection.
This only happens on about a third of my agents; the rest all came back within a couple of minutes. And it is not consistent which ones are affected. Perhaps the agent is being caught in the middle of a specific operation and can't recover?
Restarting an agent from the QB grid page works fine; however, the agent is not visible on the grid page if the issue occurs.
I think it would be helpful for QB server to remember what agents it had when it went down, that way we can see the problems more obviously.
We do have the plan to remember all connected agents for easier tracking. However this is scheduled to next major release. For agent not being able to be connected, is it still trying to reconnect periodically (this can be checked by examining the console log), or just silently wait there? Also can you please let me know on which platform the server and problem agent is running?
Agent appears to be sitting silently; there is no log activity after the initial failure to connect posted previously.
This is an agent on Windows 2003 or 2008 server, connecting to a server on Centos 5.
This set up should be fine. A probable reason for this might be that the server can not detect agent's ip address correctly: If agent ip address is not specified, QB server will try to get agent ip via REMOTE_ADDR field from http header, and this might not be correct in some special cases. For problematic agent, please make sure to:
1. add below property in "<agent install dir>/conf/node.properties":
ip=<ip paddress of agent>
2. make sure the server can connect to the ip address by testing below command from server:
telnet <ip address of agent> <port of agent>
I will try this and let you know what I find. This will have to be done on all of my agents as it is random which ones are affected.