Your browser was unable to load all of the resources. They may have been blocked by your firewall, proxy or browser configuration.
Press Ctrl+F5 or Ctrl+Shift+R to have your browser try again.

QuickBuild server freezes monthly #4370

JShelton ·

QuickBuild (10.0.30) server seems to be freezing up randomly approximately once per month (at least once, occasionally twice) and has been consistently since earlier this year. This only affects one of our QB server instances as we have several - including one which was initially set up from the same configurations and database as the freezing server. The error from the web browser when the "freeze" occurs:

Proxy Error

The proxy server received an invalid response from an upstream server.
The proxy server could not handle the request GET /.

Reason: Error reading from remote server

When the issue is being experienced and the above error is shown, QB server is unresponsive to script commands (restart or stop) and a "kill -9' command must be used on the running java process. After this point, the still-running wrapper script restarts QB on its own.

I've collected a number of logs over the past few months after this occurs to try and determine what is going on. It appears common to see many log messages for each build agent being removed "Active build agent 'agent hostname:port' timed out, removing..." followed by "Job entry no longer exists at task node 'agent hostname:port', will cancel running job..." (always a different build agent) This is always at a different time (it has occurred in the middle of the day once), and these messages are followed by repeating log entries about thread interruption (where it repeats the same message for each build agent that was connected):

jvm 1    | 2021-09-07 06:34:05,033 WARN  com.mchange.v2.resourcepool.BasicResourcePool@22715a21 -- an attempt to checkout a resource was interrupted, and the pool is still live: some other thread must have either interrupted the Thread attempting checkout!
jvm 1    |  java.lang.InterruptedException
jvm 1    |  	at java.lang.Object.wait(Native Method)
jvm 1    |  	at com.mchange.v2.resourcepool.BasicResourcePool.awaitAvailable(BasicResourcePool.java:1414)
jvm 1    |  	at com.mchange.v2.resourcepool.BasicResourcePool.prelimCheckoutResource(BasicResourcePool.java:606)
jvm 1    |  	at com.mchange.v2.resourcepool.BasicResourcePool.checkoutResource(BasicResourcePool.java:526)
jvm 1    |  	at com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool.checkoutAndMarkConnectionInUse(C3P0PooledConnectionPool.java:755)
jvm 1    |  	at com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool.checkoutPooledConnection(C3P0PooledConnectionPool.java:682)
jvm 1    |  	at com.mchange.v2.c3p0.impl.AbstractPoolBackedDataSource.getConnection(AbstractPoolBackedDataSource.java:140)
jvm 1    |  	at org.hibernate.c3p0.internal.C3P0ConnectionProvider.getConnection(C3P0ConnectionProvider.java:90)
jvm 1    |  	at org.hibernate.internal.AbstractSessionImpl$NonContextualJdbcConnectionAccess.obtainConnection(AbstractSessionImpl.java:380)
jvm 1    |  	at org.hibernate.engine.jdbc.internal.LogicalConnectionImpl.obtainConnection(LogicalConnectionImpl.java:228)
jvm 1    |  	at org.hibernate.engine.jdbc.internal.LogicalConnectionImpl.getConnection(LogicalConnectionImpl.java:171)
jvm 1    |  	at org.hibernate.engine.transaction.internal.jdbc.JdbcTransaction.doBegin(JdbcTransaction.java:67)
jvm 1    |  	at org.hibernate.engine.transaction.spi.AbstractTransactionImpl.begin(AbstractTransactionImpl.java:162)
jvm 1    |  	at org.hibernate.internal.SessionImpl.beginTransaction(SessionImpl.java:1471)
jvm 1    |  	at com.pmease.quickbuild.entitymanager.impl.DefaultMeasurementDataManager.save(DefaultMeasurementDataManager.java:94)
jvm 1    |  	at com.pmease.quickbuild.plugin.measurement.core.reporter.MeasurementServerReporter.save(MeasurementServerReporter.java:105)
jvm 1    |  	at com.pmease.quickbuild.plugin.measurement.core.reporter.MeasurementServerReporter.access$100(MeasurementServerReporter.java:34)
jvm 1    |  	at com.pmease.quickbuild.plugin.measurement.core.reporter.MeasurementServerReporter$NodeMetricsSender.execute(MeasurementServerReporter.java:93)
jvm 1    |  	at com.pmease.quickbuild.grid.NodeJobExecuteJob.execute(NodeJobExecuteJob.java:25)
jvm 1    |  	at com.pmease.quickbuild.grid.GridJob.run(GridJob.java:131)
jvm 1    |  	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
jvm 1    |  	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
jvm 1    |  	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
jvm 1    |  	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
jvm 1    |  	at java.lang.Thread.run(Thread.java:748)
jvm 1    | 2021-09-07 06:34:05,034 WARN  SQL Error: 0, SQLState: null
jvm 1    | 2021-09-07 06:34:05,034 ERROR  An SQLException was provoked by the following failure: java.lang.InterruptedException
jvm 1    | 2021-09-07 06:34:06,218 WARN  Job entry no longer exists at task node 'agent hostname:port', will cancel running job...
jvm 1    | 2021-09-07 06:34:06,218 WARN  com.mchange.v2.resourcepool.BasicResourcePool@22715a21 -- an attempt to checkout a resource was interrupted, and the pool is still live: some other thread must have either interrupted the Thread attempting checkout!

Server: CentOS 7
Build agents: Both Linux and Windows
Database: MySQL (database is on another server)

This is a strange issue. Again, we have a secondary running server set up in the exact same as this one, with a majority identical configurations, and it has never experienced this issue. Any ideas on how we could narrow down the issue to be able to fix it?

  • replies 4
  • views 1954
  • stars 0
robinshen ADMIN ·

Are you using Oracle JDK or OpenJDK?

JShelton ·

Server is running on OpenJDK 8.

If it helps, we're using these additional startup args: -XX:+UseG1GC -XX:+UseStringDeduplication -XX:+IgnoreUnrecognizedVMOptions

robinshen ADMIN ·

Are you able to test on Oracle HotSpot JDK for sometime to see if the problem still exists? We have several customers experiencing memory issues using OpenJDK, and the problem can be solved by switching to Oracle HotSpot.

JShelton ·

We might be able to test using AdoptOpenJDK since there is a definitive download option there for HotSpot JVM. I see that OpenJDK does seem to support HotSpot (https://openjdk.java.net/groups/hotspot/) but I can't find a way to definitively state that the OpenJDK version we've downloaded from CentOS yum repo actually does have HotSpot included.

Oracle JDK won't be an option due to Oracle's licensing restrictions on their SE.