Your browser was unable to load all of the resources. They may have been blocked by your firewall, proxy or browser configuration.
Press Ctrl+F5 or Ctrl+Shift+R to have your browser try again.

QB slows down, then loses gui while consuming 1700% of cpu #4002

waldemar ·

We seem to be having problems with QB getting into a state where it keep crunching numbers using 1700% of cpu on a 24 cpu machine, then losing gui (nginx reporting bad gateway).

We have 40 agents, some 3000 configurations of which about 500 are active but at the time it happens the use is very light, maybe 1 or 2 builds running. It appears that there is some cumulative effect taking place. It looks as though it is running out of memory and does not garbage collect. The decline comes quickly, once slowness is noticed, it takes about 15min to lose gui after several day of normal operations.
We have:
wrapper.java.initmemory=2048
wrapper.java.maxmemory=12216

There is 24G memory on the machine, we could allocate more to QB but we're wondering if it would only delay the problem without fixing.
We run qb 8.0.11

  • replies 3
  • views 691
  • stars 0
robinshen ADMIN ·

Memory is enough and should not be the problem. Can you please take a memory dump while noticing it is getting slow?
/path/to/jdk/bin/jmap -dump:format=b,file=<file_name>

Then please upload the dump to DropBox or some other file storage so that I can download to check the issue.

uartie ·

We often observe similar behavior (i.e. bad gateway while using nginx) when QB is either running a DB backup or when we delete an old configuration from the web UI. We are running with 512m init (xms) and 4096m max (xmx) on 4/8 cpu core/thread and 16G physical memory. We also have 1.5T disk storage @ ~64% usage (mostly QB data). We also run QB inside an openjdk:8-jdk Docker container. After a while (10-20 min), the web UI becomes responsive again and CPU usage drops back to normal. We have ~2100 configurations but observe behavior even if none are active. We have not been able to identify the root-cause.

BTW, we are still using QB version 7.0.31 (we plan to upgrade soon).

waldemar ·

In our case, the problem was traced to a very poor performance (bugs) in openjdk 1.8 under stress. We've seen substantial memory leaks and catastrophic file descriptor leaks. The remedy was to switch to Oracle jdk 1.8 which instantly restored the system to stability. No memory leaks, no fd leaks. Yes, the system became slower under severe stress but was not crashing, and gui was no longer being lost. Many thanks to Robin for helping debug this and for identifying the source of trouble.