Your browser was unable to load all of the resources. They may have been blocked by your firewall, proxy or browser configuration.
Press Ctrl+F5 or Ctrl+Shift+R to have your browser try again.

An improvement relate to Memory Management (JVM) in parrallel steps. #4569

ngocanhnu ·

Recently we have some error build when using parrallel steps.
Here is the log that I found on server:

Error notifying task node of job finishing (job class: $job_class, job id: com.pmease.quickbuild.resource.ResourceAllocateJob, task node: $qbagent_name:$qbagent_port)
 com.pmease.quickbuild.RemotingException: Java heap space

We use the above config in our agent:

# Initial Java Heap Size (in MB)

# Maximum Java Heap Size (in MB)

Because of our privacy policy, I cannot share the actual content of step. But I think you can reproduce the error like above.

  1. Create a java program to simulate java.lang.OutOfMemoryError in Java
import java.util.Vector;

public class MemoryEater
          public static void main(String[] args)
                                Vector v = new Vector();
                                long endTime = System.currentTimeMillis() + 10000;
                                    while (System.currentTimeMillis() <endTime)
                                                              byte b[] = new byte[14856];
                                                                          Runtime rt = Runtime.getRuntime();
                                                                                System.out.println( "free memory: " + rt.freeMemory() );
  1. run script to make 200 java program like in step 1
for i in $(seq 1 $last);
        cp MemoryEater$
        sed -i "s/MemoryEater/MemoryEater$i/g" MemoryEater$
        javac MemoryEater$
  1. Run the 200 parrallel steps with 200 file abve with command:
java -Xmx2048m MemoryEater${params.get("NUMBER")}

The config in repeat parameters:

Parameter Name : NUMBER
Parameter Values:
def List numbers=[]
for(int i in 1..200) numbers.add(i.toString())
return numbers

  1. Run the build with max i in different cases (200/145/140/100) and the result is different.
    The suspicious thing is when running in parallel with under 145 files the build can success and when increase the number build start failing.
    Can you try to figure out why the agent only can run this specific numbers?

We change the Parallel step to Sequential step, the issue is resolve but the time to run it is terrible slow compare with Parallel step.
I change the config to 12GB. But it is pass the build randomly so we donot think this is greate solution.

  • replies 5
  • views 230
  • stars 2
robinshen ADMIN ·

Are the parallel composition step and script step running on same node? Also please show me the full error reported relating to "com.pmease.quickbuild.RemotingException: Java heap space" (if the build log does not have the detail, please check console log of related agent).

ngocanhnu ·

Yes, the parallel composition step and script step running on the same node.
The log on build:

Step 'master>parallel_composition_steps' is failed.
java.lang.RuntimeException: Error executing step process job.
        at com.pmease.quickbuild.stepsupport.StepProcessTask.reduce(
        at com.pmease.quickbuild.stepsupport.StepProcessTask.reduce(
        at com.pmease.quickbuild.grid.GridTaskFuture.get(
        at com.pmease.quickbuild.grid.GridTaskFuture.get(
        at com.pmease.quickbuild.stepsupport.ParallelStep.triggerChildren(
        at com.pmease.quickbuild.stepsupport.Step.doExecute(
        at com.pmease.quickbuild.stepsupport.Step.execute(
        at com.pmease.quickbuild.stepsupport.StepExecutionJob.executeStepAwareJob(
        at com.pmease.quickbuild.stepsupport.StepAwareJob.executeBuildAwareJob(
        at com.pmease.quickbuild.BuildAwareJob.execute(
        at java.util.concurrent.Executors$
        at java.util.concurrent.ThreadPoolExecutor.runWorker(
        at java.util.concurrent.ThreadPoolExecutor$
    Caused by: java.lang.OutOfMemoryError: Java heap space

The log on agent:

ERROR com.pmease.quickbuild.grid.GridTaskFuture - Unable to find job (job class: com.pmease.quickbuild.resource.ResourceAllocateJob, job id: xxx, build id: N/A, job node: XXX)```
robinshen ADMIN ·

What is your agent OS and QB version?

ngocanhnu ·

QB version: 10.0.x
Agent OS: Ubuntu 18.04

robinshen ADMIN ·

The java program you created each runs OOM and parallel execution many of them will cause OS running short of memory, and it is not odd that QB server itself being impacted when requests memory from OS. And the error may occur randomly depending on when OS rejects memory request when the overall memory is low.