Your browser was unable to load all of the resources. They may have been blocked by your firewall, proxy or browser configuration.
Press Ctrl+F5 or Ctrl+Shift+R to have your browser try again.

Steps executed even if previous one is failed #3996

kururin ·

Hello

Quickbuild 8.0.20

If you look at the screenshot below, you'll see steps being executed whereas a previous step is failed:
qb8_bug.png

The "revert all" step is set to "Always execute" but the next step ("unshelve files") is set to "If all previous sibling steps are successful".
It always happens in this configuration. Could the problem come from the error returned by the "sync code" step?

11:00:32,740 ERROR - Step 'master>== Targets>== PC Final>== Perforce>Sync code (do not unshelve)' is failed: Failed to run command: p4 -p p4-tgt-codeedge-mtr:1806 -u silex-buildfarm -P ****** -c qb-WIN-7UARQU9O190-412 sync //qb-WIN-7UARQU9O190-412/...@3228179
Command return code: 1
Command error output: unlink: D:\PRESUBMIT\main\source\server\script\matchmaking_rules.py: The system cannot find the file specified.
Fatal client error; disconnecting!
chmod: D:\PRESUBMIT\main\source\server\script\matchmaking_rules.py: The system cannot find the file specified.

Thank you
Mathieu

  • replies 10
  • views 287
  • stars 0
kururin ·
robinshen ADMIN ·

This looks odd. Do you have any customer script running to change the step status unintentionally?

kururin ·

No we don't have script modifying the status.
This problem never happens in other configurations, that's why I think it comes from our "Sync" step...
We are still investigating this perforce error that came out a few days after upgrading to QB8.

robinshen ADMIN ·

How about running this command from console directly to see how it works?

kururin ·

The step fails very rarely, but each time it does it returns this strange error and this weird QB behavior.

robinshen ADMIN ·

There might exist some race conditions. Do you have two or more parallel builds or steps writing/cleaning files under the Perforce workspace while the sync is going on?

rpallares ·

Hi,

I suppose we have the same issue on QuickBuild 8.0.20, but I'll probably add new information on what happened.
Our issue is a whole sequential step executed in parallel with others is executed twice.

Steps to reproduce:

  1. Setup a parallel composite step executed on one machine (not the server to be closer to our real case)
  2. Add several sequential composite steps into
  3. Each sequential reserve a new machine using a unique resource
  4. Do several things on each machine, each step must have a pre and post execute actions that log "Hey, I will execute this step once", "Hey I have executed this step"
  5. Execute this process several times
  6. Observe that sometimes (1/100), a sequetial step is executed twice on the same machine.
    QuickBuild don't add a new column, but if we look at the logs of all steps in this column, all steps are executed twice with ~20ms offset (clearly identified by the pre and post execute steps scripts).

As second symptom, we see like described in this issue (that's why I made the connection) the same issue:

  1. A step fail (due to the concurrent double execution)
  2. The following step with execute condition defined as "If all previous sibling steps are successful" is executed then cancelled immediatly

We suppose that these two symptoms have the same cause, the whole column (or sequential composite setp) is executed twice concurrently that produce our build failures.
As the issue have a low reproducibilty, it will quite hard to reproduce it at your side.
But may be an automated process that will automatically parse the logs and detect the double execution will catch the error quickly.

Can you confirm the issue please ?
Is this issue is known ?
How can we workaround that ?
I suppose that a kind of mutex is missing when resevering several machines by resource into a parallel step.

Do not hesitate to contact me by email if you need more information to reproduce the issue.

Thanks

robinshen ADMIN ·

Thanks for the info. Will set up test env to check this issue...

robinshen ADMIN ·

Having tested this all day generating tens of thousands of builds and no one is failing. I am sending my database backup to you, can you please check if I am missing something?

kururin ·

Just a few more details:

When we say that the steps are duplicated, the duplication is not visible on the build view of QB. You can just see it by opening the log. Here is an example of step log we have:

02:28:30,758 INFO - Checking step execute condition...
02:28:30,758 INFO - Step execute condition satisfied, executing...
02:28:30,880 INFO - Executing pre-execute action...
02:28:30,880 INFO - Running step...
02:28:59,939 INFO - Executing post-execute action...
02:28:27,776 INFO - Checking step execute condition...
02:28:27,776 INFO - Step execute condition satisfied, executing...
02:28:27,898 INFO - Executing pre-execute action...
02:28:27,898 INFO - Running step...
02:28:44,490 INFO - Executing post-execute action...

The step is executed twice but as soon as one of this launch is finished, the next step is triggered without waiting for the end of the second launch of the current step.
That explains why sometimes we have 2 steps failed in a row.

We're still trying to find what these builds have in common that could lead to this kind of problem.
It happens on different configurations, not just one and I'm sure we missed some of them because the builds can finish green even if this problem happens.
The config in the backup you sent is correct even if ours are more complexe (with triggers by example). But because we have no idea what can cause the problem, it's difficult to say what the configuration must look like to reproduce the issue...