[PATCH] dnsapi: Add DnsGetCacheDataTable stub

Sat Aug 31 04:47:42 CDT 2019

On Fri, 30 Aug 2019, Rémi Bernon wrote:

> On 8/30/19 3:03 PM, Marvin wrote:
> > Hi,
> > 
> > While running your changed tests, I think I found new failures.
> > Being a bot and all I'm not very good at pattern recognition, so I might be
> > wrong, but could you please double-check?
> > 
> > Full results can be found at:
> > https://testbot.winehq.org/JobDetails.pl?Key=56052
> > 
> > Your paranoid android.
> > 
> > 
> > === build (build log) ===
> > 
> > Task errors:
> > BotError: The VM is not powered on
> > 
> 
> I did a successful run with the same patch here:
> https://testbot.winehq.org/JobDetails.pl?Key=56051

Yes, here's what happened:
* When it has nothing to do the TestBot picks some VMs that it starts up 
  in advance in the hope they will be needed by the next job.

* Because the build VM is used to provide the Windows binaries for 
  testing on Windows it's needed by almost every job. So its given a 
  high priority and ends up being prepared in advance and thus is 
  recorded by the TestBot as being in the idle state.

* But then there was a power outage so all the VMs got powered off.

* But the TestBot server is on a separate location and was not powered 
  off so it was not aware that the VMs got powered off. The thing is 
  these days the Engine never uses libvirt because these calls are 
  blocking which means if it tries to communicate with a dead VM host of 
  one where libvirt is hosed, these calls can block for a long time (up 
  to 10 minutes), which would block the Engine for all that time. 
  Instead it assumes the information it has in its database about the VM 
  is accurate and forks a process whenever it needs to perform an 
  operation on a VM, whether that's running a task, shutting it down or 
  reverting it.

* So it just scheduled the taks on the build VM as usual. But the 
  child process could not communicate with the VMs, checked its state 
  and complained that there was an error because "The VM is not 
  powered on".

What's wrong is that it marked the task as failed. A better recovery 
mechanism would have been to either mark the VM as "dirty" or "offline" 
and put the task back in the queued state so the TesBot tries running it 
again.

The risk is that if the reason why the VM is not usable is not caused by 
an external factor (such as here), the next round is likely to produce 
the same result, leading the TestBot to try to run the same highest 
priority task again and again on the one borked VM.

Finally the reason why you won't see that job as failed if you look a it 
now is because I restarted it. The user who submitted a job that failed 
due to a TestBot error gets a button to restart it. A user can only 
restart his own jobs and I'm not sure it that would have been possible 
in this case since the job came from a wine-devel email (but the 
administrator gets to restart anyone's jobs ;-).

Anyway I'll see about tweaking the task scripts to avoid this situation 
in the future.

-- 
Francois Gouget <fgouget at codeweavers.com>