TestBot News

Wed May 4 09:11:21 CDT 2022

On Thu, 28 Apr 2022, Zhiyi Zhang wrote:
[...]
> Could you make them go faster? Maybe balancing the load a bit or 
> adding more hardware?

The issue is that WineTest takes time and new jobs have to wait for 
running tasks to complete to get their turn. But then they have priority 
over WineTest.

I collected some data about the WineTest tasks (see attached 
spreadsheet) and they take between 25 minutes on Windows and 35 minutes 
on Linux. The main issue here is VMs that have many test configurations 
which must therefore be run sequentially. The three VMs with the longest 
chains are:

  Time  Configs VM
  6.7 h    11   debian11
  6.7 h    16   w1064
  6.8 h    15   w10pro64

What this means is that no amount of rebalancing can get the tests to 
run in less than about 7 hours.

And here are the results at the VM host level:

Time    Configs Host
  7.2 h    12   vm1
  1.4 h     3   vm2
  7.7 h    18   vm3
 12.1 h    25   vm4

The issue is vm2 is too slow and old to run most VMs nowadays. So moving 
some test configurations from vm4 to vm1 or vm3 will push those to 9 / 
10 hours. So I'll restart the process of getting new hardware to replace 
vm2.

The other options:
* Fix the tests that get stuck: they waste 2 minutes each.
  But it looks like there's only two of those left, conhost.exe:tty and 
  wscript.exe:run, so there's not much to gain.

* Speed up the slow tests, potentially by using multi-threading. What 
  sucks is we have no way of tracking which tests are slow, which test 
  configurations are slow, etc. It would be nice to have something like 
  the patterns page but for runtime (and also for the tests output 
  size).

* Getting hardware with faster single thread performance: over 90% of 
  the tests are single-threaded. vm2 is meant to be the first step 
  towards this.

* Splitting the VMs with many test configurations so the test load can 
  be spread across multiple hosts. That is, instead of having a single 
  VM with 15 test configurations that must run sequentially like 
  w10pro64, have two VMs with 7 and 8 configurations each that can run 
  in parallel. But that makes an extra VM to manage and requires having 
  hosts to spread them to :-(

* Load balancing could help, assuming the TestBot is smart enough.

  That is, if it starts by running the debiant and w7u tasks on vm4, 
  then by the time the other hosts are idle all that's left to run is 
  w10pro64's 15 test configurations that must be run sequentially 
  anyway. So the scheduler must give priority to the VMs with the 
  highest count of pending tasks.

  Load balancing could help reduce the latency by ensuring the builds 
  are done earlier. Here's a worst case scenario right now:
    t=0  vm2 starts a WineTest job
    t=1  Developper submits a job. First comes the build step
    t=25 vm2 completes the WineTest job
    t=25 vm1, vm3 and vm4 each start a new WineTest job
    t=26 vm2 completes the developer's build task
    t=50 vm1, vm3 and vm4 complete their WineTest task
    t=51 vm1, vm3 and vm4 starts the developer's Windows tasks
  Having multiple build VMs would make it more likely that the blocking 
  build step is completed before any other WineTest task.
  This is also why it's good that vm2 is not too busy.

* Reducing the number of test configurations :-(

-- 
Francois Gouget <fgouget at codeweavers.com>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: WineTest.xls
Type: application/vnd.ms-excel
Size: 20992 bytes
Desc: WineTest.xls
URL: <http://www.winehq.org/pipermail/wine-devel/attachments/20220504/9f2c311e/attachment-0001.xls>