[Bug 48040] New: Allow running more than one VM per host

Sun Nov 3 14:42:43 CST 2019

https://bugs.winehq.org/show_bug.cgi?id=48040

            Bug ID: 48040
           Summary: Allow running more than one VM per host
           Product: Wine-Testbot
           Version: unspecified
          Hardware: x86
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: unknown
          Assignee: wine-bugs at winehq.org
          Reporter: fgouget at codeweavers.com
      Distribution: ---

The TestBot VM hosts have 4 to 8 cores. Wine's tests are essentially single
threaded, typically remain far from maxing out even a single core, and don't
have that much I/O requirements except for the msi tests. But a number of tests
are timing sensitive (audio tests) in that a delay of a fraction of a second
can make them fail.

It was found a long time ago that running two or more WineTest instances in
separate VMs concurrently would cause extra test failures. The reason for these
failures was either timeouts in the msi tests (so slow I/O) or timing related
in the audio tests. But then there are even more random test failures at the
time, making assessment tricky.

So while the TestBot can run an arbitrary number of concurrent VMs per host,
its current configuration limits it to just one VM at a time.

There are a number of evolutions that make this situation less and less
tenable:
* We have more and more Windows configurations to test, whether that's because
of new Windows releases, or new configurations such as dual-screen, locales,
etc.
* Tests on Wine involve longer rebuilds that just building the Windows test
executables and would benefit greatly from more cores.
* Future hosts are more likely to get 8, 12 or 16 core CPUs (+hyperthreading)
with SSDs.

With the current limit scaling up means adding more underutilized VM hosts. So
this limit should be reevaluated and was way to lift it found if there are
still issues.

* Find a way to reliably assess whether one configuration provides worse
results than another despite the possible presence of random failures.

* At the time qcow2 disk I/O seemed to have a global lock issue which may have
been responsible for some of the poor I/O performance and scheduling delays.
  -> Check whether that's still the case and if there are workarounds.

* There are two I/O models: native and threaded.
  -> Check if one configuration is better than the other with regards to
scheduling issues and interference across VMs.

* Some gamers report that vcpu pinning can reduce latency variations. Also
tweaking the vcpu topology is said to help sometimes.
  -> This sounds like something that would be beneficial for our audio tests so
investigate it.
     Should the pinning be done statically or set by the TestBot before
starting up the VM based on the set of already running VMs. In the case of a
static allocation, how should the exclusion patterns be communicated to the
TestBot?
  https://mathiashueber.com/cpu-pinning-on-amd-ryzen/

https://www.reddit.com/r/VFIO/comments/7zcn5g/kvm_windows_10_guest_cpu_pinning_recommended/

-- 
Do not reply to this email, post in Bugzilla using the
above URL to reply.
You are receiving this mail because:
You are watching all bug changes.