testbot: Rework the job scheduler and add an 'off' VM state.

Francois Gouget fgouget at codeweavers.com
Thu Oct 3 05:23:54 CDT 2013


The old scheduler would limit the number of VMs in the running state to $MaxRunningVMs but would allow dirty base VMs to accumulate without bound. The new scheduler makes sure that no more than $MaxActiveVMs VMs are using memory, CPU or I/O resources, whether they are reverting, sleeping, running or dirty.
Furthermore the new scheduler will power off any dirty VM that it cannot revert right away to make sure it won't waste resources.
The run tasks also no longer power off the VMs when done. They now leave that task to the scheduler which will decide what to do with the VM.
The new scheduler can revert VMs while others are running. This case gets its own limit, $MaxRevertWhileRunningVMs, so the reverts do not disturb the tests.
To better exploit this it can look ahead and prepare the VMs needed for a job's next steps. And like the previous scheduler, when idle it will start a set of VMs for the next job, trying to pick the most likely to be needed: first the build VM, then base VMs, etc.
This can also gets its own limit, $MaxVMsWhenIdle, so developpers can set the TestBot to not use resources unless needed.
This can lead to having idle VMs that turn out to not be needed, for instance if the new job targets non base VMs. When that happens the scheduler can shut down some idle VMs to make room for more useful ones.
---

This patch preserves the old limits on the number of running VMs and how 
many VMs can be reverted simultaneously and when. However, after a 
couple of days during which we can confirm that everything still works 
fine, I think the WineTestBot could switch to the following (still 
pretty conservative) parameters:

$MaxRevertingVMs = 2;
$MaxRevertsWhileRunningVMs = 1;
$MaxActiveVMs = 4;
$MaxVMsWhenIdle = undef;

They should speed things up by removing (or at least opening up) the VM 
revert bottleneck.


 testbot/bin/Engine.pl             |  12 +-
 testbot/bin/WineRunTask.pl        |   2 -
 testbot/ddl/update26.sql          |   4 +
 testbot/lib/WineTestBot/Config.pm |  17 +--
 testbot/lib/WineTestBot/Jobs.pm   | 262 +++++++++++++++++++++++++++-----------
 testbot/lib/WineTestBot/VMs.pm    |  16 ++-
 6 files changed, 221 insertions(+), 92 deletions(-)
 create mode 100644 testbot/ddl/update26.sql

diff --git a/testbot/bin/Engine.pl b/testbot/bin/Engine.pl
index 03575f6..a12f263 100755
--- a/testbot/bin/Engine.pl
+++ b/testbot/bin/Engine.pl
@@ -80,14 +80,14 @@ It has to contend with three main scenarios:
   process matching a task's ChildPid will belong to another user so we don't
   mistake that case for the previous one.
 - A shutdown of the Engine and its tasks / VMs. In this case it's used to
-  kill the running tasks and requeue them, and/or shut down the VMs.
+  kill the running tasks and requeue them, and/or power off the VMs.
 
 In all cases we only trust that a VM status field is still valid if:
 - It is 'running' and used by a task that is still running.
 - It is 'reverting' or 'sleeping', is powered on and the revert process is
   still running.
 - It is 'idle' and powered on.
-In all other cases the VM will be powered off and marked as dirty.
+In all other cases the VM will be powered off and marked as such.
 
 =back
 =cut
@@ -196,16 +196,16 @@ sub Cleanup(;$$)
         LogMsg "$VMKey is being reverted\n";
         next;
       }
-      LogMsg "Marking $VMKey as dirty and shutting it down\n";
+      LogMsg "Powering off $VMKey\n";
       $VM->PowerOff();
     }
     else
     {
-      next if ($VM->Status eq "dirty");
-      LogMsg "Marking $VMKey as dirty\n";
+      next if ($VM->Status eq "off");
+      LogMsg "Marking $VMKey as off\n";
     }
 
-    $VM->Status('dirty');
+    $VM->Status('off');
     $VM->ChildPid(undef);
     $VM->Save();
   }
diff --git a/testbot/bin/WineRunTask.pl b/testbot/bin/WineRunTask.pl
index a06c8e6..3caa673 100755
--- a/testbot/bin/WineRunTask.pl
+++ b/testbot/bin/WineRunTask.pl
@@ -81,7 +81,6 @@ sub FatalError($$$$$)
   $Job->UpdateStatus();
 
   my $VM = $Task->VM;
-  $VM->PowerOff() if ($VM->Role ne "base");
   $VM->Status('dirty');
   $VM->Save();
 
@@ -369,7 +368,6 @@ $Task->Ended(time);
 $Task->Save();
 $Job->UpdateStatus();
 
-$VM->PowerOff() if ($VM->Role ne "base");
 $VM->Status('dirty');
 $VM->Save();
 
diff --git a/testbot/ddl/update26.sql b/testbot/ddl/update26.sql
new file mode 100644
index 0000000..2c459b7
--- /dev/null
+++ b/testbot/ddl/update26.sql
@@ -0,0 +1,4 @@
+USE winetestbot;
+
+ALTER TABLE VMs
+  MODIFY Status ENUM('dirty', 'reverting', 'sleeping', 'idle', 'running', 'off', 'offline', 'maintenance') NOT NULL;
\ No newline at end of file
diff --git a/testbot/lib/WineTestBot/Config.pm b/testbot/lib/WineTestBot/Config.pm
index 6de8b9f..501a530 100644
--- a/testbot/lib/WineTestBot/Config.pm
+++ b/testbot/lib/WineTestBot/Config.pm
@@ -26,9 +26,9 @@ WineTestBot::Config - Site-independent configuration settings
 
 use vars qw (@ISA @EXPORT @EXPORT_OK $UseSSL $LogDir $DataDir $BinDir
              $DbDataSource $DbUsername $DbPassword $MaxRevertingVMs
-             $MaxRunningVMs $MaxNonBasePoweredOnVms $SleepAfterRevert
-             $WaitForToolsInVM $AdminEMail $RobotEMail $WinePatchToOverride
-             $WinePatchCc $SuiteTimeout $SingleTimeout
+             $MaxRevertsWhileRunningVMs $MaxActiveVMs $MaxVMsWhenIdle
+             $SleepAfterRevert $WaitForToolsInVM $AdminEMail $RobotEMail
+             $WinePatchToOverride $WinePatchCc $SuiteTimeout $SingleTimeout
              $BuildTimeout $ReconfigTimeout $OverheadTimeout $TagPrefix
              $ProjectName $PatchesMailingList $LDAPServer
              $LDAPBindDN $LDAPSearchBase $LDAPSearchFilter
@@ -38,9 +38,9 @@ use vars qw (@ISA @EXPORT @EXPORT_OK $UseSSL $LogDir $DataDir $BinDir
 require Exporter;
 @ISA = qw(Exporter);
 @EXPORT = qw($UseSSL $LogDir $DataDir $BinDir
-             $MaxRevertingVMs $MaxRunningVMs $MaxNonBasePoweredOnVms
-             $SleepAfterRevert $WaitForToolsInVM $AdminEMail $RobotEMail
-             $WinePatchToOverride $WinePatchCc $SuiteTimeout
+             $MaxRevertingVMs $MaxRevertsWhileRunningVMs $MaxActiveVMs
+             $MaxVMsWhenIdle $SleepAfterRevert $WaitForToolsInVM $AdminEMail
+             $RobotEMail $WinePatchToOverride $WinePatchCc $SuiteTimeout
              $SingleTimeout $BuildTimeout $ReconfigTimeout $OverheadTimeout
              $TagPrefix $ProjectName $PatchesMailingList
              $LDAPServer $LDAPBindDN $LDAPSearchBase $LDAPSearchFilter
@@ -61,8 +61,9 @@ $DataDir = "$::RootDir/var";
 $BinDir = "$::RootDir/bin";
 
 $MaxRevertingVMs = 1;
-$MaxRunningVMs = 2;
-$MaxNonBasePoweredOnVms = 2;
+$MaxRevertsWhileRunningVMs = 0;
+$MaxActiveVMs = 2;
+$MaxVMsWhenIdle = undef;
 $SleepAfterRevert = 30;
 $WaitForToolsInVM = 30;
 
diff --git a/testbot/lib/WineTestBot/Jobs.pm b/testbot/lib/WineTestBot/Jobs.pm
index 86bb0c3..e9a5119 100644
--- a/testbot/lib/WineTestBot/Jobs.pm
+++ b/testbot/lib/WineTestBot/Jobs.pm
@@ -370,6 +370,13 @@ sub CompareTaskStatus
   return $b->Status cmp $a->Status || $a->No <=> $b->No;
 }
 
+sub min(@)
+{
+  my $m = shift @_;
+  map { $m = $_ if ($_ < $m) } (@_);
+  return $m;
+}
+
 =pod
 =over 12
 
@@ -396,7 +403,8 @@ VM Statuses.
 
 =item *
 
-The number of VMs running on the host must be kept under $MaxRunningVMs. The
+The number of active VMs on the host must be kept under $MaxActiveVMs. This
+includes any VM using resources, including those that are being reverted. The
 rational behind this limit is that the host may not be able to run more VMs
 simultaneously, typically due to memory or CPU constraints. Also note that
 this limit must be respected even if there is more than one hypervisor running
@@ -404,20 +412,15 @@ on the host.
 
 =item *
 
-FIXME: The actual limit on the number of powered on VMs is blurred by the
-$MaxNonBasePoweredOnVms setting and the last loop in ScheduleOnHost().
-
-=item *
-
 The number of VMs being reverted on the host at a given time must be kept under
-$MaxRevertingVMs. This may be set to 1 in case the hypervisor gets confused
-when reverting too many VMs at once.
+$MaxRevertingVMs, or $MaxRevertsWhileRunningVMs if some VMs are currently
+running tests. This may be set to 1 in case the hypervisor gets confused when
+reverting too many VMs at once.
 
 =item *
 
-No Task is started while there are VMs that are being reverted. This is so that
-the tests are not disrupted by the disk or CPU activity caused by reverting a
-VM.
+Once there are no jobs to run anymore the scheduler can prepare up to
+$MaxVMsWhenIdle VMs (or $MaxActiveVMs if not set) for future jobs.
 
 =cut
 
@@ -430,11 +433,51 @@ sub ScheduleOnHost($$)
 
   my $HostVMs = CreateVMs();
   $HostVMs->FilterHypervisor($Hypervisors);
-  my ($RevertingVMs, $RunningVMs) = $HostVMs->CountRevertingRunningVMs();
-  my $PoweredOnNonBaseVMs = $HostVMs->CountPoweredOnNonBaseVMs();
-  my %DirtyVMsBlockingJobs;
 
-  my $DirtyIndex = 0;
+  # Count the VMs that are 'active', that is, that use resources on the host,
+  # and those that are reverting. Also build a prioritized list of those that
+  # are ready to run tests: the idle ones.
+  my ($RevertingCount, $RunningCount, $IdleCount) = (0, 0, 0);
+  my (%VMPriorities, %IdleVMs, @DirtyVMs);
+  foreach my $VM (@{$HostVMs->GetItems()})
+  {
+    my $VMKey = $VM->GetKey();
+    my $VMStatus = $VM->Status;
+    if ($VMStatus eq "reverting")
+    {
+      $RevertingCount++;
+    }
+    elsif ($VMStatus eq "running")
+    {
+      $RunningCount++;
+    }
+    else
+    {
+      my $Priority = $VM->Type eq "build" ? 10 :
+                     $VM->Role ne "base" ? 0 :
+                     $VM->Type eq "win32" ? 1 : 2;
+      $VMPriorities{$VMKey} = $Priority;
+
+      # Consider sleeping VMs to be 'almost idle'. We will check their real
+      # status before starting a job on them anyway. But if there is no such
+      # job, then they are expandable just like idle VMs.
+      if ($VMStatus eq "idle" || $VMStatus eq "sleeping")
+      {
+        $IdleCount++;
+        $IdleVMs{$VMKey} = 1;
+      }
+      elsif ($VMStatus eq "dirty")
+      {
+        push @DirtyVMs, $VMKey;
+      }
+    }
+  }
+
+  # It usually takes longer to revert a VM than to run a test. So readyness
+  # (idleness) trumps the Job priority and thus we start jobs on the idle VMs
+  # right away. Then we build a prioritized list of VMs to revert.
+  my (%VMsToRevert, @VMsNext);
+  my ($RevertNiceness, $SleepingCount) = (0, 0);
   foreach my $Job (@$SortedJobs)
   {
     my $Steps = $Job->Steps;
@@ -444,87 +487,164 @@ sub ScheduleOnHost($$)
     {
       my $Step = $SortedSteps[0];
       $Step->HandleStaging($Job->GetKey());
+      my $PrepareNextStep;
       my $Tasks = $Step->Tasks;
-      $Tasks->AddFilter("Status", ["queued", "running"]);
+      $Tasks->AddFilter("Status", ["queued"]);
       my @SortedTasks = sort CompareTaskStatus @{$Tasks->GetItems()};
       foreach my $Task (@SortedTasks)
       {
         my $VM = $Task->VM;
         my $VMKey = $VM->GetKey();
-        if ($Task->Status eq "queued" && $HostVMs->ItemExists($VMKey))
+        next if (!$HostVMs->ItemExists($VMKey) || exists $VMsToRevert{$VMKey});
+
+        my $VMStatus = $VM->Status;
+        if ($VMStatus eq "idle" &&
+            ($RevertingCount == 0 || $MaxRevertsWhileRunningVMs > 0))
+        {
+          $IdleVMs{$VMKey} = 0;
+          $IdleCount--;
+          $VM->Status("running");
+          my ($ErrProperty, $ErrMessage) = $VM->Save();
+          return $ErrMessage if (defined $ErrMessage);
+
+          $ErrMessage = $Task->Run($Job->Id, $Step->No);
+          return $ErrMessage if (defined $ErrMessage);
+
+          $Job->UpdateStatus();
+          $RunningCount++;
+          $PrepareNextStep = 1;
+        }
+        elsif ($VMStatus eq "sleeping" and $IdleVMs{$VMKey})
         {
-          if ($VM->Status eq "idle" &&
-              $RunningVMs < $MaxRunningVMs &&
-              $RevertingVMs == 0)
-          {
-            $VM->Status("running");
-            my ($ErrProperty, $ErrMessage) = $VM->Save();
-            if (defined($ErrMessage))
-            {
-              return $ErrMessage;
-            }
-            $ErrMessage = $Task->Run($Job->Id, $Step->No);
-            if (defined($ErrMessage))
-            {
-              return $ErrMessage;
-            }
-            $Job->UpdateStatus();
-            $RunningVMs++;
-          }
-          elsif ($VM->Status eq "dirty")
-          {
-            if (! defined($DirtyVMsBlockingJobs{$VMKey}) ||
-                $Job->Priority < $DirtyVMsBlockingJobs{$VMKey})
-            {
-              $DirtyVMsBlockingJobs{$VMKey} = $DirtyIndex;
-              $DirtyIndex++;
-            }
-          }
+          # It's not running jobs yet but soon will be
+          # so it's not a candidate for shutdown.
+          $IdleVMs{$VMKey} = 0;
+          $IdleCount--;
+          $SleepingCount++;
+          $PrepareNextStep = 1;
+        }
+        elsif ($VMStatus eq "off" || $VMStatus eq "dirty")
+        {
+          $RevertNiceness++;
+          $VMsToRevert{$VMKey} = $RevertNiceness;
+        }
+      }
+      if ($PrepareNextStep && @SortedSteps >= 2)
+      {
+        # Build a list of VMs we will need next
+        my $Step = $SortedSteps[1];
+        $Tasks = $Step->Tasks;
+        $Tasks->AddFilter("Status", ["queued"]);
+        @SortedTasks = sort CompareTaskStatus @{$Tasks->GetItems()};
+        foreach my $Task (@SortedTasks)
+        {
+          my $VM = $Task->VM;
+          my $VMKey = $VM->GetKey();
+          push @VMsNext, $VMKey;
+          # Not a candidate for shutdown
+          $IdleVMs{$VMKey} = 0;
         }
       }
     }
   }
 
-  if ($RunningVMs != 0)
+  # Figure out how many VMs we will actually be able to revert now and only
+  # keep the highest priority ones.
+  my @SortedVMsToRevert = sort { $VMsToRevert{$a} <=> $VMsToRevert{$b} } keys %VMsToRevert;
+  # Sleeping VMs will soon be running
+  my $MaxReverts = ($RunningCount > 0) ?
+                   $MaxRevertsWhileRunningVMs : $MaxRevertingVMs;
+  my $ActiveCount = $IdleCount + $SleepingCount + $RunningCount + $RevertingCount;
+  my $RevertableCount = min(scalar(@SortedVMsToRevert),
+                            $MaxReverts - $RevertingCount,
+                            $MaxActiveVMs - ($ActiveCount - $IdleCount));
+  if ($RevertableCount < @SortedVMsToRevert)
   {
-    # We don't revert VMs while jobs are running so we're done
-    return undef;
+    $RevertableCount = 0 if ($RevertableCount < 0);
+    for (my $i = $RevertableCount; $i < @SortedVMsToRevert; $i++)
+    {
+      my $VMKey = $SortedVMsToRevert[$i];
+      delete $VMsToRevert{$VMKey};
+    }
+    splice @SortedVMsToRevert, $RevertableCount;
   }
 
-  # Sort the VMs by decreasing order of priority of the jobs they block
-  my @DirtyVMsByIndex = sort { $DirtyVMsBlockingJobs{$a} <=> $DirtyVMsBlockingJobs{$b} } keys %DirtyVMsBlockingJobs;
-  my $VMKey;
-  foreach $VMKey (@DirtyVMsByIndex)
+  # Power off all the VMs that we won't be reverting now so they don't waste
+  # resources while waiting for their turn.
+  foreach my $VMKey (@DirtyVMs)
   {
-    last if ($RevertingVMs >= $MaxRevertingVMs);
-
-    my $VM = $HostVMs->GetItem($VMKey);
-    if ($VM->Role ne "base")
+    if (!exists $VMsToRevert{$VMKey})
     {
-      if ($PoweredOnNonBaseVMs < $MaxNonBasePoweredOnVms)
-      {
-        $VM->RunRevert();
-        $PoweredOnNonBaseVMs++;
-        $RevertingVMs++;
-      }
+      my $VM = $HostVMs->GetItem($VMKey);
+      my $ErrMessage = $VM->PowerOff();
+      return $ErrMessage if (defined $ErrMessage);
     }
-    else
+  }
+
+  # Power off some idle VMs we don't need immediately so we can revert more
+  # of the VMs we need now.
+  if ($IdleCount > 0 && @SortedVMsToRevert > 0 &&
+      $ActiveCount + @SortedVMsToRevert > $MaxActiveVMs)
+  {
+    # Sort from least important to most important
+    my @SortedIdleVMs = sort { $VMPriorities{$a} <=> $VMPriorities{$b} } keys %IdleVMs;
+    foreach my $VMKey (@SortedIdleVMs)
     {
-      $VM->RunRevert();
-      $RevertingVMs++;
+      my $VM = $HostVMs->GetItem($VMKey);
+      next if (!$IdleVMs{$VMKey});
+
+      my $ErrMessage = $VM->PowerOff();
+      return $ErrMessage if (defined $ErrMessage);
+      $IdleCount--;
+      $ActiveCount--;
+      last if ($ActiveCount + @SortedVMsToRevert <= $MaxActiveVMs);
     }
   }
 
-  # Again for the VMs that don't block any job
-  foreach $VMKey (@{$HostVMs->GetKeys()})
+  # Revert the VMs that are blocking jobs
+  foreach my $VMKey (@SortedVMsToRevert)
+  {
+    my $VM = $HostVMs->GetItem($VMKey);
+    delete $VMPriorities{$VMKey};
+    my $ErrMessage = $VM->RunRevert();
+    return $ErrMessage if (defined $ErrMessage);
+  }
+  $RevertingCount += @SortedVMsToRevert;
+  $ActiveCount += @SortedVMsToRevert;
+
+  # Prepare some VMs for the current jobs next step
+  foreach my $VMKey (@VMsNext)
   {
+    last if ($RevertingCount == $MaxReverts);
+    last if ($ActiveCount == $MaxActiveVMs);
+
     my $VM = $HostVMs->GetItem($VMKey);
-    if (! defined($DirtyVMsBlockingJobs{$VMKey}) &&
-        $RevertingVMs < $MaxRevertingVMs &&
-        $VM->Status eq 'dirty' && $VM->Role eq "base")
+    next if ($VM->Status ne "off");
+
+    my $ErrMessage = $VM->RunRevert();
+    return $ErrMessage if (defined $ErrMessage);
+    $RevertingCount++;
+    $ActiveCount++;
+  }
+
+  # Finally, if we are otherwise idle, prepare some VMs for future jobs
+  my $MaxIdleVMs = defined $MaxVMsWhenIdle ? $MaxVMsWhenIdle : $MaxActiveVMs;
+  if ($ActiveCount - $IdleCount == 0 && $ActiveCount < $MaxIdleVMs)
+  {
+    # Sort from most important to least important
+    my @SortedVMs = sort { $VMPriorities{$b} <=> $VMPriorities{$a} } keys %VMPriorities;
+    foreach my $VMKey (@SortedVMs)
     {
-      $VM->RunRevert();
-      $RevertingVMs++;
+      last if ($RevertingCount == $MaxReverts);
+      last if ($ActiveCount >= $MaxIdleVMs);
+
+      my $VM = $HostVMs->GetItem($VMKey);
+      next if ($VM->Status ne "off");
+
+      my $ErrMessage = $VM->RunRevert();
+      return $ErrMessage if (defined $ErrMessage);
+      $RevertingCount++;
+      $ActiveCount++;
     }
   }
 
diff --git a/testbot/lib/WineTestBot/VMs.pm b/testbot/lib/WineTestBot/VMs.pm
index 8069c90..5913f2c 100644
--- a/testbot/lib/WineTestBot/VMs.pm
+++ b/testbot/lib/WineTestBot/VMs.pm
@@ -179,8 +179,14 @@ The VM is running some task.
 
 =item dirty
 
-The VM has completed the task it was given and thus has been powered off. The
-next step will be to revert it to the idle snapshot so it can be used again.
+The VM has completed the task it was given and must now be reverted to a clean
+state before it can be used again. If it is not needed right away it may be
+powered off instead.
+
+=item off
+
+The VM is not currently needed and has been powered off to free resources for
+the other VMs.
 
 =item offline
 
@@ -272,7 +278,7 @@ sub UpdateStatus($$)
   if ($State == Sys::Virt::Domain::STATE_SHUTDOWN or
       $State == Sys::Virt::Domain::STATE_SHUTOFF)
   {
-    $self->Status("dirty");
+    $self->Status("off");
     $self->Save();
   }
 
@@ -583,7 +589,7 @@ BEGIN
     CreateBasicPropertyDescriptor("SortOrder", "Display order", !1, 1, "N", 3),
     CreateEnumPropertyDescriptor("Type", "Type of VM", !1, 1, ['win32', 'win64', 'build']),
     CreateEnumPropertyDescriptor("Role", "VM Role", !1, 1, ['extra', 'base', 'winetest', 'retired', 'deleted']),
-    CreateEnumPropertyDescriptor("Status", "Current status", !1, 1, ['dirty', 'reverting', 'sleeping', 'idle', 'running', 'offline', 'maintenance']),
+    CreateEnumPropertyDescriptor("Status", "Current status", !1, 1, ['dirty', 'reverting', 'sleeping', 'idle', 'running', 'off', 'offline', 'maintenance']),
     # Note: ChildPid is only valid when Status == 'reverting' or 'sleeping'.
     CreateBasicPropertyDescriptor("ChildPid", "Child process id", !1, !1, "N", 5),
     CreateBasicPropertyDescriptor("VirtURI", "LibVirt URI of the VM", !1, 1, "A", 64),
@@ -680,7 +686,7 @@ sub FilterEnabledStatus($)
 {
   my ($self) = @_;
   # Filter out the disabled VMs, that is the offline and maintenance ones
-  $self->AddFilter("Status", ["dirty", "reverting", "sleeping", "idle", "running"]);
+  $self->AddFilter("Status", ["dirty", "reverting", "sleeping", "idle", "running", "off"]);
 }
 
 sub FilterHypervisor
-- 
1.8.4.rc3



More information about the wine-patches mailing list