[Bug 31787] Run the tests in Wine
wine-bugs at winehq.org
wine-bugs at winehq.org
Sat Mar 10 10:05:49 CST 2018
https://bugs.winehq.org/show_bug.cgi?id=31787
--- Comment #3 from François Gouget <fgouget at codeweavers.com> ---
Here is my current plan. It starts with a very basic implementation that falls
far short of what we want to do but gets the ball rolling and provides a
foundation on which we can build on. The following start with this basic
implementation, then expands on it, and has another look at the roadblocks on
our way to get the last features:
(base series)
b1. We need a new VM type so the TestBot knows which VMs to run the Wine
tests on (see VMs.pm, ddl/winetestbot.sql and ddl/update*.sql).
We could call it something like 'unix' or 'wine'.
b2. There are multiple types of Tasks and each one is handled by a
WineRunXxx.pl script. The script being used depends on the Step->Type
field so we will need new Step types (see Steps.pm and the same ddl
files).
b3. CheckForWinetestUpdate will need to be updated to create a job with a
Step of the right type, for instance unixreconfig, to update Wine on
the Unix machine(s) when there is a Git commit in Wine.
b4. To process the tasks in the new unixreconfig step type we will need
two new scripts: WineRunUnixReconfig.pl and build/UnixReconfig.pl.
note that they should *not* run WineTest: WineRunUnixReconfig.pl will
have to update the VM snapshot so later compilations start from the
new Wine baseline. But running WineTest could ruin the VM's test
environment (crash the X server, etc) which we would not want in the
snapshot we'll use to run later tests.
Now that we have proper dependency support between Steps we could also
add the UnixReconfig Step as an extra Step in the usual "Wine Update"
job and just make sure it does not depend on the classic Reconfig step
so that a failure in one does not cause the other to be skipped. The
choice is a mater of taste.
b5. We will need a new script to run the tests in the Unix VMs. Maybe
call it WineRunUnixTask.pl. Unlike WineRunTask.pl which runs the tests
on Windows, WineRunUnixTask.pl will need to deal with both patches and
executables.
b6. Modify Patches::Submit() to create tasks for the unix/wine VMs.
Currently it only creates jobs for the patches that modify the tests.
If a patch series contains test and non-test parts it combines them
one job which suits our purpose just fine. So for a basic
implementation we could keep that as is.
Then Patches::Submit() creates one job per dll that it needs to run
tests on. So if a patch touches the tests of d3d8 and d3d9 that's 2
jobs. In fact this should be changed because it does not mesh well
with https://source.winehq.org/patches/ which expects precisely one
job per patch.
So assuming the above is fixed, if we get a patch that touches the
device and visual unit tests in d3d8 and d3d9 we will currently get a
job that looks something like this (here indentation represents the
dependencies between steps):
1 Build d3d8_test.exe and d3d8_test64.exe
2 d3d8:device - 32 bit exe - 1 task per 32/64 bit Windows VM
3 d3d8:device - 64 bit exe - 1 task per 64 bit Windows VM
4 d3d8:visual - 32 bit exe - 1 task per 32/64 bit Windows VM
5 d3d8:visual - 64 bit exe - 1 task per 64 bit Windows VM
6 Build d3d9_test.exe and d3d9_test64.exe
7 d3d9:device - 32 bit exe - 1 task per 32/64 bit Windows VM
8 d3d9:device - 64 bit exe - 1 task per 64 bit Windows VM
9 d3d9:visual - 32 bit exe - 1 task per 32/64 bit Windows VM
10 d3d9:visual - 64 bit exe - 1 task per 64 bit Windows VM
The simplest approach would be to add the unix/wine tests as a single
extra step that does the build and runs all the test units.
1 Build d3d8_test.exe and d3d8_test64.exe
2 d3d8:device - 32 bit exe - 1 task per 32/64 bit Windows VM
3 d3d8:device - 64 bit exe - 1 task per 64 bit Windows VM
4 d3d8:visual - 32 bit exe - 1 task per 32/64 bit Windows VM
5 d3d8:visual - 64 bit exe - 1 task per 64 bit Windows VM
6 Build d3d9_test.exe and d3d9_test64.exe
7 d3d9:device - 32 bit exe - 1 task per 32/64 bit Windows VM
8 d3d9:device - 64 bit exe - 1 task per 64 bit Windows VM
9 d3d9:visual - 32 bit exe - 1 task per 32/64 bit Windows VM
10 d3d9:visual - 64 bit exe - 1 task per 64 bit Windows VM
11 All test units - all bitness - 1 task per Unix VM
`-> run d3d8:device d3d8:visual d3d9:device d3d9:visual
For the unix step the TestBot would either send the patch or test
executable and provide the list of test units to run. Of course that
means only testing the patches that modify the tests. Also you'll
notice there's no mention of the 32 bit vs WoW wineprefix distinction.
We'd just run all the relevant tests, 32 bit, 32 bit Wow, and 64 bit
WoW in the same task.
As I said it's the simplest approach but probably not what we want.
I'll discuss alternatives below.
b7. In addition to the WineRunUnixReconfig.pl changes,
CheckForWinetestUpdate needs to be updated to create
WineRunUnixTask.pl tasks to run the official WineTest executables
just like we currently do on Windows. These could get tacked on the
existing jobs or go into a separate job like the 'Other VMs' job.
b8. Last but not least, create one or more Unix VM to run the tests on,
with all the development packages and the right Window manager and
settings.
Note that this would be separate from the standard build VM in part
because both would need different Type fields. But also the current
build VM that generates the Windows executables for the Windows tests
uses MinGW and does not need any of the native Unix development
libraries. This means it rarely breaks or needs updates when the Wine
build dependencies change, unlike the new unix test VMs which will
more likely need regular updates.
b9. We normally give 2 minutes for the Windows tasks to run. However they
only run a single test unit each whereas the unix tests will need to
run many test units so that 2 minutes will be too short.
- The 2 minutes is in part to make sure the tests don't take too long
to run (that limit matches the WineTest.exe limit), and in part to
make sure nasty patchs sent to wine-devel don't get to use our
infrastructure for too long. So there is some value to keep it as
short as possible.
- For the full test suite we currently have a 30 minutes timeout. So
in theory we could simply add 2 minutes per test until we reach the
30 minutes limit. Past 3 test units that seems overly generous
though. So we could do something a bit more sophisticated like 2
minutes for the first 3 test units and 30 seconds after that up to
the time limit.
- The exact algorithm probably does not matter much as long as we
don't get spurious timeouts. It should also be easy to adjust
independently of the rest of the code.
Now the above is quite limited and not really what we want so let's
see what's missing and what the impact of adding it has.
First one of the things we want is to check that the test patches sent to
wine-devel compile. This can be built on top of the above.
(compilation series)
c1. The first change is to modify the code that creates the jobs,
Patches::Submit(), so it does not ignore patches that don't touch the
tests.
c2. When a patch touches a test it can work exactly like above, but
in addition it should create jobs for the other patches. These new
jobs would only contain a single unix step of the same form as the
tests above but with an empty list of test units to run.
c3. When given an empty list of test units to run the UnixTask.pl
script should still perform the build but skip running the tests.
c4. The WineRunUnixTask.pl script should not complain if there only a
build log and no test log. A missing log file may be hard to
distinguish from a bug so this may require putting some special code
like "Test log intentionally left blank" in the log so the script
knows everything is fine.
But what we really want is to also test non test Wine patches sent to
wine-devel so we can make sure they don't break the tests. Here's a
starting point for that:
(dll tests series)
d1. Whenever a test touches a dll, rerun all the tests in that dll.
This is a simple extension of the compilation series where instead of
passing an empty test unit list for non-test patches we either pass a
list of all that dll's test unit (like we currently do for the non-C
patches), or an empty list if there is no test for that dll.
This will make checking some patches a bit slow, particularly for some
dlls that have a lot of test units like mshtml for instance. But the
TestBot should be able to handle the extra load.
d2. For patches that don't modify a specific dll (or program) we'd just
pass an empty list.
However a patch that modifies a dll such as ntdll.dll for instance could
essentially break any test. So what we really, really want is to run a
much more complete range of tests for these patches.
(all tests series)
a1. The simplest approach would be to tweak the above code to
systematically rerun every test (or WineTest) for every patch. As
before this would include the 32 bit, 32 bit Wow and 64 bit Wow set of
tests.
a2. In theory this task's timeout should be 3 * ($ReconfigTimeout +
$SuiteTimeout). With the default values that would be 3 hours which
seems way too large.
a3. A Wine rebuild takes 3.5 minutes on average and running the full
WineTest suite takes about 19 minutes (see the TestBot statistics).
Even taking into account these averages means a job would take at
least 1 hour. This has to be compared to the rate at which jobs arrive
which, excluding every non-test patch, currently stands at about 1.3
job per hour.
So this very unlikely to be sustainable.
How to make it possible is not entirely clear and below I'll investigate
various options.
(options)
o1. The simplest option for reducing the load would be to reduce the
number of bitnesses we test. For instance we could drop the 32 bit
tests in favor of the 32 bit WoW ones. Or, in a more radical move,
drop everything but the basic 32 bit tests. This means we would miss
some failures but if that's rare enough it may be acceptable. This
would still leave us with tasks that take about 22 minutes so this may
not be sufficient either.
o2. The approach that BuildBot takes is to test multiple patches together.
If multiple patches arrive in a 5 minutes window (or before the
previous testing round is finished) it can put them all together and
only start one new test round.
- We could conceivably do something like this for the TestBot but it
would require pretty big changes to the way it operates. For
instance the 'patches' website assumes there is one job per patch.
But here this would not be the case.
- When a job fails we would not know which patch caused the failure.
We could solve that by doing a 'bisect' of sorts but then this
compounds the complexity so it's probably not the right approach.
- It also would not work for manually submitted jobs.
- And then there's the question of how it would mesh with the
existing Windows tests: bunch them up too? Keep them separate and
end up with multiple jobs per patch?
- So overall this really does not seem like an approach we could use
in the TestBot.
o3. Another option would be to split the work so it can be parallelized:
have one step that only does the 32 bit tests on one of the VMs (1
build + 1 WineTest run so 22 minutes expected run time), and another
step that does the 32 and 64 bit WoW tests in parallel on another VM
(running on another host). But that second VM would still have an
expected run time of about 45 minutes so that would likely be
insufficient.
o4. We could instead separate the build step from the test one(s). This
would allow us to run the 32 bit, 32 bit Wow and 64 bit Wow tests in
parallel in separate VMs, meaning we should be able to handle about 3
jobs per hour.
- This option can be tempting if we have multiple test environments.
For instance it could allow us to build once and then run the tests
in GNOME, KDE and LXDE environments for instance (although none of
these have window managers that are up to the task so maybe a better
exemple would be to run the same binaries in English, French and
Hebrew locales).
- Also while this would work fine for running the tests multiple times
on the same Linux distribution, we would probably not want to build
on Debian and then run the tests on Red Hat. So this may be of
limited usefulness.
- Furthermore if we have really different types of Unix systems such
as Linux and FreeBSD (or Mac assuming it can fit in this framework
at all), we would need a way to make sure that a binary built on
Linux is then not sent to a FreeBSD test machine. One approach could
be to create more VM types (have linux and freebsd instead of just
unix, and handle both with the same scripts), but that seems to lead
to an explosion in the number of VM types. Especially if we then go
with Debian, Red Hat, Arch, etc. It's certainly possible to find a
solution though (VM subtypes?).
- Another drawback is that this requires transferring the Wine and
test binaries from the build VM to the TestBot server and then to
each of the test VMs. This would be about 20MB compressed per
bitness for every job. This adds to the time spent running each
Task, to the WineHQ.org bandwidth consumption and the TestBot disk
usage.
- An optimisation would be to let the 'Wine update' job catch up all
the test VMs to the latest Wine binaries to establish a new
baseline, and to then only send the binary changes.
- The simplest form of diff would be to only send the modified files
(based on the modified timestamp), but there are probably a lot of
binary diff tools we could also use.
- The main issue with all these 'diff' approaches is keeping the
baseline of the build VM and the test VMs in sync. We run the risk
of having a sequence such as:
1 Generate a binary diff for job 1.
2 Update the build VM.
3 Synchronize the test VM with the new binary baseline.
4 Try to apply the diff generated in 1 to the test VM's new
binaries.
o5. Instead we could replicate the Unix VM(s). So instead of having the
host running the single Unix VM be the bottleneck, we could throw more
hardware at the issue to increase our job processing throughput.
- This assumes that the test VM really behaves exactly the same way no
matter which host it is on otherwise the results could be pretty
confusing. That should already be the case but our hosts are not
entirely identical (3 Intel processors, 1 AMD but most VMs use a
neutral 'kvm32' processor) and this has never really be thoroughly
verified.
- Note that although the job throughput would be increased, the job
latency would still remain at the usual
- See bug 39412 for (upcoming) details on how the TestBot could
implement load balancing between the hosts.
https://bugs.winehq.org/show_bug.cgi?id=39412
- The further benefit is that failover would likely come for free.
This means that with enough VM duplication, when one host freezes
the tasks would automatically be handled by the other hosts which
would mean less (or no) downtime.
o6. A more sophisticated approach would be to analyze the dlls the tests
depend on and only rerun those that can be impacted.
- For instance it looks like modifying the ws2_32 source would only
impact the secur32, webservices, winhttp, ws2_32 and wsdapi tests,
which means running 21 test units instead of the 500+ of the full
suite. Also changing the source in a dll with no test and not used
anywhere else (e.g. hal.dll) means we could skip running the tests
entirely.
- This analysis could be done on the VM right after rebuilding Wine:
see which binaries changed, then look for them in the Makefile
IMPORTS lines. But this would miss tests and dlls that do a
LoadLibrary(). So a more sophisticated analysis may be called for,
or we may need to provide the extra dependency information manually
somehow (either in the Wine source or in some TestBot table).
- This would mean the unix tasks would determine which test units to
run on their own rather than having the TestBot tell them as
proposed in the dll tests series. But the change should be minor.
- It's hard to predict how much this would reduce processing time and
thus whether it would be sufficient. This really depends on the
ratio of low-level header / dll patches versus high level dll with
no tests patches. So the only way to see if that work is probably to
try it out.
- If not sufficient on its own it can be combined with other options,
particularly the load balancing one (o5). It may also be easier to
implement than o5, depending on how hard the dependency analysis
turns out to be.
Given the above I think it makes sense to start implement things
progressively from the base series to the compilation on, to the dll one
to the all series. Each step will be able to build on the previous one and
provide us with new information about the extra TestBot load, how many
jobs we really get per hour, and also whether the test results are
reliable, etc.
Should we approach the TestBot limits at any point we'll be able to stop
expanding the tests and still have more than what we currently have while
we figure out how to proceed further. I also think that no matter what
happens and where we stop, the work done will be reused when proceeding
further.
Further bells and whistles:
(whistles)
w1. The current tasks only do one thing and produce a single log. For
some approaches a single task may do a build, run 32 bit tests, then 64
bit tests. Having all this go into a single log may be confusing when
looking at it on the web site. So it could make sense to create a
separate log for each 'subtask'.
- This would require modifying the WineRunXxx.pl and associated build
scripts of course, as well as the the JobDetails.pl web page so it
can show each log; but also the code canceling and restarting Tasks
so they clean the old logs correctly.
- The JobDetails.pl page could have "Show full 32 bit log" and "Show
full 64 bit log" links for instance.
- Note also that we already support showing the results of multiple
test units from a single log for WineTest. So that aspect should not
require any change.
w2. The job submission page lets developers cross a checkbox to run
the 64 bit Windows tests in addition to the 32 bit ones. If we do have
3 types of tests on Unix we may want to expand that. This may also
further require analyzing the whether the user picked a Unix VM to
present the right option.
w3. The current Windows tests reset the test environment entirely
between test units. The Unix tests would not. It's not clear that we
should. After all WineTest needs to be able to run all the tests with
no cleanup between them. Still it may make sense to do a cleanup
between bitnesses, at least resetting the WINEPREFIX. But doing a
more thorough cleanup would involve at least restarting the X server
between tests. That could be bothersome.
--
Do not reply to this email, post in Bugzilla using the
above URL to reply.
You are receiving this mail because:
You are watching all bug changes.
More information about the wine-bugs
mailing list