TestBot News

Sat May 7 13:52:57 CDT 2022

On 5/7/22 11:44, Francois Gouget wrote:
> On Thu, 5 May 2022, Zebediah Figura (she/her) wrote:
> [...]
>> Off the top of my head, tests I can think of that inherently can't be
>> parallelized:
> [...]
>> * Tests which change display mode (some ddraw, d3d8, d3d9, dxgi,
>> user32:sysparams). In many cases these are put into test units with other d3d
>> tests which *are* parallelizable, but they could be split out.
> 
> I would add user32:monitor.

Yeah, sorry, that was an approximate list. I should have said that this 
list is incomplete...

> 
> 
> [...]
>> * d3d tests in general are an odd case. We can't parallelize them if we might
>> run out of GPU memory, although that hasn't been a concern yet and it won't be
>> for llvmpipe.
> 
> Do we really use that much GPU memory?

I think no. It occurred to me because we *have* run out of virtual 
address space, but that's much less available.

In concrete terms, with the way things currently are, tests *shouldn't* 
use more than 128 MiB (per thread), so I don't think it's worth worrying 
about.

> 
> 
>> We also can't parallelize them on nouveau because of its threading
>> problems. There are also a few tests that *shouldn't* break other
>> tests but do because of driver bugs.
> 
> The resolution change tests always leave my monitor in a weird
> resolution like 320x200 when it's not 200x320 (portait mode). It's
> always fun in the morning to find a terminal to issue an xrandr -s 0.
> But I suspect the first WineTest (win32) run may break the next WineTest
> run (wow64) whenever a test tries to open a window that does not fit in
> that weird desktop resolution. I suspect comctl32:combo, header, rebar,
> status and toolbar are among the impacted tests. (so I'm now trying to
> inject an xrandr in between runs)
> 
> All that to say that if the resolution change tests run in parallel or
> at a somewhat random time relative to the other tests that may bring
> more variability and unexpected failureto the results.

Yeah, to be clear, I think it's a good idea to run all of the resolution 
changing tests separately, and not put effort into parallelizing to the 
absolute limit.

There's a lot of cases here where I'm thinking about an "ideal" final 
state (e.g. there's no reason why shlwapi:url can't run at the same time 
as user32:monitor), but that's just brainstorming...

> 
> 
>> * Tests which care about the foreground window. In practice this includes some
>> user32, d3d, dinput tests, probably others. Often it's only a couple of tests
>> functions out of the whole test. (I wonder if we could improve things by
>> creating custom window stations or desktops in many cases?)
>>
>> * Tests which warp the cursor or depend on cursor position. This ends up being
>> about the same set.
> 
> I may be wrong but I suspect this should include most of comctl32,
> comdlg32, user32:edit, and probably others.

I'm not too familiar with the controls tests, but that sounds plausible.

> 
> 
>> A quick skim doesn't bring up any other clear cases of tests that can't be
>> parallelized. There are probably still a lot that need auditing and perhaps
>> extra work to ensure that they can be parallelized, but I think that's work
>> worth doing.
> 
> There's also all the timing issues in sound, locking (timeout aspects)
> and timer tests.

Hmm, I guess that means we should let anything that calls 
timeBeginPeriod() run by itself. In practice that only seems to be 
winmm:timer and mmdevapi:spatialaudio? Maybe you're referring to 
something else I'm missing?

But it's not obvious to me that e.g. quartz:dsoundrender can't run in 
parallel with dsound:*. As far as I understand there's no "exclusive 
access" problems, and they shouldn't mess with each other's timers? I 
don't claim to be that familiar with low-level audio though, so maybe 
there's something I'm missing.

> 
> 
>> There are a decent number of tests that hardcode temporary file paths but
>> could be made to use GetTempFileName() instead. Actually most such tests
>> already use GetTempFileName(), I guess in order to be robust.
> 
> Eh, funny you should say that. I just found out that kernelbase:process
> and lz32:lzexpand_main forgot to do that (bug 52970). But yes, easily
> fixable.
> 
> 
> But overall I'm more skeptical about the feasibility of parallelization.
> For instance the w10pro6v2004 and w10pro64 test configurations had a
> background Windows Update causing failures in msi:msi and msi:package.
> So far quite understandable. But that also caused reproducible failures
> in ieframe:webbrowser, kernel32:resource, shell32:shlfileop, urlmon:url,
> wininet:http and wininet:urlcache (bug 52560). That's kind of wide
> ranging and unexpected.
> 
> Maybe it could work by only letting one test to run in each of very
> very broad categories (maybe that's similar to your CS idea):
>    * screen : anything that opens a window or modifies the screen
>          d3d*, user32*, gdi32*, etc.
>    * sound : anything that plays or captures sound
>          dsound, winmm, mmdevapi, etc.
>    * timing : anything sensitive to timing
>          dsound, winmm, mmdevapi, kernel32:sync, etc.
>    * install
>          msi*, ntoskrnl*, more?
>    * others : anything not in any of the above categories
> 
> But even such a scheme would probably allow msi:msi to run in parallel
> with urlmon:url and bug 52560 seems to indicate that would not be a good
> idea.

Hmm. Of the eight tests mentioned there, two are related to installers, 
and four are related to... hitting the internet data cap? Which doesn't 
sound related to msi per se. (Not sure what's up with kernel32:resource 
and shell32:shlfileop.)

Ultimately, though, we *are* going to run into spurious and non-obvious 
failures when we run apparently unrelated tests in parallel. That's 
inevitable. I'm optimistic that it won't be that many, although I have 
nothing to base that optimism on. But if it ends up being bad I think we 
can give up on it.

> 
> Also I'm not sure we'd have much parallelism left with such a scheme
> (i.e. too much complexity for too little gain?).

By number of test units, I think there's a lot that can be parallelized. 
That might not translate to time, though. Unfortunately the 
longest-running tests fall mostly in the exceptions listed above (msi, 
d3d, user32...), so maybe it won't help much.

> 
> But also maybe the only way to know is to try.
> 

Indeed :-)

I won't promise that I'll have time to put together a proof of concept, 
since I can't find the time to do much of anything, but I'll at least 
try to make time...