Hey all,
I thought I would start my annual harangue a little bit early this year. I thought I'd summarize the lens I see things with and then see if there is anything constructive we can do now, and then again when we meet in person. Then we'll also have drink to turn to if we end with nothing but despair.
The ideal is that this page: http://test.winehq.org/data/ be covered entirely in green. That would indicate that our unit tests ran successfully on all tested systems. A further ideal is that it would have 'Mac' and 'Android' columns that were also green. The holy dream we all crave is that a 'make test' would work in a rational fashion. And the ultimate fantasy is that every patch sent would be tested not only on Windows if the tests change, but on Linux, Mac, and Android as well. (Alexandre does that by hand now, but it'd be nice to automate that test).
A more reasonable ideal is that all of our tests would run successfully on a well curated list of 'rigorous' test machines. I don't believe that we have an official page for that; I maintain an unofficial one here: https://www.winehq.org/~jwhite/latest.html That is all of the 'newtb' Windows VMs excluding Windows 2000, Windows 8, and Windows 10.
That list of failures has fluctuated; starting at about 40, and once getting down to as few as a dozen. It now stands at about 20, where it's been for a while. Nicely (?), we're down to only intermittent failures :-/.
I think the instinct has been to fix all of the Windows tests; that once those are consistently green, then it would make sense to go after a well defined Linux rig and push it to green.
CodeWeavers has a rack of hardware and we're happy to put any flavor of system in there (and have done so, with standard rigs for testing AMD and NVidia Linux boxes). A quick scan suggests that those rigs stand at about the same number of failures; low to mid teens.
I think we are lacking several things:
1. Some help for Francois. He's basically doing all of this on his own. We could use some people willing to fight through Perl to help extend our capabilities.
2. The ability (will?) to drive the Windows tests to green. Is it time to articulate a kind of test that is expected to periodically fail? In other words, do we have tests that 'reasonably' fail, and so we should redefine that failure?
Many years in the past, we had constructive sessions where I forcibly prevented people from going to the pub until we flipped a whole lot more bits to green. I can remember times when that worked quite nicely. But we haven't been as productive with that lately. Is it worth dedicating another block of time to it at the Wine conference? Would it work better if we prepared ahead of time?
For example, let me suggest this: that developers start right now running winetest on machines they plan to bring to Wineconf. Then we can have that body of information available to us at the conference, and we can choose to attack issues that show commonality.
Does that make sense?
Any other prep work we should do? (Jacek and Piotr assure me that there will be plenty of beer for drowning sorrows, so that seems to be covered).
Cheers,
Jeremy
On Fri, 13 Oct 2017, Jeremy White wrote: [...]
A more reasonable ideal is that all of our tests would run successfully on a well curated list of 'rigorous' test machines. I don't believe that we have an official page for that; I maintain an unofficial one here: https://www.winehq.org/~jwhite/latest.html That is all of the 'newtb' Windows VMs excluding Windows 2000, Windows 8, and Windows 10.
I don't think it makes sense to exclude Windows 8 and 10 anymore. It's not like the early days where they had over 50 failures. Nowadays they barely have more failures than Windows 7.
Interestingly, most of the Windows 8 failures are on the real hardware machines: cw1-hd6800 and cw2-gtx560. The TestBot machines only have 3, 2 and 1 respectively.
Windows 10 is a bit higher with 11 and 7 failures, though from what I gather we can ignore the Direct X ones.
On 10/25/2017 06:57 PM, Francois Gouget wrote:
On Fri, 13 Oct 2017, Jeremy White wrote: [...]
A more reasonable ideal is that all of our tests would run successfully on a well curated list of 'rigorous' test machines. I don't believe that we have an official page for that; I maintain an unofficial one here: https://www.winehq.org/~jwhite/latest.html That is all of the 'newtb' Windows VMs excluding Windows 2000, Windows 8, and Windows 10.
I don't think it makes sense to exclude Windows 8 and 10 anymore. It's not like the early days where they had over 50 failures. Nowadays they barely have more failures than Windows 7.
I have updated my script in several ways: 1. I've included Windows 8 and Windows 10 2. I've added Linux results, although I include only one system (the 32 bit radeon) for now 3. I've made a distinction between 'usually fails' and 'sometimes fails'. We have a lot of sporadic failures where we accept that it's a transient network condition or acceptable race condition. This hopefully screens out the more important failures.
The result is that we now have 12 clearly wrong tests on Windows and 12 clearly wrong tests for Linux. The new Windows failures are mostly ddraw related, but the Linux failures are all over the map and might be easy pickings.
In an ideal world, we would drive all of the 'consistently failing' and 'usually failing' columns to zero.
Cheers,
Jeremy
On 11/04/2017 09:35 AM, Jeremy White wrote:
The result is that we now have 12 clearly wrong tests on Windows and 12 clearly wrong tests for Linux. The new Windows failures are mostly ddraw related, but the Linux failures are all over the map and might be easy pickings.
Summarizing info for that page's benefit:
- d3dcompiler_43:asm is probably showing up as failing because one of the todo messages includes the words "test failed".
- kernel32:console is bug 43952. Alex Henrie has submitted 138219.
- ntdll:info is the PagefileUsage functionality (bug 5657) which requires a newer version of the kernel (I don't remember which specifically), so I think the plan was just to update this?
- secur32:ntlm is bug 42485, which seems to have been fixed in a newer version of winbind. Can we get this updated on those machines?
- user32:msg is a few different failures: - Lines 16466-78 is bug 42568 and I've submitted 138212 for it. - Line 5090 I think was deemed a bug in fvwm2. - I don't know about line 6314. I can't reproduce it on my machine.
- dinput:mouse is bug 42570. I've submitted 138244.
I don't know anything about any of the others.
2017-11-04 20:40 GMT+01:00 Zebediah Figura [email protected]:
- ntdll:info is the PagefileUsage functionality (bug 5657) which requires a
newer version of the kernel (I don't remember which specifically), so I think the plan was just to update this?
We need the RssAnon value from /proc/self/status, which was added in Linux 4.5. Francois seemed to think that it would not be too difficult to update the virtual machines' kernels.
-Alex
On 11/04/2017 05:14 PM, Alex Henrie wrote:
2017-11-04 20:40 GMT+01:00 Zebediah Figura [email protected]:
- ntdll:info is the PagefileUsage functionality (bug 5657) which requires a
newer version of the kernel (I don't remember which specifically), so I think the plan was just to update this?
We need the RssAnon value from /proc/self/status, which was added in Linux 4.5. Francois seemed to think that it would not be too difficult to update the virtual machines' kernels.
Nice analysis, thanks! I've updated the comments on those issues. If folks have other issues they'd like me to update, just let me know.
At some point (a week or two?), I'm hoping to systematically reach out to the last author of each 'always' and 'usually' failure to invite them to look <grin>.
Cheers,
Jeremy
On Sat, 4 Nov 2017, Alex Henrie wrote:
2017-11-04 20:40 GMT+01:00 Zebediah Figura [email protected]:
- ntdll:info is the PagefileUsage functionality (bug 5657) which requires a
newer version of the kernel (I don't remember which specifically), so I think the plan was just to update this?
We need the RssAnon value from /proc/self/status, which was added in Linux 4.5. Francois seemed to think that it would not be too difficult to update the virtual machines' kernels.
I upgraded the two test machines (cw1-hd6800 and cw2-gtx560) to Debian 9 Stretch (the new stable) on Monday. The two machines are still missing support for Kerberos and GStreamer for the 32 bit tests which I will add in the coming weeks (the development packages are being troublesome). Both machines are using the open-source graphics drivers.
So I am happy to report that this failure is fixed and the upgrade even seems to have fixed other issues :-)
So I am happy to report that this failure is fixed and the upgrade even seems to have fixed other issues :-)
Reviewing recent results: https://www.winehq.org/~jwhite/latest.html
We do, in fact, see some encouraging progress. I blame Alex, Zeb and Francois <grin>.
The Linux box is down to 6 clear failures (and has a bunch of intermittents that likely hide real problems too).
Adding Windows 8 and Windows 10 did increase our counts, but our consistent failures are pretty small (about 13), and maybe there is some low hanging fruit in there.... (although it is mostly ddraw type stuff).
Cheers,
Jeremy