fgouget at codeweavers.com
Sun Apr 2 14:51:27 CDT 2017
So I've been working on the test.winehq.org site lately. Here is an
overall description of what I'm trying to achieve:
1) Improve parsing of the WineTest reports
This is what started the ball rolling.
Initially the issue was with detecting the test executable exiting
abruptly after a subprocess had already printed a test summary line.
Pids were introduced into the test reports and test.winehq turned out
to not need modifications. But having it take advantage of the pids
does. And then a lot of other parsing improvements turned up.
2) Make it possible to check out the source anywhere and use it
unmodified no matter where your web server files are located.
Currently winetest.conf, which is checked into Git, contains
hardcoded paths to Wine's Git directory and the location of the web
server files. I intend to move those to the web site configuration
file and to the crontab entry.
The path to that is to remove $datadir, $queuedir and $gitdir in
favor of a single $workdir directory; and then pass that path to the
scripts that need it. For the CGI scripts this would be done by
setting $workdir in the web server configuration file. And for
regular scripts by passing it on the command line, starting with the
3) Get rid of /data in the URLs
The idea is that everything that's supposed to be accessible through
the web server should be in the /data directory. That means moving
the error files and the /builds directory.
Once that's done the web server can be told to only serve files in
$workdir/data, /data becomes redundant in the URL, and redirects can
be put in place to not break the old URLs.
That leaves /old-data which is currently accessible but need
not be, and /queue which is partly accessible but probably should not
be. After this change neither would be accessible anymore.
There's also a number of scripts that have an unclear purpose and
make very tempting targets for git rm: error.cgi, make-winetest,
I don't know of any specific bug that needs fixing and I'm not an
expert in that domain. So I'm just trying to stick to some principles
that I hope will help keep things reasonably secure.
- I'm trying to make it so the web server process has write access to
as few files as possible. In particular it should not have write
access to anything that's executable. Currently the raw reports
remain owned by the web server so it can still modify them. That's
not necessarily an issue since those are not being run but maybe it
should be changed.
- Same thing for the perl scripts which is why I documented a way of
setting things up based on two accounts: one for the source and one
where the scripts are run (and the web server is run in a third one
- Eventually I hope to enable perl's taint mode. This does not
guarantee safety but can help identify places where incoming data
should be checked and sanitized.
4) Miscellaneous issues that would be worth looking at one day.
I don't have specific plans for these (hence why they have bug
entries) but I'll list them here anyway in case someone volunteers.
- Disk usage seems really very high for such a simple site.
The static HTML pages the web site serves are pretty big: I
estimate that for test.winehq they take up 21 GB of disk. And
that's not because the reports themselves are big. On any given day
we get about 60 reports which typically means:
50 MB of raw report files
180 MB of full report HTML files
285 MB of individual test unit HTML files
3 MB of index files
So the raw reports represent under 10% of the disk usage.
Everything else is duplicated data. But fixing this would require
quite a change in the way the site works.
See bug 42756 for more on this part:
Then there is the archives which takes 132 GB!
There are a number of things we could do to reduce that:
. It looks like we keep the old reports forever. Maybe we
could delete old data after a while.
. Maybe archiving just the raw report would be sufficient. After
all, all the data is there. We may not be able to parse the old
reports in the future but we would still be able to read them.
Or we could simply archive the raw report and the single-file
- Instead of bz2 we could use other compression algorithms. On a
set of 3 builds I got the following results:
all > bz2 -9 : 22 MB / build -> 41 GB (1)
all > xz -9 : 5.5 MB / build -> 10 GB (1)
raw + html > bz2 -9 : 14 MB / build -> 26 GB (2)
raw + html > xz -9 : 3 MB / build -> 6 GB (2)
raw report > bz2 -9 : 5.5 MB / build -> 10 GB (3)
raw report > xz -9 : 1.1 MB / build -> 2 GB (3)
(1) Archive every single file as we currently do.
(2) Only archive (the raw) report and report.html
(3) Only archive (the raw) report.
. Passing the 64 bit builds through UPX (as we already do for
the 32 bit ones) would compress them by a factor of around 2.9
(one build that took 82 MB got reduced to 28 MB). This would
reduce the size of the /builds directory to around 43 GB.
- Provide per machine results over time.
The index files make it possible to see how the number of failures
evolved over time for a fiven platform, such as Windows XP. But
anyone running the tests regularly on a machine would appreciate
being able to see how it fared over time. However that's currently
impossible since you only get specific machine results on a
per-build basis. A fix for that would be to add a 'flat all
machines' top-level index, as well as 'flat all machines' indexes
for each test unit.
- Provide some data about the test unit run time.
This would likely take the form of additional per-build index files
showing the test units sorted by run time with some min, max and
This would help identify which test units take too much time or
which machines are having trouble keeping up.
- Provide some data about the size of individual test units output
The Wine test reports are pretty close to the 1.5 MB limit so
having an idea of which the biggest offenders are would be helpful.
Francois Gouget <fgouget at codeweavers.com>
More information about the wine-devel