test.winehq refactoring

Sun Apr 2 14:51:27 CDT 2017

So I've been working on the test.winehq.org site lately. Here is an 
overall description of what I'm trying to achieve:

1) Improve parsing of the WineTest reports

   This is what started the ball rolling.

   Initially the issue was with detecting the test executable exiting 
   abruptly after a subprocess had already printed a test summary line. 
   Pids were introduced into the test reports and test.winehq turned out 
   to not need modifications. But having it take advantage of the pids 
   does. And then a lot of other parsing improvements turned up.

2) Make it possible to check out the source anywhere and use it 
   unmodified no matter where your web server files are located.

   Currently winetest.conf, which is checked into Git, contains 
   hardcoded paths to Wine's Git directory and the location of the web 
   server files. I intend to move those to the web site configuration 
   file and to the crontab entry.

   The path to that is to remove $datadir, $queuedir and $gitdir in 
   favor of a single $workdir directory; and then pass that path to the 
   scripts that need it. For the CGI scripts this would be done by 
   setting $workdir in the web server configuration file. And for 
   regular scripts by passing it on the command line, starting with the 
   crontab entry.

3) Get rid of /data in the URLs

   The idea is that everything that's supposed to be accessible through 
   the web server should be in the /data directory. That means moving 
   the error files and the /builds directory.

   Once that's done the web server can be told to only serve files in 
   $workdir/data, /data becomes redundant in the URL, and redirects can 
   be put in place to not break the old URLs.

   That leaves /old-data which is currently accessible but need 
   not be, and /queue which is partly accessible but probably should not 
   be. After this change neither would be accessible anymore.

   There's also a number of scripts that have an unclear purpose and 
   make very tempting targets for git rm: error.cgi, make-winetest, 
   service.cgi, site

3) Security

   I don't know of any specific bug that needs fixing and I'm not an 
   expert in that domain. So I'm just trying to stick to some principles 
   that I hope will help keep things reasonably secure.

   - I'm trying to make it so the web server process has write access to 
     as few files as possible. In particular it should not have write 
     access to anything that's executable. Currently the raw reports 
     remain owned by the web server so it can still modify them. That's 
     not necessarily an issue since those are not being run but maybe it 
     should be changed.

   - Same thing for the perl scripts which is why I documented a way of 
     setting things up based on two accounts: one for the source and one 
     where the scripts are run (and the web server is run in a third one 
     of course).

   - Eventually I hope to enable perl's taint mode. This does not 
     guarantee safety but can help identify places where incoming data 
     should be checked and sanitized.

4) Miscellaneous issues that would be worth looking at one day.

   I don't have specific plans for these (hence why they have bug 
   entries) but I'll list them here anyway in case someone volunteers.

   - Disk usage seems really very high for such a simple site.

     The static HTML pages the web site serves are pretty big: I 
     estimate that for test.winehq they take up 21 GB of disk. And 
     that's not because the reports themselves are big. On any given day 
     we get about 60 reports which typically means:
        50 MB of raw report files
       180 MB of full report HTML files
       285 MB of individual test unit HTML files
         3 MB of index files
     So the raw reports represent under 10% of the disk usage. 
     Everything else is duplicated data. But fixing this would require 
     quite a change in the way the site works.

     See bug 42756 for more on this part:
     https://bugs.winehq.org/show_bug.cgi?id=42756

     Then there is the archives which takes 132 GB!
     There are a number of things we could do to reduce that:
     . It looks like we keep the old reports forever. Maybe we 
       could delete old data after a while.
     . Maybe archiving just the raw report would be sufficient. After 
       all, all the data is there. We may not be able to parse the old 
       reports in the future but we would still be able to read them.
       Or we could simply archive the raw report and the single-file 
       full report.
     - Instead of bz2 we could use other compression algorithms. On a 
       set of 3 builds I got the following results:
       all > bz2 -9        : 22 MB / build   -> 41 GB (1)
       all > xz -9         : 5.5 MB / build  -> 10 GB (1)
       raw + html > bz2 -9 : 14 MB / build   -> 26 GB (2)
       raw + html > xz -9  : 3 MB / build    ->  6 GB (2)
       raw report > bz2 -9 : 5.5 MB / build  -> 10 GB (3)
       raw report > xz -9  : 1.1 MB / build  ->  2 GB (3)
         (1) Archive every single file as we currently do.
         (2) Only archive (the raw) report and report.html
         (3) Only archive (the raw) report.
     . Passing the 64 bit builds through UPX (as we already do for 
       the 32 bit ones) would compress them by a factor of around 2.9 
       (one build that took 82 MB got reduced to 28 MB). This would 
       reduce the size of the /builds directory to around 43 GB.

   - Provide per machine results over time.

     The index files make it possible to see how the number of failures 
     evolved over time for a fiven platform, such as Windows XP. But 
     anyone running the tests regularly on a machine would appreciate 
     being able to see how it fared over time. However that's currently 
     impossible since you only get specific machine results on a 
     per-build basis. A fix for that would be to add a 'flat all 
     machines' top-level index, as well as 'flat all machines' indexes 
     for each test unit.

     https://bugs.winehq.org/show_bug.cgi?id=39379

   - Provide some data about the test unit run time.

     This would likely take the form of additional per-build index files 
     showing the test units sorted by run time with some min, max and 
     average data.
     This would help identify which test units take too much time or 
     which machines are having trouble keeping up.

     https://bugs.winehq.org/show_bug.cgi?id=42757

   - Provide some data about the size of individual test units output

     The Wine test reports are pretty close to the 1.5 MB limit so 
     having an idea of which the biggest offenders are would be helpful.

     https://bugs.winehq.org/show_bug.cgi?id=42758

-- 
Francois Gouget <fgouget at codeweavers.com>