[Bug 42756] New: test.winehq.org uses a lot of disk space

wine-bugs at winehq.org wine-bugs at winehq.org
Sun Apr 2 12:40:13 CDT 2017


https://bugs.winehq.org/show_bug.cgi?id=42756

            Bug ID: 42756
           Summary: test.winehq.org uses a lot of disk space
           Product: WineHQ.org
           Version: unspecified
          Hardware: x86
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: www-unknown
          Assignee: wine-bugs at winehq.org
          Reporter: fgouget at codeweavers.com
      Distribution: ---

According to my estimates the static HTML files for the test.winehq.org site
use around 21.5 GB. That seems a lot for such a simple website.

This estimate is based on the following data collected from 3 days worth of
reports:

  8.3MB / report
  60 reports / day
  500MB / day
  44 days history
  -> 21.5 GB total

It's not because the raw reports are big either. On any given day we get:
   50 MB of raw report files
  180 MB of full report HTML files
  285 MB of individual test unit HTML files
    3 MB of index files

So most of it is caused by the HTML files.

Here are some ways one could minimize the bloat:
1) Split the raw report into one file per test unit plus one index file to know
in which order to reassemble them. These would not use more space than the
original raw report except for the filesystem's internal fragmentation. Then
each file the web server serves would be generated from these. For the raw
report one would just have to concatenate them in the right order. The full
report and the individual test unit files would have to parse the relevant
source text files. A drawback is all the extra parsing required to generate
each page. Plus some issues in the raw report can only be detected when parsing
the full report as they show up as issue transitioning from one test result to
the next.

2) An other option would be to skip generating the full report HTML file and
omit the header and footer of the individual test unit HTML files. Then each
page could be generated by assembling the individual HTML fragments with the
right HTML glue to build either a full HTML report or an individual test unit
HTML file. This would have the advantage of not requiring more parsing to serve
each file.

3) Drop the individual test unit HTML files and always link to the full HTML
report. The drawback is more bandwidth usage since every one looking at a test
result would end up downloading the 180 MB full report instead of the small
individual test unit reports.

4) Drop the full HTML report and only keep the per test unit HTML files. This
provides a lower disk saving but does not force users to download a big file
when they are only interested in the result of one test unit.

5) Use some way to reduce the size of the links to the Git source. Removing the
links altogether saves 191 MB in the test case. Out of the 128 characters of a
typical link, 95 of these are identical from one link to the next. So maybe
using some JavaScript one could factorize most of this.


A drawback of options 1 and 2 is that since the files would no longer be static
they could probably not be put up on a content distribution network. But I'm
not sure test.winehq.org uses a CDN now so this probably does not matter.

It may still be possible to reduce the processing load introduced by 1 and 2 by
caching the generated files for a few hours or days. All in all this would
increase complexity quite a bit though.

-- 
Do not reply to this email, post in Bugzilla using the
above URL to reply.
You are receiving this mail because:
You are watching all bug changes.



More information about the wine-bugs mailing list