[00/25] JobDetails performance and new failures patchset
Francois Gouget
fgouget at codeweavers.com
Tue Jan 14 09:40:36 CST 2020
I'm sending this patchset here for review and to provide an overview of
what's to come. It's quite large so it shouldn't be committed in one go
on the TestBot production system (hence the lack of the 'PATCH' prefix
and signing off, those will go with the actual submission of the final
patches).
The main goal of this patchset is to reduce the "new error" false
positives caused by intermittent failures (bug 47998). But fixing this
has a number of impacts which makes the patchset larger.
Some context: the TestBot detects new failures introduced by a patch by
comparing its result to that of the last WineTest run. Any failure not
present in the WineTest run is considered to be caused by the patch.
This breaks down when a failure is intermittent and just happens on the
patch's test run.
To avoid this the pathset modifies the TestBot to use all past WineTest
runs as the comparison point. Only failures not present in any of them
are considered new and attributed to the patch.
But the TestBot does the new failures detection when generating the
JobDetails page (and also when sending the emails). This means parsing
the WineTest results (600KB to 1.5MB) to extract failures, the same for
the test results, and diffing them, for each test configuration
(typically about 20), every time the page is loaded or refreshed. This
results in typical load times between 0.2 and 2 seconds on my test
system (up to 5 seconds on the official TestBot) (bug 48035). Annoying
but not a major issue. But these can put quite a load on the web server
when a rogue spider decides to crawl the TestBot site as happened a few
times in december. Using all WineTest results on record instead of just
the latest would increase the average load times by a factor of 20. This
would make the page load time unacceptable and put too much load on the
web server.
This means the JobDetails performance issues must be tackled first.
Initially my idea was to just cache the errors lists. But in fact it's
just as easy to also cache whether each error is new and thus do the
detection only once. So the patchset introduces .errors files which
contain only the errors extracted from a given task's log, whether each
error is new, and its line number in the full report which allows for
some future enhancements. With this change, generating the JobDetails
page takes 0.01 to 0.15 seconds on my test system.
The structures used to manipulate the error lists were not practical and
hardly extensible. In particular there was not a single list with a flag
denoting new errors, but two lists, one with all the errors and another
with only the new errors (see TagNewErrors()). Furthermore the error
groups and errors in each group were returned separately (see
GetLogErrors()), making them annoying to pass around and even harder to
extend with more information such as line numbers. So the early patches
rework the way errors are stored in memory to simplify usage and make
their representation more extensible.
Before the patchset the WineRun* scripts would parse the test reports
and store the extra validation errors in .err files (e.g. the too much
output errors). But it's just as easy to store those in the .errors
files, making the .err ones redundant. So they are removed on upgrade.
The web server is the wrong place to generate these 'cache' .errors
files, in part because it does not have write access to the jobs/
directory. It's more logical to have each WineRun* script generate those
when it completes. Then the .errors files are available when WineSendLog
and JobDetails need them.
Because the new failures detection was performed each time JobDetails
was run, it could happen long after the job completed. By that time
there could be new WineTest results so simply using the latest ones
would have changed the list of new failures of past jobs which would
have made no sense. With the .errors files the new failures detection is
performed only once when a task completes. So keeping a copy of the
WineTest reference files is not really necessary anymore. But it's still
useful to have them to diagnose issues.
The WineRunBuild script used to copy the latest WineTest reports to each
of the job's step directories when the build completed. Instead it's
much more logical to have each WineRun* script capture the WineTest
reports it needs when it completes its task and to store them in the
task's directory.
The reference WineTest results are stored in the latest/ directory so
each needs a unique name. That name used to be of the form
'<vmname>_<reportname>', for instance debian10_win32_fr_FR.report'. So
this naming scheme was updated to allow for more than one report per
test configuration. Specifically it now includes the job id since
there's only one report per job for a given test configuration and this
makes it easy to know where the reference report is coming from. Also
underscores are allowed in both the VM names and report names which
makes it hard to split the filename into its components. So the
underscore was replaced by dashes. So that gives
'debian10-job63174-win32_fr_FR.report'.
The Engine used to perform the renaming / upgrade operations on startup.
But generating the .errors files for every single job takes several
minutes which makes it inappropriate as an Engine startup task. Also in
order to work on the patchset it is necessary to be able to test code
both before and after files were moved around and .errors files added.
So a new tool, UpdateTaskLogs was added that can rebuild all generated
files from scratch and allows dealing with upgrades and downgrades. It's
introduced before any changes to the way the logs are handled and
updated to perform the required renames and to rebuild the generated
files. This allows upgrading the content of the latest/ and jobs/
directories when needed. Also when reverting to an earlier version of
the code it can mostly restore the content of these directories to their
original state, thus making it possible to test the old code. And should
things go wrong on the production server this will also provide a way to
roll back the state of the TestBot.
The TestBot engine redirects the WineRun* scripts stderr to log.err in
order to capture Perl errors. This makes log.err different from all
other .err files since it cannot be rebuilt from the content of the
corresponding log file. So the patchset renames log and log.err to
task.log and testbot.log respectively. Of course UpdateTaskLogs deals
with the needed renames on upgrade. This is also the reason for the
'mostly' in the previous section: UpdateTaskLogs does not revert these
renames on downgrades (but that's easy to deal with with a find -exec
command).
Another issue is that the WineRun* scripts obviously cannot generate the
.errors file for testbot.log. So instead this file is generated by
WineSendLog which is called as soon as the overall job is complete.
Normally testbot.log is empty and thus ignored anyway. Also the
JobDetails page needs to deal with cases where the .errors file is
missing in any case. So when that happens it simply points the user to
the full log.
Finally, if we have two WineTest reports, one with errors A and B, and
an older one with only error B, then the second report is useless as far
as detecting new failures go. So one can speed up new failure detection
by only keeping the useful reports. With the new/old status being stored
in the .errors file this is not really crucial since it only slows down
the WineRun* scripts. The patchset still includes a patch to prune the
redundant WineTest reports. However it's not very effective because some
tests produce errors that change with every run. Those will clearly need
to be fixed.
Francois Gouget (25):
testbot: Standardize building the reference report filenames.
testbot/LogUtils: Don't rename the old logs on Engine startup.
testbot/UpdateTaskLogs: Delete, rebuild or add reference reports.
testbot/LogUtils: Tag new errors instead of filtering them.
testbot/LogUtils: Return the log errors as a single object.
testbot/LogUtils: Add _AddLogError() and _AddLogGroup().
testbot/LogUtils: Rename the $Summary variables to $LogInfo.
testbot/LogUtils: Tweak ParseTaskLog() to reduce the indentation
level.
testbot/LogUtils: Return the ParseWineTestReport() results as a single
object.
testbot/LogUtils: Add BadRef/NoRef to TagNewErrors().
testbot/LogUtils: Rename the ParseTaskLog() NoLog error field to
BadLog.
testbot/JobDetails: Highlight new errors in the full log too.
testbot/LogUtils: Keep track of the first line of each error group.
testbot/LogUtils: Store the filename in the $LogInfo structures.
testbot: Share code to access the latest reference WineTest reports.
testbot/UpdateTaskLogs: Delete reference reports that are not supposed
to exist.
testbot: Move reference logs to the task and prepare for having more
than one.
testbot: Separate the per-task task and testbot logs.
testbot/LogUtils: Reorder the functions to get a more logical
grouping.
testbot: Cache the log error lists in .errors files.
testbot/JobDetails: Improve the error reporting.
testbot/LogUtils: Share code between _SaveLogErrors() and
_DumpErrors().
testbot/LogUtils: Return the errors in the Parse*() $LogInfo
structure.
testbot: Take into account multiple WineTest reports.
testbot/LogUtils: Deduplicate the latest WineTest reports.
testbot/bin/Engine.pl | 14 -
testbot/bin/Janitor.pl | 9 +-
testbot/bin/UpdateTaskLogs | 559 +++++++++++++
testbot/bin/WineRunBuild.pl | 49 +-
testbot/bin/WineRunReconfig.pl | 24 +-
testbot/bin/WineRunTask.pl | 78 +-
testbot/bin/WineRunWineTest.pl | 118 +--
testbot/bin/WineSendLog.pl | 143 ++--
testbot/lib/WineTestBot/LogUtils.pm | 1161 ++++++++++++++++++---------
testbot/lib/WineTestBot/Tasks.pm | 50 +-
testbot/web/JobDetails.pl | 146 +++-
testbot/web/WineTestBot.css | 2 +
12 files changed, 1700 insertions(+), 653 deletions(-)
create mode 100755 testbot/bin/UpdateTaskLogs
--
2.20.1
More information about the wine-devel
mailing list