[Bug 44688] Detect stuck processes

wine-bugs at winehq.org wine-bugs at winehq.org
Tue Jun 26 20:29:07 CDT 2018


https://bugs.winehq.org/show_bug.cgi?id=44688

François Gouget <fgouget at codeweavers.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|NEW                         |RESOLVED

--- Comment #1 from François Gouget <fgouget at codeweavers.com> ---
This is done.
The TestBot now also detects stuck revert processes, retries the revert and
avoids infinite loops.

commit fca14eaa91f0ba811c5b90c028f6e824d07f8742
Author: Francois Gouget <fgouget at codeweavers.com>
Date:   Thu Jun 7 00:29:38 2018 +0200

    testbot: Also mark the VM for maintenance if the reverts get stuck.

    When a VM takes a long time to revert the LibvirtTool.pl process
    typically remains stuck in the Sys::Virt::DomainSnapshot::revert_to()
    call and cannot enforce the timeout itself, thus causing the timeout to
    be detected at the TestBot Engine level.

    Signed-off-by: Francois Gouget <fgouget at codeweavers.com>
    Signed-off-by: Alexandre Julliard <julliard at winehq.org>

commit 02681e7b8fd3186f446add2391e69c87ca3a00df
Author: Francois Gouget <fgouget at codeweavers.com>
Date:   Wed May 16 11:22:10 2018 +0200

    testbot: Detect VM revert loops.

    VM revert loops typically happen when a VM is misconfigured such that
    the TestBot fails to access the TestAgent daemon after reverting it.
    This results in the VM being put offline until it is accessible again
    through Libvirt which is the case so that it is immediately put back
    online and reverted again leading to a new error.
    With this patch the VM is put in maintenance mode for an administrator
    to look at if it has too many consecutive errors.

    Signed-off-by: Francois Gouget <fgouget at codeweavers.com>
    Signed-off-by: Alexandre Julliard <julliard at winehq.org>

commit b65f81e0770dab83249bd472c7c5feb3e57267ec
Author: Francois Gouget <fgouget at codeweavers.com>
Date:   Mon May 14 13:21:49 2018 +0200

    testbot: Tweak the 'Putting VM offline' email.

    Emphasize that the TestBot is still monitoring the VM.

    Signed-off-by: Francois Gouget <fgouget at codeweavers.com>
    Signed-off-by: Alexandre Julliard <julliard at winehq.org>

commit 129508e7e60130f979b6715db00114d5100011a4
Author: Francois Gouget <fgouget at codeweavers.com>
Date:   Mon May 14 13:21:30 2018 +0200

    testbot: Requeue the task in case the script gets stuck.

    Count how many times the task has been requeued to avoid infinite loops,
    just like the scripts themselves normally do.

    Signed-off-by: Francois Gouget <fgouget at codeweavers.com>
    Signed-off-by: Alexandre Julliard <julliard at winehq.org>

commit 2cd1475bf6269db4979499e8a63a6873f4e74362
Author: Francois Gouget <fgouget at codeweavers.com>
Date:   Fri May 11 00:15:22 2018 +0200

    testbot: Reschedule at the latest when the next task times out.

    This ensures we catch stuck tasks in a timely fashion.
    Note that we still reschedule every 10 minutes to catch any issues but
    the scheduler handles this itself instead of relying on SafetyNet().

    Signed-off-by: Francois Gouget <fgouget at codeweavers.com>
    Signed-off-by: Alexandre Julliard <julliard at winehq.org>

commit a5d7bc263b1e355ee8b522812c8a0961d1d9d116
Author: Francois Gouget <fgouget at codeweavers.com>
Date:   Fri May 11 00:14:52 2018 +0200

    testbot/Engine: Let event handlers add / remove events.

    This makes it possible to handle events that happen at irregular
    intervals: the event is created as non-repeating and the event handler
    computes when the next event should happen and adds it.

    Signed-off-by: Francois Gouget <fgouget at codeweavers.com>
    Signed-off-by: Alexandre Julliard <julliard at winehq.org>

commit 3ce81c0c6cf9e9829608441ab943da0503492d1d
Author: Francois Gouget <fgouget at codeweavers.com>
Date:   Wed May 9 02:45:31 2018 +0200

    testbot: Detect and kill stuck task scripts.

    The tasks themselves have a timeout which the corresponding scripts
    enforce. However the scripts themselves may get stuck, typically due
    to network problems. When that happens they can end up blocking the
    whole TestBot. So make sure the TestBot engine itself can detect stuck
    scripts and take corrective action.
    Note that the detection is not very timely but will happen at the
    latest in the SafetyNet() function. This means there will be at most a
    10 minutes delay.

    Signed-off-by: Francois Gouget <fgouget at codeweavers.com>
    Signed-off-by: Alexandre Julliard <julliard at winehq.org>

-- 
Do not reply to this email, post in Bugzilla using the
above URL to reply.
You are receiving this mail because:
You are watching all bug changes.


More information about the wine-bugs mailing list