[Bug 44688] New: Detect stuck processes
wine-bugs at winehq.org
wine-bugs at winehq.org
Thu Mar 8 14:36:25 CST 2018
https://bugs.winehq.org/show_bug.cgi?id=44688
Bug ID: 44688
Summary: Detect stuck processes
Product: Wine-Testbot
Version: unspecified
Hardware: x86
OS: Linux
Status: NEW
Severity: normal
Priority: P2
Component: unknown
Assignee: wine-bugs at winehq.org
Reporter: fgouget at codeweavers.com
Distribution: ---
Sometimes a TestBot worker process can get stuck.
This can happen to LibvirtTool.pl, particularly when dealing with offline VMs.
But it can also happen to regular scripts like WineRunTask.pl when using
TestAgent to send or retrieve a file.
In both cases the TestBot Engine should have a way to detect stuck processes
and simply kill them.
To detect stuck processes add two fields to the VM table.
ChildStarted - The current child process start timestamp.
ChildTimeout - How long the current child process is allowed to run.
Most of our tasks already have timeouts so it's just a matter of reusing this
timeout and adding some leeway. For the revert and offline tasks we could use 5
and 60 minutes respectively. Then the Jobs::_CheckAndClassifyVMs() method can
check those fields and kill the stuck processes. This works because the
Engine's SafetyNet() method schedules jobs every 10 minutes as a fallback.
The reason for using two fields instead of a single ChildDeadline one is that
the ChildStarted field could be useful to know which period to analyze when
collecting the Munin statistics (currently we analyze an arbitrary period of
time that's supposed to cover the worst case).
--
Do not reply to this email, post in Bugzilla using the
above URL to reply.
You are receiving this mail because:
You are watching all bug changes.
More information about the wine-bugs
mailing list