[Bug 48353] Use In-Reply-To when identifying patch series

WineHQ Bugzilla wine-bugs at winehq.org
Sun Mar 8 14:14:24 CDT 2020


https://bugs.winehq.org/show_bug.cgi?id=48353

--- Comment #2 from François Gouget <fgouget at codeweavers.com> ---
Support for versions helps but there can still be cases where it's not
sufficient.

Unfortunately In-Reply-To is tricky to use when preserving compatibility with
unthreaded patch series. Here's a proposed series of assumptions:

* Most developers send threaded patch series but it is still necessary to
  support unthreaded patch series.

* Whether a developer sends threaded or unthreaded patch series depends on his
  toolset and thus a developer is unlikely to mix threaded and unthreaded patch
  series. So it is not worth worrying about them interfering.

* While patch series can be temporally mixed up when the TestBot receives them,
  they have a clean temporal separation when the developer sends them.

* Because unthreaded patch series are less common, it's ok if the TestBot
  sometimes gets confused when they get mixed up on reception.


For the following it may be useful to check the TestBot's database schema as a
reference:
https://source.winehq.org/git/tools.git/blob/HEAD:/testbot/doc/winetestbot-schema.dia

-----

For reference, here's a description of the TestBot's current algorithm for
dealing with patch series:

* When receiving a part N email:
  - Look for a PendingPatchSet object with the same sender, version and total
    part count.
  - If there is no match, create a PendingPatchSet object and a matching
    PendingPatch object.
  - If there is a match and it already contains a PendingPatch object with
    the same part number, throw away the current email.
  - Otherwise create a PendingPatch object for the current email.

Given two patch series (a1, a2, a3) and (b1, b2, b3) where the emails are
received in this order a1, a2, b2, a3, b1, b3; this will result in a viable
(a1, a2, a3) patch series, b2 getting dropped, and an incomplete patch series
(b1, b3). Such 'zombie' patch series will break any attempt to resend the patch
series up to 48 hours later (and any attempt will add another 24/48 hours).
This is the biggest drawback of this algorithm.

This algorithm could also result in broken patch series if the emails are
received in this order for instance: a1, b2, a2, a3, b1, b3. Then it would form
a broken patch series (a1, b2, a3), drop a2, and leave an incomplete zombie
patch series (b1, b3).

-----

And a basic algorithm to deal with both threaded and unthreaded patch series:

1. Modify PendingPatchSets to have a single field:
   PK Id
      - The patch series id.

2. Modify PendingPatches to have the following fields:
   PK/FK PendingPatchSetId
      - The patch series potential thread id, that is In-Reply-To ||
Message-Id.
   PK No
      - The part number (from the Subject field).
   I  FromEmail
      - The email of the sender (i.e. the From field minus the name).
   I  Version
      - Patch series version (from the Subject field).
   I  TotalParts
      - The total number of parts (from the Subject field).
   FK PatchId
     - Identifies the Patch object which itself points to the job testing it.

3. When receiving a part 1 email with In-Reply-To set:
   - This means there is a cover and that the patch series uses threading.
   - Set PendingPatchSetId = In-Reply-To on the PendingPatch object.
   - Create a matching PendingPatchSet object if necessary.

4. When receiving a part N >= 2 email with In-Reply-To set:
   - This means the patch series uses threading.
   - Set PendingPatchSetId = In-Reply-To on the PendingPatch object.
   - Create a matching PendingPatchSet object if necessary.

5. When receiving a part N email with no In-Reply-To field:
   - If it is a part 1 we don't know if the email is part of a threaded patch
     series or not. If it is a part N >= 2, then we know it's an unthreaded
     patch series.
   - Collect all PendingPatch objects with the same sender, version, part
number,
     and total part count. Put their PendingPatchSetId field in the Excluded
     set: this email cannot be part of the same patch series as them.
   - Loop on all PendingPatch objects with the same sender, version and total
     part count, excluding those where PendingPatchSetId is in the Excluded
set.
   - If there is no match, this email is in a new patch series so create a new
     PendingPatch object with PendingPatchSetId = Message-Id and the associated
     PendingPatchSet object.
   - Otherwise pick the first PendingPatch object in that set and create a new
     PendingPatch object with the same PendingPatchSetId.
     Note that this means that for unthreaded patch series, the
PendingPatchSetId
     may not be the Message-Id of part 1. It is instead the Message-Id of the
     first part that was received.

6. In all cases, once the new PendingPatch object has been created, check for
   jobs that can be scheduled in the usual way.

Note that this algorithm does better than the current TestBot algorithm.
Given the same two patch series (a1, a2, a3) and (b1, b2, b3) where the emails
are received in the same order a1, a2, b2, a3, b1, b3; this will either create
the two correct patch series, or two broken ones (a1, a2, b3) and (b1, b2, a3).
But at least it will not create a zombie incomplete patch series to mess things
up for the next 48 hours.

With the alternative a1, b2, a2, a3, b1, b3 order this would still definitely
result in broken patch series: (a1, b2, a3) and (b1, a2, b3), but still no
zombie.

-----

Note that the above algorithm does not actually take the temporal send time
separation into account. One can do slightly better by tweaking this algorithm
as follows:

* Add a SendTime field to the PendingPatch object. Set it based on the email's
Date field.

* Instead of picking the first PendingPatch object in point 6, pick the one
with the closest
  SendTime field: this is the email which is most likely to be part of the same
patch series.

In the same conditions as above this algorithm the SendTime would almost
guarantee that a3 gets assigned to (a1, a2), resulting in two correct patch
series.

The SendTime would however not help if the emails are received in the following
order: a1, b2, a2, a3, b1, b3: this would still result in the broken patch
series: (a1, b2, b3) and (b1, a2, a), but still no zombie.

To improve the results one would have to delay assigning parts to
PendingPatchSet objects until more parts arrive which makes the algorithm even
more complex.

But this only matters for collisions where at least one patch series is
unthreaded and both have the same version which should be pretty rare. So
taking the SendTime into account may not be worth it at all.

-- 
Do not reply to this email, post in Bugzilla using the
above URL to reply.
You are receiving this mail because:
You are watching all bug changes.


More information about the wine-bugs mailing list