[Bug 45023] New: TestBot Object Relational Mapper issues

wine-bugs at winehq.org wine-bugs at winehq.org
Thu Apr 19 09:57:59 CDT 2018


https://bugs.winehq.org/show_bug.cgi?id=45023

            Bug ID: 45023
           Summary: TestBot Object Relational Mapper issues
           Product: Wine-Testbot
           Version: unspecified
          Hardware: x86
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: unknown
          Assignee: wine-bugs at winehq.org
          Reporter: fgouget at codeweavers.com
      Distribution: ---

The TestBot implements its own perl Object Relational Mapper (ORM) to access
the database. It is implemented by the modules in lib/ObjectModel. The are two
main concepts:

* Items
  An Item is an object representing a table row. The classes such as Job, Step
and Task all inherit from Item.

* Collections
  A collection represents a table and is thus a collection of Items. To allow
for fast access the items are stored in a perl hashtable indexed by their
primary key. A Collection may not contain all of the table's rows. For instance
a Step Collection only contains the Steps of the parent Job object used to
create it. So $Job->Steps is a collection that contains only the $Job steps.

  The items in a Collection can be further filtered out by specifying that one
of their property must be in a set of values. For instance
$Jobs->AddFilter("Status", ["queued", "running"]) will cause the $Jobs
collection to only return the queued and running jobs.


Unfortunately this implementation has a number of limitations and design
issues.


1. Care must be taken when creating multiple objects.

  my $Job1 = $Jobs->Add();
  ... Set mandatory fields ...
  my $Job2 = $Jobs->Add();
  ... Set mandatory fields ...
  # Here $Job1 != $Job2
  $Jobs->Save();

Only $Job2 got saved to the database!
The reason is that a Job's primary key is an auto-increment field. So the Job
Id is set when it is saved to the database. Thus when created in memory that
field is unset and an empty string is used when adding it to the $Jobs items
hashtable. Thus $Job2 overwrites $Jobs->{Items}->{""} and is the only one that
gets saved.

Note that there is no such issue for Steps for instance because their key is
not an auto-increment field and is set as soon as the object is created.

Fortunately the workaround is simple: systematically save an object before
creating the next one but the difference in behavior between $Jobs->Add() and
$Steps->Add() is error prone.


2. Auto-increment values on new Items are not ignored

If an auto-increment field is modified on an object, that value is used when
saving the object to the database.

This means fixing the previous issue is not as simple as setting
$Job->Id(--$Unique): handling of auto-increment fields also needs to be changed
when saving new objects.


3. Items retrieved from Detailref collections are missing fields

'Detailref' properties are used to represent 'one to many' relationships.
They are used for instance to access the Steps that belong to a Job:

  my $Step = $Job->Steps->GetItem(1);

Here Steps is a collection that gets created based on the information of the
job's DetailRef property.

However objects gotten in this way don't provide access to all of the
corresponding database fields. In the case of Steps, although the Steps table
has a JobId column, the $Step->JobId property does not exist. This presents
difficulties in a number of cases such as when trying to figure out the name of
the directory containing the Step's files since it is called
"jobs/$JobId/$StepNo" (see Step::GetDir()).

It also means there is no $Step->Job property to get back to the parent Job
object (this would also cause a perl reference loop which would be
troublesome).

The reason for this issue is that when a Collection or Item has a parent object
(called a 'master' object in the TestBot ORM) the parent's foreign key fields
(Step.JobId here) are stored separately in a 'master cols' structure and are
thus not accessible through the normal field methods.


4. "Child" objects can only be accessed from their parent

Any collection referenced through a Detailref property can only be accessed
through the corresponding 'parent' object.

This is the case for the Steps collections. The only way to get a collection of
steps is through $Job->Steps and this only returns the steps that belong to
$Job. So it's not possible to iterate over all the rows in the Steps table.

There is a (somewhat incorrect) workaround for this for the Records table.
This involves having the CreateRecords() method picking one of two
PropertyDescriptors list depending on the presence of a parent object.

However the corresponding Record objects have an extra field compared to the
ones obtained through $RecordGroup->Records: this time the foreign key
identifying the parent object is accessible since it's not not tucked away in
the 'master cols' structure. This is not an issue unless you start mixing both
types of objects, for instance through scope inheritance.


5. The 'master cols' terminology is ambiguous.

It's never clear if a 'master cols' method treats $self as a child of its
parent object, or as the parent to other child objects.

* $Step->GetMasterCols() treats $Step as a child object and returns the key
columns of its $Job master object, that is (JobId).

* Similarly $Step->MasterKeyChanged() means the parent object's key changed, so
here that the value of $JobId changed (which happens on new objects due to the
auto-increment field btw).

* Despite the unchanged 'Master' moniker, $Step->GetMasterKey() considers $Step
to be the master object of the Tasks it contains and thus includes the step's
key in the returned columns. This means $Step->GetMasterKey() returns (JobId,
StepNo).

* Despite not having the 'Master' moniker, $Step->GetFullKey() works the same
as $Step->GetMasterKey() but returns the column values as a single string
'$JobId#@#$StepNo'.


6. Filters on Detailref collections

It's possible to put a filter on a collection to only get a subset of the
corresponding rows. For instance $Job->Steps->AddFilter('Status', ['queued'])
ensures that one will only get the queued steps of $Job.

But Detailref collections are stored as a field of the Item they belong to.
This means $Job->Steps always returns the same collection object. So once a
piece of code has set a filter on it, to only return 'queued' steps for
instance, all other parts of the code will only get the 'queued' steps, no
matter what they need. Also there is no method for cloning a collection (so one
cannot do $Job->Steps->Clone()->AddFilter()), or for removing a filter.

For instance:
   $Job->Steps->AddFilter('Status', ['queued']);
   ...
   # Here UpdateStatus() will only take into account the queued steps!
   $Job->UpdateStatus();

Note also that it's not entirely trivial to work around. It's tempting to do
something like:
   $Job->Steps->AddFilter('Status', ['queued']);
   ...
   # Make sure the database contains an up-to-date view of all the Steps and
   # Tasks in $Job.
   $Job->Save();
   # Then create a new job object from the database
   my $TmpJob = CreateJobs()->GetItem($Job->Id);
   # Now we can update the status without fearing bad filters.
   $TmpJob->UpdateStatus();
   # But here $Job->Status could still be wrong so any code that still uses
   # $Job will be lead astray. So update $Job.
   $Job->Status($TmpJob->Status)
   # But that still leaves out every Step and Task object referenced by $Job
   # Also saving $Job will save $Job->Status, again. What could go wrong?


7. Filters vs. loading from the database

Once a collection has been loaded from the database, adding a filter to it has
no effect. That's because filtering happens by tweaking the SQL query and
changing the filter does not mark the collection for reloading.

Again this issue is made more severe because of the way Detailref collections
are handled. If a piece of code has caused the $Job->Steps collection to be
loaded, then adding a filter elsewhere will have no effect.

For instance:
   $Job->UpdateStatus(); # calls $Job->GetItems(), loading it from DB
   ...
   $Job->Steps->AddFilter('Status', ['queued']);
   foreach my $Step ($Job->Steps->GetItem())
   # Here we will get all steps, not just the queued ones!


8. No "or" operator in filters

Originally the filters only allowed one to request that a field be in a set of
admissible values. This was extended to allow requesting for a field to be less
than, greater than or like some value.

However it is still not possible to use an "or" operator in filters. So for
instance one cannot retrieve all Steps that completed recently or correspond to
a WineTest run (WHERE Ended > 5 minutes ago OR User == 'batch').


9. No Itemref support for multiple key fields.

A Task has a VMName field which is a foreign key referencing a VM. Through the
use of an Itemref property one can directly access the VM with $Task->VM. But
we cannot do the same thing for steps.

The primary key of a step has two fields, (JobId, No). A step also has a
PreviousNo field which identifies the step it depends on. So we would want to
be able to access the previous step through $Step->Previous by using an Itemref
property:

  CreateItemrefPropertyDescriptor("Previous", "Previous step", !1, !1,
                                  \&CreateSteps, ["JobId", "PreviousNo"]),

However this fails with a "Multiple key components not supported" error. Note
that for this to work we would also need a way to tell the Itemref that the
Step.PreviousNo field should be mapped to Step.No. But there is no way to do
so.


10. Collection::GetItem() blindly adds the item to the collection

Collections have a GetItem() method that returns an item when given a string
containing that object's key.

So using the Step example above, if we still have the right $Job->Steps
collection accessible, we can easily get the Previous step through:

  $Job->Steps->GetItem($Step->PreviousNo)

However this also adds $Step->PreviousNo to the $Job->Steps collection,
regardless of what the filter on that collection is. So when the scheduler
analyzes the queued and running steps to figure out whether the one they depend
on, $Step->PreviousNo, has completed, it also unwittingly adds the previous
step to the $Job->Step collection. So the next iteration on the collection may
return completed steps despite expectations.


11. Save order issues

The TestBot ORM automatically takes care of saving parent objects before the
child objects they contain. In practice, if you create a Job > Step > Task
hierarchy you can count on the TestBot ORM to save the Job before saving the
Steps that use the JobId as part of their key and so on.

But this breaks down for $Step->PreviousNo. There you have to make sure to save
the steps in the right order otherwise you get an SQL error.

Similarly deletions must be performed in the right order. Fortunately this is
mostly transparent since we only delete whole Jobs and Job::OnDelete() takes
care of blanking the Step::PreviousNo fields before recursing.


12. No Order-by support

There are a many cases where we retrieve a number of rows from the database and
then do a simple alphabetical or numerical sort on them. That's the kind of
thing that the database would do much faster than Perl. This is mostly an issue
for the activity and statistics web pages because of the number of RecordGroups
they handle.

So it would be nice to be able to specify an Order-By SQL directive when
retrieving the objects. However this would run into the same issues as the
filters for Detailref collections with regards to already loaded collections,
and GetItem() not knowing where to insert freshly loaded objects.


13. No support for joins

There are a few cases where doing a join could be useful.

For instance when reporting the activity what we really want is all the Records
rows corresponding to a RecordGroup that is less than 12 hours old. For now we
have to proceed indirectly: we query all the RecordGroup objects that are less
than 12 hours old, take the lowest GroupId we find and then retrieve all the
Records that have a Group Id greater than that.

A different type of join could also be useful in many other places: currently
we first retrieve the jobs and then we do one more SQL request per job to
retrieve its steps and then one per step to retrieve the tasks. A join could
let us load it all in one request. Fortunately we don't have many jobs so
unlike in the activity case this does not have a significant performance
impact.


14. Performance

Most parts of the TestBot don't have much processing to do so that performance
is not an issue (though there is not really any hard data).

Things are different for the activity and statistics page. The statistics page
is the worst and generating it takes over a dozen seconds. That's annoying
because of the load it puts on the TestBot web server. It's also really long
for a mere couple dozen thousand Records. Not all of it comes from the ORM but
over 50% of it does.

There are some relatively obvious paths for improvement. For instance accessing
a field like $Job->Ended causes the ORM to loop over all of a job's property
descriptors until it finds one called 'Ended', and then it returns the value
from a hash table. It should be possible to make it so that most of the time
the value is returned directly from the hash table. Detailref and Itemref
properties have their own hash tables and so must be handled separately. But
hat could likely also be changed.

-- 
Do not reply to this email, post in Bugzilla using the
above URL to reply.
You are receiving this mail because:
You are watching all bug changes.



More information about the wine-bugs mailing list