[08/12] msxml3: Add an internal IUnknown object for xmlnode
wine at mkarcher.dialup.fu-berlin.de
Mon Dec 1 17:26:38 CST 2008
Am Samstag, den 29.11.2008, 10:52 +0100 schrieb Michael Karcher:
> This makes the xmlnode structures kind-of aggregatable COM objects.
> The explanation of the new reference management is in the rather verbose
> comment at the top of the new node_unk.c file.
For reference, the comment explaining the proposed reference counting
scheme is at the end of this mail. This patch got the comment on IRC:
"julliard: mkarcher: the comment explaining that COM rules are not
followed rings an alarm bell".
So, let me first explain the problem I'm trying to solve.
In libxml2, the application is responsible for lifetime management,
whereas in msxml3, lifetime management is done by reference counting.
The XML documents are highly cyclic data structures: There is one real
root, called the "document node". In XML documents, it has a single
child, the so called "document element". The document element is the
root of the so called DOM tree. Every node in the DOM tree has a
get_parent and a get_ownerDocument method. Also, each non-leaf node has
a get_children method. This is opposed to the standard
container-containee model, where the container owns a reference on the
containees, but the containees do not know about being in a container.
Reference counting in that case is straightforward.
Currently, msxml3 implements reference counting in the following way: As
soon as an interface object for any DOM node is requested, the reference
count of the document is increased by one, and as soon as the interface
looses its final reference, the document's reference count is
decremented. If the document's reference count reaches zero, no pointer
to any object in the DOM tree is out anymore, so the document can be
deleted. This seems to work nicely, but has two major catches, that made
me do this rewrite of the reference management:
a) msxml3 has methods to splice subtrees.
b) msxml3 can create nodes nodes owned by a document, that are not in
the DOM tree
Point a) means that the document a node belongs to is not constant over
the lifetime of the node. As in the current model, the node needs to
take a reference on its current document, reference counts have to be
adjusted. The xmldoc_release and xmldoc_add_ref function calls in
removeChild, insertBefore (and alike) seem to take care of this. But as
the bullet already said, it is not just splicing the node passed into
the splicing function, but a whole subtree! That means that all
interface objects to child nodes have to move their reference to the new
document. Alas, there is *no* *way* in current wine to find all
interface objects for a given node, so there is no way to tell these
interface objects to move their references. As there is also no way just
to know *how* *many* interface objects are out for a given node, let
alone for a whole subtree, reference-correct moving is impossible.
Point b) means that just freeing the nodes in the DOM tree on document
destruction does not get all the nodes belonging to the document. This
is taken care of in the orphan list. The (benign) problem with the
orphan list approach is that the so called orphan subtrees, i.e.
subtrees that are not part of the document get not freed as soon as no
references to the subtree are out anymore, but not until there are no
references to any node that is associated with the document. Think
put_documentElement, that replaces the whole DOM tree, making the old
tree orphan. This might be quite a lot of wasted memory.
Now, lets look at what can be done.
To solve the important issue in part a, there are two possible
a1) Move to a concept that doesn't need the missing information
a2) Make the missing information available.
My patch went the route of a1), but I think a2) is also possible: We
might count reference to a node in the _private element of the node, so
the number of references to a node is at hand. If a subtree is spliced,
we would have to traverse the whole subtree to collect the number of
references in the subtree to move from the old document to the new
document. Another a2) approach is to count the number of references to
the whole subtree inside the _private element. This makes obtaining the
number of references to the subtree an O(1) operation, but updating the
interface object count of one node means the interface count of all
parents has to be updated, too. While this is not as expensive as a
subtree traversal, updating interface object counts is a very common
operation that ought to be O(1) too, and not O(depth).
To solve the issue of b), the same things are imaginable. If a node gets
orphan, check whether the whole subtree is orphan, and nuke it if it is.
If it is not, add it to the orphan list. If later the last reference
into that subtree is freed, one has to detect that the subtree is
orphan. One could do that by checking on every interface release whether
the subtree is now orphan. As this is a "is there any reference to the
subtree" question, the first a2 approach is out immediately. The second
one can do. The up-traversal will stop either at the document element or
at the root of the orphan subtree, and the information whether there is
any reference left is immediately at hand.
Now, there is the a1 thing left. Throw the old concept of "everyone refs
the document" away (which is only band-aided by the orphan list and the
expensive sub-tree counts or up-traversals, IMHO) and try to find a
clean way to get what we need. This is what this patch is about. The
first choice about reference counting in cyclic structures is the choice
of reference direction. The choice that the parent node owns its child
nodes is tempting, as it seems to resemble the simple
container-owns-containee concept. But it breaks down as soon as there
are references to the child nodes out, and the parent node looses its
last reference, as the parent node would get freed in that case. It
would be possible for a parent node to wait for all children to be
released before it gets freed itself. But this again needs non-local
information. A node has to stay alive as long as *any* descendant of
that node, however deep in the hierarchy it might be, is alive. On the
other hand, if the explicit references are from child to parent, the
crucial operation is no longer to find out whether any child node is
still in use (we do have that info in the node's reference count now),
the difficult operation is now to find out whether the parent is still
in use. This operation is unneeded however, as freeing is delegated to
the root node of the subtrees, so a node doesn't care whether there
might be new reference obtained to it by use of its parent node, as it
can be sure, the parent won't destroy it until it is sure that
references can no longer be obtained through it. This finally leads to
the solution I implemented. (I won't explain it here again in detail,
see below the PS for the explanation of that system)
Now, my question is how to proceed. The patches I posted seem to be
working fine and seem to clean up the reference management, for example,
they do away with the orphan list I introduced but still looks like a
hack to me. On the other hand, there are concerns that the new reference
management does not follow COM rules (although I couldn't find
definitive COM rules for cyclic structures, just the usual info that
reference cycles have to be avoided in some way). As I really want
problem a) to get fixed, I ask for advice.
Thanks for reading this quite long mail. I would be grateful for
PS: This mail does not cover the second, independent topic that we need
an IUnknown pointer with node lifetime stability. A basis for that is
also introduced in the new reference counting system. While lifetime
IUnknown and reference counting are two different things, it looked like
a good idea to have both things solved at once. To make use of the
provided lifetime IUnknown, the upper layer classes like XMLDOMNode,
XMLDOMDocument and the like have to be adjusted too, but I postponed
that part of work.
The node_unk object serves two purposes:
a) it provides a lifetime stable IUnknown for an XML node
b) it manages references
each xmlNode can carry a node_unk reference in its _private member.
If there is an interface pointer to the node floating around, the node
does have an associated node_unk object which is created and destroyed
lazily. Lazy allocation means: node_unk objects are created when first
needed, not at XML document creation time. The main advantage of it is
that one does not need to manually create the node_unk objects at each
place a xmlNode could be created.
Lazy destruction means that the node_unk object is *not* freed as soon
as its reference count drops down to zero, because purpose a) wouldn't
be fulfilled that way. A node_unk with no references are said to be in
Reference counting is done this way:
1) Parents do *not* hold references on their children
2a) If the child is not in standby mode, a reference is held in the parent.
2b) If there is no parent, a reference on the document is held instead.
2c) The document node is exempt from rule 2b), it doesn't hold a reference.
Rule 1 is obvious to prevent cyclic references. As children do not
destroy themselves on their own, there is no danger of stale child
Each node needs to indirectly reference its document, as strings like
node names can be stored in per-document dictionaries in libxml2. For nodes
being part of the document tree, this is accomplished by each node
referencing its parent via 2a, up to the root element, whose parent
is the document itself.
There is a complication: A document is fact is not a tree, but a forest!
The forest has one main tree, its root being the "document element", but
can have many further trees consisting of nodes created by createElement,
createText and alike functions, or by removing some subtrees from the
document tree, for example by removeChild or replaceChild. These trees are
called orphan trees, as they don't have a parent, opposed to the "document
element" whose parent is the document itself. The root node of the orphan
tree (having no parent) is holding a reference on the document by rule 2b.
As there is no way to get a reference to a node in an orphan tree except
for having a reference to another node in the same orphan tree, orphan
trees can be destroyed as soon as its root node enters standby mode.
This has the consequence that the document doesn't need to know about
the orphan trees itself, it just needs to know whether there are any of
them to prevent early destruction. This is implemented by the reference
from rule 2b. On the other hand, one can not free the main document tree
as long as any orphan tree is alive, as each orphan tree provides a
reference to the document which in turn provides a reference to its main
Finally, all nodes you can access from the document node are members
of the main document tree, which is kept alive as subtree of the document
node, so the document node does not have to hold the reference on anything.
So, the implementation works like this:
As soon, as the windows program asks for an interface pointer into some
XML document (we don't care about the source here), the node is equipped
with its node_unk structure, and so get all its parents. While it would be
possible to delay the allocation to the AddRef call, the AddRef call has
no way to indicate failure in the out-of-memory case. If the application
asks for further interface pointers into the same document, chains of
node_unk objects are created until a branch is met that already has them;
at that point, the reference count is just increased by one on that level
without further up-propagation.
The document itself (the parent of the document element) keeps out of
standby mode until all interface pointers to the document vanished.
At that point, the document frees itself and all children. Root node
of orphan trees work the same way.
The most interesting points are not covered yet: Subtrees can be spliced
from one document into another one, like with IXMLDOMNode::appendChild or
IXMLDOMNode::replaceChild. In that case, the simple old approach where
each IXMLDOMFoo interface held a reference on its document breaks down, as
it is unknown how many IXMLDOMFoo pointers are out there that get moved to
a different document behind their back. With this model, it is quite
simple. A tree that gets spliced out has either zero or one reference on
its direct parent, depending on whether it is in standby mode or not. A
tree that gets spliced in takes exactly one reference if it is not in
standby mode. As an added bonus, we get orphan tracking and early freeing
of orphans (a feature we didn't have in the old model) for free.
More information about the wine-devel