RFC: Implement XML loading in msxml3 directly from IStream

Michael Karcher wine at mkarcher.dialup.fu-berlin.de
Wed Oct 22 04:50:11 CDT 2008


Am Freitag, den 17.10.2008, 00:46 +0200 schrieb Michael Karcher:
> If libxml2 is new enough, it is possible to parse directly from the
> stream using xmlReadIO to avoid copying the whole stream contents to
> memory.
Testing native, it looks like this is the right approach. Piotr Caban
asked me whether it works with asynchronous streams. MSDN confusingly
describes three types of streams, two of them being equal to the user:
 - Synchronous streams:
  Have low latency read calls, no network access needed after creation.
  Examples are Files and Memory blocks. No short reads (read returning
  less characters than requested).
 - Blocking asynchronous streams:
  May have high latency read calls, but read never fails with E_PENDING.
  The contents of the stream may not be locally available before Read
  is called on the respective part of data. Examples are downloading
  streams (either a direct http download or a downloaded compound
  document as asynchronous storage).
 - Non-blocking asynchronous streams:
  Have low latency read calls. If no data is available, read fails with
  E_PENDING. They are useless on there own, as the pull model doesn't
  work without polling for data and they don't provide a push model (in
  case of reading, opposite in case of writing). They are used in
  asynchronous binding where a IBindStatusCallback of the consumer
  provides the needed push model.
[see http://msdn.microsoft.com/en-us/library/aa768185(VS.85).aspx,
"Storage of Control persistent Data"]

The consumer of a stream typically doesn't care whether the stream is
synchronous or blocking asynchronous. The IPersistStream interface of
DOMDocument behaves funny when receiving a non-blocking asynchronous
stream: It reads until two consecutive E_PENDING results, then returns
with S_OK, and readyState being INTERACTIVE, which is probably the
correct behaviour for non-blocking asynchronous loads, but the native
implementation does not seem to try to obtain any push capability from
the stream, so there seems to be no possibility to push further data
into the object.

I suspect that this is a kind of "don't do this" undefined behaviour.

Another issue that comes up is that with the asynchronous loading via
IPersistMoniker, while the document is being loaded, native shows a tree
progressively getting filled in its DOM. It should be read-only (didn't
test that yet), and provides call-backs for the event of receiving
further data (didn't test them yet, too). libxml2 does not seem to
provide an incremental tree builder (even with the push interface), so
it looks like we need our own tree-building code based on libxml2's SAX
interface. Any other ideas?

> I attached two diff files,
>  - load-via-readio.diff shows the way that calls stream callbacks from
> libxml2
Looks like the way to go.

>  - load-use-copyto.diff uses the CopyTo function, but still copies.
Bad idea. Some applications might put in a dumb stream that does not
implement CopyTo.

Regards,
  Michael Karcher




More information about the wine-devel mailing list