[PATCH v3 3/3] xmllite: Expand test for any unparsed data at end of XML.

Sat Dec 7 00:15:42 CST 2019

On Fri, Dec 6, 2019 at 4:19 PM Nikolay Sivov <nsivov at codeweavers.com> wrote:
>
> On 12/7/19 12:24 AM, Jeff Smith wrote:
> > On Fri, Dec 6, 2019 at 11:16 AM Nikolay Sivov <nsivov at codeweavers.com> wrote:
> >> On 12/5/19 10:53 PM, Jeff Smith wrote:
> >>> @@ -2662,7 +2663,7 @@ static HRESULT reader_parse_nextnode(xmlreader *reader)
> >>>                hr = reader_parse_misc(reader);
> >>>                if (hr != S_FALSE) return hr;
> >>>
> >>> -            if (*reader_get_ptr(reader))
> >>> +            if (buffer->cur*sizeof(WCHAR) < buffer->written)
> >>>                {
> >>>                    WARN("found garbage in the end of XML\n");
> >>>                    return WC_E_SYNTAX;
> > Hi Nikolay,
> >
> >> That means we don't have enough data,
> > How do you figure that?
> >
> >> it's another change not backed by tests
> > This fixes two tests, and does not break any others.
> >
> >> and potentially depending on current read-ahead buffer size/filled level.
> > I'm pretty sure reader_parse_misc would have read at least one byte
> > ahead, which is all that is required for this to trigger, though I
> > could double-check that.
> > However, to your point made in the patch 2 of the set about not
> > exposing the buffer at this level, I will also consider this something
> > that potentially needs to be handled elsewhere.
> My point is that we should always hit this single invalid syntax/garbage
> at the end condition that we already have,

That garbage-at-the-end condition, as it exists, is explicitly NOT
triggered by a null character, but it should.
So what we have here currently is not sufficient.
While there may be cases that my patch does not cover, based on
existing test cases, it is an improvement.

> instead of doing fixups for specific node types.

On Windows, the context in which the null character is encountered is
significant.
For instance, if any character other than '<' is encountered at the
end of a whitespace sequence, it raises the syntax error without
returning a Whitespace node.
So we need to catch the invalid character and interrupt before a
Whitespace node is returned.