[Bug 29685] Manifest generator (ecmangen) tool from Windows Platform SDK 7.1 crashes due to unhandled facet/regular expression in XML schema (escape sequence)

WineHQ Bugzilla wine-bugs at winehq.org
Wed May 26 10:06:22 CDT 2021


https://bugs.winehq.org/show_bug.cgi?id=29685

Damjan Jovanovic <damjan.jov at gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |damjan.jov at gmail.com

--- Comment #6 from Damjan Jovanovic <damjan.jov at gmail.com> ---
That <xs:pattern> snippet can be used to reproduce this bug with the command
line "xmllint" tool. Here's a quickly cobbled together example.

x.xml:

---snip---
<?xml version="1.0"?>
<note
xmlns="https://www.w3schools.com"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="file:///tmp/x.xsd">
  <strTableRef>$(string.a)</strTableRef>
</note>
---snip---

/tmp/x.xsd:
(uncommented line broken, commented line below it working)

---snip---
<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="https://www.w3schools.com"
xmlns="https://www.w3schools.com"
elementFormDefault="qualified">
<xs:element name="note">
  <xs:complexType>
    <xs:sequence>
      <xs:element name="strTableRef">
        <xs:simpleType>
           <xs:restriction base="xs:string">
             <xs:pattern value="(\$\([Ss]tring\..*\))|(\$\([Mm][Cc]\..*\))"/>
<!--         <xs:pattern value="($\([Ss]tring\..*\))|($\([Mm][Cc]\..*\))"/> -->
           </xs:restriction>
        </xs:simpleType>
      </xs:element>
    </xs:sequence>
  </xs:complexType>
</xs:element>
</xs:schema>
---snip---

And to test:

---snip---
$ xmllint --schema /tmp/x.xsd x.xml --noout
regexp error : failed to compile: Wrong escape sequence, misuse of character
'\'
regexp error : failed to compile: internal: no atom generated
regexp error : failed to compile: generate transition: atom == NULL
regexp error : failed to compile: xmlFAParseAtom: expecting ')'
regexp error : failed to compile: xmlFAParseRegExp: extra characters
x.xsd:13: element pattern: Schemas parser error : Element
'{http://www.w3.org/2001/XMLSchema}pattern': The value
'(\$\([Ss]tring\..*\))|(\$\([Mm][Cc]\..*\))' of the facet 'pattern' is not a
valid regular expression.
WXS schema x.xsd failed to compile
---snip---

Swap the commented and uncommented lines in /tmp/x.xsd around, and:

---snip---
$ xmllint --schema /tmp/x.xsd x.xml --noout
x.xml validates
---snip---

Note the problem: MSXML is apparently ok with "\$" in the regex, but to libxml2
that's an error, it never allows "$" after a "\".

As per Appendix F of
https://www.w3.org/TR/2004/REC-xmlschema-2-20041028/datatypes.html#regexs it
seems that "\$" really shouldn't be allowed, but MSXML allows it anyway; ie.
MSXML uses a non-conforming regex dialect where additional characters are
allowed.

Bug #43581 has the same issue with various "\u####" regex sequences and could
be considered a duplicate of this one.

While for this we could make a regex parser that rewrites MSXML's dialect into
libxml2's, regex in XML sadly doesn't end at schema validation, eg. XSLT 2 uses
it as well (https://www.xml.com/pub/a/2003/06/04/tr.html). If MSXSML uses the
same regex dialect for other things, and we can't change that regex in transit
between its origin and libxml2, then we may need to ship a private fork of
libxml2 patched to use MSXML's dialect internally.

-- 
Do not reply to this email, post in Bugzilla using the
above URL to reply.
You are receiving this mail because:
You are watching all bug changes.


More information about the wine-bugs mailing list