<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <p>On 12/03/18 06:03, Nikolay Sivov wrote:<br>

    </p>

    <blockquote type="cite"

      cite="mid:dd5f8d0b-dd0b-0bb1-e033-b543baa5402c@gmail.com">

      <pre wrap="">On 3/12/2018 12:25 PM, Huw Davies wrote:

</pre>

      <blockquote type="cite">

        <pre wrap="">

</pre>

        <blockquote type="cite">

          <pre wrap="">+                           LPARAM sort_handle)

+{

+

+    DWORD mask = flags;

+

+    TRACE("%s %x %s %d %s %d %p %p %p %ld\n", wine_dbgstr_w(localename), flags,

+          wine_dbgstr_w(src), src_size, wine_dbgstr_w(value), value_size, found,

+          version_info, reserved, sort_handle);

+    FIXME("strings should be normalized once NormalizeString() is implemented\n");

</pre>

        </blockquote>

        <pre wrap="">

I don't think we want the noise that this FIXME would generate.  Just add a comment.

</pre>

      </blockquote>

      <pre wrap="">

Actually it might be possible that CompareString() handles decomposed

case on its own, I haven't tested that.

</pre>

    </blockquote>

    <br>

    Yeah, you are right Nikolai; I just tested on Windows and it seems

    that CompareString() shares the same comparison semantics with

    FindNLSStringEx(). On Wine it fails, however, so I guess I'd code

    FindNLSStringEx() assuming a working CompareString(), and then see

    what is missing there.<br>

    I actually had it like this in my first patch, relying on

    CompareString (assuming the shared semantics). I wanted to normalize

    first in this v2 patch so that the substring search would be worst

    case o(n) instead of o(n.m). However, reading the Unicode standard,

    it seems that I can make some assumptions about the maximum

    expansion factor in decomposition (when assuming canonical

    decomposition).<br>

    <br>

    <i>"There is also a Unicode Consortium stability policy that

      canonical mappings are always limited in all versions of Unicode,

      so that no string when decomposed with NFC expands to more than 3×

      in length (measured in code units). This is true whether the text

      is in UTF-8, UTF-16, or UTF-32. This guarantee also allows for

      certain optimizations in processing, especially in determining

      buffer sizes"<br>

      <br>

    </i>Although it seems that the worst case possible is a 18x

    expansion factor when using normalization form NFKC, it looks like

    these functions only test for canonical equivalence, so I guess it

    would be ok to assume a worst case of 3x for the length to keep

    things o(n).<br>

    <br>

    Does this sound right to you?<br>

    <i></i>

  </body>

</html>