<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<p>On 12/03/18 06:03, Nikolay Sivov wrote:<br>
</p>
<blockquote type="cite"
cite="mid:dd5f8d0b-dd0b-0bb1-e033-b543baa5402c@gmail.com">
<pre wrap="">On 3/12/2018 12:25 PM, Huw Davies wrote:
</pre>
<blockquote type="cite">
<pre wrap="">
</pre>
<blockquote type="cite">
<pre wrap="">+ LPARAM sort_handle)
+{
+
+ DWORD mask = flags;
+
+ TRACE("%s %x %s %d %s %d %p %p %p %ld\n", wine_dbgstr_w(localename), flags,
+ wine_dbgstr_w(src), src_size, wine_dbgstr_w(value), value_size, found,
+ version_info, reserved, sort_handle);
+ FIXME("strings should be normalized once NormalizeString() is implemented\n");
</pre>
</blockquote>
<pre wrap="">
I don't think we want the noise that this FIXME would generate. Just add a comment.
</pre>
</blockquote>
<pre wrap="">
Actually it might be possible that CompareString() handles decomposed
case on its own, I haven't tested that.
</pre>
</blockquote>
<br>
Yeah, you are right Nikolai; I just tested on Windows and it seems
that CompareString() shares the same comparison semantics with
FindNLSStringEx(). On Wine it fails, however, so I guess I'd code
FindNLSStringEx() assuming a working CompareString(), and then see
what is missing there.<br>
I actually had it like this in my first patch, relying on
CompareString (assuming the shared semantics). I wanted to normalize
first in this v2 patch so that the substring search would be worst
case o(n) instead of o(n.m). However, reading the Unicode standard,
it seems that I can make some assumptions about the maximum
expansion factor in decomposition (when assuming canonical
decomposition).<br>
<br>
<i>"There is also a Unicode Consortium stability policy that
canonical mappings are always limited in all versions of Unicode,
so that no string when decomposed with NFC expands to more than 3×
in length (measured in code units). This is true whether the text
is in UTF-8, UTF-16, or UTF-32. This guarantee also allows for
certain optimizations in processing, especially in determining
buffer sizes"<br>
<br>
</i>Although it seems that the worst case possible is a 18x
expansion factor when using normalization form NFKC, it looks like
these functions only test for canonical equivalence, so I guess it
would be ok to assume a worst case of 3x for the length to keep
things o(n).<br>
<br>
Does this sound right to you?<br>
<i></i>
</body>
</html>