Debugging string comparison problem

Dmitry Timoshkov dmitry at codeweavers.com
Wed Jun 28 02:44:24 CDT 2006

```----- Original Message -----
From: "Juan Lang" <juan_lang at yahoo.com>
To: <wine-devel at winehq.org>
Sent: Wednesday, June 28, 2006 12:20 AM
Subject: Debugging string comparison problem

> I'm trying to figure out why CompareStringA returns CSTR_EQUAL for the strings "\1" and "\2".  (See bug 5469, and the
> todo_wine test case in dlls/kernel/tests/locale.c)
>
> CompareStringA does the usual thing, calls MultiByteToWideChar and calls CompareStringW.  So CompareStringW is
> comparing L"\0001" to L"\0002".
>
> CompareStringW calls wine_compare_string, in libs/unicode/sortkey.c  That calls compare_unicode_weights.  That has
> this little bit of code:
>        ce1 = collation_table[collation_table[*str1 >> 8] + (*str1 & 0xff)];
>        ce2 = collation_table[collation_table[*str2 >> 8] + (*str2 & 0xff)];
>
> With the strings L"\0001" and L"\0002", *str1 is 0x0001, and *str2 is 0x0002.  So *str1 >> 8 is 0, and *str2 >> 8 is
> 0.  *str1 & 0xff is 0x01, *str2 & 0xff is 0x02.  So, ce1 == collation_table[1], which is 0x00000300 (in collation.c),
> and ce2 == collation_table[2], which is 0x00000400.
>
> That gets us here:
>        if (ce1 != (unsigned int)-1 && ce2 != (unsigned int)-1)
>            ret = (ce1 >> 16) - (ce2 >> 16);
>        else
>            ret = *str1 - *str2;
>
> Well, 0x00000300 >> 16 is 0, and so is 0x00000400, so ce1 - ce2 is 0, so these strings are considered equal.  But as
> the test case shows, they're not supposed to be.
>
> I'm just not sure what to do about it.  Changing collation.c isn't really
> an option, since it's generated.  So there's some flaw in the logic here,
> but I don't understand the meaning of collation_table.  Could someone explain
> to me what it is?

That's really a problem with collation.c, or rather with the it's been generated
from www.unicode.org/reports/tr10/allkeys.txt. There are a lot of differences
between that file and microsoft's implementation. We have some hacks in Crossover
to compensate it, and to do so what I did is just fixed up the allkeys.txt from
unicode.org and regenerated collation.c.

--
Dmitry.

```