KERNEL32: add a test case for CompareStringW undocumented flag
wine-devel at shemesh.biz
Mon Nov 22 02:49:28 CST 2004
Dmitry Timoshkov wrote:
>"Mike McCormack" <mike at codeweavers.com> wrote:
>>The flag (0x10000000) passed to CompareString reverse the sort order of
>>a number of unicode characters. I've got no idea why it would want to
>>do that... maybe somebody can shed some light on what the reason behind
>>this would be?
>Just a shot in the dark: perhaps the flag is supposed to force CompareString
>to make character reordering first (taking into account bidirectional layout)
>and only then do an actual string comparison?
A. BiDi strings are compared in logical order. Reordering is just about
the last thing you do before display.
B. This changes the greater than/less than semantics, not the order.
Then again, this does not make sense any way you turn it.
The regression test shows that table 1 is lower than table 2 with the
flag, higher without. Let's look at it:
table 1 has three characters in various forms. These are Arabic
"Shadda", and two CJK marks (prolonged sound mark and iteration and
sound iteration marks).
table 2 has quite a bit of characters. Taking the range I know well
(Hebrew - U0590-U05FF), it has all the diacritics and "Ta'amim" marks.
This makes no sense. I even asked on the Arabeyes project's IRC channel.
Comparing Arabic and Hebrew is totally meaningless. The other characters
in table 2 belong to the following languages:
Spacing modifier (U02b0-U02ff), Combining diacritical marks
(U0300-U036f), Greek (three characters, all marked as "reserved"),
Cyrillic, Devanagari, Bengali, Gurmukhi, and that's where I gave up on
looking up the names of languages I didn't even know existed. I will
mention "Combining marks for symbols", though, which I think is crucial
to understanding this.
I will also mention "CJK Symbols and Punctuation", range U3000-U303F,
and the Hiragana range U3040-U309F.
Now, here's the thing. ALL the symbols in table 2 are diacritic or
punctuation symbols written either below or above the letter. They are
combining marks which do not change the letter's width. On the other
hand, Shadda in Arabic means to double the pronunciation of the
character it combines with. In other words, this undocumented flag means
that letters that are doubled in Arabic should come after other
languages' diacritics. It's still apples and oranges, but maybe we have
a clue as to "why". Does anyone here know what the CJK marks mean?
Out of interest, could it be that U0A01 needs to be added to table 2? If
so, we may have a solution to what this flag means. Mike, can you test
Lingnu Open Source Consulting ltd.
More information about the wine-devel