KERNEL32: add a test case for CompareStringW undocumented flag 0x10000000

Shachar Shemesh wine-devel at shemesh.biz
Mon Nov 22 02:49:28 CST 2004


Dmitry Timoshkov wrote:

>"Mike McCormack" <mike at codeweavers.com> wrote:
>
>  
>
>>The flag (0x10000000) passed to CompareString reverse the sort order of 
>>a number of unicode characters.  I've got no idea why it would want to 
>>do that... maybe somebody can shed some light on what the reason behind 
>>this would be?
>>    
>>
>
>Just a shot in the dark: perhaps the flag is supposed to force CompareString
>to make character reordering first (taking into account bidirectional layout)
>and only then do an actual string comparison?
>
A. BiDi strings are compared in logical order. Reordering is just about 
the last thing you do before display.
B. This changes the greater than/less than semantics, not the order.

Then again, this does not make sense any way you turn it.
The regression test shows that table 1 is lower than table 2 with the 
flag, higher without. Let's look at it:
table 1 has three characters in various forms. These are Arabic 
"Shadda", and two CJK marks (prolonged sound mark and iteration and 
sound iteration marks).
table 2 has quite a bit of characters. Taking the range I know well 
(Hebrew - U0590-U05FF), it has all the diacritics and "Ta'amim" marks.

This makes no sense. I even asked on the Arabeyes project's IRC channel. 
Comparing Arabic and Hebrew is totally meaningless. The other characters 
in table 2 belong to the following languages:
Spacing modifier (U02b0-U02ff), Combining diacritical marks 
(U0300-U036f), Greek (three characters, all marked as "reserved"), 
Cyrillic, Devanagari, Bengali, Gurmukhi, and that's where I gave up on 
looking up the names of languages I didn't even know existed. I will 
mention "Combining marks for symbols", though, which I think is crucial 
to understanding this.

I will also mention "CJK Symbols and Punctuation", range U3000-U303F, 
and the Hiragana range U3040-U309F.

Now, here's the thing. ALL the symbols in table 2 are diacritic or 
punctuation symbols written either below or above the letter. They are 
combining marks which do not change the letter's width. On the other 
hand, Shadda in Arabic means to double the pronunciation of the 
character it combines with. In other words, this undocumented flag means 
that letters that are doubled in Arabic should come after other 
languages' diacritics. It's still apples and oranges, but maybe we have 
a clue as to "why". Does anyone here know what the CJK marks mean?

Out of interest, could it be that U0A01 needs to be added to table 2? If 
so, we may have a solution to what this flag means. Mike, can you test 
it out?

          Shachar

-- 
Shachar Shemesh
Lingnu Open Source Consulting ltd.
http://www.lingnu.com/




More information about the wine-devel mailing list