Help with LCM_Unicode_LUT in ole2nls.c wanted

Ove Kaaven ovehk at ping.uio.no
Sun May 5 11:21:02 CDT 2002


On Sun, 5 May 2002, Uwe Bonnes wrote:

> looking at LCMapStringW, I think we need some table like the
> LCM_Unicode_LUT[] table. However 
> - I don't understand where the values come from. Odd values seem to be a
>   collation of flags, even values to be some character weight and
>   LCM_Diacritic_LUT[] is some weight for the diacritic. 

Pretty much. The first value in the Unicode_LUT pairs seems to be what
I've previously identified (in my reverse engineering of cp_xxx.nls files)
as the sort class, the second as the sort weight, and the Diacritic_LUT is
the diacritic weight. (Case weight also exist; it is not in those tables,
but the case weight is pretty much isupper(x) ? 18 : 2, so no table is
used there.)

Sort classes I've identified before:
2 = decomposed sort (e.g. "ß" is sorted as "ss", "þ" is sorted as "th")
(sort weight is used as index into decomposition table in cp_xxx.nls)
6 = control characters, hyphens (stuff that's ignored if SORT_STRINGSORT
is not specified)
7 = separators
8 = math symbols
10 = symbols
12 = numbers
14 = letters

All weights and classes start on 2 simply because they're used in sort
keys generated by LCMapString, which is a string where 0 is the
null-terminator and 1 is the field-separator.

> - Do the tables in ../wine/unicode somehow contain enough information to
>   generate these tables?

The UnicodeData.txt you can get from ftp.unicode.org contains data that
you can use for the sort class, case weight, and maybe diacritic weight,
but not sort weight, since that's locale-dependent; you need a sort table
for each locale. (I think Windows deals with it by having a big table of
default sort weights, then each locale has a table of "exceptions" that's
patched into the big table at run-time...)

Unfortunately, I'm not aware of a source for such sort weight data.




More information about the wine-devel mailing list