Unicode normalization for Wine

Wed Jul 26 07:38:01 CDT 2017

Hello,

On 7/25/17 4:33 PM, Artur Świgoń wrote:
> Dear All,
> 
> My name is Artur and I'm participating in Google Summer of Code 2017 for Wine.
> Under Nikolay's supervision, I'm working on implementation of Unicode
> normalization. I probably should have introduced myself some time ago to share
> results of my research and my ideas, but I also wanted to wait until I could
> illustrate my points with some code.
> 

Very cool! This is a problem I ran into with Japanese unicode string comparisons a while ago so it is great it will be addressed! Then we will have to investigate the CompareStringW, and family, behavior.

> - Mappings for characters above 0xFFFF are encoded as UTF-16 (using surrogate
>    pairs), but a single codepoint (UTF-32 if you like) is used for table
>    indexing. Setting $utflim in make_unicode to 65536 is the simplest way to
>    disable support for such characters, but supporting surrogate pairs should
>    not affect any text-related Wine component in a negative way.
> 

There is some super basic work on non-BMP unicode glyphs and surrogate pairs in Uniscribe (usp10).  I wrote a quick decode_surrogate_pair() function to help get a DWORD unicode value for the surrogate pair. So you can look at that if you are interested!

Thanks!
-aric