Is W really UTF-16?

Bill Medland medbi01_1 at accpac.com
Wed Jan 9 16:01:09 CST 2002


Thanks for the repsonse

"Ove Kaaven" <ovehk at ping.uio.no> wrote in message
news:Pine.LNX.4.21.0201092154570.4461-100000 at mizar.ping.uio.no...
>
> On Wed, 9 Jan 2002, Bill Medland wrote:
>
> > While I was working on the DrawText functions over the past many months
I
> > started wondering about when it would fail.  (I'm pedantic and such
things
> > fascinate me!).  The main concern I have is how to walk a W string
> > correctly.  For example while "ellipsifying" text we will need to "move
the
> > pointer to the previous character" which is currently done by
decrementing
> > the pointer by 1.  But from what I currently understand that won't work
if
> > there are surrogate pairs.
>
> If you're concerned about that, surrogate pairs are the least of your
> worries. You should also be concerned about Unicode combining (or
> composite) characters. I think they might be identified with ctypes
> C3_NONSPACING and C3_DIACRITIC and that kind of stuff...

Good point.

>
> > 1. Does anyone know under what circumstances CharNextW isn't +1 (apart
from
> > when pointing at the terminating 0)
>
> Have you tried low surrogate followed by high surrogate, on a Microsoft OS
> recent enough that Microsoft *might* have thought about preparing it for
> dealing with surrogates?

Well, that's the complication.  I am lazy so I don't fancy the work involved
in learning enough to put together a font that actually uses a surrogate
pair so that I can test it with ExtTextOut, which is why I took the easy
route of assuming that was what CharNextW was for.  I guess I'll have to do
the hard work since that family of functions are the ones I have seen that
seem to suggest they are UTF-16 compatible.

>
> > 2. Is e.g. XP really using UTF-16 or is it actually still UCS2?
>
> I don't know. But it probably ought to be UTF16.
>
> > 3 Have we thought about how we should handle walking along a W string
(in a
> > fashion that doesn't reduced the speed to a crawl).  I guess that in the
> > short term I am expecting some sort of macro or inline.
>
> With p++, perhaps? There aren't very many circumstances where that is
> going to be a problem (where unicode composite characters are not also),
> is there?

No, so they should both probably be handled together.

Ah well, when I am next in there I'll probably just add a FIXME UTF-16
comment or something.

Thanks again







More information about the wine-devel mailing list