WM_GETTEXTLENGTH returns double size
Shachar Shemesh
wine-devel at sun.consumer.org.il
Wed Jun 12 08:53:22 CDT 2002
I am not sure about the specific case, but I do have some experience
with handling DBCS in general.
When using TCHAR and defining MBCS (which is the default with VCC - MS
doing something nice for a change) the result (if my memory serves me
correctly) is an unsigned char. This means that it is the same size as a
regular char.
The thing to understand when working with MBCS is that a single byte
does not necessarily mean a single character. You get a stream of bytes,
some will be 1 byte/character, and some 2.
You are guaranteed against NULL and new line being misrepresented. For
that reason alone most byte by byte processing will work on MBCS without
a problem. If you are doing no string processing at all, you can simply
ignore the MBCS possibility at all.
Things do become messy if you want to either work on a character based
calculations (i.e. - I have 7 characters in the string, despite it being
10 bytes long), if you are looking for a particular character ('\' is a
nasty example), or if you want to traverse the string backwards.
Traversing a MBCS string is akin to a forward iterator in STL. You have
a macro (isleadbyte, IIRC) that lets you know whether the next byte is
alone or part of a double byte. You are allowed to save the pointer and
return to it, but when traversing the string backwards, it is very
difficult for you to know whether the previous byte is a single
character or not.
Another problem is that the second byte of an MBCS character may be
something you will find interesting on its own. Like I said before, one
nasty example is when parsing a path and looking for '\' separators.
There are some Japanese characters that, when coded in MBCS, result is
two bytes, the second one being '\'. When the proper locale is loaded,
Windows knows not to treat this '\' as a directory separator, but your
programs may fail to do so (does wine?).
These are the main issues when working with MBCS. I hope I have managed
to help.
Shachar
Andriy Palamarchuk wrote:
>This happens in code which unmaps message, mapped from
>ASCII to Unicode.
>See windows/winproc.c, function
>WINPROC_UnmapMsg32ATo32W:
>
> case WM_GETTEXTLENGTH:
> case CB_GETLBTEXTLEN:
> case LB_GETTEXTLEN:
> /* there may be one DBCS char for each Unicode
>char */
> return result * 2;
>
>What is the correct way to handle double-byte
>characters in this situation?
>How Windows handles this?
>At least can we return double values when system
>metrics SM_DBCSENABLED is true? We could have a switch
>in the config file for this system metrics.
>
>I came across this issue when used default combo box
>control implementation in Delphi 6.
>I assume the same issue also exists for edit controls.
>The returned length is correct if I comment out the
>code above.
>
>Existing behavior is a possible cause of bug in
>entering serial numbers - when
>cursor jumps to the next edit field when only half of
>text is entered.
>
>Thanks,
>Andriy
>
>__________________________________________________
>Do You Yahoo!?
>Yahoo! - Official partner of 2002 FIFA World Cup
>http://fifaworldcup.yahoo.com
>
>
>
More information about the wine-devel
mailing list