WM_GETTEXTLENGTH returns double size

Shachar Shemesh wine-devel at sun.consumer.org.il
Wed Jun 12 08:53:22 CDT 2002


I am not sure about the specific case, but I do have some experience 
with handling DBCS in general.

When using TCHAR and defining MBCS (which is the default with VCC - MS 
doing something nice for a change) the result (if my memory serves me 
correctly) is an unsigned char. This means that it is the same size as a 
regular char.

The thing to understand when working with MBCS is that a single byte 
does not necessarily mean a single character. You get a stream of bytes, 
some will be 1 byte/character, and some 2.

You are guaranteed against NULL and new line being misrepresented. For 
that reason alone most byte by byte processing will work on MBCS without 
a problem. If you are doing no string processing at all, you can simply 
ignore the MBCS possibility at all.

Things do become messy if you want to either work on a character based 
calculations (i.e. - I have 7 characters in the string, despite it being 
10 bytes long), if you are looking for a particular character ('\' is a 
nasty example), or if you want to traverse the string backwards.

Traversing a MBCS string is akin to a forward iterator in STL. You have 
a  macro (isleadbyte, IIRC) that lets you know whether the next byte is 
alone or part of a double byte. You are allowed to save the pointer and 
return to it, but when traversing the string backwards, it is very 
difficult for you to know whether the previous byte is a single 
character or not.

Another problem is that the second byte of an MBCS character may be 
something you will find interesting on its own. Like I said before, one 
nasty example is when parsing a path and looking for '\' separators. 
There are some Japanese characters that, when coded in MBCS, result is 
two bytes, the second one being '\'. When the proper locale is loaded, 
Windows knows not to treat this '\' as a directory separator, but your 
programs may fail to do so (does wine?).

These are the main issues when working with MBCS. I hope I have managed 
to help.

                    Shachar


Andriy Palamarchuk wrote:

>This happens in code which unmaps message, mapped from
>ASCII to Unicode.
>See windows/winproc.c, function
>WINPROC_UnmapMsg32ATo32W:
>
>    case WM_GETTEXTLENGTH:
>    case CB_GETLBTEXTLEN:
>    case LB_GETTEXTLEN:
>        /* there may be one DBCS char for each Unicode
>char */
>        return result * 2;
>
>What is the correct way to handle double-byte
>characters in this situation?
>How Windows handles this?
>At least can we return double values when system
>metrics SM_DBCSENABLED is true? We could have a switch
>in the config file for this system metrics.
>
>I came across this issue when used default combo box
>control implementation in Delphi 6.
>I assume the same issue also exists for edit controls.
>The returned length is correct if I comment out the
>code above.
>
>Existing behavior is a possible cause of bug in
>entering serial numbers - when
>cursor jumps to the next edit field when only half of
>text is entered.
>
>Thanks,
>Andriy
>
>__________________________________________________
>Do You Yahoo!?
>Yahoo! - Official partner of 2002 FIFA World Cup
>http://fifaworldcup.yahoo.com
>
>  
>





More information about the wine-devel mailing list