FW: locales, unicode and ansi with msvcrt (bug 8022)

Eric Pouech eric.pouech at wanadoo.fr
Tue Apr 17 08:32:48 CDT 2007


Ann & Jason Edmeades a écrit :
> (Meant to copy wine-devel as well, in case anyone else had any comments)
>
> Subject: RE: locales, unicode and ansi with msvcrt (bug 8022)
>
>   
>>> My current plan, unless you have strong objections, is to make the
>>>       
> wprintf
>   
>>> msvcrt routines use WideCharToMultiByte on the string into the GetACP
>>> codepage before being written out, and add file tests for this into the
>>> msvcrt testsuite
>>>       
>
>   
>> You shouldn't really guess, you need to investigate how the things are
>>     
> supposed
>   
>> to work in Windows. IMO you are going wrong route by looking how msvcrt
>>     
> works
>   
>> instead of looking how cmd.exe does. Did you try to run native cmd.exe in
>>     
> WIne
>   
>> and see how it handles cp conversions, or under logger.exe in Windows?
>>     
>
> Firstly, leave cmd.exe out of the equation. This has nothing to do with it
> at all. Just to emphasize, I am looking at the part of the bug where xcopy
> was writing out characters padded with gaps (nulls) which is caused by it
> simply calling msvcrt's wprintf. 
>
> I can recreate the problem with a simple, one line console application run
> through wine (not cmd.exe nor wineconsole):
>
> void main() { wprintf(L"test with wprintf"); }
> [program was compiled VC6 : cl /MD test.c ]
>
> wine test >a.a
> od -x a.a
> 0000000 0074 0065 0073 0074 0020 0077 0069 0074
> 0000020 0068 0020 0077 0070 0072 0069 006e 0074
> 0000040 0066
>
> export WINEDLLOVERRIDES=msvcrt=n
> wine test >a.a
> od -x a.a
> 0000000 6574 7473 7720 7469 2068 7077 6972 746e
> 0000020 0066
>
> Shows wine internal msvcrt differs from windows msvcrt and incorrectly
> outputs Unicode
>
>   
>>> I'm also guessing relay and snoop cant see internal dll calls, so that
>>>       
> might
>   
>>> explain the lack of calls. Perhaps something like wcstombs would be the
>>>       
> key
>   
>>> to this
>>>       
>
>   
>> Even if internal dll calls are not logged, wcstombs does an external call
>>     
> to do
>   
>> its job.
>>     
>
> What makes you say that? It potentially doesn't need to, given the internal
> caching it does in process_attach, it could do it all in memory. However...
>
> WINEDEBUG=+all,+relay,+snoop wine test 2>a.a
> a.a (removing Set/GetLastError and Tls call) shows:
>
> 0009:CALL msvcrt.wprintf(00403010) ret=0040100d
> 0009:Call ntdll.RtlAllocateHeap(00410000,00000000,00001000) ret=77c2c3c9
> 0009:trace:heap:RtlAllocateHeap (0x410000,00000002,00001000): returning
> 0x412168
> 0009:Ret  ntdll.RtlAllocateHeap() retval=00412168 ret=77c2c3c9
> 0009:Call kernel32.WriteFile(00000008,0033f9c8,00000011,0033f9ac,00000000)
> ret=77c30218
> 0009:trace:file:WriteFile 0x8 0x33f9c8 17 0x33f9ac (nil)
> 0009:trace:ntdll:NtWriteFile
> (0x8,(nil),(nil),(nil),0x33f8b8,0x33f9c8,0x00000011,(nil),(nil))!
> 0009: get_handle_fd( handle=0x8, access=00000002, cached=1 )
> 0009: get_handle_fd() = 0 { type=1, flags=0 }
> 0009:Ret  kernel32.WriteFile() retval=00000001 ret=77c30218
> 0009:RET  msvcrt.wprintf() retval=00000011 ret=0040100d
>
> Note the writefile call - it passed 0x11 bytes (number of bytes in the
> narrow string)
>
> Logger shows no calls from msvcrt during that time, but I don't trust it. 
>
> Running test program under windbg and trying to follow it through it does
> appear wctomb is called (it does NOT result in WideCharToMultiByte, possibly
> due to the characters or locale I am using?) but it does it on a character
> by character basis (I believe during the formatting). This would be the
> equivalent on wine to calling pf_vsnprintf with out.unicode set to false.
>
> (I don't know why it doesn't come up on snoop - maybe it's a near call (no
> fixups?).
>
> So its back to a necessity to ensure for text streams we only output
> multibyte (not wide).
>
> I can see 2 ways of doing it:
>
> 1. Change MSVCRT_vfwprintf and add a conversion widechartomultibyte for a
> text stream (if (MSVCRT_fdesc[fd].wxflag & WX_TEXT) ...)
> Advantages: Changes self contained, easy
> Disadvantages: Formatting into Unicode buffer, then need to allocate space
> for answer, convert, print, free
>
> 2. Change MSVCRT_vfwprintf to use a new internal function which is identical
> to MSVCRT_vsnwprintf but sets out->Unicode = FALSE and out.buf.A as
> destination string.
> Advantages: Fastest way (conversion is WC->MB as the formatting is
> processed)
> Disadvantage: Needs new internal function
>
> My preference is (2), and I have coded and tested it to confirm it does
> solve both the file i/o tests (included in patch) and the console output
> tests (performed by hand)
>
> Comments?
> Jason
>
>
>   
> ------------------------------------------------------------------------
>
>
>   
what should be done is:
- ensure that all msvcrt's function needing to write unicode strings 
behave as follows:
    + if the output handle is a console handle, use WriteConsoleW -> 
this is the only way to get the proper output
    + if not, convert the unicode string to ansi string and use WriteFile
For the implementation, note that the call to WriteConsoleW will fail if 
the output stream is not a console handle, ie has been redirected to a 
file, pipe... => which makes the coding rather straightforward

the current issue is that all the w-functions end up calling WriteFile 
which is wrong

NB : in your test cases, don't use L"xxx" strings, they won't work when 
your test case is compiled under linux

A+

-- 
Eric Pouech
"The problem with designing something completely foolproof is to underestimate the ingenuity of a complete idiot." (Douglas Adams)





More information about the wine-devel mailing list