[PATCH 1/3] kernel32: Support UTF-7 in MultiByteToWideChar. (try 3)

Tue Oct 14 20:16:25 CDT 2014

2014-10-14 16:12 GMT-06:00 Michael Stefaniuc <mstefani at redhat.com>:
>> @@ -1963,7 +2119,7 @@ BOOL WINAPI EnumSystemCodePagesW( CODEPAGE_ENUMPROCW lpfnCodePageEnum, DWORD fla
>>   *   flags  [I] Character mapping flags
>>   *   src    [I] Source string buffer
>>   *   srclen [I] Length of src (in bytes), or -1 if src is NUL terminated
>> - *   dst    [O] Destination buffer
>> + *   dst    [O] Destination buffer, or NULL to compute the required length
>>   *   dstlen [I] Length of dst (in WCHARs), or 0 to compute the required length
> The only case where dst == NULL is when dstlen == 0 but that already
> implies computing the required length.
> dst == NULL and dstlen != 0 is an error and MultiByteToWideChar() bails
> out.

Good point. I've removed this chunk and its WideCharToMultiByte counterpart.

>>  /***********************************************************************
>> + *              write_to_w_string
>> + *
>> + * Helper for utf7_mbstowcs
>> + *
>> + * RETURNS
>> + *   TRUE on success, FALSE on error
>> + */
>> +static inline BOOL write_to_w_string(WCHAR *dst, int dstlen, int *index, WCHAR character)
> This isn't a good name for the helper as that's not what it does. It is
> more a "count and optionally write". Also prefixing it utf7 would make
> it clear that is just a helper for utf7_mbstowcs and not a generic
> function and thus would remove the need for the comment that it is just
> a helper for utf7_mbstowcs.

I've renamed them to utf7_write_w and utf7_write_c.

> I know you're using this helper to avoid code duplication between the
> length calculation and the conversion unlike wine_utf8_mbstowcs(). But
> it complicates the handling of available space in dst. Without seeing
> the code for both approaches I'm not sure what is best so I'll review
> it as is.
>
>> +{
>> +    if (dst)
> You have to force dst = NULL in utf7_mbstowcs() for this to work.
> Just checking for dstlen > 0 should avoid that.

Good point. I've made this change as you suggested.

>> +/***********************************************************************
>> + *              utf7_mbstowcs
>> + *
>> + * UTF-7 to UTF-16 string conversion, helper for MultiByteToWideChar
>> + *
>> + * RETURNS
>> + *   On success, the number of characters written
>> + *   On dst buffer overflow, -1
> What about the case where it fails to decode a character? E.g.
> wine_utf8_mbstowcs() has -2 for that case which then translates to
> SetLastError(ERROR_NO_UNICODE_TRANSLATION). I looked at the tests but
> you didn't seem to test the GetLastError in the cases with the invalid
> input characters.

The UTF-7 routines in Windows do not check for Unicode compliance.
Still, I've added error code checking to all the tests now, which will
hopefully make that more clear.

>> +    static const WCHAR base64_decoding_table[] = {
> Why is this table an array of WCHAR? The entries aren't WCHARs.

I've changed them to unsigned chars.

>> +        /* \0   */ -1,    /* \x01 */ -1,    /* \x02 */ -1,    /* \x03 */ -1,
> This table is really hard to read even in a editor that does syntax
> highlighting. As you just need 4 characters per column you could have
> sixteen columns and merge the comments to one comment at the end of the
> line.  Something like this would be more readable as well as being more
> compact:
>         -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
>  /* 0x00 - 0x15 */

I've changed it to something similar while still listing every
individual character:

        /* \0     \x01   \x02   \x03   \x04   \x05   \x06   \a   */
           -1,    -1,    -1,    -1,    -1,    -1,    -1,    -1,

>> +    /* MultiByteToWideChar guarantees that srclen > 0 */
>> +    assert(srclen > 0);
> This assert isn't needed. Just rewriting the first do..while loop to a
> while loop avoids it. That eliminates 3 lines of code of which one is a
> comment
> pointing to an unneeded assert.

A while loop adds an unnecessary check, but whatever. You and
Sebastian Lackner both requested it, so I concede.

>> +                    sextet = base64_decoding_table[sextet];
>> +                    if (sextet == (WCHAR)-1)
> Unneeded cast. Using a signed type for sextet also avoids the
> -Wsign-compare compiler warning. Alternatively use ~0 instead of -1 if
> you want to keep an unsigned type.

Using a signed char in an array subscript causes Wchar-subscripts. I'd
just be trading one warning for another.

>> +                        /* -1 means that the next character of src is not part of a base64 sequence */
> This upper part of the comment is better placed in the declaration of
> base64_decoding_table.

I really think it makes more sense where it is.

2014-10-14 16:59 GMT-06:00 Michael Stefaniuc <mstefani at redhat.com>:
>>  dlls/kernel32/tests/codepage.c | 751 ++++++++++++++++++++++++++++++++++
>> +++++++ 1 file changed, 751 insertions(+)
> while more tests are good that's a lot of lines to have for just one
> function. So splitting the WCHAR => UTF-7 from the UTF-7 => WCHAR tests
> should help make it more manageable.

The trouble is that the utf16_to_utf7_test's are reversible, so I
can't cleanly categorize the tests as "encoding tests" or "decoding
tests", because some tests are both.

> Also some of the strings are huge. Most of those seem to be for the
> WCHAR => UTF-7 and could be probably generated by just using the
> directly_encodable_table[] from patch 2.
> If the tests cannot be generated from the table then please split the
> strings into more manageable chunks, as is they are unreadable /
> unmaintainable.

Those strings never need to be changed, ever. It's by far easier to
generate them once and inline them. I'm not sure that splitting them
up will make them much easier to read, either.

> Also the due to the size of the patch it is hard to discern what the
> difference between the different test structs are. Especially as the
> names are basically equivalent in this context:
> - utf16_to_utf7_tests and wcstombs_tests
> - utf7_to_utf16_tests and mbstowcs_tests

Could you suggest better names?

Anyway, thanks for taking the time to review this patch set. You made
a lot of good points. The latest changes (incorporating your
suggestions) are at https://github.com/alexhenrie/wine

-Alex