WineHQ

World Wine News

All the news that fits, we print.

01 May 2000 00:00:00 -0800
by Eric Pouech
Issue: 41

XML source
More Issues...

This is the 41st release of the Wine's kernel cousin publication. Its main goal is to distribute widely what's going on around Wine (the Un*x Windows emulator).

Wine 20000430 has been released. Main changes are:
  • Wine is now distributed under the X11 license.
  • DirectDraw restructuration.
  • Debugger is now an external Winelib program.
  • pthreads emulation for thread-safe glibc routines.
  • On-demand loading of built-in dlls.
  • WININET, URLMON and i18n fixes merged from Corel tree.
  • Lots of bug fixes.

This week, 161 posts consumed 444 K. There were 29 different contributors. 20 (68%) posted more than once. 11 (37%) posted last week too.

The top 5 posters of the week were:

  1. 28 posts in 150K by Patrik Stridvall
  2. 21 posts in 13K by Dimitrie O. Paun
  3. 13 posts in 28K by Alexandre Julliard
  4. 13 posts in 28K by Uwe Bonnes
  5. 13 posts in 33K by gerard patel

Improving wrc Archive

Bertho Stultiens, while preparing for a new version of wrc (the Wine resource compiler), had some yet unanswered questions: According to what I found on the web are resources always little-endian because MS does not support/wrote OSes for big-endian processors. There are a couple of questions that go with this:
  • Is it true that MS only has little-endian version?
  • Should I support big-endian at all in wrc? Currently, wrc generates the native endianness of the platform, but it does _not_ convert binary resources (such as bitmaps). It is actually extremely difficult to mix endianness in resources because everything has to be examined and _cannot_ be guaranteed to be correct (such as RCDATA).
  • Should wine only use little-endian in the resources? In my opinion, yes. Let the resources be the same all the time and let the resource-loader take care of conversion. There is a comment in a header about byte-swapping and wrc. I really would prefer to have byte-swapping in wine rather than wrc. Mainly because wine already requires to do the analysis of resource-contents, whereas wrc only packs data (without contextual/semantical knowledge).

Bertho asked for feedback and also experiences natively running Wine on a big-endian CPU.

Both Alexandre Julliard and Ulrich Weigand answered that all current NT versions run on little-endian only systems, so this question doesn't seem to have been addressed (it still remains open on Windows CE). Alexandre even made some sarcasm:The Windows headers contain a few #ifdef _MAC that attempt to add big-endian support (apparently using a generic #ifdef BIG_ENDIAN was a concept a bit too abstract for Microsoft)

Ulrich went a bit further: I agree, resources should always be treated little-endian.

At the most, we might think about making a distinction between the resource data itself and the 'meta data' surrounding it (resource directory, PE header links ...); it might be easier to have the latter in native byte ordering, especially in the case of the dummy PE headers created for Wine modules (these structures are completely internal between wrc and the Wine loader, so we can use whatever is easiest here, of course).

Every 'external' format, be it .RES file or cursors/icons/etc. imported by or included in RC files, should IMO always be little-endian. The same applies to the raw resource data exchanged between app and Wine, e.g. when using a Create...Indirect routine.

Ulrich gave also some feedback on his successful trials to run 'hello3' on Solaris (32 bit big endian) (even if he never sent the patches, because he never finished the clean up): I decided to have resource contents in little-endian, and meta data (resource directory) in native big-endian format, as this seemed to be the solution requiring the fewest Wine changes. The changes described in the following achieved this result.

Major changes include reading and writing meta-data in wrc (doing some swapping when needed), as well as modifying reading of resources in Wine (same type of swapping). Ulrich also pointed out some less obvious modification to be made: another problem is in the handling of Unicode strings: wide characters are also endianness-sensitive, of course, so a simple lstrcpyWtoA doesn't do the right thing... and pe_resource.c routines don't work, as they rely on various bit-field structures to break out the 'resource name is string' and 'resource data is directory' bits. This doesn't work, as on Sparc bit-fields are allocated starting from the MSB down, not LSB up as on Intel :-/

Finally, Bertho announced he shall be sending a new wrc version later this week.

Wine's license

After the previous events (see "shall we change?" and "vote for a change!" ) episodes), Alexandre Julliard changed the Wine license for the X11 one. Here's the terms of the new license:

Copyright (c) 1993-2000 the Wine project authors (see the file AUTHORS for a complete list)

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Ansi and Unicode Archive

Dimitrie Paun was kind of unhappy with Wine's current string support. As you may already know, most of 32 APIs come into two flavors: ANSI and Unicode. API suffixed with 'A' are ANSI, and the ones with 'W' are Unicode. Being ANSI (resp. Unicode) express how the function must handle any string input or output parameter. So, the same function, say CreateWindow, come in two flavors CreateWindowA and CreateWindowW.

Microsoft uses the same convention (a #define UNICODE triggers the Unicode mode at compile time).

ANSI means a one byte per character coding, whereas Unicode implies several bytes (at least two, but some are escapes to longer sequences). Even if Unicode consumes more memory, it also allows to store strings for various languages: most of non textual languages (Japanese, at least in Kanji or Chinese, most of cyrillic alphabets, as Russian... but also some other European languages, with specific diacritics).

Ove Kåven gave an overview of the different encodings:
  • ASCII: 7-bit, one byte per character
  • ISO 8859 encodings, ordinary SBCS codepages: 8-bit (often extended ASCII), one byte per character. (Note: All the ISO Latin 1,2.... follow this scheme )
  • Asian languages, DBCS codepages: 8-bit; either one or two bytes per character (if the first byte is a "lead byte", it's a two-byte character).
  • UTF-16: Unicode encoding, two bytes per character (preferably big-endian but I doubt MS cares). May employ surrogate pairs (two UTF16 characters in reserved ranges) to encode Unicode characters beyond the first 64K; the surrogate pairs allow access to 1M more characters (may be necessary for very exotic Asian languages, but no such characters are defined yet).
  • UCS2: Unicode encoding, two bytes per character, but not surrogate pairs.
  • UCS4: Unicode encoding, four bytes per character, easily and conveniently encodes the full Unicode set. This is what GNU systems prefer, since they don't want to deal with surrogate pairs.
  • UTF-32: Same as UCS4, just defined by different organizations (UCS4 is ISO, UTF32 is Unicode Consortium, plus the added restriction of that no more than 64K+1M different characters may exist in UTF32).
  • UTF-8 (UTF-FSS): Unicode encoding useful for compatibility with software written for 8-bit C strings. Variable-width (between 1 and 6 bytes per character). Lower 128 characters are encoded as plain ASCII.
  • UTF-7: Unicode encoding for compatibility with software written for 7-bit characters (email, news, etc). A hybrid of Base64 and Quoted-Printable.

In the rest of this article, W will refer to UTF-16 strings or functions, and U to UTF-8 strings or functions.

Currently, as Dimitrie points out, most of the Wine code is poorly written with regard to Unicode: most of the W functions convert the string into an ANSI one, and then call the A function, implying a loss of information, and some potential bugs.

Dimitrie proposed to change Wine's style for coding by providing a unique function (let's say suffixed by 'X') which would be the work horse for both A and W functions.

Dmitry Timoshkov didn't like this proposal, and rather suggested to Wine should have only one functional implementation indeed. I think, it should be implemented like in NT: all actual work does Unicode version, ANSI version simply converts ANSI to Unicode and then calls Unicode workhorse. But this transition will consume a lot of time and efforts.

Dimitrie Paun went further with: Somehow, I don't think working with W is the right thing to do in Unix. We have the following situation: we receive strings as arguments; their encoding is not explicit with every string, but rather is implicit by the entry point. Now, we can do two things:
  1. [eager] convert at the entry point in one common format, and carry on in with one internally with that format
  2. [lazy] remember the encoding that the strings are in, and pass that around until we actually need a specific encoding

Anyway, I like 2 better than 1. Not committing to an encoding early in the game is good -- sometimes we need UTF8 (file systems, X), in other cases we need UTF16 (pure Win stuff). Moreover, the thing is scalable -- if another encoding comes along, we could easily support it. And, on top of it all, it should be more efficient.

With lots of discussions and contributions from many people, the following table has been built:

Description
Pros
Cons
1 W->A conversion, work internally with A
  • best option for debugging
  • fast for A (common case today)
  • use std. Win API
  • we do NOT support Unicode, we just pretend we do(1)
  • a lot of work, a lot clutter, close to no gain.
  • inefficient for the W case
2 A->W conversion, work internally with W
  • full Unicode support
  • fast for W
  • use of std. Win API
  • part of Wine is already written this way
  • a lot of clutter
  • very inefficient in the A case (A->W->U usually)(2)
3 A,W call onto a X function which carries the encoding around
  • full Unicode support
  • as fast as 1 for A, and as 2 for W (for common code path like display)
  • support for new encodings is trivial
  • not much worse than 2 for debugging
  • maybe a bit less clutter than in 1 or 2 (debatable)
  • easy transition from what we have to this
  • use of non std. Win API: this doesn't work across DLLs (would require new APIs)
  • it is not used in Wine currently
  • test coverage of all possible paths can be huge
4 Write all functions independent of the encoding and recompile to get all encodings (same .c file would generate .Ansi.o, .w.o object files
  • fastest option for A, W
  • easy to support future encodings
  • use of std. API
  • less clutter (in theory)
  • huge bloat
  • it is not used in Wine currently
  • (maybe) difficult transition path

Notes:
  1. Patrik Stridvall modified his winapi_check tool to list the cases where W->A conversion was used. At least 172 suspect functions have been reported.
  2. Alexandre Julliard pointed out that converting A->W->U for file I/O may seem wasteful but it isn't really since we need to support code pages; you can only do A->U directly for 7-bit ascii which is not enough. And supporting code pages without the Unicode step means N^2 conversion tables instead of 2*N (where N is the number of code pages).

Since Alexandre's preferred approach is #2, it was the chosen one. However, lots of arguments, mainly between Dimitrie Paun and Patrik Stridvall flooded wine-devel to such an extent that some readers thought they were reading linux-kernel mailing list.

Patrik also proposed to automate some of the A->W or W->A conversions so that stubs for some functions could be generated from the .spec file. This didn't work out as, because there are different options to take care of:
  • strings can be input, output, or input/output string
  • being a NULL string can be an error or a normal parameter
  • string can be 0 terminated, of fixed length...
  • in some cases (like resources), strings represent IDs (if HiWord is 0)
Semantics seemed too complex to really provide a robust framework. As a conclusion, Wine internal string encoding shall (slowly) shift from Ansi to Unicode (UTF-16).

All Kernel Cousin issues and summaries are copyright their original authors, and distributed under the terms of the
GNU General Public License, version 2.0.