Perl on Redhat 9 switches to character semantics
Shachar Shemesh
wine-devel at shemesh.biz
Fri May 16 18:07:42 CDT 2003
Hans Leidekker wrote:
>Hi,
>
>This is because I have LANG="en_US.UTF-8" as part of my
>environment, and perl will now switch to character semantics
>(as opposed to byte semantics) when it detects a Unicode
>character set. Wine source files contain characters with
>ordinals > 127 (it looks like the Wine sources are ISO_8859-1)
>
No, they are in whatever locale the string is. In particular, the entire
keyboard code is filled to the brim with strings, each with a different
locale. I'm talking about functional code here, not something which is
only inside comments.
Another place where everything is with different locaele are the resources.
>and of course, these usually don't also form valid UTF-8
>sequences.
>
>Off hand I see three solutions (in order of increasing
>acceptability):
>
>1. Convert Wine source files to ASCII-7 or UTF-8
>
No can do ASCII. A hebrew "שלום" will not look good, or at all, for
that matter, in ASCII. UTF-8 may work for resources, if the resource
compiler is adjusted accordingly, but not inside the code, where the
encoding actually matters for the code that parses it.
>2. Set character set to "C" or "ISO8859-1" prior to
> running perl on the sources
>
That sounds better, I think... What does perl do with the sources again?
>1. Most non-ASCII-7 characters are in C comments (in
> the names of authors, e.g. Ove Kåven). But there
> are files like dlls/x11drv/keyboard.c that contain
> them as part of a C string. Going this way would
> mean these characters would have to be escaped.
>
I offered that some time ago. This can also mean that the strings can be
unicode proper. The general consensus at the time was that this should
not be the case, so that the maintainer of the language can easily check
their layout.
> Converting to UTF-8 seems more promising. C strings
> still need to be escaped but then our Hungarian,
> authors can finally have their names spelled properly
> in the sources! Still, there are more programs that
> have to interpret C source files and I estimate that
> most of them do not yet handle UTF-8 properly (though
> vi and emacs are amongst the capable).
>
Plus you have not solved the functional strings problem.
Shachar
--
Shachar Shemesh
Open Source integration consultant
Home page & resume - http://www.shemesh.biz/
More information about the wine-devel
mailing list