Perl on Redhat 9 switches to character semantics
Hans Leidekker
hans at it.vu.nl
Fri May 16 06:19:56 CDT 2003
Hi,
On Redhat 9 I get errors like this one when doing a
'make htmlpages':
Malformed UTF-8 character (unexpected non-continuation byte 0x6e,
immediately after start byte 0xfc) in substitution (s///) at
../../tools/c2man.pl line 313, <SOURCE_FILE> line 2.
This is because I have LANG="en_US.UTF-8" as part of my
environment, and perl will now switch to character semantics
(as opposed to byte semantics) when it detects a Unicode
character set. Wine source files contain characters with
ordinals > 127 (it looks like the Wine sources are ISO_8859-1)
and of course, these usually don't also form valid UTF-8
sequences.
Off hand I see three solutions (in order of increasing
acceptability):
1. Convert Wine source files to ASCII-7 or UTF-8
2. Set character set to "C" or "ISO8859-1" prior to
running perl on the sources
3. Force perl back into using byte semantics
1. Most non-ASCII-7 characters are in C comments (in
the names of authors, e.g. Ove Kåven). But there
are files like dlls/x11drv/keyboard.c that contain
them as part of a C string. Going this way would
mean these characters would have to be escaped.
Of course this is a step backwards. It would degrade
readability of the sources and probably offend some
awkwardly named authors ;^) It would also require a
change of pratice, which is hard to accomplish.
Converting to UTF-8 seems more promising. C strings
still need to be escaped but then our Hungarian,
authors can finally have their names spelled properly
in the sources! Still, there are more programs that
have to interpret C source files and I estimate that
most of them do not yet handle UTF-8 properly (though
vi and emacs are amongst the capable).
2. Changing the character set beforehand will get rid of
the errors and is much less controversial than the
above solution ;) But it's still cumbersome to have to
do so.
3. This will work regardless of the character set specified
in the user's environment. The attached patch does this for
c2man.pl
Bye,
-Hans
Changelog:
Force perl to use byte semantics
-------------- next part --------------
A non-text attachment was scrubbed...
Name: c2man-utf8.diff
Type: text/x-diff
Size: 472 bytes
Desc: not available
Url : http://www.winehq.org/pipermail/wine-patches/attachments/20030516/3f5cd04e/c2man-utf8.bin
More information about the wine-patches
mailing list