Perl on Redhat 9 switches to character semantics

Hans Leidekker hans at it.vu.nl
Fri May 16 06:19:56 CDT 2003


Hi,

On Redhat 9 I get errors like this one when doing a
'make htmlpages':

Malformed UTF-8 character (unexpected non-continuation byte 0x6e,
immediately after start byte 0xfc) in substitution (s///) at 
../../tools/c2man.pl line 313, <SOURCE_FILE> line 2.

This is because I have LANG="en_US.UTF-8" as part of my 
environment, and perl will now switch to character semantics
(as opposed to byte semantics) when it detects a Unicode 
character set. Wine source files contain characters with 
ordinals > 127 (it looks like the Wine sources are ISO_8859-1)
and of course, these usually don't also form valid UTF-8 
sequences.

Off hand I see three solutions (in order of increasing
acceptability):

1. Convert Wine source files to ASCII-7 or UTF-8
2. Set character set to "C" or "ISO8859-1" prior to 
   running perl on the sources
3. Force perl back into using byte semantics

1. Most non-ASCII-7 characters are in C comments (in 
   the names of authors, e.g. Ove Kåven). But there
   are files like dlls/x11drv/keyboard.c that contain
   them as part of a C string. Going this way would
   mean these characters would have to be escaped.

   Of course this is a step backwards. It would degrade
   readability of the sources and probably offend some
   awkwardly named authors ;^) It would also require a
   change of pratice, which is hard to accomplish.

   Converting to UTF-8 seems more promising. C strings
   still need to be escaped but then our Hungarian,
   authors can finally have their names spelled properly
   in the sources! Still, there are more programs that 
   have to interpret C source files and I estimate that
   most of them do not yet handle UTF-8 properly (though
   vi and emacs are amongst the capable).

2. Changing the character set beforehand will get rid of
   the errors and is much less controversial than the 
   above solution ;) But it's still cumbersome to have to
   do so.

3. This will work regardless of the character set specified
   in the user's environment. The attached patch does this for
   c2man.pl

Bye,

 -Hans

Changelog:
    Force perl to use byte semantics

-------------- next part --------------
A non-text attachment was scrubbed...
Name: c2man-utf8.diff
Type: text/x-diff
Size: 472 bytes
Desc: not available
Url : http://www.winehq.org/pipermail/wine-devel/attachments/20030516/3f5cd04e/c2man-utf8.bin


More information about the wine-devel mailing list