Perl on Redhat 9 switches to character semantics

Shachar Shemesh wine-devel at shemesh.biz
Sun May 18 12:23:46 CDT 2003


Hans Leidekker wrote:

>On Sat, 17 May 2003, Shachar Shemesh wrote:
>
>  
>
>>No, they are in whatever locale the string is. In particular, the entire 
>>keyboard code is filled to the brim with strings, each with a different 
>>locale. I'm talking about functional code here, not something which is 
>>only inside comments.
>>    
>>
>
>I know Wine sources are not declared as adhering to any particular
>character set, but when I display them using ISO_8859-1 I see the
>least distortions. That's why I said "it looks like" they are 
>ISO_8859-1.
>  
>
That's because people with names outside of the 8859-1 charset rarely 
assume that any client will be able to read their name, and write it in 
latin (Japanese call it "Romanji") letters. European names, on the other 
hand, rarely have pure-latin transcripts, because the letters are too 
similar. Irony.

>>No can do ASCII.  A hebrew "שלום" will not look good, or at all, for 
>>that matter, in ASCII.
>>
As your locale is UTF-8, you made my string twice as long `-)

>That's obvious. Hebrew won't look good in ISO_8859-1 either.
>
No, but it will, at least, be preserved. Not critical to comments, but 
is critical to non-lating strings.

> Then,
>like I said, your option is to "escape" characters outside ASCII-7,
>like Germans do with their umlauts.
>
Care to show what you mean?

> If that Hebrew string you presented
>is your name,
>
Nah, far too long for that. My name is just three letters. Get the full 
story at http://www.shemesh.biz/sun.html.

> then "Shachar" could be seen as an escaped ASCII-7 
>notation for it, couldn't it?
>
If you mean that instead of writing "שחר", I should write 
"\xfa\xe8\xf9", then I think you are talking non-practical solutions 
here. It took me less then a second to write the native version - I just 
typed it. It took me almost a minute to write the escaped version, and I 
can only speculate as to whether I got it right. I just redid it, 
because I have, in fact, not got it right. What CJK people are expected 
to do is not something I would like to contemplate. In addition to that, 
noone, not even Hebrew speakers, can be reasonably expected to 
understand what is written there. That is a majour source for problems.

Having said that, there is one place I did exactly this in the Wine 
sources. In dlls/commdlg/font.c, you can find, near the begining of the 
file, a table of the characters that the font dialog should display for 
the corresponding locale. The enteries in that table are in UTF-16, as I 
couldn't make each string of a different locale. As a result, they are, 
indeed, unreadable. As this is not a true string, but simply a few 
character to demo a font, I'm hoping it will not matter much.

>>UTF-8 may work for resources, if the resource
>>compiler is adjusted accordingly, but not inside the code, where the 
>>encoding actually matters for the code that parses it.
>>    
>>
>>>2. Set character set to "C" or "ISO8859-1" prior to 
>>>  running perl on the sources
>>>      
>>>
>>That sounds better, I think... What does perl do with the sources again?
>>    
>>
>By Perl I in fact mean any Wine tool that's written in Perl. Mostly
>running regexps on the sources is what they do I guess.
>  
>
Then I vote for this. 8859-1 will not distort the sources, which is all 
that is really required.

>>Plus you have not solved the functional strings problem.
>>    
>>
>What do you mean by "functional strings"?
>  
>
I mean strings that actually perform some function, as opposed to 
comments. The most prominant example is keyboard.c, where each string is 
of a different encoding. The code at fontdlg.c is also an example.

>-Hans
>  
>
Much thought I like UTF-8, I think it is totally and utterly 
inapropriate for handling the Wine code. Like it or not, MS chose UTF-16 
(actually, they chose UCS-2, and then made it UTF-16 when it was 
invented, IIRC), and that's what Wine must choose as well. Given that 
fact, it makes no sense to have strings inside Wine in UTF-8, as that 
would require runtime convertions. If the strings are not UTF-8, there 
is no reason to make the comments so.

             Shachar

-- 
Shachar Shemesh
Open Source integration consultant
Home page & resume - http://www.shemesh.biz/





More information about the wine-devel mailing list