Perl on Redhat 9 switches to character semantics
Shachar Shemesh
wine-devel at shemesh.biz
Sun May 18 12:23:46 CDT 2003
Hans Leidekker wrote:
>On Sat, 17 May 2003, Shachar Shemesh wrote:
>
>
>
>>No, they are in whatever locale the string is. In particular, the entire
>>keyboard code is filled to the brim with strings, each with a different
>>locale. I'm talking about functional code here, not something which is
>>only inside comments.
>>
>>
>
>I know Wine sources are not declared as adhering to any particular
>character set, but when I display them using ISO_8859-1 I see the
>least distortions. That's why I said "it looks like" they are
>ISO_8859-1.
>
>
That's because people with names outside of the 8859-1 charset rarely
assume that any client will be able to read their name, and write it in
latin (Japanese call it "Romanji") letters. European names, on the other
hand, rarely have pure-latin transcripts, because the letters are too
similar. Irony.
>>No can do ASCII. A hebrew "שלו×" will not look good, or at all, for
>>that matter, in ASCII.
>>
As your locale is UTF-8, you made my string twice as long `-)
>That's obvious. Hebrew won't look good in ISO_8859-1 either.
>
No, but it will, at least, be preserved. Not critical to comments, but
is critical to non-lating strings.
> Then,
>like I said, your option is to "escape" characters outside ASCII-7,
>like Germans do with their umlauts.
>
Care to show what you mean?
> If that Hebrew string you presented
>is your name,
>
Nah, far too long for that. My name is just three letters. Get the full
story at http://www.shemesh.biz/sun.html.
> then "Shachar" could be seen as an escaped ASCII-7
>notation for it, couldn't it?
>
If you mean that instead of writing "שחר", I should write
"\xfa\xe8\xf9", then I think you are talking non-practical solutions
here. It took me less then a second to write the native version - I just
typed it. It took me almost a minute to write the escaped version, and I
can only speculate as to whether I got it right. I just redid it,
because I have, in fact, not got it right. What CJK people are expected
to do is not something I would like to contemplate. In addition to that,
noone, not even Hebrew speakers, can be reasonably expected to
understand what is written there. That is a majour source for problems.
Having said that, there is one place I did exactly this in the Wine
sources. In dlls/commdlg/font.c, you can find, near the begining of the
file, a table of the characters that the font dialog should display for
the corresponding locale. The enteries in that table are in UTF-16, as I
couldn't make each string of a different locale. As a result, they are,
indeed, unreadable. As this is not a true string, but simply a few
character to demo a font, I'm hoping it will not matter much.
>>UTF-8 may work for resources, if the resource
>>compiler is adjusted accordingly, but not inside the code, where the
>>encoding actually matters for the code that parses it.
>>
>>
>>>2. Set character set to "C" or "ISO8859-1" prior to
>>> running perl on the sources
>>>
>>>
>>That sounds better, I think... What does perl do with the sources again?
>>
>>
>By Perl I in fact mean any Wine tool that's written in Perl. Mostly
>running regexps on the sources is what they do I guess.
>
>
Then I vote for this. 8859-1 will not distort the sources, which is all
that is really required.
>>Plus you have not solved the functional strings problem.
>>
>>
>What do you mean by "functional strings"?
>
>
I mean strings that actually perform some function, as opposed to
comments. The most prominant example is keyboard.c, where each string is
of a different encoding. The code at fontdlg.c is also an example.
>-Hans
>
>
Much thought I like UTF-8, I think it is totally and utterly
inapropriate for handling the Wine code. Like it or not, MS chose UTF-16
(actually, they chose UCS-2, and then made it UTF-16 when it was
invented, IIRC), and that's what Wine must choose as well. Given that
fact, it makes no sense to have strings inside Wine in UTF-8, as that
would require runtime convertions. If the strings are not UTF-8, there
is no reason to make the comments so.
Shachar
--
Shachar Shemesh
Open Source integration consultant
Home page & resume - http://www.shemesh.biz/
More information about the wine-devel
mailing list