AUTHORS list and the C locale on Mac OS X

James McKenzie jjmckenzie51 at earthlink.net
Tue Nov 9 21:19:13 CST 2010


On 11/9/10 8:02 PM, Charles Davis wrote:
> On 11/9/10 7:58 PM, James McKenzie wrote:
>> On 11/9/10 3:29 PM, Reece Dunn wrote:
>>> On 9 November 2010 22:13, Charles Davis<cdavis at mymail.mines.edu>   wrote:
>>>> On 11/9/10 1:58 PM, James Mckenzie wrote:
>>>>> Charles Davis<cdavis at mymail.mines.edu>   wrote:
>>>>>> On 11/9/10 12:13 PM, James Mckenzie wrote:
>>>>>>> No, it is not a bug in GNU sed.  The authors.c file needs to have
>>>>>>> the erroneous characters for the language used by
>>>>>>> MacOSX changed to be acceptable?
>>>>>> That ain't gonna fly. I think we should explicitly use a UTF-8 locale
>>>>>> (like en_US.UTF-8 or some such) instead of the C locale when sed goes
>>>>>> over the AUTHORS file.
>>>>> Don't shoot the messenger.
>>>> Sorry.
>>>>
>>>> The problem with your first idea--removing the bad characters directly
>>>> from the authors.c file--is that we'd need to use a utility like sed or
>>>> awk to implement it automatically--which puts us right back where we
>>>> started. (We could use diff/patch, but is it worth the effort to
>>>> maintain a patch for this? And would AJ let us put the patch file in
>>>> Wine? And if not, where would we put it?)
>>>>>    Maybe we can force the use of sed if it exists in the /usr/bin
>>>>> directory then to get around the 'brokenness' of GNU sed on the Mac?
>>>> Maybe. But that seems like a hack. A better way might be to detect if
>>>> we're on Mac OS and using GNU sed; in that case, we use /usr/bin/sed.
>>>> That's less of a hack, but still a hack.
>>>>>    If not, it is a real bear to set the language on a Mac per
>>>>> previous discussions on the Users list.
>>>> That was about setting LANG. Wine always obeys LC_*, and so does sed.
>>>>
>>>> It's not the language that's the problem. It's the encoding. The AUTHORS
>>>> file is encoded in UTF-8, but GNU sed isn't using UTF-8 because we told
>>>> it not to (i.e. we told it to use MacRoman because that's the default
>>>> encoding for the C locale). If we tell it to use UTF-8 (by setting
>>>> LC_ALL to, for example, 'en_US.UTF-8'), it will process the file
>>>> correctly.
>>>>
>>>> Unfortunately, I just remembered that the name of the UTF-8 encoding is
>>>> different on Mac OS ('UTF-8') and Linux ('utf8'). That might prevent us
>>>> from setting LC_ALL differently. We might end up having to hack around
>>>> this the way either you or I described.
>>> You could use autoconf to detect:
>>>     1/  broken handling of UTF-8 characters by sed;
>>>     2/  name of LC_ALL flag that handles UTF-8
>>>
>>> NOTE: You will need to enumerate available locales as the user may not
>>> have en_US present with UTF-8 encoding (e.g. a Spanish-only or
>>> Chinese-only system).
>>>
>>> Something like:
>>>
>>> cat>   get_locale.sh<   EOF
>>> locale -a | while read locale ; do
>>>      if [[ LC_ALL=$locale sed<   authors.c>   /dev/null ]] ; then
>>>         echo $locale
>>>         exit
>>>      fi
>>> done
>>> EOF
>>>
>>> This should print a locale that can process the UTF-8 file. It needs
>>> cleaning up a bit, but that is the basis of it.
>>>
>> Thanks Reece.
>>
>> Charles:  You want to do this?
> I'm on it.
>
> If you have a patch ready, though, go for it.
>
No, I'm stuck with a problem in richedit.  Besides you have more Mac 
specific knowledge than I do, and I'm happy to say that.  Although, if 
you need a test 'victim' I'm here for you.

James McKenzie




More information about the wine-devel mailing list