AUTHORS list and the C locale on Mac OS X
Charles Davis
cdavis at mymail.mines.edu
Tue Nov 9 21:02:31 CST 2010
On 11/9/10 7:58 PM, James McKenzie wrote:
> On 11/9/10 3:29 PM, Reece Dunn wrote:
>> On 9 November 2010 22:13, Charles Davis<cdavis at mymail.mines.edu> wrote:
>>> On 11/9/10 1:58 PM, James Mckenzie wrote:
>>>> Charles Davis<cdavis at mymail.mines.edu> wrote:
>>>>> On 11/9/10 12:13 PM, James Mckenzie wrote:
>>>>>> No, it is not a bug in GNU sed. The authors.c file needs to have
>>>>>> the erroneous characters for the language used by
>>>>>> MacOSX changed to be acceptable?
>>>>> That ain't gonna fly. I think we should explicitly use a UTF-8 locale
>>>>> (like en_US.UTF-8 or some such) instead of the C locale when sed goes
>>>>> over the AUTHORS file.
>>>> Don't shoot the messenger.
>>> Sorry.
>>>
>>> The problem with your first idea--removing the bad characters directly
>>> from the authors.c file--is that we'd need to use a utility like sed or
>>> awk to implement it automatically--which puts us right back where we
>>> started. (We could use diff/patch, but is it worth the effort to
>>> maintain a patch for this? And would AJ let us put the patch file in
>>> Wine? And if not, where would we put it?)
>>>> Maybe we can force the use of sed if it exists in the /usr/bin
>>>> directory then to get around the 'brokenness' of GNU sed on the Mac?
>>> Maybe. But that seems like a hack. A better way might be to detect if
>>> we're on Mac OS and using GNU sed; in that case, we use /usr/bin/sed.
>>> That's less of a hack, but still a hack.
>>>> If not, it is a real bear to set the language on a Mac per
>>>> previous discussions on the Users list.
>>> That was about setting LANG. Wine always obeys LC_*, and so does sed.
>>>
>>> It's not the language that's the problem. It's the encoding. The AUTHORS
>>> file is encoded in UTF-8, but GNU sed isn't using UTF-8 because we told
>>> it not to (i.e. we told it to use MacRoman because that's the default
>>> encoding for the C locale). If we tell it to use UTF-8 (by setting
>>> LC_ALL to, for example, 'en_US.UTF-8'), it will process the file
>>> correctly.
>>>
>>> Unfortunately, I just remembered that the name of the UTF-8 encoding is
>>> different on Mac OS ('UTF-8') and Linux ('utf8'). That might prevent us
>>> from setting LC_ALL differently. We might end up having to hack around
>>> this the way either you or I described.
>> You could use autoconf to detect:
>> 1/ broken handling of UTF-8 characters by sed;
>> 2/ name of LC_ALL flag that handles UTF-8
>>
>> NOTE: You will need to enumerate available locales as the user may not
>> have en_US present with UTF-8 encoding (e.g. a Spanish-only or
>> Chinese-only system).
>>
>> Something like:
>>
>> cat> get_locale.sh< EOF
>> locale -a | while read locale ; do
>> if [[ LC_ALL=$locale sed< authors.c> /dev/null ]] ; then
>> echo $locale
>> exit
>> fi
>> done
>> EOF
>>
>> This should print a locale that can process the UTF-8 file. It needs
>> cleaning up a bit, but that is the basis of it.
>>
> Thanks Reece.
>
> Charles: You want to do this?
I'm on it.
If you have a patch ready, though, go for it.
Chip
More information about the wine-devel
mailing list