AUTHORS list and the C locale on Mac OS X

James McKenzie jjmckenzie51 at earthlink.net
Tue Nov 9 20:58:41 CST 2010


On 11/9/10 3:29 PM, Reece Dunn wrote:
> On 9 November 2010 22:13, Charles Davis<cdavis at mymail.mines.edu>  wrote:
>> On 11/9/10 1:58 PM, James Mckenzie wrote:
>>> Charles Davis<cdavis at mymail.mines.edu>  wrote:
>>>> On 11/9/10 12:13 PM, James Mckenzie wrote:
>>>>> No, it is not a bug in GNU sed.  The authors.c file needs to have the erroneous characters for the language used by
>>>>> MacOSX changed to be acceptable?
>>>> That ain't gonna fly. I think we should explicitly use a UTF-8 locale
>>>> (like en_US.UTF-8 or some such) instead of the C locale when sed goes
>>>> over the AUTHORS file.
>>> Don't shoot the messenger.
>> Sorry.
>>
>> The problem with your first idea--removing the bad characters directly
>> from the authors.c file--is that we'd need to use a utility like sed or
>> awk to implement it automatically--which puts us right back where we
>> started. (We could use diff/patch, but is it worth the effort to
>> maintain a patch for this? And would AJ let us put the patch file in
>> Wine? And if not, where would we put it?)
>>>   Maybe we can force the use of sed if it exists in the /usr/bin directory then to get around the 'brokenness' of GNU sed on the Mac?
>> Maybe. But that seems like a hack. A better way might be to detect if
>> we're on Mac OS and using GNU sed; in that case, we use /usr/bin/sed.
>> That's less of a hack, but still a hack.
>>>   If not, it is a real bear to set the language on a Mac per previous discussions on the Users list.
>> That was about setting LANG. Wine always obeys LC_*, and so does sed.
>>
>> It's not the language that's the problem. It's the encoding. The AUTHORS
>> file is encoded in UTF-8, but GNU sed isn't using UTF-8 because we told
>> it not to (i.e. we told it to use MacRoman because that's the default
>> encoding for the C locale). If we tell it to use UTF-8 (by setting
>> LC_ALL to, for example, 'en_US.UTF-8'), it will process the file correctly.
>>
>> Unfortunately, I just remembered that the name of the UTF-8 encoding is
>> different on Mac OS ('UTF-8') and Linux ('utf8'). That might prevent us
>> from setting LC_ALL differently. We might end up having to hack around
>> this the way either you or I described.
> You could use autoconf to detect:
>    1/  broken handling of UTF-8 characters by sed;
>    2/  name of LC_ALL flag that handles UTF-8
>
> NOTE: You will need to enumerate available locales as the user may not
> have en_US present with UTF-8 encoding (e.g. a Spanish-only or
> Chinese-only system).
>
> Something like:
>
> cat>  get_locale.sh<  EOF
> locale -a | while read locale ; do
>     if [[ LC_ALL=$locale sed<  authors.c>  /dev/null ]] ; then
>        echo $locale
>        exit
>     fi
> done
> EOF
>
> This should print a locale that can process the UTF-8 file. It needs
> cleaning up a bit, but that is the basis of it.
>
Thanks Reece.

Charles:  You want to do this?

James McKenzie





More information about the wine-devel mailing list