Unicode, i18n support

Wed Apr 3 07:54:15 CST 2002

Francois Gouget wrote:
>    So if I understand correctly, Linux does not provide a uniform
> interface to the filesystem. I.e. if I do 'touch ~/foo' where foo
> contains weird characters I must make sure these are the right
> characters for the codepage used by ~, and then if I do 'touch
> /mnt/win98/foo', then I must change 'foo' so that its characters now
> match the 1251 codepage, and I may have to rewrite foo yet again for
> 'touch /zipdrive/foo'.
> 
>    Urgh. This is certainly ugly. I thought that Linux would be taking
> UTF-8 or something like it for all filesystems and then do the codepage
> conversions itself depending on the underlying filesystem. I thought
> that this was the point of having all the codepage information in the
> kernel for fat filesystems.

Native Unix point of view is that filename is string, and what it 
means depends on userspace programs (presentation). In Linux one 
can easily change encoding used for presentation, in Poland typical 
is ISO8859-2. If user types a name which contains national characters
such name is stored verbatim on disk. If retrived later using different
encoding it may appear garbled. Typical Linux installation will choose 
"preffered" encoding and set up things so that encoding works well. 
In particular codepages in kernel and mount options allows to translate
names on fat filesystem form (to) "preffered" encoding. 

When I wrote about weird setup I mean that technically it is possible 
to use different encodings in different filesystems, and I can imagine
various scenarios that do this (basicaly to work with software that 
insists on specific encoding). 

I belive that UTF-8 is the way to go, but it is still the future --
last year trying to make UTF-8 system I found that I need a bunch of 
programs which cannot work with UTF-8 (they work well with any 8-bit
encoding)

-- 
                              Waldek Hebisch
hebisch at math.uni.wroc.pl    or hebisch at hera.math.uni.wroc.pl