Unicode, i18n support
hebisch at math.uni.wroc.pl
Wed Apr 3 07:54:15 CST 2002
Francois Gouget wrote:
> So if I understand correctly, Linux does not provide a uniform
> interface to the filesystem. I.e. if I do 'touch ~/foo' where foo
> contains weird characters I must make sure these are the right
> characters for the codepage used by ~, and then if I do 'touch
> /mnt/win98/foo', then I must change 'foo' so that its characters now
> match the 1251 codepage, and I may have to rewrite foo yet again for
> 'touch /zipdrive/foo'.
> Urgh. This is certainly ugly. I thought that Linux would be taking
> UTF-8 or something like it for all filesystems and then do the codepage
> conversions itself depending on the underlying filesystem. I thought
> that this was the point of having all the codepage information in the
> kernel for fat filesystems.
Native Unix point of view is that filename is string, and what it
means depends on userspace programs (presentation). In Linux one
can easily change encoding used for presentation, in Poland typical
is ISO8859-2. If user types a name which contains national characters
such name is stored verbatim on disk. If retrived later using different
encoding it may appear garbled. Typical Linux installation will choose
"preffered" encoding and set up things so that encoding works well.
In particular codepages in kernel and mount options allows to translate
names on fat filesystem form (to) "preffered" encoding.
When I wrote about weird setup I mean that technically it is possible
to use different encodings in different filesystems, and I can imagine
various scenarios that do this (basicaly to work with software that
insists on specific encoding).
I belive that UTF-8 is the way to go, but it is still the future --
last year trying to make UTF-8 system I found that I need a bunch of
programs which cannot work with UTF-8 (they work well with any 8-bit
hebisch at math.uni.wroc.pl or hebisch at hera.math.uni.wroc.pl
More information about the wine-devel