Linus Torvalds in ascii art easy ascii

The ability to mix legacy encodings in a single filesystem is sometimes useful for applications but it creates major data conversion issues when users do this. Having looked into many different encodings, I'd agree with the suggestion to use UTF-*, but in reality systems still need to support legacy 8-bit and 16-bit encodings - there are many filesystems out there with filenames in legacy encodings, and often a mix of encodings. ISO2022 is a truly horrible encoding that should never be used, and should certainly not be supported - it can embed normal ASCII characters within a "wide" character, making it very difficult to process. I'm not sure whether this is really feasible, but it seems like the best choice if it is. But when presenting them to userspace, it escapes certain characters in a predictable way. Basically, under this option, the kernel continues to treat filenames as binary blobs on the disk. I wonder if it would be feasible to use the "escaping" option talked about on Wheeler's page. Letting system administrators turn their implicit policy into an explicit one would close a lot of security holes. Most system administrators already have an unwritten policy about filenames- they don't create filenames with embedded control characters, crazy stuff like leading dashes, or embedded newlines. A few people running special-purpose systems might mount their rootfs with more restrictive rulesets. Probably nearly every Linux distribution would disallow filenames that were not UTF-8.

I think I agree with Spudd86's solution: there should be some kind of mount option that puts a ruleset in place for filenames. So is it really worth rewriting all software that dislays filenames in order to better support this legacy stuff? Especially when no other platforms support it at all? As Linus constantly points out, Linux-specific filesystem interfaces don't get used that much, even when they offer great benefits. Based on comments made elsewhere in this thread, MacOS and Windows have already decreed that all filenames should be unicode. I'm pretty far from being an expert in internationalization, but my understanding is that non-unicode character encodings are considered deprecated. However, it would be a huge amount of work to change all the applications to check this attribute and act appropriately. Well, you could use an extended attribute to represent the encoding of the filename. > applications how best to display the name

> For example, it could tell user space that a file name is Utf-8 but still > filesystem could carry encoding hints without being encoding-aware itself. > that are encoding-aware and the problems that causes. There have been a number of complaints on this thread about filesystems