HFS+ and Unicode madness
This is an old post that I wrote some while ago, but never published it. It is here just for historical|nostalgic|unburdening reasons. Original post begins here:
Annoying scenario: A file with unicode characters in its filename, like Pétalo.java
:
Why? That’s easy, because Mac OS X java wants everything with MacRoman encoding (WTF??). Then I tried with the -encoding flag and tab-autocompletion for the filename:
That is not good. When using autocompletion the filename is represented in NFD or Normalization Form Canonical Decomposition (é => 65 cc81) but the class name in the file uses NFC or Normalization Form Canonical Composition (é => c3a9). What a mess! Do you see the difference?:
You can’t see the difference until you see the actual bytes of the string, and that’s why Apple tries to protect us by storing everything in the Filesystem with the same representation: NFD. The first echo was done with autocompletion while the latter was typed directly with the keyboard. So if I compile the java sourcecode by fully typing the filename, it succeeds.
¿What about version control?
Then, this leads to a major problem. When someone in the world (including you and me) creates a file, he types its name with the keyboard, which as I have showed, it gives NFC in Mac OS X. But with HFS, the file is stored using NFD and if you clone a git repo created in another OS you’ll see that you have untracked files without modifying anything!!
UFS to the rescue (kind of…)
Maybe you could say “well, just use another filesystem”, and I’ve done that, but it seems that Apple uses it’s decomposition algorithms everywhere.
First I tried ZFS as I thought “That FS should work correctly in OSX, they can’t spoil it”. But I installed the read-write implementation from MacOS Forge and it did the same stupid conversion (UPDATE: ZFS is discontinued in MacOS Forge :( ).
FAT is a no-no since it only accepts ASCII characters for the filename and I don’t want to start a remote server nor a Virtual Machine every time I have to work with those files. This should work with my beloved OS!! And in fact, I have successfully made everything I’ve needed to work in Mac OS, until now :(.
So I needed to try UFS, I looked at DiskUtility, but it doesn’t support UFS and newfs fails to format my external USB disk.
But since I used hdiutil to create Pallet’s dmg, I thought of havin a UFS diskimage and guess what, you can!! This is the magic command:
Now I can mount my dmg, compile and use git!!
But still there are two little problems:
- Snow Leopard doesn’t support UFS. I can’t even mount my UFS disk image :(
- Every app that uses Cocoa to open files can’t read files with NFC filenames. This means “forget about TextMate”. Noooooooo!!
Update
Now I’ve updated to Snow Leopard and is very disappointing and sad that I have to use a virtual machine to do my job.
The good thing is that I can work with TextMate via ssh and MacFusion.
Bad things: MacOS X can solve my problem, I can’t open files with NFC names in TextMate (but emacs.app still loves me) and although it seems there are some people working with File Systems at Apple I think this won’t be solved anytime soon.