I'm struggling to deliver an project to a client. The job is to package files into an archive; simple, right? Well, the files have (and must have) french characters in their names. I'm archiving from the linux command line, she's opening from the desktop on windows.
At first, I tried 'zip', and it didn't work out. Character support appears to vary by implementation from what I've read here on StackOverflow. While unpacking, the resulting files didn't look right to me (Ubuntu Archive Manager) or to her ( WinZip, Windows ).
We next tried tar. Finally, things appear normal for me, but still not ok to the client ( trying PeaZip and 7zip for Windows).
Going into this, I really didn't expect this to be a problem. French speaking computer users must archive things, what are they using?
Any insight or assistance with this would be greatly appreciated. Thanks!
ZIP traditionally encodes filenames using IBM437 encoding. However to my knowledge, many tools (incorrectly) tend to use the default encoding on the system, which will likely cause problems in such a situation, because both ends might use different encodings.
In theory ZIP also supports UTF-8 by now, which should resolve these problems, but again tool-support will be the problem. For example as far as I know the ZIP support of Windows Explorer won't be able to handle UTF-8 encoded filenames.
So we end up with this: both ends have to agree about the encoding used for filenames and you will need an encoding that supports all the characters you have (any Unicode encoding will be fine, I'm not sure about IBM437 though). ZIP came a long way and thus there are many tools which tend to disagree about encoding. If possible, explicitly specify the encoding to use and prefer Unicode. In terms of compatibility with arbitrary tools you might be better off, using a newer format that is designed with Unicode in mind.
7-Zip supports it since 4.58 beta, according to the change log, but will only use it, when the local code page doesn't support the required characters. Using the -mcu command line switch will use UTF-8 for anything but ASCII. The local encodings usually differ only on the non-ASCII character range, so this will most likely do the trick. That is, if the tool used for unpacking also supports UTF-8 (which is more likely for 7-ZIP than for ZIP, because it isn't as old as ZIP and there are fewer unpacking tools).
WinRAR might also be worth a try.
Try using an archive program that allows you to specify the character encoding (say, UTF-8), or figuring out how to do it with the one you have. This forum thread might help you, because it's similar to what you're asking, albeit in reverse and for German rather than French: http://sourceforge.net/projects/sevenzip/forums/forum/45797/topic/3710172
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With