I'm making a cross-platform application that renames files based on data retrieved online. I'd like to sanitize the Strings I took from a web API for the current platform.
I know that different platforms have different file-name requirements, so I was wondering if there's a cross-platform way to do this?
Edit: On Windows platforms you cannot have a question mark '?' in a file name, whereas in Linux, you can. The file names may contain such characters and I would like for the platforms that support those characters to keep them, but otherwise, strip them out.
Also, I would prefer a standard Java solution that doesn't require third-party libraries.
As suggested elsewhere, this is not usually what you want to do. It is usually best to create a temporary file using a secure method such as File.createTempFile().
You should not do this with a whitelist and only keep 'good' characters. If the file is made up of only Chinese characters then you will strip everything out of it. We can't use a whitelist for this reason, we have to use a blacklist.
Linux pretty much allows anything which can be a real pain. I would just limit Linux to the same list that you limit Windows to so you save yourself headaches in the future.
Using this C# snippet on Windows I produced a list of characters that are not valid on Windows. There are quite a few more characters in this list than you may think (41) so I wouldn't recommend trying to create your own list.
foreach (char c in new string(Path.GetInvalidFileNameChars())) { Console.Write((int)c); Console.Write(","); }
Here is a simple Java class which 'cleans' a file name.
public class FileNameCleaner { final static int[] illegalChars = {34, 60, 62, 124, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 58, 42, 63, 92, 47}; static { Arrays.sort(illegalChars); } public static String cleanFileName(String badFileName) { StringBuilder cleanName = new StringBuilder(); for (int i = 0; i < badFileName.length(); i++) { int c = (int)badFileName.charAt(i); if (Arrays.binarySearch(illegalChars, c) < 0) { cleanName.append((char)c); } } return cleanName.toString(); } }
EDIT: As Stephen suggested you probably also should verify that these file accesses only occur within the directory you allow.
The following answer has sample code for establishing a custom security context in Java and then executing code in that 'sandbox'.
How do you create a secure JEXL (scripting) sandbox?
or just do this:
String filename = "A20/B22b#öA\\BC#Ä$%ld_ma.la.xps"; String sane = filename.replaceAll("[^a-zA-Z0-9\\._]+", "_");
Result: A20_B22b_A_BC_ld_ma.la.xps
Explanation:
[a-zA-Z0-9\\._]
matches a letter from a-z lower or uppercase, numbers, dots and underscores
[^a-zA-Z0-9\\._]
is the inverse. i.e. all characters which do not match the first expression
[^a-zA-Z0-9\\._]+
is a sequence of characters which do not match the first expression
So every sequence of characters which does not consist of characters from a-z, 0-9 or . _ will be replaced.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With