Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a cross-platform Java method to remove filename special chars?

I'm making a cross-platform application that renames files based on data retrieved online. I'd like to sanitize the Strings I took from a web API for the current platform.

I know that different platforms have different file-name requirements, so I was wondering if there's a cross-platform way to do this?

Edit: On Windows platforms you cannot have a question mark '?' in a file name, whereas in Linux, you can. The file names may contain such characters and I would like for the platforms that support those characters to keep them, but otherwise, strip them out.

Also, I would prefer a standard Java solution that doesn't require third-party libraries.

like image 204
Ben S Avatar asked Jul 20 '09 18:07

Ben S


2 Answers

As suggested elsewhere, this is not usually what you want to do. It is usually best to create a temporary file using a secure method such as File.createTempFile().

You should not do this with a whitelist and only keep 'good' characters. If the file is made up of only Chinese characters then you will strip everything out of it. We can't use a whitelist for this reason, we have to use a blacklist.

Linux pretty much allows anything which can be a real pain. I would just limit Linux to the same list that you limit Windows to so you save yourself headaches in the future.

Using this C# snippet on Windows I produced a list of characters that are not valid on Windows. There are quite a few more characters in this list than you may think (41) so I wouldn't recommend trying to create your own list.

        foreach (char c in new string(Path.GetInvalidFileNameChars()))         {             Console.Write((int)c);             Console.Write(",");         } 

Here is a simple Java class which 'cleans' a file name.

public class FileNameCleaner { final static int[] illegalChars = {34, 60, 62, 124, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 58, 42, 63, 92, 47}; static {     Arrays.sort(illegalChars); } public static String cleanFileName(String badFileName) {     StringBuilder cleanName = new StringBuilder();     for (int i = 0; i < badFileName.length(); i++) {         int c = (int)badFileName.charAt(i);         if (Arrays.binarySearch(illegalChars, c) < 0) {             cleanName.append((char)c);         }     }     return cleanName.toString(); } } 

EDIT: As Stephen suggested you probably also should verify that these file accesses only occur within the directory you allow.

The following answer has sample code for establishing a custom security context in Java and then executing code in that 'sandbox'.

How do you create a secure JEXL (scripting) sandbox?

like image 83
Sarel Botha Avatar answered Oct 12 '22 22:10

Sarel Botha


or just do this:

String filename = "A20/B22b#öA\\BC#Ä$%ld_ma.la.xps"; String sane = filename.replaceAll("[^a-zA-Z0-9\\._]+", "_"); 

Result: A20_B22b_A_BC_ld_ma.la.xps

Explanation:

[a-zA-Z0-9\\._] matches a letter from a-z lower or uppercase, numbers, dots and underscores

[^a-zA-Z0-9\\._] is the inverse. i.e. all characters which do not match the first expression

[^a-zA-Z0-9\\._]+ is a sequence of characters which do not match the first expression

So every sequence of characters which does not consist of characters from a-z, 0-9 or . _ will be replaced.

like image 20
D-rk Avatar answered Oct 12 '22 23:10

D-rk