Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I sanitize a string for use as a filename?

I've got a routine that converts a file into a different format and saves it. The original datafiles were numbered, but my routine gives the output a filename based on an internal name found in the original.

I tried to batch-run it on a whole directory, and it worked fine until I hit one file whose internal name had a slash in it. Oops! And if it does that here, it could easily do it on other files. Is there an RTL (or WinAPI) routine somewhere that will sanitize a string and remove invalid symbols so it's safe to use as a filename?

like image 937
Mason Wheeler Avatar asked Jun 06 '09 23:06

Mason Wheeler


People also ask

What is sanitize the filename?

Description. Removes special characters that are illegal in filenames on certain operating systems and special characters requiring special escaping to manipulate at the command line.

What characters can you use in a filename?

Supported characters for a file name are letters, numbers, spaces, and ( ) _ - , . *Please note file names should be limited to 100 characters. Characters that are NOT supported include, but are not limited to: @ $ % & \ / : * ? " ' < > | ~ ` # ^ + = { } [ ] ; !


2 Answers

You can use PathGetCharType function, PathCleanupSpec function or the following trick:

  function IsValidFilePath(const FileName: String): Boolean;   var     S: String;     I: Integer;   begin     Result := False;     S := FileName;     repeat       I := LastDelimiter('\/', S);       MoveFile(nil, PChar(S));       if (GetLastError = ERROR_ALREADY_EXISTS) or          (            (GetFileAttributes(PChar(Copy(S, I + 1, MaxInt))) = INVALID_FILE_ATTRIBUTES)            and            (GetLastError=ERROR_INVALID_NAME)          ) then         Exit;       if I>0 then         S := Copy(S,1,I-1);     until I = 0;     Result := True;   end; 

This code divides string into parts and uses MoveFile to verify each part. MoveFile will fail for invalid characters or reserved file names (like 'COM') and return success or ERROR_ALREADY_EXISTS for valid file name.


PathCleanupSpec is in the Jedi Windows API under Win32API/JwaShlObj.pas

like image 181
Alex Avatar answered Nov 07 '22 19:11

Alex


Regarding the question whether there is any API function to sanitize a file a name (or even check for its validity) - there seems to be none. Quoting from the comment on the PathSearchAndQualify() function:

There does not appear to be any Windows API that will validate a path entered by the user; this is left as an an ad hoc exercise for each application.

So you can only consult the rules for file name validity from File Names, Paths, and Namespaces (Windows):

  • Use almost any character in the current code page for a name, including Unicode characters and characters in the extended character set (128–255), except for the following:

    • The following reserved characters are not allowed:
      < > : " / \ | ? *
    • Characters whose integer representations are in the range from zero through 31 are not allowed.
    • Any other character that the target file system does not allow.
  • Do not use the following reserved device names for the name of a file: CON, PRN, AUX, NUL, COM1..COM9, LPT1..LPT9.
    Also avoid these names followed immediately by an extension; for example, NUL.txt is not recommended.

If you know that your program will only ever write to NTFS file systems you can probably be sure that there are no other characters that the file system does not allow, so you would only have to check that the file name is not too long (use the MAX_PATH constant) after all invalid chars have been removed (or replaced by underscores, for example).

A program should also make sure that the file name sanitizing has not lead to file name conflicts and it silently overwrites other files which ended up with the same name.

like image 24
mghie Avatar answered Nov 07 '22 19:11

mghie