I need to extract extensions from file names.
I know this can be done for single extensions like .gz
or .tar
by using filePath.lastIndexOf('.')
or using utility methods like FilenameUtils.getExtension(filePath)
from Apache commons-io.
But, what if I have a file with an extension like .tar.gz
? How can I manage files with extensions that contain .
characters?
The tar GZ file extension is widely used on UNIX based operating systems but can also be used on Windows and macOS with WinZip. Tar GZ files are most commonly used for: Storing multiple files in one archive. Sending and receiving larger files in a compressed format.
The path. extname() method returns the extension of a file path.
To get a filename extension, you can use a combination of split() and pop() methods. The split() method will convert a string into an array of substrings, separated by the character you passed as the method's parameter. And that's how you can get the file extension from a filename.
Since Windows doesn't natively support tar. gz files, you need a third-party tool to open them for you. Most file extraction applications like 7-Zip or WinZip will get the job done. Begin by downloading and installing 7-Zip on your computer if you don't already have it.
If you know what extensions are important, you can simply check for them explicitly. You would have a collection of known extensions, like this:
List<String> EXTS = Arrays.asList("tar.gz", "tgz", "gz", "zip");
You could get the (first) longest matching extension like this:
String getExtension(String fileName) {
String found = null;
for (String ext : EXTS) {
if (fileName.endsWith("." + ext)) {
if (found == null || found.length() < ext.length()) {
found = ext;
}
}
}
return found;
}
So calling getExtension("file.tar.gz")
would return "tar.gz"
.
If you have mixed-case names, perhaps try changing the check to filename.toLowerCase().endsWith("." + ext)
inside the loop.
A file can just have one extension!
If you have a file test.tar.gz
,
.gz
is the extension and test.tar
is the Basename! .tar
in this case is part of the basename, not the part of the extension!
If you like to have a file encoded as tar
and gz
you should call it .tgz
. To use a .tar.gz
is bad practice, if you need to handle thesse files you should make a workaround like rename the file to test.tgz
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With