Having this four type of file names:
Like this:
String doubleexsension = "doubleexsension.pdf.pdf";
String noextension = "noextension";
String nameWithDot = "nameWithDot.";
String properName = "properName.pdf";
String extension = "pdf";
My aim is to sanitze all the types and output only the filename.filetype
properly. I made a little stupid script in order to make this post:
ArrayList<String> app = new ArrayList<String>();
app.add(doubleexsension);
app.add(properName);
app.add(noextension);
app.add(nameWithDot);
System.out.println("------------");
for(String i : app) {
// Ends with .
if (i.endsWith(".")) {
String m = i + extension;
System.out.println(m);
break;
}
// Double extension
String p = i.replaceAll("(\\.\\w+)\\1+$", "$1");
System.out.println(p);
}
This outputs:
------------
doubleexsension.pdf
properName.pdf
noextension
nameWithDot.pdf
I dont know how can I handle the noextension
one. How can I do it? When there's no extension, it should take the extension
value and apped it to the string at the end.
My desired output would be:
------------
doubleexsension.pdf
properName.pdf
noextension.pdf
nameWithDot.pdf
Thanks in advance.
You may add alternatives to the regex to match all kinds of scenarios:
(?:(\.\w+)\1*|\.|([^.]))$
And replace with $2.pdf
. See the regex demo.
EDIT: In case the extensions that can be duplicated are known, you may use the whitelisting approach via an alternation group:
(?:(\.(?:pdf|gif|jpe?g))\1*|\.|([^.]))$
See another regex demo.
Details:
(?:
- start of grouping, the $
end of string anchor is applied to all the alternatives below (they must be at the end of string)
(\.\w+)\1*
- duplicated (or not) extensions (.
+ 1+ word chars repeated zero or more times) (with the whitelisting approach, only the indicated extensions will be taken into account - (?:pdf|gif|jpe?g)
will only match pdf
, gif
, jpeg, jpg
, etc. if more alternatives are added)|
- or\.
- a dot|
- or([^.])
- any char that is not a dot captured into Group 2)
- end of the outer grouping$
- end of string.See Java demo:
List<String> strs = Arrays.asList("doubleexsension.pdf.pdf","noextension","nameWithDot.","properName.pdf");
for (String str : strs)
System.out.println(str.replaceAll("(?:(\\.\\w+)\\1*|\\.|([^.]))$", "$2.pdf"));
I would avoid the complexity (and reduced readability) of regular expressions:
String m = i;
if (m.endsWith(".")) {
m = m + extension;
}
if (m.endsWith("." + extension + "." + extension)) {
m = m.substring(0, m.length() - extension.length() - 1);
}
if (!m.endsWith("." + extension)) {
m = m + "." + extension;
}
Easy
if (-1 == i.indexOf('.'))
System.out.println(i + "." + extension);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With