First of all, I'd like to know if there is an existing library that is similar to SimpleDateFormat but supports wildcard characters? If not, what is the best approach for this?
I have this problem where I need to match and extract the date from a file name but I could not seem to find the right approach for this scenario. While I admit that the scenario below isn't practical at all for a file name, I've had to include it still as a "WHAT IF".
Filename: 19882012ABCseptemberDEF03HIJ12KLM0156_249.zip, Pattern: yyyyMMMddhhmmss'_.zip'
I see a lot of issues parsing this (e.g. determining the correct year). I hope you guys can shed some light and help me get to the right direction.
There is no sunch thing that I know of in SimpleDateFormat but what you can do is check with a regular expression if the input filename match, and if it does extract what matched to create your date.
This is a quick regex that validates your criterias:
(.*?)([0-9]{4})([^0-9]*?)([a-z]+)(.*?)([0-9]{2})(.*?)([0-9]{2})(.*?)([0-9]{4})_([^.]+)[.]zip
Which means (it's really not that complicated)
(.*?) // anything
([0-9]{4}) // followed by 4 digits
([^0-9]*?) // followed by anything excepted digits
([a-z]+) // followed by a sequence of text in lowercase
(.*?) // followed by anything
([0-9]{2}) // until it finds 2 digits
(.*?) // followed by anything
([0-9]{2}) // until it finds 2 digits again
(.*?) // followed by anything
([0-9]{4}) // until if finds 4 consecutive digits
_([^.]+) // an underscore followed by anything except a dot '.'
[.]zip // the file extension
You can use it in Java
String filename = "19882012ABCseptemberDEF03HIJ12KLM0156_249.zip";
String regex = "(.*?)([0-9]{4})([^0-9]*?)([a-z]+)(.*?)([0-9]{2})(.*?)([0-9]{2})(.*?)([0-9]{4})_([^.]+)[.]zip";
Matcher m = Pattern.compile(regex).matcher(filename);
if (m.matches()) {
// m.group(2); // the year
// m.group(4); // the month
// m.group(6); // the day
// m.group(8); // the hour
// m.group(10); // the minutes & seconds
String dateString = m.group(2) + "-" + m.group(4) + "-" + m.group(6) + " " + m.group(8) + m.group(10);
Date date = new SimpleDateFormat("yyyy-MMM-dd HHmmss").parse(dateString);
// here you go with your date
}
Runnable sample on ideone: http://ideone.com/GBDEJ
Edit:
you can avoid matching what you dont wan't by removing the parenthesis around what you dont care. Then the regular expression becomes .*?([0-9]{4})[^0-9]*?([a-z]+).*?([0-9]{2}).*?([0-9]{2}).*?([0-9]{4})_[^.]+[.]zip and the matched group becomes
group(1): the year
group(2): the month
group(3): the day
group(4): the hour
group(5): the minutes & secondes
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With