Given an arbitrary string, for example ("I'm going to play croquet next Friday"
or "Gadzooks, is it 17th June already?"
), how would you go about extracting the dates from there?
If this is looking like a good candidate for the too-hard basket, perhaps you could suggest an alternative. I want to be able to parse Twitter messages for dates. The tweets I'd be looking at would be ones which users are directing at this service, so they could be coached into using an easier format, however I'd like it to be as transparent as possible. Is there a good middle ground you could think of?
PHP | date_parse() Function Return Value: Returns an associative array containing information about the parsed date. Errors/Exceptions: In case if the date format has an error, an error message will appear. Below programs illustrate the date_parse() function.
The strtotime() function parses an English textual datetime into a Unix timestamp (the number of seconds since January 1 1970 00:00:00 GMT). Note: If the year is specified in a two-digit format, values between 0-69 are mapped to 2000-2069 and values between 70-100 are mapped to 1970-2000.
The date_parse() function returns an associative array with detailed information about a specified date.
we can analyze the dates by simple comparison operator if the given dates are in a similar format. <? php $date1 = "2018-11-24"; $date2 = "2019-03-26"; if ($date1 > $date2) echo "$date1 is latest than $date2"; else echo "$date1 is older than $date2"; ?>
If you have the horsepower, you could try the following algorithm. I'm showing an example, and leaving the tedious work up to you :)
//Attempt to perform strtotime() on each contiguous subset of words... //1st iteration strtotime("Gadzooks, is it 17th June already") strtotime("is it 17th June already") strtotime("it 17th June already") strtotime("17th June already") strtotime("June already") strtotime("already") //2nd iteration strtotime("Gadzooks, is it 17th June") strtotime("is it 17th June") strtotime("17th June") //date! strtotime("June") //date! //3rd iteration strtotime("Gadzooks, is it 17th") strtotime("is it 17th") strtotime("it 17th") strtotime("17th") //date! //4th iteration strtotime("Gadzooks, is it") //etc
And we can assume that strtotime("17th June")
is more accurate than strtotime("17th")
simply because it contains more words... i.e. "next Friday" will always be more accurate than "Friday".
I would do it this way:
First check if the entire string is a valid date with strtotime(). If so, you're done.
If not, determine how many words are in your string (split on whitespace for example). Let this number be n.
Loop over every n-1 word combination and use strtotime() to see if the phrase is a valid date. If so you've found the longest valid date string within your original string.
If not, loop over every n-2 word combination and use strtotime() to see if the phrase is a valid date. If so you've found the longest valid date string within your original string.
...and so on until you've found a valid date string or searched every single/individual word. By finding the longest matches, you'll get the most informed dates (if that makes sense). Since you're dealing with tweets, your strings will never be huge.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With