Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex to extract time (e.g. 7:30pm, 8 pm, 9.05) from string

Tags:

regex

I'm working on a Rails application which uses an external feed for some event data and annoyingly they only provide a string with the time in. For example:

Doors open at 7:30pm, show starts at 9pm

I'm aiming to extract the first time from these strings and put it into a datetime field. The system needs to capture the following kinds of values:

  1. 11 am
  2. 12pm
  3. 1pm
  4. 2:15pm
  5. 3.30pm
  6. 4.45
  7. 5:30
  8. 06:15
  9. 07:30pm
  10. 8:30 pm
  11. 9.15 pm

But not these ones:

  1. 105
  2. 2 50
  3. 305pm
  4. 4 15pm
  5. 74pm
  6. 840am

I figure the best way to do this is with regex and through some searching (and particularly this SO question) I've got the following:

[0-9]{1,2}(:|.)??[0-9]{0,2}\s?(am|pm|AM|PM)

It partly works but doesn't exclude any of the ones I don't want and seems to only capture the first character of am/pm in 2 and 3.

Is this possible with regex?

Thanks!

like image 246
samlester Avatar asked Sep 16 '25 09:09

samlester


2 Answers

\b((?:0?[1-9]|1[0-2])(?!\d| (?![ap]))[:.]?(?:(?:[0-5][0-9]))?(?:\s?[ap]m)?)\b

It doesn't support 24-hour format but it enforces valid times. Add a case insensitive flag to your regex engine, whatever language it may be, or wrap the regex with (i: ) if it is supported.

Demo with your sample

Regex:

NODE                     EXPLANATION
--------------------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char
--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    (?:                      group, but do not capture:
--------------------------------------------------------------------------------
      0?                       '0' (optional (matching the most
                               amount possible))
--------------------------------------------------------------------------------
      [1-9]                    any character of: '1' to '9'
--------------------------------------------------------------------------------
     |                        OR
--------------------------------------------------------------------------------
      1                        '1'
--------------------------------------------------------------------------------
      [0-2]                    any character of: '0' to '2'
--------------------------------------------------------------------------------
    )                        end of grouping
--------------------------------------------------------------------------------
    (?!                      look ahead to see if there is not:
--------------------------------------------------------------------------------
      \d                       digits (0-9)
--------------------------------------------------------------------------------
     |                        OR
--------------------------------------------------------------------------------
                               ' '
--------------------------------------------------------------------------------
      (?!                      look ahead to see if there is not:
--------------------------------------------------------------------------------
        [ap]                     any character of: 'a', 'p'
--------------------------------------------------------------------------------
      )                        end of look-ahead
--------------------------------------------------------------------------------
    )                        end of look-ahead
--------------------------------------------------------------------------------
    [:.]?                    any character of: ':', '.' (optional
                             (matching the most amount possible))
--------------------------------------------------------------------------------
    (?:                      group, but do not capture (optional
                             (matching the most amount possible)):
--------------------------------------------------------------------------------
      (?:                      group, but do not capture:
--------------------------------------------------------------------------------
        [0-5]                    any character of: '0' to '5'
--------------------------------------------------------------------------------
        [0-9]                    any character of: '0' to '9'
--------------------------------------------------------------------------------
      )                        end of grouping
--------------------------------------------------------------------------------
    )?                       end of grouping
--------------------------------------------------------------------------------
    (?:                      group, but do not capture (optional
                             (matching the most amount possible)):
--------------------------------------------------------------------------------
      \s?                      whitespace (\n, \r, \t, \f, and " ")
                               (optional (matching the most amount
                               possible))
--------------------------------------------------------------------------------
      [ap]                     any character of: 'a', 'p'
--------------------------------------------------------------------------------
      m                        'm'
--------------------------------------------------------------------------------
    )?                       end of grouping
--------------------------------------------------------------------------------
  )                        end of \1
--------------------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char
like image 83
dee-see Avatar answered Sep 18 '25 23:09

dee-see


Perhaps something like this:

^[01]?[0-9]([:.][0-9]{2})?(\s?[ap]m)?$

Demonstration

Note that this will not handle 24-hour time, and it's not that specific about 12-hour time—i.e. it would match 19pm.

If you want to be more specific, you might try:

^((0?[0-9]|1[012])([:.][0-9]{2})?(\s?[ap]m)|([01]?[0-9]|2[0-3])([:.][0-9]{2})?)$

Demonstration

Or to try to match it as part of a larger portion of text, you might use something like this:

\b((0?[1-9]|1[012])([:.][0-5][0-9])?(\s?[ap]m)|([01]?[0-9]|2[0-3])([:.][0-5][0-9]))\b

Demonstration

like image 43
p.s.w.g Avatar answered Sep 19 '25 00:09

p.s.w.g