Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Grouping timestamps with descriptions

I have data for tasks that were recorded with a time sheet app. I'm trying to parse the breaks for each task.

An example break string attached to a task can look like this:

1:19pm – 10:33pm ate tacos 10:35pm – 11:38pm 12:40am – 1:24am took a nap

I need to group this into time stamps with their associated descriptions. The above should be grouped like:

1:19pm – 10:33pm ate tacos

10:35pm – 11:38pm

12:40am – 1:24am took a nap

The description for a break interval can have basically any characters or be any length. Some intervals don't have descriptions.

I figure regex would be the simplest way to get an array of intervals with their descriptions (if they have one).

So far I have:

\d{1,2}:\d{2}[ap]m\s–\s\d{1,2}:\d{2}[ap]m

which matches the time stamps 1:19pm – 10:33pm , 10:35pm – 11:38pm , and 12:40am – 1:24am

I am using JavaScript, and the match function, to parse this data. I want to make a regular expression that will match the time stamp and everything that follows it until the next time stamp.

I'm a beginner with regex so go easy on me. I've been at this for hours, watched several videos, read tutorial blogs, and been experimenting with regex101. Anchors, lookahead/behinds, are confusing and I can't seem to get anything to do what I want. Not looking to become an expert in writing regular expressions, but I would really like learning something new that can be directly applied to what I'm doing.

like image 822
jgoodhcg Avatar asked Dec 11 '15 19:12

jgoodhcg


3 Answers

You can use the following regex:

(\d{1,2}:\d{2}[ap]m\s*–\s*\d{1,2}:\d{2}[ap]m)(\D*(?:\d(?!\d?:\d{2}[ap]m\s)\D*)*)

See the regex demo

The problem you face is matching a text that does not match a specific pattern. This can be achieved either with a tempered greedy token or an unroll-the-loop technique. The latter is preferable since it involves less backtracking. My regex is based on that technique.

Here is the regex explanation:

  • (\d{1,2}:\d{2}[ap]m\s*–\s*\d{1,2}:\d{2}[ap]m) - matches and captures into Group #1 time period (I just added outer parentheses and the * quantifiers to \s classes) - as it is your regex, I won't go into detail
  • (\D*(?:\d(?!\d?:\d{2}[ap]m\s)\D*)*) - this is an unrolled .*?(?=\d{1,2}:\d{2}[ap]m\s) construct matching anything up to the first \d{1,2}:\d{2}[ap]m\s pattern. It is placed in Group #2.
    • \D* - 0 or more characters other than a digit
    • (?:\d(?!\d?:\d{2}[ap]m\s)\D*)* - 0 or more sequences of...
      • \d(?!\d?:\d{2}[ap]m\s) - a digit (\d) that is not followed by 1 or 0 digits followed with : followed with 2 digits, then a or p, then m, and then a whitespace
      • \D* - again, 0 or more characters other than a digit.

JS demo:

var re = /(\d{1,2}:\d{2}[ap]m\s*–\s*\d{1,2}:\d{2}[ap]m)(\D*(?:\d(?!\d?:\d{2}[ap]m\s)\D*)*)/ig; 
var str = '1:19pm – 10:33pm ate tacos 10:35pm – 11:38pm 12:40am – 1:24am took a nap';
var m;
 
while ((m = re.exec(str)) !== null) {
    document.getElementById("r").innerHTML += "Period: " + m[1] + "<br/>";
    document.getElementById("r").innerHTML   += "Description: " + m[2] + "<br/><br/>";
}
<div id="r"/>
like image 181
Wiktor Stribiżew Avatar answered Nov 07 '22 20:11

Wiktor Stribiżew


I'm sure this can be simplified, but the following regular expression seems to work:

Example Here

/(\d{1,2}:\d{2}[ap]m\s–\s\d{1,2}:\d{2}[ap]m(?:.(?!\d{1,2}:\d{2}[ap]m))*)/g

var input = '1:19pm – 10:33pm ate tacos 10:35pm – 11:38pm 12:40am – 1:24am took a nap';
var matches = input.match(/(\d{1,2}:\d{2}[ap]m\s–\s\d{1,2}:\d{2}[ap]m(?:.(?!\d{1,2}:\d{2}[ap]m))*)/g);

for (var i = 0; i < matches.length; i++) {
  snippet.log(matches[i]);
}
<script src="http://tjcrowder.github.io/simple-snippets-console/snippet.js"></script>

Output:

1:19pm – 10:33pm ate tacos

10:35pm – 11:38pm

12:40am – 1:24am took a nap

like image 42
Josh Crozier Avatar answered Nov 07 '22 19:11

Josh Crozier


hope it will help:

https://regex101.com/r/dV7vY5/1

(\d{1,2}:\d{2}[ap]m) – (\d{1,2}:\d{2}[ap]m)([\s|a-z|A-Z]+)

output:

1:19pm – 10:33pm ate tacos

10:35pm – 11:38pm

12:40am – 1:24am took a nap

and you can acess each patter:

 $1 - first hour  (1:19pm)
 $2 - second hour (10:33pm)
 $3 - string      ( ate tacos)

example below:

var string = '1:19pm – 10:33pm ate tacos 10:35pm – 11:38pm 12:40am – 1:24am took a nap';
var regex = /(\d{1,2}:\d{2}[ap]m) – (\d{1,2}:\d{2}[ap]m)([\s|a-z|A-Z]+)/gi;
var eachMatche = string.match(regex);

for (var i = 0; i < eachMatche.length; i++) {
  snippet.log(eachMatche[i]);
  snippet.log('period : '+ eachMatche[i].replace(regex,'$1') +' - ' + eachMatche[i].replace(regex,'$2'));
  snippet.log('description : '+eachMatche[i].replace(regex,'$3'));
}
<script src="http://tjcrowder.github.io/simple-snippets-console/snippet.js"></script>
like image 2
Alvaro Joao Avatar answered Nov 07 '22 21:11

Alvaro Joao