How to exclude occurrences after a positive lookbehind?

Question

Suppose I have the following markdown list items:

- [x] Example of a completed task.
- [x] ! Example of a completed task.
- [x] ? Example of a completed task.

I am interested to parse that item using regex and extract the following group captures:

$1: the left [ and the right ] brackets when the symbol x is in-between
$2: the symbol x in between the brackets [ and ]
$3: the modifier ! that follows after [x]
$4: the modifier ? that follows after [x]
$5: the text that follows [x] without a modifier, e.g., [x] This is targeted.
$6: the text that follows [x] !
$7: the text that follows [x] ?

After a lot of trial-and-error using online parsers, I came up with the following:

((?<=x)\]|$$(?=x]))|((?<=\[)x(?=$$))|((?<=$$x$$\s)!(?=\s))|((?<=$$x$$\s)\?(?=\s))|((?<=$$x$$\s)[^!?].*)|((?<=$$x$$\s!\s).*)|((?<=$$x$$\s\?\s).*)

To make the regex above more readable, these are the capture groups listed one by one:

$1: ((?<=x)\]|$$(?=x]))
$2: ((?<=\[)x(?=$$))
$3: ((?<=$$x$$\s)!(?=\s))
$4: ((?<=$$x$$\s)\?(?=\s))
$5: ((?<=$$x$$\s)[^!?].*)
$6: ((?<=$$x$$\s!\s).*)
$7: ((?<=$$x$$\s\?\s).*)

This is most likely not the best way to do it, but at least it seems to capture what I want:

Matches for the example list items

I would like to extend that regex to capture lines in a markdown table that looks like this:

|       | Task name                               |    Plan     |   Actual    |      File      |
| :---- | :-------------------------------------- | :---------: | :---------: | :------------: |
| [x]   | Task one with a reasonably long name.   | 08:00-08:45 | 08:00-09:00 |  [[task-one]]  |
| [x] ! | Task two with a reasonably long name.   | 09:00-09:30 |             |  [[task-two]]  |
| [x] ? | Task three with a reasonably long name. | 11:00-13:00 |             | [[task-three]] |

More specifically, I am interested in having the same group captures as above, but I would like to exclude the table grid (i.e., the |). So, groups $1 to $4 should stay the same, but groups $5 to $7 should capture the text, excluding the |, e.g., like in the selection below:

Matches for the example table

Do you have any ideas on how I can adjust, for example, the regex for group $5 to exclude the |. I have endlessly tried all sorts of negations (e.g., [^\|]). I am using Oniguruma regular expressions.

nps · Accepted Answer

Inspired by the answer by Wiktor , check the following regex, which is quite short

(?:\G(?<!\A)\||(?:\[x]\s[?!]?\s*\|?))\K([^|\n]*)

The explanation to above

1.\G(?!\A)\|

\G asserts position at the end of the previous match or the start of the string for the first match. Negative Lookbehind (?!\A)

\A asserts position at start of the string

| matches the character |

(?:\[x]\s[?!]?\s*\|?)

Non-capturing group. That matches [x], \s (space), [?|!] (zero or 1) followed by \s* (zero or more) and a | (zero or one)

\K

\K resets the starting point of the reported match.

([^|\n]*)

All characters except | or \n (newline) matches previous token zero or unlimited times.

Wiktor Stribiżew · Answer

You can use

((?<=x)]|\[(?=x]))|((?<=\[)x(?=]))|((?<=\[x]\s)!(?=\s))|(?<=\[x]\s)(\?)(?=\s)|(?:\G(?!\A)\||(?<=\[x]\s[?!\s]\s\|))\K([^|\n]*)(?=\|)

See the regex101 PCRE and a Ruby (Onigmo/Oniguruma) demos.

What is added? The (?:\G(?!\A)\||(?<=\[x]\s[?!\s]\s\|))\K([^|\n]*)(?=\|) part:

(?: - start of a non-capturing group (a custom boundary here, we'll match...)
- \G(?!\A)\| - either the end of the previous match and a | char (i.e. | must immediately follow the previous match),
- |(?<=\[x]\s[?!\s]\s\|) - or a location that is immediately preceded with [x] + a whitespace + a ?, ! or whitespace + a whitespace and | char
) - end of the group
\K - match reset operator that removes the text matched so far from the overall match memory buffer
([^|\n]*) - zero or more chars other than | and a line feed char
(?=\|) - a | char must appear immediately to the right of the current location.

How to exclude occurrences after a positive lookbehind?

Tags:

regex

regex-group

visual-studio-code

syntax-highlighting

Mihai

2 Answers

nps

Wiktor Stribiżew

Recent Activity

Donate For Us

How to exclude occurrences after a positive lookbehind?

Tags:

regex

regex-group

visual-studio-code

syntax-highlighting

Mihai

2 Answers

nps

Wiktor Stribiżew

Related questions

Recent Activity

Donate For Us