Suppose I have the following markdown list items:
- [x] Example of a completed task.
- [x] ! Example of a completed task.
- [x] ? Example of a completed task.
I am interested to parse that item using regex and extract the following group captures:
$1: the left [ and the right ] brackets when the symbol x is in-between$2: the symbol x in between the brackets [ and ]$3: the modifier ! that follows after [x]$4: the modifier ? that follows after [x]$5: the text that follows [x] without a modifier, e.g., [x] This is targeted.$6: the text that follows [x] !$7: the text that follows [x] ?After a lot of trial-and-error using online parsers, I came up with the following:
((?<=x)\]|\[(?=x]))|((?<=\[)x(?=\]))|((?<=\[x\]\s)!(?=\s))|((?<=\[x\]\s)\?(?=\s))|((?<=\[x\]\s)[^!?].*)|((?<=\[x\]\s!\s).*)|((?<=\[x\]\s\?\s).*)
To make the regex above more readable, these are the capture groups listed one by one:
$1: ((?<=x)\]|\[(?=x]))$2: ((?<=\[)x(?=\]))$3: ((?<=\[x\]\s)!(?=\s))$4: ((?<=\[x\]\s)\?(?=\s))$5: ((?<=\[x\]\s)[^!?].*)$6: ((?<=\[x\]\s!\s).*)$7: ((?<=\[x\]\s\?\s).*)This is most likely not the best way to do it, but at least it seems to capture what I want:

I would like to extend that regex to capture lines in a markdown table that
looks like this:
| | Task name | Plan | Actual | File |
| :---- | :-------------------------------------- | :---------: | :---------: | :------------: |
| [x] | Task one with a reasonably long name. | 08:00-08:45 | 08:00-09:00 | [[task-one]] |
| [x] ! | Task two with a reasonably long name. | 09:00-09:30 | | [[task-two]] |
| [x] ? | Task three with a reasonably long name. | 11:00-13:00 | | [[task-three]] |
More specifically, I am interested in having the same group captures as above, but I would like to exclude the table grid (i.e., the |). So, groups $1 to $4 should stay the same, but groups $5 to $7 should capture the text, excluding the |, e.g., like in the selection below:

Do you have any ideas on how I can adjust, for example, the regex for group $5 to exclude the |. I have endlessly tried all sorts of negations (e.g., [^\|]). I am using Oniguruma regular expressions.
Inspired by the answer by Wiktor , check the following regex, which is quite short
(?:\G(?<!\A)\||(?:\[x]\s[?!]?\s*\|?))\K([^|\n]*)
The explanation to above
1.\G(?!\A)\|
\G asserts position at the end of the previous match or the start of the string for the first match. Negative Lookbehind (?!\A)
- \A asserts position at start of the string
- | matches the character |
(?:\[x]\s[?!]?\s*\|?)Non-capturing group. That matches [x], \s (space), [?|!] (zero or 1) followed by \s* (zero or more) and a | (zero or one)
\K\K resets the starting point of the reported match.
([^|\n]*)All characters except | or \n (newline) matches previous token zero or unlimited times.
You can use
((?<=x)]|\[(?=x]))|((?<=\[)x(?=]))|((?<=\[x]\s)!(?=\s))|(?<=\[x]\s)(\?)(?=\s)|(?:\G(?!\A)\||(?<=\[x]\s[?!\s]\s\|))\K([^|\n]*)(?=\|)
See the regex101 PCRE and a Ruby (Onigmo/Oniguruma) demos.
What is added? The (?:\G(?!\A)\||(?<=\[x]\s[?!\s]\s\|))\K([^|\n]*)(?=\|) part:
(?: - start of a non-capturing group (a custom boundary here, we'll match...)
\G(?!\A)\| - either the end of the previous match and a | char (i.e. | must immediately follow the previous match),|(?<=\[x]\s[?!\s]\s\|) - or a location that is immediately preceded with [x] + a whitespace + a ?, ! or whitespace + a whitespace and | char) - end of the group\K - match reset operator that removes the text matched so far from the overall match memory buffer([^|\n]*) - zero or more chars other than | and a line feed char(?=\|) - a | char must appear immediately to the right of the current location.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With