I'm currently developing a little tool that allows me to convert Github wikis to Github pages. Now I'm trying to add a proper support for Markdown tables (not supported by the parser I'm using).
Now I hook up to the parser's lexer and then extend it with various Github wiki specific tweaks (ie. links) and then pass the modified tokens back to the parser. Tables should fit this scheme as well. My tweaks use various regex patterns and regex replace in order to perform the modifications I need.
I'm a bit stuck with the complicated table syntax, though. You can find an example of that here and here. As you can see there's some structure but some parts are entirely optional.
I've given some thought about it and I think I would like a regex that would output me a group containing the header (first line), the column alignment data (second line) and actual content as separate groups. It should contain at least one content line in order to match. The header and alignment data also has to obey certain rules as seen on the examples.
How would you approach building a regex such as this? Better yet, can someone provide me some starting point where to build upon? It's possible my approach is misguided (perhaps regex can be avoided?). If so, any ideas leading to the same results easier are appreciated.
I need a regex solution to the same problem. Here's what I've got so far, will update it as I am able to improve it:
|(?:([^\r\n|]*)\|)+\r?\n\|(?:(:?-+:?)\|)+\r?\n(\|(?:([^\r\n|]*)\|)+\r?\n)+
Debuggex Demo
Tested with javascript
I had the same problem, and never finding a suitable answer, I eventually came up with the following.
^(\|[^\n]+\|\r?\n)((?:\|:?[-]+:?)+\|)(\n(?:\|[^\n]+\|\r?\n?)*)?$
Flags are "Global", and "Multiline".
Although it's not really based on Sean's answer, it did end up being rather similar, with a few notable differences such as being a little shorter, completing in fewer steps (59 vs 126 steps, according to regex101.com), and having probably more "sensible" capturing groups. Plus it allows for "incomplete" tables too. (As in no "body"). (The reason I'm adding it in a separate answer is that I really do find it more useful, plus my ego would not allow me to do otherwise) ;).
In a nutshell:
|
character, and the "cell alignments" line is properly formatted.Tested in Java, (Android) and here:
Regex101
and here:
Debuggex Demo
Hope it helps someone. :)
Somethin that I did:
[|]?(\s+[A-Za-z0-9 -_*#@$%:;?!.,\/\\]+\s+)[|]?[|]?(\s+[A-Za-z0-9 -_*#@$%:;?!.,\/\\]+\s+)[|]?[|]?(\s+[A-Za-z0-9 -_*#@$%:;?!.,\/\\]+\s+)[|]?\r?\n?\|?:-+:\|?:-+:\|?:-+:\|?
—modifier - global
\|?:-+:\|?:-+:\|?:-+:\|?\r?\n?
-- modifier - global
[|]?(\s+[A-Za-z0-9 -_*#@$%:;?!.,\/\\]+\s+)[|]?[|]?(\s+[A-Za-z0-9 -_*#@$%:;?!.,\/\\]+\s+)[|]?[|]?(\s+[A-Za-z0-9 -_*#@$%:;?!.,\/\\]+\s+)[|]?\r?\n?
—modifiers - global, multiline
This is table, for parsing.
| Tables | Are | Cool |
|:-------------:|:-------------:|:-----:|
| col 3 is | r-l | $1600 |
| col 2 is | centered | $12 |
| zebra stripes | are neat | $1 |
I ended up skipping Regex altogether and just hacked it together using conventional logic. It might not be as pretty or short as a Regex based one but at least I can maintain this easily.
I did find some Regexes that might have fit this purpose btw. See MultiMarkdown.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With