I have 3 columns delimited by white spaces but the second field is optionally enclosed by double quotes.
I want to extract the 1st field, 2nd field(value within the double quotes) and third field, sometimes the 2nd field value might not be enclosed within the double quotes in that case just return the existing value.
Sample Input
1a "2a 2.1a 2.2a" 3a
4b "5.5b 5.6b 5.7b" 6b
7c 8c 9c
Final output
Matching Information are
1st row match
\1 1a
\2 2a 2.1a 2.2a
\3 3a
2nd row match
\1 4b
\2 5.5b 5.6b 5.7b
\3 6b
3rd row match
\1 7c
\2 8c
\3 9c
I tried the below regex and it works fine for the first two inputs but the third line is not matched, Can someone help me to solve this issue?
Regex i tried:
([a-z0-9]+)\s+"([a-z0-9\s.]+)"\s+([a-z0-9]+)
Link:
https://regex101.com/r/rN4uB4/1
You could simply make the quotations optional in your pattern. By following the preceding token with ? you're telling the regular expression engine to match the preceding between "zero and one" time.
([a-z0-9]+)\s+"?([a-z0-9\s.]+)"?\s+([a-z0-9]+)
If your language supports it, you could use the branch reset feature. By using this feature, both capturing groups in the alternatives are considered as one capturing group.
([a-z0-9]+)\s+(?|"([^"]+)"|([a-z0-9]+))\s+([a-z0-9]+)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With