In Octave, I am finding words ending with only whitespaces, or either a comma or a period followed by whitespace(s).
The following is my code:
str = 'Hello, I am kjd#(@*#@m, aa.aa.aa.aa. It was nice meeting you.';
regexp(str, "\[a-zA-Z]+\[,.]?\s+", 'match')
This should return the words
Hello, I, am, It, was, nice, meeting, you.
However, it only returns was. I'm having a hard time figuring this out.
I've also tried tried this answer: https://stackoverflow.com/a/29174222/6213337, but it returns ans = {}(1x0).
Any ideas? Thanks.
Matlab uses PCRE regex flavor, thus, the regex pattern you need can be short and compact and quite comprehensive:
str = 'Hello, I am kjd#(@*#@m, aa.aa.aa.aa. It was nice meeting you.';
regexp(str, "(?<!\\S)\\p{L}++(?!\\p{P}\\S)", 'match')
print match
See the regex and IDEONE demos.
The regex matches:
(?<!\S) - check if there is no non-whitespace character before the current location in string, and if there is not, go on matching....\p{L}++ - any 1+ letters (possessively, not allowing backtracking, thus, the next check will only be performed once after the last letter matched) that are NOT followed with...(?!\p{P}\S) - any punctuation and then a non-whitespace ((?!...) is a negative lookahead that fails a match if its subpattern matches to the right of the current location in the string).If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With