Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Octave - Finding words in a string using regex

In Octave, I am finding words ending with only whitespaces, or either a comma or a period followed by whitespace(s).

The following is my code:

str = 'Hello, I am kjd#(@*#@m, aa.aa.aa.aa. It was nice meeting you.';
regexp(str, "\[a-zA-Z]+\[,.]?\s+", 'match')

This should return the words Hello, I, am, It, was, nice, meeting, you. However, it only returns was. I'm having a hard time figuring this out.

I've also tried tried this answer: https://stackoverflow.com/a/29174222/6213337, but it returns ans = {}(1x0).

Any ideas? Thanks.

like image 814
CH123 Avatar asked Jan 22 '26 06:01

CH123


1 Answers

Matlab uses PCRE regex flavor, thus, the regex pattern you need can be short and compact and quite comprehensive:

str = 'Hello, I am kjd#(@*#@m, aa.aa.aa.aa. It was nice meeting you.';
regexp(str, "(?<!\\S)\\p{L}++(?!\\p{P}\\S)", 'match')
print match

See the regex and IDEONE demos.

The regex matches:

  • (?<!\S) - check if there is no non-whitespace character before the current location in string, and if there is not, go on matching....
  • \p{L}++ - any 1+ letters (possessively, not allowing backtracking, thus, the next check will only be performed once after the last letter matched) that are NOT followed with...
  • (?!\p{P}\S) - any punctuation and then a non-whitespace ((?!...) is a negative lookahead that fails a match if its subpattern matches to the right of the current location in the string).
like image 112
Wiktor Stribiżew Avatar answered Jan 23 '26 20:01

Wiktor Stribiżew