How would you split at every and/ERT
only when it is not succeded by "/V" inside one word after in:
text <- c("faulty and/ERT something/VBN and/ERT else/VHGB and/ERT as/VVFIN and/ERT not else/VHGB propositions one and/ERT two/CDF and/ERT three/ABC")
# my try - !doesn't work
> strsplit(text, "(?<=and/ERT)\\s(?!./V.)", perl=TRUE)
^^^^
# Exptected return
[[1]]
[1] "faulty and/ERT something/VBN and/ERT else/VHGB and/ERT as/VVFIN and/ERT"
[2] "not else/VHGB propositions one and/ERT"
[3] "two/CDF and/ERT"
[4] "three/ABC"
Actually, you need to approach this in another way:
(?<=and/ERT)\\s(?!\\S+/V)
^^^^
You will need to use \\S+
because using .*
will prevent a match even if /V
is present two words ahead.
\\S+
matches non spaces by the way.
Lastly, the final period can be safely ignored.
regex101 demo
Actually you have made a tiny little mistake but it caused everything not to work:
(?<=and/ERT)\\s(?![^\\s/]+/V)
^^^^^^^
match one or more characters that are not white space or forward slash /
By the way, the dot .
after the /V
is not needed.
Edit: I have made some edits according to @smerny's comment and your edit.
Try this:
(?<=and/ERT)\\s(?![a-zA-Z]+/V)
The problem was that your /V
preceeded and followed by one of anything and your example had more than one character between your space and your /V
.
[a-zA-Z]+/V
makes sure that the only thing between the space and the /V is a single word consisting of letters. I believe this is your requirement based on your description and examples given.
Demo
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With