Find repeated pattern in a string of characters using R

Question

I have a large text that contains expressions such as: "aaaahahahahaha that was a good joke". after processing, I want the "aaaaahahahaha" to disappear, or at least, change it to simply "ha".

At the moment, I am using this:

gsub('(.+?)\1', '', str)

This works when the string with the pattern is at the beginning of the sentence, but not where is located anywhere else. So:

str <- "aaaahahahahaha that was a good joke"
gsub('(.+?)\1', '', str)
#[1] "ha that was a good joke"`

But

 str <- "that was aaaahahahahaha a good joke"
 gsub('(.+?)\1', '', str)
#[1] "that was aaaahahahahaha a good joke"

This question might relate to this: find repeated pattern in python, but I can't find the equivalence in R.

I am assuming is very simple and perhaps I am missing something trivial, but since regular expressions are not my strength and I have already tried a bunch of things that have not worked, I was wondering if someone could help me. The question is: How to find, and substitute, repeated patterns in a string of characters in R?

Thanks in advance for your time.

vks · Accepted Answer

\b(\S+?)\1\S*\b

Use this.See demo.

https://regex101.com/r/sJ9gM7/46

For r use \b(\S+?)\1\S*\b with perl=TRUE option.

Find repeated pattern in a string of characters using R

Tags:

string

regex

r

Javier

1 Answers

vks

Recent Activity

Donate For Us

Find repeated pattern in a string of characters using R

Tags:

string

regex

r

Javier

1 Answers

vks

Related questions

Recent Activity

Donate For Us