Removing repeated characters in strings

Question

This question could be related with this question.

Unfortunately the solution given there doesn't work with my data.

I have the following vector example:

example<-c("ChildrenChildren", "Clothing and shoesClothing and shoes","Education, health and beautyEducation, health and beauty", "Leisure activities, travelingLeisure activities, traveling","LoansLoans","Loans and financial servicesLoans and financial services" ,"Personal transfersPersonal transfers" ,"Savings and investmentsSavings and investments","TransportationTransportation","Utility servicesUtility services")

And I want of course the same strings without repetition, that is:

  > result
 [1]   "Children" "Clothing and shoes" "Education, health and beauty"

Is that possible?

Cath · Accepted Answer

You can use sub for that, directly capturing the bit you want in the pattern part:

sub("(.+)\1", "\1", example)
 #[1] "Children"                      "Clothing and shoes"            "Education, health and beauty"  "Leisure activities, traveling" "Loans"                        
 #[6] "Loans and financial services"  "Personal transfers"            "Savings and investments"       "Transportation"                "Utility services"

(.+) permits to capture some pattern and \1 displays what you just captured so what you're trying to find is "anything twice" and then you replace with the same "anything" but just once.

Spacedman · Answer

If all the strings are repeated, then they are twice as long as they need to be, so take the first half of each string:

> substr(example, 1, nchar(example)/2)
 [1] "Children"                      "Clothing and shoes"           
 [3] "Education, health and beauty"  "Leisure activities, traveling"
 [5] "Loans"                         "Loans and financial services" 
 [7] "Personal transfers"            "Savings and investments"      
 [9] "Transportation"                "Utility services"

Removing repeated characters in strings

Tags:

regex

r

Henry Navarro

2 Answers

Cath

Spacedman

Recent Activity

Donate For Us

Removing repeated characters in strings

Tags:

regex

r

Henry Navarro

2 Answers

Cath

Spacedman

Related questions

Recent Activity

Donate For Us