Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing repeated characters in strings

Tags:

regex

r

This question could be related with this question.

Unfortunately the solution given there doesn't work with my data.

I have the following vector example:

example<-c("ChildrenChildren", "Clothing and shoesClothing and shoes","Education, health and beautyEducation, health and beauty", "Leisure activities, travelingLeisure activities, traveling","LoansLoans","Loans and financial servicesLoans and financial services" ,"Personal transfersPersonal transfers" ,"Savings and investmentsSavings and investments","TransportationTransportation","Utility servicesUtility services")

And I want of course the same strings without repetition, that is:

  > result
 [1]   "Children" "Clothing and shoes" "Education, health and beauty"

Is that possible?

like image 368
Henry Navarro Avatar asked Dec 02 '22 09:12

Henry Navarro


2 Answers

You can use sub for that, directly capturing the bit you want in the pattern part:

sub("(.+)\\1", "\\1", example)
 #[1] "Children"                      "Clothing and shoes"            "Education, health and beauty"  "Leisure activities, traveling" "Loans"                        
 #[6] "Loans and financial services"  "Personal transfers"            "Savings and investments"       "Transportation"                "Utility services"

(.+) permits to capture some pattern and \\1 displays what you just captured so what you're trying to find is "anything twice" and then you replace with the same "anything" but just once.

like image 193
Cath Avatar answered Dec 04 '22 01:12

Cath


If all the strings are repeated, then they are twice as long as they need to be, so take the first half of each string:

> substr(example, 1, nchar(example)/2)
 [1] "Children"                      "Clothing and shoes"           
 [3] "Education, health and beauty"  "Leisure activities, traveling"
 [5] "Loans"                         "Loans and financial services" 
 [7] "Personal transfers"            "Savings and investments"      
 [9] "Transportation"                "Utility services"             
like image 32
Spacedman Avatar answered Dec 04 '22 01:12

Spacedman