This question could be related with this question.
Unfortunately the solution given there doesn't work with my data.
I have the following vector example:
example<-c("ChildrenChildren", "Clothing and shoesClothing and shoes","Education, health and beautyEducation, health and beauty", "Leisure activities, travelingLeisure activities, traveling","LoansLoans","Loans and financial servicesLoans and financial services" ,"Personal transfersPersonal transfers" ,"Savings and investmentsSavings and investments","TransportationTransportation","Utility servicesUtility services")
And I want of course the same strings without repetition, that is:
> result
[1] "Children" "Clothing and shoes" "Education, health and beauty"
Is that possible?
You can use sub
for that, directly capturing the bit you want in the pattern
part:
sub("(.+)\\1", "\\1", example)
#[1] "Children" "Clothing and shoes" "Education, health and beauty" "Leisure activities, traveling" "Loans"
#[6] "Loans and financial services" "Personal transfers" "Savings and investments" "Transportation" "Utility services"
(.+)
permits to capture some pattern and \\1
displays what you just captured so what you're trying to find is "anything twice" and then you replace with the same "anything" but just once.
If all the strings are repeated, then they are twice as long as they need to be, so take the first half of each string:
> substr(example, 1, nchar(example)/2)
[1] "Children" "Clothing and shoes"
[3] "Education, health and beauty" "Leisure activities, traveling"
[5] "Loans" "Loans and financial services"
[7] "Personal transfers" "Savings and investments"
[9] "Transportation" "Utility services"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With