I have the following data:
temp<-c("AIR BAGS:FRONTAL" ,"SERVICE BRAKES HYDRAULIC:ANTILOCK",
"PARKING BRAKE:CONVENTIONAL",
"SEATS:FRONT ASSEMBLY:POWER ADJUST",
"POWER TRAIN:AUTOMATIC TRANSMISSION",
"SUSPENSION",
"ENGINE AND ENGINE COOLING:ENGINE",
"SERVICE BRAKES HYDRAULIC:ANTILOCK",
"SUSPENSION:FRONT",
"ENGINE AND ENGINE COOLING:ENGINE",
"VISIBILITY:WINDSHIELD WIPER/WASHER:LINKAGES")
I would like to create a new vector that retains only the text before the first ":" in the cases where a ":" is present, and the whole word when ":" is not present.
I have tried to use:
temp=data.frame(matrix(unlist(str_split(temp,pattern=":",n=2)),
+ ncol=2, byrow=TRUE))
but it does not work in the cases where there is no ":"
I know this question is very similar to: truncate string from a certain character in R, which used:
sub("^[^.]*", "", x)
But I am not very familiar with regular expressions and have struggled to reverse that example to retain only the beginning of the string.
You can solve this with a simple regex:
sub("(.*?):.*", "\\1", x)
[1] "AIR BAGS" "SERVICE BRAKES HYDRAULIC" "PARKING BRAKE" "SEATS"
[5] "POWER TRAIN" "SUSPENSION" "ENGINE AND ENGINE COOLING" "SERVICE BRAKES HYDRAULIC"
[9] "SUSPENSION" "ENGINE AND ENGINE COOLING" "VISIBILITY"
How the regex works:
"(.*?):.*"
Look for a repeated set of any characters .*
but modify it with ?
to not be greedy. This should be followed by a colon and then any character (repeated)"\\1"
The bit to understand is that any regex match is greedy by default. By modifying it to be non-greedy, the first pattern match can not include the colon, since the first character after the parentheses is a colon. The regex after the colon is back to the default, i.e. greedy.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With