I'm running through a character vector (approx 10,000 entries) and it has a lot of information in it I wish to discard, but quite a bit that I want to keep. The information I want to keep has to match a given string in another character vector. So, this would be the matching_points
vector containing the arguments that satisfy the matching criteria:
matching_points <- "house|techno|pop|jazz|dreampop|artrock"
and this would be the vector i'd want to clean up:
music <- c("tropical house", "tech house", "funk", "hardcore", "hard rock", "pop", "dream pop", "free jazz")
and through the cleanup operation, I'd want the vector music
to then look like this
[1] "house" "house" "" "" "" "pop" "pop" "jazz"
It would be great if anyone had any idea how I can do this - I suspect there's a simple option that can be applied to the gsub
process in order to invert the process, i.e. keep the stuff that matches and replacing everything else with ""
.
Definitions of sub & gsub: The sub R function replaces the first match in a character string with new characters. The gsub R function replaces all matches in a character string with new characters. In the following tutorial, I’ll explain in two examples how to apply sub and gsub in R.
The gsub function, in contrast, replaces all matches with “c” (i.e. all “a” of our example character string). In Example 1, we replaced only one character pattern (i.e. “a”). However, sometimes we might want to replace multiple patterns with the same new character.
The search term – can be a text fragment or a regular expression. Fixed – option which forces the sub function to treat the search term as a string, overriding any other instructions (useful when a search string can also be interpreted as a regular expression. # gsub in R > base <- "Diogenes the cynic searched Athens for an honest man."
The gsub() function in R can be used to replace all occurrences of certain text within a string in R. This function uses the following basic syntax: gsub(pattern, replacement, x)
You can try stringr
,
library(stringr)
str_extract(music, matching_points)
#[1] "house" "house" NA NA NA "pop" "pop" "jazz"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With