Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R - invert gsub: keep only matches with gsub argument [duplicate]

Tags:

regex

r

gsub

I'm running through a character vector (approx 10,000 entries) and it has a lot of information in it I wish to discard, but quite a bit that I want to keep. The information I want to keep has to match a given string in another character vector. So, this would be the matching_points vector containing the arguments that satisfy the matching criteria:

matching_points <- "house|techno|pop|jazz|dreampop|artrock"

and this would be the vector i'd want to clean up:

music <- c("tropical house", "tech house", "funk", "hardcore", "hard rock", "pop", "dream pop", "free jazz")

and through the cleanup operation, I'd want the vector music to then look like this

[1] "house"  "house"  ""  ""  ""  "pop"  "pop"  "jazz" 

It would be great if anyone had any idea how I can do this - I suspect there's a simple option that can be applied to the gsub process in order to invert the process, i.e. keep the stuff that matches and replacing everything else with "".

like image 441
nikUoM Avatar asked Jun 15 '16 14:06

nikUoM


People also ask

What is sub and GSUB in R?

Definitions of sub & gsub: The sub R function replaces the first match in a character string with new characters. The gsub R function replaces all matches in a character string with new characters. In the following tutorial, I’ll explain in two examples how to apply sub and gsub in R.

How does GSUB work in Python?

The gsub function, in contrast, replaces all matches with “c” (i.e. all “a” of our example character string). In Example 1, we replaced only one character pattern (i.e. “a”). However, sometimes we might want to replace multiple patterns with the same new character.

What is the difference between fixed and search terms in GSUB?

The search term – can be a text fragment or a regular expression. Fixed – option which forces the sub function to treat the search term as a string, overriding any other instructions (useful when a search string can also be interpreted as a regular expression. # gsub in R > base <- "Diogenes the cynic searched Athens for an honest man."

How to replace all occurrences of a string in R?

The gsub() function in R can be used to replace all occurrences of certain text within a string in R. This function uses the following basic syntax: gsub(pattern, replacement, x)


1 Answers

You can try stringr,

library(stringr) 
str_extract(music, matching_points)
#[1] "house" "house" NA      NA      NA      "pop"   "pop"   "jazz" 
like image 83
Sotos Avatar answered Sep 27 '22 16:09

Sotos