Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove parentheses and text within from strings in R

Tags:

regex

r

In R, I have a list of companies such as:

companies  <-  data.frame(Name=c("Company A Inc (COMPA)","Company B (BEELINE)", "Company C Inc. (Coco)", "Company D Inc.", "Company E")) 

I want to remove the text with parenthesis, ending up with the following list:

                  Name 1        Company A Inc  2            Company B 3       Company C Inc. 4       Company D Inc. 5            Company E 

One approach I tried was to split the string and then use ldply:

companies$Name <- as.character(companies$Name) c<-strsplit(companies$Name, "\\(") ldply(c) 

But because not all company names have parentheses portions, it fails:

Error in list_to_dataframe(res, attr(.data, "split_labels"), .id, id_as_factor) :    Results do not have equal lengths 

I'm not married to the strsplit solution. Whatever removes that text and the parentheses would be fine.

like image 504
aiolias Avatar asked Jun 11 '14 21:06

aiolias


People also ask

How do I remove parentheses from a string?

Using the replace() Function to Remove Parentheses from String in Python. In Python, we use the replace() function to replace some portion of a string with another string. We can use this function to remove parentheses from string in Python by replacing their occurrences with an empty character.

How do I remove text from a string in R?

How to remove a character or multiple characters from a string in R? You can either use R base function gsub() or use str_replace() from stringr package to remove characters from a string or text.

How do I remove text from values in R?

To remove a character in an R data frame column, we can use gsub function which will replace the character with blank. For example, if we have a data frame called df that contains a character column say x which has a character ID in each value then it can be removed by using the command gsub("ID","",as.


1 Answers

A gsub should work here

gsub("\\s*\\([^\\)]+\\)","",as.character(companies$Name))  # [1] "Company A Inc"  "Company B"      "Company C Inc." # [4] "Company D Inc." "Company E"  

Here we just replace occurrences of "(...)" with nothing (also removing any leading space). R makes it look worse than it is with all the escaping we have to do for the parenthesis since they are special characters in regular expressions.

like image 136
MrFlick Avatar answered Oct 06 '22 13:10

MrFlick