Remove certain words in string from column in dataframe in R

Question

I have a dataset in R that lists out a bunch of company names and want to remove words like "Inc", "Company", "LLC", etc. for part of a clean-up effort. I have the following sample data:

sampleData

  Location             Company
1 New York, NY         XYZ Company
2 Chicago, IL          Consulting Firm LLC
3 Miami, FL            Smith & Co.

Words I do not want to include in my output:

stopwords = c("Inc","inc","co","Co","Inc.","Co.","LLC","Corporation","Corp","&")

I built the following function to break out each word, remove the stopwords, and then bring the words back together, but it is not iterating through each row of the dataset.

removeWords <- function(str, stopwords) {
  x <- unlist(strsplit(str, " "))
  paste(x[!x %in% stopwords], collapse = " ")
}

removeWords(sampleData$Company,stopwords)

The output for the above function looks like this:

[1] "XYZ Company Consulting Firm Smith"

T he output should be:

 Location              Company
1 New York, NY         XYZ Company
2 Chicago, IL          Consulting Firm
3 Miami, FL            Smith

Any help would be appreciated.

Hardik Gupta · Accepted Answer

We can use 'tm' package

library(tm)

stopwords = readLines('stopwords.txt')     #Your stop words file
x  = df$company        #Company column data
x  =  removeWords(x,stopwords)     #Remove stopwords

df$company_new <- x     #Add the list as new column and check

Remove certain words in string from column in dataframe in R

Tags:

r

Shannon

1 Answers

Hardik Gupta

Recent Activity

Donate For Us

Remove certain words in string from column in dataframe in R

Tags:

r

Shannon

1 Answers

Hardik Gupta

Related questions

Recent Activity

Donate For Us