Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R function doesn't loop through column but repeats first row result

I am trying to use the stemming function suggested in the corpus package stemming vignette here https://cran.r-project.org/web/packages/corpus/vignettes/stemmer.html

but when I try to run the function on the entire column it seems to just be repeating the result for the first row down the rest of the rows. I'm guessing this has to do with the [[1]] within the following function. I'm guessing the solution is something along the lines of "for i in x" but I'm not familiar enough with writing functions to know how to solve this.

df <- data.frame(x = 1:7, y= c("love", "lover", "lovely", "base", "snoop", "dawg", "pound"), stringsAsFactors=FALSE)

stem_hunspell <- function(term) {
    # look up the term in the dictionary
    stems <- hunspell::hunspell_stem(term)[[1]]

    if (length(stems) == 0) { # if there are no stems, use the original term
        stem <- term
    } else { # if there are multiple stems, use the last one
        stem <- stems[[length(stems)]]
    }

    stem
}

df[3] <- stem_hunspell(df$y)

like image 591
Kreitzbe87 Avatar asked Jun 27 '26 23:06

Kreitzbe87


1 Answers

Your intuition is right.

hunspell_stem(term) returns a list of length length(term) of character vectors.

The vectors seem to have the word but only if it was found in a dictionary as the first element and the stem as the second if it isn't a stem already.

> hunspell::hunspell_stem(df$y)
[[1]]
[1] "love"

[[2]]
[1] "lover" "love" 

[[3]]
[1] "lovely" "love"  

[[4]]
[1] "base"

[[5]]
[1] "snoop"

[[6]]
character(0)

[[7]]
[1] "pound"

The below function returns either the stem or the original term

stem_hunspell <- function(term) {
  stems <- hunspell::hunspell_stem(term)
  output <- character(length(term))

  for (i in seq_along(term)) {
    stem <- stems[[i]]
    if (length(stem) == 0) {
      output[i] <- term[i]
    } else {
      output[i] <- stem[length(stem)]
    }
  }
  return(output)
}

If you want dawg not to be returned the function becomes simpler:

stem_hunspell <- function(term) {
  stems <- hunspell::hunspell_stem(term)
  output <- character(length(term))

  for (i in seq_along(term)) {
    stem <- stems[[i]]
    if (length(stem) > 0) {
      output[i] <- stem[length(stem)]
    }
  }
  return(output)
}
like image 187
Robin Gertenbach Avatar answered Jun 29 '26 14:06

Robin Gertenbach



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!