Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Assigning numeric values based on the letters in a string in R

Tags:

r

I have a data.frame that is a single column with 235,886 rows. Each row corresponds to a single word of the English language.

E.g.

> words[10000:10005,1]

[1] anticontagionist anticontagious anticonventional anticonventionalism anticonvulsive
[6] anticor

What I'd like to do is convert each row to a number based on the letters in it. So, if "a" = 1, "b" = 2, "c" = 3, and "d" = 4, then "abcd" = 10. Does anyone know of a way to do that?

My ultimate goal is to have a function that scans the data.frame for a given numeric value and returns all the strings, i.e. words, with that value. So, continuing from the example above, if I asked for the value 9, this function would return "dad" and any other rows having a numeric value of 9.

like image 751
BenL126 Avatar asked Feb 07 '23 11:02

BenL126


1 Answers

You can use a combination of strsplit and match. I've thrown a tolower in there to make sure that we are matching to the right thing.

Here's a function that implements those steps:

word_value <- function(words) {
  temp <- strsplit(tolower(words), "", TRUE)
  vapply(temp, function(x) sum(match(x, letters)), integer(1L))
}

Here's a sample vector:

myvec <- c("and", "dad", "cat", "fox", "mom", "add", "dan")

Test it out:

word_value(myvec)
# [1] 19  9 24 45 41  9 19

myvec[word_value(myvec) == 9]
# [1] "dad" "add"

myvec[word_value(myvec) > 20]
# [1] "cat" "fox" "mom"
like image 160
A5C1D2H2I1M1N2O1R2T1 Avatar answered Feb 16 '23 03:02

A5C1D2H2I1M1N2O1R2T1