I have a data.frame that is a single column with 235,886 rows. Each row corresponds to a single word of the English language.
E.g.
> words[10000:10005,1]
[1] anticontagionist anticontagious anticonventional anticonventionalism anticonvulsive
[6] anticor
What I'd like to do is convert each row to a number based on the letters in it. So, if "a" = 1, "b" = 2, "c" = 3, and "d" = 4, then "abcd" = 10. Does anyone know of a way to do that?
My ultimate goal is to have a function that scans the data.frame for a given numeric value and returns all the strings, i.e. words, with that value. So, continuing from the example above, if I asked for the value 9, this function would return "dad" and any other rows having a numeric value of 9.
You can use a combination of strsplit
and match
. I've thrown a tolower
in there to make sure that we are matching to the right thing.
Here's a function that implements those steps:
word_value <- function(words) {
temp <- strsplit(tolower(words), "", TRUE)
vapply(temp, function(x) sum(match(x, letters)), integer(1L))
}
Here's a sample vector:
myvec <- c("and", "dad", "cat", "fox", "mom", "add", "dan")
Test it out:
word_value(myvec)
# [1] 19 9 24 45 41 9 19
myvec[word_value(myvec) == 9]
# [1] "dad" "add"
myvec[word_value(myvec) > 20]
# [1] "cat" "fox" "mom"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With