I'm currently working through Project Euler problem 22 which has the following challenge:
Using names.txt (right click and 'Save Link/Target As...'), a 46K text file containing over five-thousand first names, begin by sorting it into alphabetical order. Then working out the alphabetical value for each name, multiply this value by its alphabetical position in the list to obtain a name score.
For example, when the list is sorted into alphabetical order, COLIN, which is worth 3 + 15 + 12 + 9 + 14 = 53, is the 938th name in the list. So, COLIN would obtain a score of 938 × 53 = 49714.
What is the total of all the name scores in the file?
The file can be downloaded using the above link. I've written the below code to solve the problem:
rm(list=ls())
library(splitstackshape)
#read in data from http://projecteuler.net/problem=22
names=sort(t(read.table("names.txt",sep=",")))
#letters to numbers conversion vectors
from=LETTERS[seq(1,26)]
to=as.character(seq(1,26))
#function to replace all letters with corresponding numbers
gsub2 = function(pattern, replacement, x, ...){
for(i in 1:length(pattern))
x = gsub(pattern[i],paste(replacement[i]," ",sep=""), x, ...)
x
}
#create df, run function, create row number var for later calculation
df=data.frame(names=names)
df$name.num = gsub2(from,to,df$names)
df$rownum=seq(1,nrow(df))
#split letter values, add across rows, multiply by row number to get name score and sum
df=concat.split(df,"name.num"," ")
df$name.sum=rowSums(df[,4:15],na.rm=TRUE)
df$name.score=df$name.sum*df$rownum
print(sum(df$name.score,na.rm=TRUE))
My result appears to be off 158,055 (I get 871040227
where it should be 871198282
). I've spot checked parts of it, and it appears that the list of names is sorted correctly, and that the name scores are compiling correctly (for instance, I also get COLIN=49174
). I've also read other threads troubleshooting this problem on SO, but they're mostly in Python and the problems seem to be different than mine. My suspicion is that either the names.txt
file is somehow not being read in right or that perhaps the method I'm using (concat.split
from the splitstackshape
package) to split the df$name.num
is incorrect, though it seems to be working correctly.
Any ideas?
Also, any suggestions on how to improve/simplify my code are more than welcome!
I used to have fun doing the Euler problems in R. Here's my solution to 22.
namesscore<-function(name) {
score<-0;
for(s in 1:nchar(name)) {
score<-score + which(substr(name,s,s)==LETTERS[1:26])
}
score
}
names<-scan("prob022.txt", "character", sep=",", quote="\"", na.strings="")
name.pos <- rank(names)
name.val <- sapply(names,namesscore)
sum(name.pos*name.val)
# [1] 871198282
There is a name "NA" in the list which may cause you problems.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With