Project Euler #22, off by 158,055

Question

I'm currently working through Project Euler problem 22 which has the following challenge:

Using names.txt (right click and 'Save Link/Target As...'), a 46K text file containing over five-thousand first names, begin by sorting it into alphabetical order. Then working out the alphabetical value for each name, multiply this value by its alphabetical position in the list to obtain a name score.

For example, when the list is sorted into alphabetical order, COLIN, which is worth 3 + 15 + 12 + 9 + 14 = 53, is the 938th name in the list. So, COLIN would obtain a score of 938 × 53 = 49714.

What is the total of all the name scores in the file?

The file can be downloaded using the above link. I've written the below code to solve the problem:

rm(list=ls())
library(splitstackshape)

#read in data from http://projecteuler.net/problem=22
names=sort(t(read.table("names.txt",sep=",")))

#letters to numbers conversion vectors
from=LETTERS[seq(1,26)]
to=as.character(seq(1,26))

#function to replace all letters with corresponding numbers
gsub2 = function(pattern, replacement, x, ...){
  for(i in 1:length(pattern))
    x = gsub(pattern[i],paste(replacement[i]," ",sep=""), x, ...)
  x
}

#create df, run function, create row number var for later calculation
df=data.frame(names=names)
df$name.num = gsub2(from,to,df$names)
df$rownum=seq(1,nrow(df))

#split letter values, add across rows, multiply by row number to get name score and sum 
df=concat.split(df,"name.num"," ")
df$name.sum=rowSums(df[,4:15],na.rm=TRUE)
df$name.score=df$name.sum*df$rownum
print(sum(df$name.score,na.rm=TRUE))

My result appears to be off 158,055 (I get 871040227 where it should be 871198282). I've spot checked parts of it, and it appears that the list of names is sorted correctly, and that the name scores are compiling correctly (for instance, I also get COLIN=49174). I've also read other threads troubleshooting this problem on SO, but they're mostly in Python and the problems seem to be different than mine. My suspicion is that either the names.txt file is somehow not being read in right or that perhaps the method I'm using (concat.split from the splitstackshape package) to split the df$name.num is incorrect, though it seems to be working correctly.

Any ideas?

Also, any suggestions on how to improve/simplify my code are more than welcome!

MrFlick · Accepted Answer

I used to have fun doing the Euler problems in R. Here's my solution to 22.

namesscore<-function(name) {
    score<-0;
    for(s in 1:nchar(name)) {
        score<-score + which(substr(name,s,s)==LETTERS[1:26])
    }
    score
}
names<-scan("prob022.txt", "character", sep=",", quote="\"", na.strings="")
name.pos <- rank(names)
name.val <- sapply(names,namesscore)
sum(name.pos*name.val)
# [1] 871198282

There is a name "NA" in the list which may cause you problems.

Project Euler #22, off by 158,055

Tags:

r

splitstackshape

rmbaughman

1 Answers

MrFlick

Recent Activity

Donate For Us

Project Euler #22, off by 158,055

Tags:

r

splitstackshape

rmbaughman

1 Answers

MrFlick

Related questions

Recent Activity

Donate For Us