Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R - Converting Fractions in Text to Numeric

Tags:

string

r

I'm trying to convert, for example, '9¼"'to '9.25' but cannot seem to read the fraction correctly.

Here's the data I'm working with:

library(XML)

url <- paste("http://mockdraftable.com/players/2014/", sep = "")  
combine <- readHTMLTable(url,which=1, header=FALSE, stringsAsFactors=F)

names(combine) <- c("Name", "Pos", "Hght", "Wght", "Arms", "Hands",
                    "Dash40yd", "Dash20yd", "Dash10yd", "Bench", "Vert", "Broad", 
                    "Cone3", "ShortShuttle20")

As an example, the Hands column in the first row is '9¼"', how would I make combine$Hands become 9.25? Same for all of the other fractions 1/8 - 7/8.

Any help would be appreciated.

like image 656
Frank B. Avatar asked Feb 22 '15 21:02

Frank B.


1 Answers

You can try to transform the unicode encoding to ASCII directly when reading the XML using a special return function:

library(stringi)
readHTMLTable(url,which=1, header=FALSE, stringsAsFactors=F,elFun=function(node) {
        val = xmlValue(node); stri_trans_general(val,"latin-ascii")})

You can then use @Metrics' suggestion to convert it to numbers.

You could do for example, using @G. Grothendieck's function from this post clean up the Arms data:

library(XML)
library(stringi)
library(gsubfn)
#the calc function is by @G. Grothendieck
calc <- function(s) {
        x <- c(if (length(s) == 2) 0, as.numeric(s), 0:1)
        x[1] + x[2] / x[3]
}

url <- paste("http://mockdraftable.com/players/2014/", sep = "")  

combine<-readHTMLTable(url,which=1, header=FALSE, stringsAsFactors=F,elFun=function(node) {
        val = xmlValue(node); stri_trans_general(val,"latin-ascii")})

names(combine) <- c("Name", "Pos", "Hght", "Wght", "Arms", "Hands",
                    "Dash40yd", "Dash20yd", "Dash10yd", "Bench", "Vert", "Broad", 
                    "Cone3", "ShortShuttle20")

sapply(strapplyc(gsub('\"',"",combine$Arms), "\\d+"), calc)

#[1] 30.000 31.500 30.000 31.750 31.875 29.875 31.000 31.000 30.250 33.000 32.500 31.625 32.875

There might be some encoding issues depending on your machine (see the comments)

like image 60
NicE Avatar answered Oct 04 '22 10:10

NicE