Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert written number to number in R

Tags:

Does anybody know a function to convert a text representation of a number into an actual number, e.g. 'twenty thousand three hundred and five' into 20305. I have written numbers in dataframe rows and want to convert them to numbers.

In package qdap, you can replace numeric represented numbers with words (e.g., 1001 becomes one thousand one), but not the other way around:

library(qdap) replace_number("I like 346457 ice cream cones.") [1] "I like three hundred forty six thousand four hundred fifty seven ice cream cones." 
like image 418
Henk Avatar asked Aug 20 '13 10:08

Henk


People also ask

How do I convert text to numeric in R?

To convert character to numeric in R, use the as. numeric() function. The as. numeric() is a built-in R function that creates or coerces objects of type “numeric”.

How do I convert a dataset to numeric in R?

Data Visualization using R Programming To convert columns of an R data frame from integer to numeric we can use lapply function. For example, if we have a data frame df that contains all integer columns then we can use the code lapply(df,as. numeric) to convert all of the columns data type into numeric data type.

How do I convert a decimal to numeric in R?

To convert a float or double to integer in R, use the as. integer() function. The as. integer() is an inbuilt function used for an object of class ursaRaster that truncates the decimal part of image values and then converts to type integer.

What does as numeric do in R?

numeric() function in R is used to convert a character vector into a numeric vector.


2 Answers

Here's a start that should get you to hundreds of thousands.

word2num <- function(word){     wsplit <- strsplit(tolower(word)," ")[[1]]     one_digits <- list(zero=0, one=1, two=2, three=3, four=4, five=5,                        six=6, seven=7, eight=8, nine=9)     teens <- list(eleven=11, twelve=12, thirteen=13, fourteen=14, fifteen=15,                   sixteen=16, seventeen=17, eighteen=18, nineteen=19)     ten_digits <- list(ten=10, twenty=20, thirty=30, forty=40, fifty=50,                        sixty=60, seventy=70, eighty=80, ninety=90)     doubles <- c(teens,ten_digits)     out <- 0     i <- 1     while(i <= length(wsplit)){         j <- 1         if(i==1 && wsplit[i]=="hundred")             temp <- 100         else if(i==1 && wsplit[i]=="thousand")             temp <- 1000         else if(wsplit[i] %in% names(one_digits))             temp <- as.numeric(one_digits[wsplit[i]])         else if(wsplit[i] %in% names(teens))             temp <- as.numeric(teens[wsplit[i]])         else if(wsplit[i] %in% names(ten_digits))             temp <- (as.numeric(ten_digits[wsplit[i]]))         if(i < length(wsplit) && wsplit[i+1]=="hundred"){             if(i>1 && wsplit[i-1] %in% c("hundred","thousand"))                 out <- out + 100*temp             else                 out <- 100*(out + temp)             j <- 2         }         else if(i < length(wsplit) && wsplit[i+1]=="thousand"){             if(i>1 && wsplit[i-1] %in% c("hundred","thousand"))                 out <- out + 1000*temp             else                 out <- 1000*(out + temp)             j <- 2         }         else if(i < length(wsplit) && wsplit[i+1] %in% names(doubles)){             temp <- temp*100             out <- out + temp         }         else{             out <- out + temp         }         i <- i + j     }     return(list(word,out)) } 

Results:

> word2num("fifty seven") [[1]] [1] "fifty seven"  [[2]] [1] 57  > word2num("four fifty seven") [[1]] [1] "four fifty seven"  [[2]] [1] 457  > word2num("six thousand four fifty seven") [[1]] [1] "six thousand four fifty seven"  [[2]] [1] 6457  > word2num("forty six thousand four fifty seven") [[1]] [1] "forty six thousand four fifty seven"  [[2]] [1] 46457  > word2num("forty six thousand four hundred fifty seven") [[1]] [1] "forty six thousand four hundred fifty seven"  [[2]] [1] 46457  > word2num("three forty six thousand four hundred fifty seven") [[1]] [1] "three forty six thousand four hundred fifty seven"  [[2]] [1] 346457 

I can tell you already that this won't work for word2num("four hundred thousand fifty"), because it doesn't know how to handle consecutive "hundred" and "thousand" terms, but the algorithm can be modified probably. Anyone should feel free to edit this if they have improvements or build on them in their own answer. I just thought this was a fun problem to play with (for a little while).

Edit: Apparently Bill Venables has a package called english that may achieve this even better than the above code.

like image 173
Thomas Avatar answered Sep 23 '22 21:09

Thomas


I wrote an R package to do this a few years back, https://github.com/fsingletonthorn/words_to_numbers, which works for numbers up to the decillions.

devtools::install_github("fsingletonthorn/words_to_numbers")  library(wordstonumbers)  example_input <- "twenty thousand three hundred and five"  words_to_numbers(example_input)  [1] "20305"  

It also works for much more complex cases similar to those included in the qdap example:

words_to_numbers('I like three hundred forty six thousand four hundred fifty seven ice cream cones.') [1] "I like 346457 ice cream cones." 
like image 42
FelixST Avatar answered Sep 21 '22 21:09

FelixST