Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

convert numbers written in words to numbers using R programming

Tags:

r

my challenge is to convert ten and one which is in words to numbers as 10 and 1 in the input sentence:

example_input <- paste0("I have ten apple and one orange")

Numbers may change based on user requirement, input sentence can be tokenized:

my_output_toget<-paste("I have 10 apple and 1 orange")
like image 513
Sachin Hegde Avatar asked May 07 '19 08:05

Sachin Hegde


3 Answers

We can pass a key/val pair as replacement in gsubfn to replace those words with numbers

library(english)
library(gsubfn)
gsubfn("\\w+", setNames(as.list(1:10), as.english(1:10)), example_input)
#[1] "I have 10 apple and 1 orange"
like image 71
akrun Avatar answered Oct 04 '22 20:10

akrun


textclean is quite a handy possibility for this task:

mgsub(example_input, replace_number(seq_len(10)), seq_len(10))

[1] "I have 10 apple and 1 orange"

You just need to adjust the seq_len() parameter according to the maximum number in your data.

Some examples:

example_input <- c("I have one hundred apple and one orange")

mgsub(example_input, replace_number(seq_len(100)), seq_len(100))

[1] "I have 100 apple and 1 orange"

example_input <- c("I have one tousand apple and one orange")

mgsub(example_input, replace_number(seq_len(1000)), seq_len(1000))

[1] "I have 1 tousand apple and 1 orange"

If you don't know your maximum number beforehand, you can just choose a sufficiently big number.

like image 40
tmfmnk Avatar answered Oct 04 '22 19:10

tmfmnk


I wrote an R package to do this - https://github.com/fsingletonthorn/words_to_numbers which should work for more use cases.

devtools::install_github("fsingletonthorn/words_to_numbers")

library(wordstonumbers)

example_input <- "I have ten apple and one orange"

words_to_numbers(example)

[1] "I have 10 apple and 1 orange"

It also works for much more complex cases like


words_to_numbers("The Library of Babel (by Jorge Luis Borges) describes a library that contains all possible four-hundred and ten page books made with a character set of twenty five characters (twenty two letters, as well as spaces, periods, and commas), with eighty lines per book and forty characters per line.")
#> [1] "The Library of Babel (by Jorge Luis Borges) describes a library that contains all possible 410 page books made with a character set of 25 characters (22 letters, as well as spaces, periods, and commas), with 80 lines per book and 40 characters per line."

Or

words_to_numbers("300 billion, 2 hundred and 79 cats")
#> [1] "300000000279 cats"
like image 45
FelixST Avatar answered Oct 04 '22 19:10

FelixST