I have a set of integer data between 1:10000
. I need to bring them in range 0:1
.
For example, converting
etc. (note that I don't want to scale
the values).
Any suggestions how to do this on all the data at once?
In Python, you can simply use the bin() function to convert from a decimal value to its corresponding binary value. And similarly, the int() function to convert a binary to its decimal value. The int() function takes as second argument the base of the number to be converted, which is 2 in case of binary numbers.
An integer, also called a "round number" or “whole number,” is any positive or negative number that does not include decimal parts or fractions. For example, 3, -10, and 1,025 are all integers, but 2.76 (decimal), 1.5 (decimal), and 3 ½ (fraction) are not.
I would simply do
x <- c(2, 14, 128, 1940, 140, 20000)
x/10^nchar(x)
## [1] 0.200 0.140 0.128 0.194 0.140 0.200
But a much faster approach (which avoids to character
conversion) offered by @Frank
x/10^ceiling(log10(x))
Benchmark
library(microbenchmark)
set.seed(123)
x <- sample(1e8, 1e6)
microbenchmark(
david = x/10^nchar(x),
davidfrank = x/10^ceiling(log10(x)),
richard1 = as.numeric(paste0(".", x)),
richard2 = as.numeric(sprintf(".%d", x))
)
# Unit: milliseconds
# expr min lq mean median uq max neval cld
# david 691.0513 822.6482 1052.2473 956.5541 1153.4779 2391.7856 100 b
# davidfrank 130.0522 164.3227 255.8397 197.3158 339.3224 576.2255 100 a
# richard1 1130.5160 1429.8314 1972.2624 1689.8454 2473.6409 4791.0558 100 c
# richard2 712.8357 926.8013 1181.5349 1103.1661 1315.4459 2753.6795 100 b
The non-mathy way would be to add the decimal with paste()
then coerce back to numeric.
x <- c(2, 14, 128, 1940, 140, 20000)
as.numeric(paste0(".", x))
# [1] 0.200 0.140 0.128 0.194 0.140 0.200
Update 1: There was some interest about the timings of the two originally posted methods. According to the following benchmarks, they seem to be about the same.
library(microbenchmark)
x <- 1:1e5
microbenchmark(
david = { david <- x/10^nchar(x) },
richard = { richard <- as.numeric(paste0(".", x)) }
)
# Unit: milliseconds
# expr min lq mean median uq max neval
# david 88.94391 89.18379 89.70962 89.40736 89.71012 99.68126 100
# richard 87.89776 88.17234 89.38383 88.44439 88.77052 105.06066 100
identical(richard, david)
# [1] TRUE
Update 2: I have also remembered that sprintf()
is often faster than paste0()
. We can also use the following.
as.numeric(sprintf(".%d", x))
Now using the same x
from above, and only comparing these two choices, we have a good improvement in the timing of sprintf()
versus paste()
, as shown below.
microbenchmark(
paste0 = as.numeric(paste0(".", x)),
sprintf = as.numeric(sprintf(".%d", x))
)
# Unit: milliseconds
# expr min lq mean median uq max neval
# paste0 87.89413 88.41606 90.25795 88.82484 89.65674 107.8080 100
# sprintf 61.16524 61.23328 62.26202 61.29192 61.48316 79.1202 100
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With