For example, consider the number 96. It can be written in following ways:
1. 96
2. 48 * 2
3. 24 * 2 * 2
4. 12 * 2 * 2 * 2
5. 6 * 2 * 2 * 2 * 2
6. 3 * 2 * 2 * 2 * 2 * 2
7. 4 * 3 * 2 * 2 * 2
8. 8 * 3 * 2 * 2
9. 6 * 4 * 2 * 2
10. 16 * 3 * 2
11. 4 * 4 * 3 * 2
12. 12 * 4 * 2
13. 8 * 6 * 2
14. 32 * 3
15. 8 * 4 * 3
16. 24 * 4
17. 6 * 4 * 4
18. 16 * 6
19. 12 * 8
I know this is related to partitions as any number written as the power, n, of a single prime, p, is simply the number of ways you can write n. For example, to find all of the factorizations of 2^5, we must find all ways to write 5. They are:
I found a wonderful article by Jerome Kelleher about partition generating algorithms here. I have adapted one of his python algorithms to R. The code is below:
library(partitions) ## using P(n) to determine number of partitions of an integer
IntegerPartitions <- function(n) {
a <- 0L:n
k <- 2L
a[2L] <- n
MyParts <- vector("list", length=P(n))
count <- 0L
while (!(k==1L)) {
x <- a[k-1L]+1L
y <- a[k]-1L
k <- k-1L
while (x<=y) {a[k] <- x; y <- y-x; k <- k+1L}
a[k] <- x+y
count <- count+1L
MyParts[[count]] <- a[1L:k]
}
MyParts
}
I attempted to extend this method to numbers with more one than one prime factor, but my code got very clunky. After wrestling with this idea for a while, I decided to try a different route. My new algorithm makes no use of generating partitions whatsoever. It is more of a "lookback" algorithm that takes advantage of factorizations that have already been generated. The code is below:
FactorRepresentations <- function(n) {
MyFacts <- EfficientFactorList(n)
MyReps <- lapply(1:n, function(x) x)
for (k in 4:n) {
if (isprime(k)) {next}
myset <- MyFacts[[k]]
mylist <- vector("list")
mylist[[1]] <- k
count <- 1L
for (j in 2:ceiling(length(myset)/2)) {
count <- count+1L
temp <- as.integer(k/myset[j])
myvec <- sort(c(myset[j], temp), decreasing=TRUE)
mylist[[count]] <- myvec
MyTempRep <- MyReps[[temp]]
if (isprime(temp) || temp==k) {next}
if (length(MyTempRep)>1) {
for (i in 1:length(MyTempRep)) {
count <- count+1L
myvec <- sort(c(myset[j], MyTempRep[[i]]), decreasing=TRUE)
mylist[[count]] <- myvec
}
}
}
MyReps[[k]] <- unique(mylist)
}
MyReps
}
The first function in the code above is simply a function that generates all factors. Here is the code if you are curious:
EfficientFactorList <- function(n) {
MyFactsList <- lapply(1:n, function(x) 1)
for (j in 2:n) {
for (r in seq.int(j, n, j)) {MyFactsList[[r]] <- c(MyFactsList[[r]], j)}
}
MyFactsList
}
My algorithm is just okay if you are only concerned with numbers less than 10,000 (it generates all factorizations for every number <= 10,000 in about 17 seconds), but it definitely doesn't scale well. I would like to find an algorithm that has the same premise of generating a list of all factorizations for every number less than or equal to n as some of the applications I have in mind will reference a given factorization multiple times, thus having it in a list should be faster than generating it on the fly everytime (I know there is a memory cost here).
In this article, we study an efficient method to calculate the prime factorization using O(n) space and O(log n) time complexity with pre-computation allowed. Prerequisites : Sieve of Eratosthenes, Least prime factor of numbers till n.
Your function EfficientFactorList
does a good job of efficiently grabbing the set of all factors for each number from 1 to n, so all that remains is getting the set of all factorizations. As you suggest, using the factorizations of smaller values to compute the factorizations for larger values seems like it could be efficient.
Consider a number k, with factors k_1, k_2, ..., k_n. A naive approach would be to combine the factorizations of k/k_1, k/k_2, ..., k/k_n, appending k_i to each factorization of k/k_i to yield a factorization of k. As a worked example, consider computing the factorizations of 16 (which has non-trivial factors 2, 4, and 8). 2 has factorization {2}, 4 has factorizations {4, 2*2}, and 8 has factorizations {8, 4*2, 2*2*2}, so we would compute the full set of factorizations by first computing {2*8, 4*4, 2*2*4, 8*2, 4*2*2, 2*2*2*2} and then taking the unique factorizations, {8*2, 4*4, 4*2*2, 2*2*2*2}. Adding 16 yields the final answer.
A more efficient approach is to notice that we don't need to append k_i to all the factorizations of k/k_i. For instance, we didn't need to add 2*2*4 from the factorization of 4 because this is already included from the factorization of 8. Similarly, we didn't need to add 2*8 from the factorization of 2 because this is already included from the factorization of 8. In general, we only need to include a factorization from k/k_i if all the values in the factorization are k_i or greater.
In code:
library(gmp)
all.fact <- function(n) {
facts <- EfficientFactorList(n)
facts[[1]] <- list(1)
for (x in 2:n) {
if (length(facts[[x]]) == 2) {
facts[[x]] <- list(x) # Prime number
} else {
x.facts <- facts[[x]][facts[[x]] != 1 & facts[[x]] <= (x^0.5+0.001)]
allSmaller <- lapply(x.facts, function(pf) lapply(facts[[x/pf]], function(y) {
if (all(y >= pf)) {
return(c(pf, y))
} else {
return(NULL)
}
}))
allSmaller <- do.call(c, allSmaller)
facts[[x]] <- c(x, allSmaller[!sapply(allSmaller, function(y) is.null(y))])
}
}
return(facts)
}
This is a good deal quicker than the posted code:
system.time(f1 <- FactorRepresentations(10000))
# user system elapsed
# 13.470 0.159 13.765
system.time(f2 <- all.fact(10000))
# user system elapsed
# 1.602 0.028 1.641
As a sanity check, it also returns the same number of factorizations for each number:
lf1 <- sapply(f1, length)
lf2 <- sapply(f2, length)
all.equal(lf1, lf2)
# [1] TRUE
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With