Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R Algorithm for generating all possible factorizations of a number

For example, consider the number 96. It can be written in following ways:

1. 96
2. 48 * 2
3. 24 * 2 * 2
4. 12 * 2 * 2 * 2
5. 6 * 2 * 2 * 2 * 2
6. 3 * 2 * 2 * 2 * 2 * 2
7. 4 * 3 * 2 * 2 * 2
8. 8 * 3 * 2 * 2
9. 6 * 4 * 2 * 2
10. 16 * 3 * 2
11. 4 * 4 * 3 * 2
12. 12 * 4 * 2
13. 8 * 6 * 2
14. 32 * 3
15. 8 * 4 * 3
16. 24 * 4
17. 6 * 4 * 4
18. 16 * 6
19. 12 * 8

I know this is related to partitions as any number written as the power, n, of a single prime, p, is simply the number of ways you can write n. For example, to find all of the factorizations of 2^5, we must find all ways to write 5. They are:

  1. 1+1+1+1+1 ==>> 2^1 * 2^1 * 2^1 * 2^1 * 2^1
  2. 1+1+1+2 ==>> 2^1 * 2^1 * 2^1 * 2^2
  3. 1+1+3 ==>> 2^1 * 2^1 * 2^3
  4. 1+2+2 ==>> 2^1 * 2^2 * 2^2
  5. 1+4 ==>> 2^1 * 2^4
  6. 2+3 ==>> 2^2 * 2^3
  7. 5 ==>> 2^5

I found a wonderful article by Jerome Kelleher about partition generating algorithms here. I have adapted one of his python algorithms to R. The code is below:

library(partitions)   ## using P(n) to determine number of partitions of an integer
IntegerPartitions <- function(n) {
    a <- 0L:n
    k <- 2L
    a[2L] <- n
    MyParts <- vector("list", length=P(n))
    count <- 0L
    while (!(k==1L)) {
        x <- a[k-1L]+1L
        y <- a[k]-1L
        k <- k-1L
        while (x<=y) {a[k] <- x; y <- y-x; k <- k+1L}
        a[k] <- x+y
        count <- count+1L
        MyParts[[count]] <- a[1L:k]
    }
    MyParts
}

I attempted to extend this method to numbers with more one than one prime factor, but my code got very clunky. After wrestling with this idea for a while, I decided to try a different route. My new algorithm makes no use of generating partitions whatsoever. It is more of a "lookback" algorithm that takes advantage of factorizations that have already been generated. The code is below:

FactorRepresentations <- function(n) {

MyFacts <- EfficientFactorList(n)
MyReps <- lapply(1:n, function(x) x)

    for (k in 4:n) {
        if (isprime(k)) {next}
        myset <- MyFacts[[k]]
        mylist <- vector("list")
        mylist[[1]] <- k
        count <- 1L
            for (j in 2:ceiling(length(myset)/2)) {
                count <- count+1L
                temp <- as.integer(k/myset[j])
                myvec <- sort(c(myset[j], temp), decreasing=TRUE)
                mylist[[count]] <- myvec
                MyTempRep <- MyReps[[temp]]

                if (isprime(temp) || temp==k) {next}

                if (length(MyTempRep)>1) {
                    for (i in 1:length(MyTempRep)) {
                        count <- count+1L
                        myvec <- sort(c(myset[j], MyTempRep[[i]]), decreasing=TRUE)
                        mylist[[count]] <- myvec
                    }
                }
            }
        MyReps[[k]] <- unique(mylist)
    }
    MyReps
}

The first function in the code above is simply a function that generates all factors. Here is the code if you are curious:

EfficientFactorList <- function(n) {
    MyFactsList <- lapply(1:n, function(x) 1)
    for (j in 2:n) {
        for (r in seq.int(j, n, j)) {MyFactsList[[r]] <- c(MyFactsList[[r]], j)}
    }
    MyFactsList
}

My algorithm is just okay if you are only concerned with numbers less than 10,000 (it generates all factorizations for every number <= 10,000 in about 17 seconds), but it definitely doesn't scale well. I would like to find an algorithm that has the same premise of generating a list of all factorizations for every number less than or equal to n as some of the applications I have in mind will reference a given factorization multiple times, thus having it in a list should be faster than generating it on the fly everytime (I know there is a memory cost here).

like image 795
Joseph Wood Avatar asked Apr 27 '15 20:04

Joseph Wood


People also ask

What is the Complexity of prime factorization?

In this article, we study an efficient method to calculate the prime factorization using O(n) space and O(log n) time complexity with pre-computation allowed. Prerequisites : Sieve of Eratosthenes, Least prime factor of numbers till n.


1 Answers

Your function EfficientFactorList does a good job of efficiently grabbing the set of all factors for each number from 1 to n, so all that remains is getting the set of all factorizations. As you suggest, using the factorizations of smaller values to compute the factorizations for larger values seems like it could be efficient.

Consider a number k, with factors k_1, k_2, ..., k_n. A naive approach would be to combine the factorizations of k/k_1, k/k_2, ..., k/k_n, appending k_i to each factorization of k/k_i to yield a factorization of k. As a worked example, consider computing the factorizations of 16 (which has non-trivial factors 2, 4, and 8). 2 has factorization {2}, 4 has factorizations {4, 2*2}, and 8 has factorizations {8, 4*2, 2*2*2}, so we would compute the full set of factorizations by first computing {2*8, 4*4, 2*2*4, 8*2, 4*2*2, 2*2*2*2} and then taking the unique factorizations, {8*2, 4*4, 4*2*2, 2*2*2*2}. Adding 16 yields the final answer.

A more efficient approach is to notice that we don't need to append k_i to all the factorizations of k/k_i. For instance, we didn't need to add 2*2*4 from the factorization of 4 because this is already included from the factorization of 8. Similarly, we didn't need to add 2*8 from the factorization of 2 because this is already included from the factorization of 8. In general, we only need to include a factorization from k/k_i if all the values in the factorization are k_i or greater.

In code:

library(gmp)
all.fact <- function(n) {
  facts <- EfficientFactorList(n)
  facts[[1]] <- list(1)
  for (x in 2:n) {
    if (length(facts[[x]]) == 2) {
      facts[[x]] <- list(x)  # Prime number
    } else {
      x.facts <- facts[[x]][facts[[x]] != 1 & facts[[x]] <= (x^0.5+0.001)]
      allSmaller <- lapply(x.facts, function(pf) lapply(facts[[x/pf]], function(y) {
        if (all(y >= pf)) {
            return(c(pf, y))
        } else {
            return(NULL)
        }
      }))
      allSmaller <- do.call(c, allSmaller)
      facts[[x]] <- c(x, allSmaller[!sapply(allSmaller, function(y) is.null(y))])
    }
  }
  return(facts)
}

This is a good deal quicker than the posted code:

system.time(f1 <- FactorRepresentations(10000))
#    user  system elapsed 
#  13.470   0.159  13.765 
system.time(f2 <- all.fact(10000))
#    user  system elapsed 
#   1.602   0.028   1.641 

As a sanity check, it also returns the same number of factorizations for each number:

lf1 <- sapply(f1, length)
lf2 <- sapply(f2, length)
all.equal(lf1, lf2)
# [1] TRUE
like image 175
josliber Avatar answered Nov 15 '22 04:11

josliber