R Algorithm for generating all possible factorizations of a number

Q: What is the Complexity of prime factorization?

In this article, we study an efficient method to calculate the prime factorization using O(n) space and O(log n) time complexity with pre-computation allowed. Prerequisites : Sieve of Eratosthenes, Least prime factor of numbers till n.

Tags:

algorithm

r

factorization

For example, consider the number 96. It can be written in following ways:

1. 96
2. 48 * 2
3. 24 * 2 * 2
4. 12 * 2 * 2 * 2
5. 6 * 2 * 2 * 2 * 2
6. 3 * 2 * 2 * 2 * 2 * 2
7. 4 * 3 * 2 * 2 * 2
8. 8 * 3 * 2 * 2
9. 6 * 4 * 2 * 2
10. 16 * 3 * 2
11. 4 * 4 * 3 * 2
12. 12 * 4 * 2
13. 8 * 6 * 2
14. 32 * 3
15. 8 * 4 * 3
16. 24 * 4
17. 6 * 4 * 4
18. 16 * 6
19. 12 * 8

I know this is related to partitions as any number written as the power, n, of a single prime, p, is simply the number of ways you can write n. For example, to find all of the factorizations of 2^5, we must find all ways to write 5. They are:

1+1+1+1+1 ==>> 2^1 * 2^1 * 2^1 * 2^1 * 2^1
1+1+1+2 ==>> 2^1 * 2^1 * 2^1 * 2^2
1+1+3 ==>> 2^1 * 2^1 * 2^3
1+2+2 ==>> 2^1 * 2^2 * 2^2
1+4 ==>> 2^1 * 2^4
2+3 ==>> 2^2 * 2^3
5 ==>> 2^5

I found a wonderful article by Jerome Kelleher about partition generating algorithms here. I have adapted one of his python algorithms to R. The code is below:

library(partitions)   ## using P(n) to determine number of partitions of an integer
IntegerPartitions <- function(n) {
    a <- 0L:n
    k <- 2L
    a[2L] <- n
    MyParts <- vector("list", length=P(n))
    count <- 0L
    while (!(k==1L)) {
        x <- a[k-1L]+1L
        y <- a[k]-1L
        k <- k-1L
        while (x<=y) {a[k] <- x; y <- y-x; k <- k+1L}
        a[k] <- x+y
        count <- count+1L
        MyParts[[count]] <- a[1L:k]
    }
    MyParts
}

I attempted to extend this method to numbers with more one than one prime factor, but my code got very clunky. After wrestling with this idea for a while, I decided to try a different route. My new algorithm makes no use of generating partitions whatsoever. It is more of a "lookback" algorithm that takes advantage of factorizations that have already been generated. The code is below:

FactorRepresentations <- function(n) {

MyFacts <- EfficientFactorList(n)
MyReps <- lapply(1:n, function(x) x)

    for (k in 4:n) {
        if (isprime(k)) {next}
        myset <- MyFacts[[k]]
        mylist <- vector("list")
        mylist[[1]] <- k
        count <- 1L
            for (j in 2:ceiling(length(myset)/2)) {
                count <- count+1L
                temp <- as.integer(k/myset[j])
                myvec <- sort(c(myset[j], temp), decreasing=TRUE)
                mylist[[count]] <- myvec
                MyTempRep <- MyReps[[temp]]

                if (isprime(temp) || temp==k) {next}

                if (length(MyTempRep)>1) {
                    for (i in 1:length(MyTempRep)) {
                        count <- count+1L
                        myvec <- sort(c(myset[j], MyTempRep[[i]]), decreasing=TRUE)
                        mylist[[count]] <- myvec
                    }
                }
            }
        MyReps[[k]] <- unique(mylist)
    }
    MyReps
}

The first function in the code above is simply a function that generates all factors. Here is the code if you are curious:

EfficientFactorList <- function(n) {
    MyFactsList <- lapply(1:n, function(x) 1)
    for (j in 2:n) {
        for (r in seq.int(j, n, j)) {MyFactsList[[r]] <- c(MyFactsList[[r]], j)}
    }
    MyFactsList
}

My algorithm is just okay if you are only concerned with numbers less than 10,000 (it generates all factorizations for every number <= 10,000 in about 17 seconds), but it definitely doesn't scale well. I would like to find an algorithm that has the same premise of generating a list of all factorizations for every number less than or equal to n as some of the applications I have in mind will reference a given factorization multiple times, thus having it in a list should be faster than generating it on the fly everytime (I know there is a memory cost here).

795

asked Apr 27 '15 20:04

Joseph Wood

1 Answers

Your function EfficientFactorList does a good job of efficiently grabbing the set of all factors for each number from 1 to n, so all that remains is getting the set of all factorizations. As you suggest, using the factorizations of smaller values to compute the factorizations for larger values seems like it could be efficient.

Consider a number k, with factors k_1, k_2, ..., k_n. A naive approach would be to combine the factorizations of k/k_1, k/k_2, ..., k/k_n, appending k_i to each factorization of k/k_i to yield a factorization of k. As a worked example, consider computing the factorizations of 16 (which has non-trivial factors 2, 4, and 8). 2 has factorization {2}, 4 has factorizations {4, 2*2}, and 8 has factorizations {8, 4*2, 2*2*2}, so we would compute the full set of factorizations by first computing {2*8, 4*4, 2*2*4, 8*2, 4*2*2, 2*2*2*2} and then taking the unique factorizations, {8*2, 4*4, 4*2*2, 2*2*2*2}. Adding 16 yields the final answer.

A more efficient approach is to notice that we don't need to append k_i to all the factorizations of k/k_i. For instance, we didn't need to add 2*2*4 from the factorization of 4 because this is already included from the factorization of 8. Similarly, we didn't need to add 2*8 from the factorization of 2 because this is already included from the factorization of 8. In general, we only need to include a factorization from k/k_i if all the values in the factorization are k_i or greater.

In code:

library(gmp)
all.fact <- function(n) {
  facts <- EfficientFactorList(n)
  facts[[1]] <- list(1)
  for (x in 2:n) {
    if (length(facts[[x]]) == 2) {
      facts[[x]] <- list(x)  # Prime number
    } else {
      x.facts <- facts[[x]][facts[[x]] != 1 & facts[[x]] <= (x^0.5+0.001)]
      allSmaller <- lapply(x.facts, function(pf) lapply(facts[[x/pf]], function(y) {
        if (all(y >= pf)) {
            return(c(pf, y))
        } else {
            return(NULL)
        }
      }))
      allSmaller <- do.call(c, allSmaller)
      facts[[x]] <- c(x, allSmaller[!sapply(allSmaller, function(y) is.null(y))])
    }
  }
  return(facts)
}

This is a good deal quicker than the posted code:

system.time(f1 <- FactorRepresentations(10000))
#    user  system elapsed 
#  13.470   0.159  13.765 
system.time(f2 <- all.fact(10000))
#    user  system elapsed 
#   1.602   0.028   1.641

As a sanity check, it also returns the same number of factorizations for each number:

lf1 <- sapply(f1, length)
lf2 <- sapply(f2, length)
all.equal(lf1, lf2)
# [1] TRUE

175

answered Nov 15 '22 04:11

josliber

Related questions
                            
                                Sweave to R markdown file conversion - code or converters available?
                            
                                How to accurately display SI prefix for numbers in y-axis scale of plot made with ggplot2 in R?
                            
                                The logic of passing a missing argument to the R-function rep
                            
                                How to use dplyr for programming
                            
                                Plot going off graph in gvisMotionChart
                            
                                How to minimize size of object of class "lm" without compromising it being passed to predict()
                            
                                subsetting by multi-column index/key in dplyr (have data.table soln)
                            
                                How to search PubMed or other databases using R
                            
                                Apply a function to a multi-dimensional array: R vs MATLAB
                            
                                How to fix prettytable to display chinese character properly
                            
                                Can my use of paste0() in R be corrected so that this function runs as fast as the original Python example?
                            
                                Include a specific chunk from one markdown document in another document
                            
                                Where can I find the limiting distribution of the Kolmogorov-Smirnov distance in R?
                            
                                Counting unique pairs of categorical variables in R [duplicate]
                            
                                RTextTools create_matrix returns non-character argument error
                            
                                Subsetting in H2O R
                            
                                Color path segments in ggvis / layer_paths
                            
                                Why is there a difference between length(f) and length(g) in this example?
                            
                                Make a landscape table in a word document
                            
                                R plot() produces date labels in Russian

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With