Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Optimize log entropy calculation in sparse matrix

Tags:

c++

c

math

I have a 3007 x 1644 dimensional matrix of terms and documents. I am trying to assign weights to frequency of terms in each document so I'm using this log entropy formula http://en.wikipedia.org/wiki/Latent_semantic_indexing#Term_Document_Matrix (See entropy formula in the last row).

I'm successfully doing this but my code is running for >7 minutes. Here's the code:

int N = mat.cols();
for(int i=1;i<=mat.rows();i++){
    double gfi = sum(mat(i,colon()))(1,1); //sum of occurrence of terms
    double g =0;
    if(gfi != 0){// to avoid divide by zero error

        for(int j = 1;j<=N;j++){
            double tfij = mat(i,j);
            double pij = gfi==0?0.0:tfij/gfi;
            pij = pij + 1; //avoid log0
            double G = (pij * log(pij))/log(N);
            g = g + G;
        }
    }

    double gi = 1 - g;
    for(int j=1;j<=N;j++){
        double tfij = mat(i,j) + 1;//avoid log0
        double aij = gi * log(tfij);
        mat(i,j) = aij;
    }
}

Anyone have ideas how I can optimize this to make it faster? Oh and mat is a RealSparseMatrix from amlpp matrix library.

UPDATE Code runs on Linux mint with 4gb RAM and AMD Athlon II dual core

Running time before change: > 7mins

After @Kereks answer: 4.1sec

like image 401
Emmanuel John Avatar asked Dec 31 '13 13:12

Emmanuel John


People also ask

What is cross entropy loss in machine learning?

Cross-entropy loss is used when adjusting model weights during training. The aim is to minimize the loss, i.e, the smaller the loss the better the model. A perfect model has a cross-entropy loss of 0. Cross-entropy is defined as. Equation 2: Mathematical definition of Cross-Entopy. Note the log is calculated to base 2.

How can I optimize calculation performance and data storage?

See Understanding Intelligent Calculation. To optimize calculation performance and data storage, balance data block density and data block size by rearranging the dense and sparse dimension configuration of the database. Keep these suggestions in mind:

How can I optimize the calculation of formulas in sparse dimensions?

You can use the SET FRMLBOTTOMUP calculation command to optimize the calculation of formulas in sparse dimensions in large database outlines. With this command, you can force a bottom-up calculation on sparse member formulas that otherwise would be calculated top-down.

What is sparse matrix computation?

Furthermore, sparse matrix computation is a simple example of data-dependent performance behavior of many large real-world applications. Due to the large amount of zero elements, compaction techniques are used to reduce the amount of storage, memory accesses, and computation performed on these zero elements.


1 Answers

Here's a very naive rewrite that removes some redundancies:

int const N = mat.cols();
double const logN = log(N);

for (int i = 1; i <= mat.rows(); ++i)
{
    double const gfi = sum(mat(i, colon()))(1, 1);  // sum of occurrence of terms
    double g = 0;

    if (gfi != 0)
    {
        for (int j = 1; j <= N; ++j)
         {
            double const pij = mat(i, j) / gfi + 1;
            g += pij * log(pij);
        }
        g /= logN;
    }

    for (int j = 1; j <= N; ++j)
    {
        mat(i,j) = (1 - g) * log(mat(i, j) + 1);
    }
}

Also make sure that the matrix data structure is sane (e.g. a flat array accessed in strides; not a bunch of dynamically allocated rows).

Also, I think the first + 1 is a bit silly. You know that x -> x * log(x) is continuous at zero with limit zero, so you should write:

double const pij = mat(i, j) / gfi;
if (pij != 0) { g += pij + log(pij); }

In fact, you might even write the first inner for loop like this, avoiding a division when it isn't needed:

        for (int j = 1; j <= N; ++j)
        {
            if (double pij = mat(i, j))
            {
                pij /= gfi;
                g += pij * log(pij);
            }
        }
like image 106
Kerrek SB Avatar answered Oct 13 '22 03:10

Kerrek SB