Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Handle large sized matrix in Java

Tags:

java

matrix

I currently need to do singular value decomposition with a matrix of size 48K x 50K.

I've tried JAMA but it only works for rows > columns. I've tried PCOLT, JBLAS but they return error when rows*columns > MAX_INT

Any suggestions what I should do?

Sorry if I made any mistakes in the above lines.

Thanks a lot in advance!

like image 897
Phu Tran Thanh Avatar asked Dec 30 '11 09:12

Phu Tran Thanh


3 Answers

I faced similar problems when performing SVD computations and my experience is: don't do this in Java. There are tools available which can do this more efficiently. If you really need Java, you might consider building an interface which calls the tool from inside your code. I ended up using R. I used it manually by storing the matrix in a file which could be read by R as a matrix.

By the way, if the matrix is sparse, there are various optimizations possible which would reduce memory usage and the size of the output file (if you choose to use one).

Otherwise, check out this thread to see if that helps: Handle large data structure in Java

like image 164
Pieter Avatar answered Oct 22 '22 23:10

Pieter


For really large blocks of memory, I tend to suggest using memory mapped files (perhaps this is what R does for you) You can do this in Java with some boiler plate code. Unfortunately Java doesn't directly support mapping more than 2 GB at a time, so you have to break it into sections.

import sun.misc.Cleaner;
import sun.nio.ch.DirectBuffer;

import java.io.Closeable;
import java.io.IOException;
import java.io.RandomAccessFile;
import java.nio.MappedByteBuffer;
import java.nio.channels.FileChannel;
import java.util.ArrayList;
import java.util.List;

public class LargeDoubleMatrix implements Closeable {
    private static final int MAPPING_SIZE = 1 << 30;
    private final RandomAccessFile raf;
    private final int width;
    private final int height;
    private final List<MappedByteBuffer> mappings = new ArrayList<MappedByteBuffer>();

    public LargeDoubleMatrix(String filename, int width, int height) throws IOException {
        this.raf = new RandomAccessFile(filename, "rw");
        try {
            this.width = width;
            this.height = height;
            long size = 8L * width * height;
            for (long offset = 0; offset < size; offset += MAPPING_SIZE) {
                long size2 = Math.min(size - offset, MAPPING_SIZE);
                mappings.add(raf.getChannel().map(FileChannel.MapMode.READ_WRITE, offset, size2));
            }
        } catch (IOException e) {
            raf.close();
            throw e;
        }
    }

    protected long position(int x, int y) {
        return (long) y * width + x;
    }

    public int width() {
        return width;
    }

    public int height() {
        return height;
    }

    public double get(int x, int y) {
        assert x >= 0 && x < width;
        assert y >= 0 && y < height;
        long p = position(x, y) * 8;
        int mapN = (int) (p / MAPPING_SIZE);
        int offN = (int) (p % MAPPING_SIZE);
        return mappings.get(mapN).getDouble(offN);
    }

    public void set(int x, int y, double d) {
        assert x >= 0 && x < width;
        assert y >= 0 && y < height;
        long p = position(x, y) * 8;
        int mapN = (int) (p / MAPPING_SIZE);
        int offN = (int) (p % MAPPING_SIZE);
        mappings.get(mapN).putDouble(offN, d);
    }

    public void close() throws IOException {
        for (MappedByteBuffer mapping : mappings)
            clean(mapping);
        raf.close();
    }

    private void clean(MappedByteBuffer mapping) {
        if (mapping == null) return;
        Cleaner cleaner = ((DirectBuffer) mapping).cleaner();
        if (cleaner != null) cleaner.clean();
    }
}

has this test which sets the diagonal value.

@Test
public  void getSetMatrix() throws IOException {
    long start = System.nanoTime();
    final long used0 = usedMemory();
    LargeDoubleMatrix matrix = new LargeDoubleMatrix("/tmp/ldm.test", 48*1000, 50*1000);
    for(int i=0;i<matrix.width();i++)
        matrix.set(i,i,i);
    for(int i=0;i<matrix.width();i++)
        assertEquals(i, matrix.get(i,i), 0.0);
    long time = System.nanoTime() - start;
    final long used = usedMemory() - used0;
    if (used==0)
        System.err.println("You need to use -XX:-UsedTLAB to see small changes in memory usage.");
    System.out.printf("Setting the diagonal took %,d ms, Heap used is %,d KB%n", time/1000/1000, used/1024);
    matrix.close();
}

private long usedMemory() {
    return Runtime.getRuntime().totalMemory() - Runtime.getRuntime().freeMemory();
}

prints (when run with -XX:-UseTLAB )

Setting the diagonal took 60 ms, Heap used is 55 KB

Only the pages actually used are created. The files appears to be very large, but the space allocated is based on use.

$ ls -lh /tmp/ldm.test 
-rw-rw-r-- 1 peter peter 18G 2011-12-30 10:18 /tmp/ldm.test
$ du -sh /tmp/ldm.test 
222M    /tmp/ldm.test
like image 33
Peter Lawrey Avatar answered Oct 22 '22 23:10

Peter Lawrey


Step 1. Use a database to hold it.
Step 2. Use a multi-frontal / parallel algorithm.

This paper surveys SOTA methods for large SVD. Lanzcos algorithm on 3 processors took just over 10 minutes on a 32k X 32k matrix, but only for the smallest singular value. It's probably possible to deflate, and re-extract successive singular values -- I've always found Power Iteration with deflate good for this.

In short, make M X M_T and M_T X M and take the eigenvectors and eigenvalues to reconstruct the SVD matrices.

If you are prepared to accept approximations, this other paper is just one of many that deals with approximate algorithms. Many are based on some kind of downsampling of the columns, or of optimally representative submatrices, where you leverage the benefit of cubically smaller pieces to work with, plus parallelism.

Obviously these come with some distortion but maybe you can smooth it out for your result.

Lastly, you really need to use Strassen's method for doing the multiplications.

like image 2
Cris Stringfellow Avatar answered Oct 22 '22 22:10

Cris Stringfellow