Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I efficiently initialize this sparse array in Mathematica?

I'm trying to solve a rather large linear programming problem in Mathematica, but for some reason the bottleneck is setting up the array of linear constraints.

My code for initializing the matrix looks like this:

AbsoluteTiming[S = SparseArray[{{i_, 1} -> iaa[[i]],
    {i_, j_} /; Divisible[a[[j]], aa[[i]]] -> 1.}, {2*n, n}]]

Here n is 4455, iaa is a list of reals, and a, aa are lists of integers. The output I get for this line is

{2652.014773,SparseArray[<111742>,{8910,4455}]}

In other words, it takes 45 minutes to initialize this matrix, even though it only has 111,742 nonzero entries. For comparison, actually solving the linear program only takes 17 seconds. What gives?

Edit: Also, can anyone explain why this takes up so much memory as it is running? Because in user time, this calculation takes less than ten minutes... most of the computation time is spent paging through memory.

Is Mathematica for some reason storing this matrix as a non-sparse matrix while it is building it? Because that would be really really dumb.

like image 671
user1015507 Avatar asked Oct 26 '11 22:10

user1015507


1 Answers

You can surely do a lot better. Here is a code based on the low-level sparse array API posted here, which I will reproduce to make the code self - contained:

ClearAll[spart, getIC, getJR, getSparseData, getDefaultElement, makeSparseArray];
HoldPattern[spart[SparseArray[s___], p_]] := {s}[[p]];
getIC[s_SparseArray] := spart[s, 4][[2, 1]];
getJR[s_SparseArray] := Flatten@spart[s, 4][[2, 2]];
getSparseData[s_SparseArray] := spart[s, 4][[3]];
getDefaultElement[s_SparseArray] := spart[s, 3];
makeSparseArray[dims : {_, _}, jc : {__Integer}, ir : {__Integer}, data_List, defElem_: 0] := 
    SparseArray @@ {Automatic, dims,  defElem, {1, {jc, List /@ ir}, data}};



Clear[formSparseDivisible];
formSparseDivisible[a_, aa_, iaa_, chunkSize_: 100] :=
  Module[{getDataChunkCode, i, start, ic, jr, sparseData, dims,  dataChunk, res},
    getDataChunkCode :=
      If[# === {}, {}, SparseArray[1 - Unitize@(Mod[#, aa] & /@ #)]] &@
        If[i*chunkSize >= Length[a],
           {},
           Take[a, {i*chunkSize + 1, Min[(i + 1)*chunkSize, Length[a]]}]];  
    i = 0;
    start = getDataChunkCode;
    i++;
    ic = getIC[start];
    jr = getJR[start];
    sparseData = getSparseData[start];
    dims = Dimensions[start];        
    While[True,
      dataChunk = getDataChunkCode;
      i++;
      If[dataChunk === {}, Break[]];
      ic = Join[ic, Rest@getIC[dataChunk] + Last@ic];
      jr = Join[jr, getJR[dataChunk]];
      sparseData = Join[sparseData, getSparseData[dataChunk]];
      dims[[1]] += First[Dimensions[dataChunk]];
    ];
    res = Transpose[makeSparseArray[dims, ic, jr, sparseData]];
    res[[All, 1]] = N@iaa;
    res]

Now, here are the timings:

In[249]:= 
n = 1500;
iaa = aa = Range[2 n];
a = Range[n];
AbsoluteTiming[res = formSparseDivisible[a, aa, iaa, 100];]

Out[252]= {0.2656250, Null}

In[253]:= AbsoluteTiming[
  res1 = SparseArray[{{i_, 1} :> 
  iaa[[i]], {i_, j_} /; Divisible[a[[j]], aa[[i]]] -> 1.}, {2*n, n}];]

Out[253]= {29.1562500, Null}

So, we've got 100 - fold speedup, for this size of the array. And of course, the results are the same:

In[254]:= Normal@res1 == Normal@res
Out[254]= True

The main idea of the solution is to vectorize the problem (Mod), and build the resulting sparse array incrementally, in chunks, using the low-level API above.

EDIT

The code assumes that the lists are of the right length - in particular, a should have a length n, while aa and iaa - 2n. So, to compare to other answers, the test code has to be slightly modified (for a only):

n = 500;
iaa = RandomReal[{0, 1}, 2 n];
a = Range[ n]; aa = RandomInteger[{1, 4 n}, 2 n];


In[300]:= 
AbsoluteTiming[U=SparseArray[ReplacePart[Outer[Boole[Divisible[#1,#2]]&,
a[[1;;n]],aa],1->iaa]]\[Transpose]]
AbsoluteTiming[res = formSparseDivisible[a,aa,iaa,100]]

Out[300]= {0.8281250,SparseArray[<2838>,{1000,500}]}
Out[301]= {0.0156250,SparseArray[<2838>,{1000,500}]}

In[302]:= Normal@U==Normal@res
Out[302]= True

EDIT 2

Your desired matrix size is done in about 3 seconds on my not very fast laptop (M8), and with a fairly decent memory usage as well:

In[323]:= 
n=5000;
iaa=RandomReal[{0,1},2 n];
a=Range[ n];aa=RandomInteger[{1,4 n},2 n];
AbsoluteTiming[res = formSparseDivisible[a,aa,iaa,200]]

Out[326]= {3.0781250,SparseArray[<36484>,{10000,5000}]}
like image 122
Leonid Shifrin Avatar answered Sep 29 '22 07:09

Leonid Shifrin