Distributed cross correlation matrix computation

Tags:

How can I calculate pearson cross correlation matrix of large (>10TB) data set, possibly in distributed manner ? Any efficient distributed algorithm suggestion will be appreciated.

update: I read the implementation of apache spark mlib correlation

Click to copy

Pearson Computaation:
/home/d066537/codespark/spark/mllib/src/main/scala/org/apache/spark/mllib/stat/correlation/Correlation.scala
Covariance Computation:
/home/d066537/codespark/spark/mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala

but for me it looks like all the computation is happening at one node and it is not distributed in real sense.

Please put some light in here. I also tried executing it on a 3 node spark cluster and below are the screenshot:

Entire Computation timeline One the task details

As you can see from 2nd image that data is pulled up at one node and then computation is being done.Am i right in here ?

636

asked Feb 17 '17 17:02

Roshan Mehta

1 Answers

To start with, have a look at this to see if things are going right. You may then refer to any of these implementations: MPI/OpenMP: Agomezl or Meismyles, MapReduce: Vangjee or Seawolf42. It'd also be interesting to read this before you proceed. On a different note, James's thesis provides some pointers if you're interested in computing the correlations that are robust to outliers.

181

answered Sep 19 '22 20:09

dangiankit

Related questions
                            
                                Algorithm to bracket an expression in order to maximize its value
                            
                                Which regular expression algorithm does PHP use?
                            
                                efficient methods to do summation
                            
                                Need an advice of framework for path on map validation
                            
                                Select unique/deduplication in SSE/AVX
                            
                                How do I find the largest sequence in a string that is repeated at least once?
                            
                                What's the most efficient algorithm to calculate the LCM of a range of numbers?
                            
                                Reducing Integer Fractions Algorithm
                            
                                Data structure for retrieving strings that are close by Levenshtein distance
                            
                                If you know the future prices of a stock, what's the best time to buy and sell?
                            
                                Does std::copy_n work with overlapping ranges?
                            
                                Efficient way to store millions of arrays, and perform IN check
                            
                                How to find the nearest line segment to a specific point more efficently?
                            
                                Counting coprimes in a sequence
                            
                                Door in an infinite wall algorithm
                            
                                Is there any practical application of Tango Trees?
                            
                                Constant-Time comparison [closed]
                            
                                Placing archers on wall
                            
                                How to implement a constraint solver for 2-D geometry?
                            
                                Coming up with an algorithm in O(n)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Distributed cross correlation matrix computation

Tags:

algorithm

distributed-computing

apache-spark

distributed

cross-correlation

Roshan Mehta

People also ask

1 Answers

dangiankit

Recent Activity

Donate For Us