Matrix Multiplication in Apache Spark [closed]

1 Answers

All depends on the input data and dimensions but generally speaking what you want is not a RDD but one of the distributed data structures from org.apache.spark.mllib.linalg.distributed. At this moment it provides four different implementations of the DistributedMatrix

IndexedRowMatrix - can be created directly from a RDD[IndexedRow] where IndexedRow consist of row index and org.apache.spark.mllib.linalg.Vector

import org.apache.spark.mllib.linalg.{Vectors, Matrices} import org.apache.spark.mllib.linalg.distributed.{IndexedRowMatrix,   IndexedRow}  val rows =  sc.parallelize(Seq(   (0L, Array(1.0, 0.0, 0.0)),   (0L, Array(0.0, 1.0, 0.0)),   (0L, Array(0.0, 0.0, 1.0))) ).map{case (i, xs) => IndexedRow(i, Vectors.dense(xs))}  val indexedRowMatrix = new IndexedRowMatrix(rows)

RowMatrix - similar to IndexedRowMatrix but without meaningful row indices. Can be created directly from RDD[org.apache.spark.mllib.linalg.Vector]
```
import org.apache.spark.mllib.linalg.distributed.RowMatrix  val rowMatrix = new RowMatrix(rows.map(_.vector))       
```

BlockMatrix - can be created from RDD[((Int, Int), Matrix)] where first element of the tuple contains coordinates of the block and the second one is a local org.apache.spark.mllib.linalg.Matrix

val eye = Matrices.sparse(   3, 3, Array(0, 1, 2, 3), Array(0, 1, 2), Array(1, 1, 1))  val blocks = sc.parallelize(Seq(    ((0, 0), eye), ((1, 1), eye), ((2, 2), eye)))  val blockMatrix = new BlockMatrix(blocks, 3, 3, 9, 9)

CoordinateMatrix - can be created from RDD[MatrixEntry] where MatrixEntry consist of row, column and value.

import org.apache.spark.mllib.linalg.distributed.{CoordinateMatrix,   MatrixEntry}  val entries = sc.parallelize(Seq(    (0, 0, 3.0), (2, 0, -5.0), (3, 2, 1.0),    (4, 1, 6.0), (6, 2, 2.0), (8, 1, 4.0)) ).map{case (i, j, v) => MatrixEntry(i, j, v)}  val coordinateMatrix = new CoordinateMatrix(entries, 9, 3)

First two implementations support multiplication by a local Matrix:

val localMatrix = Matrices.dense(3, 2, Array(1.0, 2.0, 3.0, 4.0, 5.0, 6.0))  indexedRowMatrix.multiply(localMatrix).rows.collect // Array(IndexedRow(0,[1.0,4.0]), IndexedRow(0,[2.0,5.0]), //   IndexedRow(0,[3.0,6.0]))

and the third one can be multiplied by an another BlockMatrix as long as number of columns per block in this matrix matches number of rows per block of the other matrix. CoordinateMatrix doesn't support multiplications but is pretty easy to create and transform to other types of distributed matrices:

blockMatrix.multiply(coordinateMatrix.toBlockMatrix(3, 3))

Each type has its own strong and weak sides and there are some additional factors to consider when you use sparse or dense elements (Vectors or block Matrices). Multiplying by a local matrix is usually preferable since it doesn't require expensive shuffling.

You can find more details about each type in the MLlib Data Types guide.

186

answered Sep 23 '22 22:09

zero323

Related questions
                            
                                Check for repeated characters in a string Javascript
                            
                                List comprehension with else pass
                            
                                Avoding instanceof in Java
                            
                                Where should I initialize pg-promise
                            
                                Can I have multiple values in one HTML "data-" element?
                            
                                Get date from week number in Google Sheets
                            
                                Split at last occurrence of character then join
                            
                                How to send post parameters dynamically (or in loop) in OKHTTP 3.x in android?
                            
                                How to use Postman for Laravel $_POST request
                            
                                Configuring AutoMapper 4.2 with built in IoC in ASP.NET Core 1.0 MVC6
                            
                                Clear browser Cookies with Selenium WebDriver Java bindings
                            
                                Stream highWaterMark misunderstanding

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Matrix Multiplication in Apache Spark [closed]

Tags:

Jigar

People also ask

1 Answers

zero323

Recent Activity

Donate For Us