I want to convert CSV coordinate format (COO) data into a local matrix. Currently I'm first converting them to CoordinateMatrix
and then converting to LocalMatrix
. But is there a better way to do this?
Example data:
0,5,5.486978435
0,3,0.438472867
0,0,6.128832321
0,7,5.295923198
0,1,7.738270234
Code:
var loadG = sqlContext.read.option("header", "false").csv("file.csv").rdd.map("mapfunctionCreatingMatrixEntryOutOfRow")
var G = new CoordinateMatrix(loadG)
var matrixG = G.toBlockMatrix().toLocalMatrix()
To read multiple CSV files in Spark, just use textFile() method on SparkContext object by passing all file names comma separated. The below example reads text01. csv & text02.
Using csv("path") or format("csv"). load("path") of DataFrameReader, you can read a CSV file into a PySpark DataFrame, These methods take a file path to read from as an argument.
A LocalMatrix
will be stored on a single machine and hence not make use of Spark's strengths. In other words, using Spark seems a bit wasteful, although still possible.
The easiest way to get the CSV file to a LocalMatrix
is to first read the CSV with Scala, not Spark:
val entries = Source.fromFile("data.csv").getLines()
.map(_.split(","))
.map(a => (a(0).toInt, a(1).toInt, a(2).toDouble))
.toSeq
The SparseMatrix
variant of the LocalMatrix
has a method for reading COO formatted data. The number of rows and columns need to be specified to use this. Since the matrix is sparse this should in most cases be done by hand but it's possible to get the highest values in the data as follows:
val numRows = entries.map(_._1).max + 1
val numCols = entries.map(_._2).max + 1
Then create the matrix:
val matrixG = SparseMatrix.fromCOO(numRows, numCols, entries)
The matrix will be stored in CSC format on the machine. Printing the example input above will yield the following output:
1 x 8 CSCMatrix
(0,0) 6.128832321
(0,1) 7.738270234
(0,3) 0.438472867
(0,5) 5.486978435
(0,7) 5.295923198
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With