This is an extension to an existing question: Convert table into matrix by column names
I am using the final answer: https://stackoverflow.com/a/2133898/1287275
The original CSV file matrix has about 1.5M rows with three columns ... row index, column index, and a value. All numbers are long integers. The underlying matrix is a sparse matrix about 220K x 220K in size with an average of about 7 values per row.
The original read.table works just fine.
x <- read.table("/users/wallace/Hadoop_Local/reference/DiscoveryData6Mo.csv", header=TRUE);
My problem comes when I do the reshape command.
reshape(x, idvar="page_id", timevar="reco", direction="wide")
The CPU hits 100% and there it sits forever. The machine (a mac) has more memory than R is using. I don't see why it should take so long to construct a sparse matrix.
I am using the default matrix package. I haven't installed anything extra. I just downloaded R a few days ago, so I should have the latest version.
Suggestions?
Thanks, Wallace
To convert a table into matrix in R, we can use apply function with as. matrix. noquote function.
Convert a Data Frame into a Numeric Matrix in R Programming – data. matrix() Function. data. matrix() function in R Language is used to create a matrix by converting all the values of a Data Frame into numeric mode and then binding them as a matrix.
We use colnames() function for renaming the matrix column in R. It is quite simple to use the colnames() function. If you want to know more about colnames() function, then you can get help about it in R Studio using the command help(colnames) or ? colnames().
I would use the sparseMatrix
function from the Matrix
package. The typical usage is sparseMatrix(i, j, x)
where i
, j
, and x
are three vectors of same length: respectively, the row indices, col indices, and values of the non-zero elements in the matrix. Here is an example where I have tried to match variable names and dimensions to your specifications:
num.pages <- 220000
num.recos <- 230000
N <- 1500000
df <- data.frame(page_id = sample.int(num.pages, N, replace=TRUE),
reco = sample.int(num.recos, N, replace=TRUE),
value = runif(N))
head(df)
# page_id reco value
# 1 33688 48648 0.3141030
# 2 78750 188489 0.5591290
# 3 158870 13157 0.2249552
# 4 38492 56856 0.1664589
# 5 70338 138006 0.7575681
# 6 160827 68844 0.8375410
library("Matrix")
mat <- sparseMatrix(i = df$page_id,
j = df$reco,
x = df$value,
dims = c(num.pages, num.recos))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With