Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Uncommon error message converting Matrix to Sparse in R

I'm trying to run a LASSO on our dataset, and to do so, I need to convert non-numeric variables to numeric, ideally via a sparse matrix. However, when I try to use the Matrix command, I get the same error:

Error in asMethod(object) : invalid class 'NA' to dup_mMatrix_as_geMatrix

I thought this was due to NA's in my data, so I did an na.omit and got the same error. I tried again with a mini subset of my code and got the same error again:

> sparsecombined <- Matrix(combined1[1:10,],sparse=TRUE)
Error in asMethod(object) : invalid class 'NA' to dup_mMatrix_as_geMatrix

This is the data set I tried to convert with that last line of code:

enter image description here

Is there anything that jumps out that might prevent sparse conversion?

like image 381
Prem Avatar asked Oct 18 '22 13:10

Prem


2 Answers

The easiest way to incorporate categorical variables into a LASSO is to use my glmnetUtils package, which provides a formula/data frame interface to glmnet.

glmnet(ArrDelay ~ ArrTime + uniqueCarrier + TailNum + Origin + Dest,
       data=combined1, sparse=TRUE)

This automatically handles categorical vars via one-hot encoding (also known as dummy variables). It can also use sparse matrices if so desired.

like image 102
Hong Ooi Avatar answered Oct 21 '22 06:10

Hong Ooi


I think the error is due to the fact that you have non-numeric data types in your matrix.

Perhaps first convert your nun-numeric columns like UniqueCarrier to binary vectors using one-hot encoding. And only then convert the matrix to sparse.

Here is my code that I used for that conversion:

    # Convert Genre into binary variables

# Convert genreVector into a corpus in order to parse each text string into a binary vector with 1s representing the presence of a genre and 0s the absence 
library(tm)
library(slam)

convertToBinary <- function(category) {
  genreVector = category
  genreVector = strsplit(genreVector, "(\\s)?,(\\s)?") # separate out commas

  genreVector = gsub(" ", "_", genreVector) # combine DirectorNames with whitespaces

  genreCorpus = Corpus(VectorSource(genreVector))
  #dtm = DocumentTermMatrix(genreCorpus, list(dictionary=genreNames))
  dtm = DocumentTermMatrix(genreCorpus)
  binaryGenreVector = inspect(dtm)

  return(binaryGenreVector)
  #return(data.frame(binaryGenreVector)) # convert binaryGenreVector to dataframe
}

directorBinary = convertToBinary(x$Director)
directorBinaryDF = as.data.frame(directorBinary)

See nograpes answer in

recommenderlab, Error in asMethod(object) : invalid class 'NA' to dup_mMatrix_as_geMatrix

like image 41
DRozen Avatar answered Oct 21 '22 04:10

DRozen