Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create_Analytics in RTextTools

I trying to classify Text documents into number of categories. My below code works fine

matrix[[i]] <- create_matrix(trainingdata[[i]][,1], language="english",removeNumbers=FALSE,stemWords=FALSE,weighting=weightTf,minWordLength=3)                              
container[[i]] <- create_container(matrix[[i]],trainingdata[[i]][,2],trainSize=1:50,testSize=51:100) ,
models[[i]] <- train_models(container[[i]], algorithms=c("MAXENT","SVM"))
results[[i]] = classify_models(container[[i]],models[[i]])

When i try to the below code to get Precision, recall, accuracy values:

analytic[[i]]  <- create_analytics(container[[i]], results[[i]])

I get the following error:

Error in `row.names<-.data.frame`(`*tmp*`, value = c(NA_real_, NA_real_ : 
  duplicate 'row.names' are not allowed

My Categories are in text format. If i convert those categories into Numeric - the above code works fine.

Is there a work around to keep the categories in text format and get Precision, recall, accuracy values.

My objective is to get Precision, recall, accuracy values and Confusion matrix for multi-class classifier. Is there any other package to get the above values for Multi- Class Text classifier (one vs. all )

like image 451
Prasanna Nandakumar Avatar asked May 09 '14 09:05

Prasanna Nandakumar


1 Answers

As user3294343 commented, it worked for me converting my class field to a factor, and then to numeric, as follows:

doc_matrix <- create_matrix(dataset.arff$text, language="english", removeNumbers=TRUE, stemWords=TRUE, removeSparseTerms=.998)
container <- create_container(doc_matrix, as.numeric(factor(dataset.arff$"@@class@@")), trainSize=1:1500, testSize=1501:1999, virgin=FALSE)

That solved the error for me.

like image 140
dsg Avatar answered Oct 06 '22 00:10

dsg