I trying to classify Text documents into number of categories. My below code works fine
matrix[[i]] <- create_matrix(trainingdata[[i]][,1], language="english",removeNumbers=FALSE,stemWords=FALSE,weighting=weightTf,minWordLength=3)
container[[i]] <- create_container(matrix[[i]],trainingdata[[i]][,2],trainSize=1:50,testSize=51:100) ,
models[[i]] <- train_models(container[[i]], algorithms=c("MAXENT","SVM"))
results[[i]] = classify_models(container[[i]],models[[i]])
When i try to the below code to get Precision, recall, accuracy values:
analytic[[i]] <- create_analytics(container[[i]], results[[i]])
I get the following error:
Error in `row.names<-.data.frame`(`*tmp*`, value = c(NA_real_, NA_real_ :
duplicate 'row.names' are not allowed
My Categories
are in text
format.
If i convert those categories
into Numeric
- the above code works fine.
Is there a work around to keep the categories in text
format and get Precision, recall, accuracy values.
My objective is to get Precision, recall, accuracy values and Confusion matrix for multi-class classifier. Is there any other package to get the above values for Multi- Class Text classifier (one vs. all )
As user3294343 commented, it worked for me converting my class field to a factor, and then to numeric, as follows:
doc_matrix <- create_matrix(dataset.arff$text, language="english", removeNumbers=TRUE, stemWords=TRUE, removeSparseTerms=.998)
container <- create_container(doc_matrix, as.numeric(factor(dataset.arff$"@@class@@")), trainSize=1:1500, testSize=1501:1999, virgin=FALSE)
That solved the error for me.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With