Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

xgboost error message about numerical variable and label

Tags:

r

xgboost

gbm

I use the xgboost function in R, and I get the following error message

bst <- xgboost(data = germanvar, label = train$Creditability, max.depth = 2, eta = 1,nround = 2, objective = "binary:logistic")

Error in xgb.get.DMatrix(data, label, missing, weight) : 
  xgboost only support numerical matrix input,
           use 'data.matrix' to transform the data.
In addition: Warning message:
In xgb.get.DMatrix(data, label, missing, weight) :
  xgboost: label will be ignored.

Following is my full code.

credit<-read.csv("http://freakonometrics.free.fr/german_credit.csv", header=TRUE)
library(caret)
set.seed(1000)
intrain<-createDataPartition(y=credit$Creditability, p=0.7, list=FALSE) 
train<-credit[intrain, ]
test<-credit[-intrain, ]


germanvar<-train[,2:21]
str(germanvar)
bst <- xgboost(data = germanvar, label = train$Creditability, max.depth = 2, eta = 1,
               nround = 2, objective = "binary:logistic")

Data has a mixture of continuous and categorical variables.

However, because of the error message that only continuous variables can be used, all the variables were recognized as continuous, but the error message reappears.

How can I solve this problem???

like image 328
신익수 Avatar asked Oct 28 '25 16:10

신익수


1 Answers

So if you have categorical variables that are represented as numbers, it is not an ideal representation. But with deep enough trees you can get away with it. The trees will partition it eventually. I don't prefer that approach but it keeps you columns minimal, and can succeed given the right setup.

Note that xgboost takes numeric matrix as data, and numeric vector as label.

NOT INTEGERS :)

The following code will train with the inputs cast properly

credit<-read.csv("http://freakonometrics.free.fr/german_credit.csv", header=TRUE)
library(caret)
set.seed(1000)
intrain<-createDataPartition(y=credit$Creditability, p=0.7, list=FALSE) 
train<-credit[intrain, ]
test<-credit[-intrain, ]


germanvar<-train[,2:21]
label <- as.numeric(train$Creditability) ## make it a numeric NOT integer
data <-  as.matrix(germanvar)  # to matrix
mode(data) <- 'double'  # to numeric i.e double precision


bst <- xgboost(data = data, label = label, max.depth = 2, eta = 1,
               nround = 2, objective = "binary:logistic")
like image 106
T. Scharf Avatar answered Oct 31 '25 05:10

T. Scharf