xgboost predict method returns the same predicted value for all rows

Tags:

I've created an xgboost classifier in Python:

train is a pandas dataframe with 100k rows and 50 features as columns. target is a pandas series

xgb_classifier = xgb.XGBClassifier(nthread=-1, max_depth=3, silent=0, 
                                   objective='reg:linear', n_estimators=100)
xgb_classifier = xgb_classifier.fit(train, target)

predictions = xgb_classifier.predict(test)

However, after training, when I use this classifier to predict values the entire results array is the same number. Any idea why this would be happening?

Data clarification: ~50 numerical features with a numerical target

I've also tried RandomForestRegressor from sklearn with the same data and it does give realistic predictions. Perhaps a legitimate bug in the xgboost implementation?

873

asked Nov 02 '15 03:11

mistakeNot

2 Answers

This question has received several responses including on this thread as well as here and here.

I was having a similar issue with both XGBoost and LGBM. For me, the solution was to increase the size of the training dataset.

I was training on a local machine using a random sample (~0.5%) of a large sparse dataset (200,000 rows and 7000 columns) because I did not have enough local memory for the algorithm. It turned out that for me, the array of predicted values was just an array of the average values of the target variable. This suggests to me that the model may have been underfitting. One solution to an underfitting model is to train your model on more data, so I tried my analysis on a machine with more memory and the issue was resolved: my prediction array was no longer an array of average target values. On the other hand, the issue could simply have been that the slice of predicted values I was looking at were predicted from training data with very little information (e.g. 0's and nan's). For training data with very little information, it seems reasonable to predict the average value of the target feature.

None of the other suggested solutions I came across were helpful for me. To summarize some of the suggested solutions included: 1) check if gamma is too high 2) make sure your target labels are not included in your training dataset 3) max_depth may be too small.

answered Oct 20 '22 04:10

Blane

One of the reasons for the same is that you're providing a high penalty through parameter gamma. Compare the mean value of your training response variable and check if the prediction is close to this. If yes then the model is restricting too much on the prediction to keep train-rmse and val-rmse as close as possible. Your prediction is the simplest with higher value of gamma. So you'll get the simplest model prediction like mean of training set as prediction or naive prediction.

answered Oct 20 '22 04:10

Shahidur

Related questions
                            
                                if __name__ == '__main__' not working ipython
                            
                                Poisson Point Process in Python 3 with numpy, without scipy
                            
                                how to make an exception for broken pipe errors on flask, when the client disconnects prematurely?
                            
                                Asynchronous RabbitMQ consumer with aioamqp
                            
                                Tab completion in ipython for list elements
                            
                                How to add punctuation to text using python?
                            
                                How to prevent overwritting Python Built-in Function by accident?
                            
                                Anaconda Python: How to install missing dependency?
                            
                                Group By in mongoengine EmbeddedDocumentListField
                            
                                python pandas: how to avoid chained assignment
                            
                                get mask from contour with OpenCV
                            
                                How to change window sizes/dimensions via Python
                            
                                SQLAlchemy correlated update for multiple columns
                            
                                IPython: Configure Base Url Path for All Request
                            
                                Create scipy curve fitting definitions for fourier series dynamically
                            
                                Why does isinstance([1, 2, 3], List[str]) evaluate to true?
                            
                                pickling scipy interp1d spline
                            
                                How to extract white region in an image
                            
                                How to bring figure legend to front?
                            
                                Evaluating function using numpy array returns inf and nan

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

xgboost predict method returns the same predicted value for all rows

Tags:

python

machine-learning

xgboost

mistakeNot

People also ask

2 Answers

Blane

Shahidur

Recent Activity

Donate For Us