convert python xgboost dMatrix to numpy ndarray or pandas DataFrame

Tags:

I'm following a xgboost example on their main git at - https://github.com/dmlc/xgboost/blob/master/demo/guide-python/basic_walkthrough.py#L64

in this example they are reading files directly put into dMatrix -

Click to copy

dtrain = xgb.DMatrix('../data/agaricus.txt.train')
dtest = xgb.DMatrix('../data/agaricus.txt.test')

I looked at dMatrix code, seems there is no way to briefly look at how the data is structured - as we normally do in pandas with pandas.DataFrame.head()

in xgboost documentation it mentions that we can convert numpy.ndarray to xgboost.dMatrix - can we somehow convert it back - from xgboost.dMatrix to numpy.ndarray, or perhaps pandas dataFrame? I don't see possible way from their code - but perhaps someone knows a way?

Or is there a way to briefly look at how data is like in xgboost.dMatrix?

Thanks in advance, Howard

886

asked May 18 '16 20:05

howard

2 Answers

To elaborate on @jcaine's answer, you can use sklearn to load the files, then convert them to ordinary numpy arrays:

Click to copy

from sklearn.datasets import load_svmlight_file
train_data = load_svmlight_file('demo/data/agaricus.txt.train')
X = train_data[0].toarray()
y = train_data[1]

I haven't found a way to directly convert from dMatrix to numpy arrays yet.

121

answered Sep 24 '22 12:09

Peter

Howard,

I believe that the xgb.DMatrix assumes the libsvm data format. You can get this data into a sparse CSR matrix using scikit's load_svmlight_file: http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_svmlight_file.html.

You can then partition the response variable and the features using the example at the bottom of the page.

answered Sep 25 '22 12:09

jcaine

Related questions
                            
                                python is not installing dependencies listed in install_requires of setuptools
                            
                                Add line to pandas plot
                            
                                What is the difference between OneVsRestClassifier with SVC and SVC with decision_function_shape='ovr'?
                            
                                Embed an interactive Bokeh in django views
                            
                                Fatal Python error: initfsencoding: unable to load the file system codec
                            
                                Python: Mock a module without importing it or needing it to exist
                            
                                numpy on multicore hardware
                            
                                How to reliably generate Ethernet frame errors in software?
                            
                                PyGObject in Python 3 on windows
                            
                                Use time elapsed as assertion in unit tests
                            
                                How do I document classes without the module name?
                            
                                Sparse Matrix in Numba
                            
                                (list|set|dict) comprehension containing a yield expression does not return a (list|set|dict)
                            
                                How do you actually use a reusable django app in a project?
                            
                                Why does python's built in binary search function run so much faster?
                            
                                Getting console.log output from Firefox with Selenium
                            
                                How to write unit tests for django-rest-framework api's?
                            
                                Show hex value for all bytes, even when ASCII characters are present
                            
                                Django NodeNotFoundError during migration
                            
                                How to not await in a loop with asyncio?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

convert python xgboost dMatrix to numpy ndarray or pandas DataFrame

Tags:

python

pandas

numpy

xgboost

howard

People also ask

2 Answers

Peter

jcaine

Recent Activity

Donate For Us