I am referring to this link to Feature Transformation using tree ensembles for the context. Specifically for below part of code, in the sample of the link, the method of (1) using Boosting tree to generate feature, then using LR to train, outperforms (2) using Boosting tree itself. Questions, <ol> <li>Wondering if it is true in general case using Boosting tree to generate feature (and using another classifier to classify) is better than using Boosting tree to do classification itself?</li> <li> And also wondering why using Boosting tree to generate feature, then using LR to train, outperforms using Boosting tree itself? <pre class="prettyprint"><code>grd = GradientBoostingClassifier(n_estimators=n_estimator) grd_enc = OneHotEncoder() grd_lm = LogisticRegression() grd.fit(X_train, y_train) grd_enc.fit(grd.apply(X_train)[:, :, 0]) grd_lm.fit(grd_enc.transform(grd.apply(X_train_lr)[:, :, 0]), y_train_lr) </code></pre> </li> </ol>

Interesting sources are paper_1 and paper_2 and additional references in them. So to answer your questions: <ol> <li>Very general statement, looking at some experimental results in the above papers there seem to be some exceptions. However, most of the time it does improve the score.</li> <li>The main idea behind doing so is to map features into a space where samples are linearly separable. If it really is the case, then linear classifiers shine.</li> </ol>

Using Boosting tree to generate feature in sklearn

Tags:

I am referring to this link to Feature Transformation using tree ensembles for the context.

Specifically for below part of code, in the sample of the link, the method of (1) using Boosting tree to generate feature, then using LR to train, outperforms (2) using Boosting tree itself. Questions,

Wondering if it is true in general case using Boosting tree to generate feature (and using another classifier to classify) is better than using Boosting tree to do classification itself?

And also wondering why using Boosting tree to generate feature, then using LR to train, outperforms using Boosting tree itself?

grd = GradientBoostingClassifier(n_estimators=n_estimator)
grd_enc = OneHotEncoder()
grd_lm = LogisticRegression()
grd.fit(X_train, y_train)
grd_enc.fit(grd.apply(X_train)[:, :, 0])
grd_lm.fit(grd_enc.transform(grd.apply(X_train_lr)[:, :, 0]), y_train_lr)

324

asked May 01 '18 04:05

Lin Ma

1 Answers

Interesting sources are paper_1 and paper_2 and additional references in them.

So to answer your questions:

Very general statement, looking at some experimental results in the above papers there seem to be some exceptions. However, most of the time it does improve the score.
The main idea behind doing so is to map features into a space where samples are linearly separable. If it really is the case, then linear classifiers shine.

145

answered Sep 28 '22 17:09

Jan K

Related questions
                            
                                Migration from Google cloud datastore to Google cloud sql
                            
                                Is it possible to completely disable the Doze mode and Standby mode?
                            
                                Material UI (v1.0) : Force a new line for a TextField
                            
                                Error Creating JWT Token using RSA Security Key with key size less than 2048
                            
                                How are built-in types protected from overwriting (assigning to) their methods?
                            
                                Is there a specification for a floating point’s exponent bias?
                            
                                Why does the Glasgow Haskell Compiler report multiple type errors here?
                            
                                Python: Import excel file using relative path
                            
                                How to make JUnit take any Lambda expression
                            
                                Requests through service-worker are done twice
                            
                                Python: Grouping by time interval
                            
                                Use JUnit5 Tags for Spek

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With