I'm implementing logistic regression. I managed to get probabilities out of it, and am able to predict a 2 class classification task.
My question is:
For my final model, I have weights and the training data. There are 2 features, so my weight is a vector with 2 rows.
How do I plot this? I saw this post, but I don't quite understand the answer. Do I need a contour plot?
For the gradient, m, consider two distinct points on the decision boundary, (xa1,xa2) and (xb1,xb2), so that m=(xb2−xa2)/(xb1−xa1). Along the boundary line, 0=w1xb1+w2xb2+b−(w1xa1+w2xa2+b)⇒−w2(xb2−xa2)=w1(xb1−xa1)⇒m=−w1w2.
Single-Line Decision Boundary: The basic strategy to draw the Decision Boundary on a Scatter Plot is to find a single line that separates the data-points into regions signifying different classes.
The fundamental application of logistic regression is to determine a decision boundary for a binary classification problem. Although the baseline is to identify a binary decision boundary, the approach can be very well applied for scenarios with multiple classification classes or multi-class classification.
The decision boundary is a line or a plane that separates the target variables into different classes that can be either linear or nonlinear. In the case of a Logistic Regression model, the decision boundary is a straight line.
An advantage of the logistic regression classifier is that once you fit it, you can get probabilities for any sample vector. That may be more interesting to plot. Here's an example using scikit-learn:
import numpy as np from sklearn.linear_model import LogisticRegression from sklearn.datasets import make_classification import matplotlib.pyplot as plt import seaborn as sns sns.set(style="white")
First, generate the data and fit the classifier to the training set:
X, y = make_classification(200, 2, 2, 0, weights=[.5, .5], random_state=15) clf = LogisticRegression().fit(X[:100], y[:100])
Next, make a continuous grid of values and evaluate the probability of each (x, y) point in the grid:
xx, yy = np.mgrid[-5:5:.01, -5:5:.01] grid = np.c_[xx.ravel(), yy.ravel()] probs = clf.predict_proba(grid)[:, 1].reshape(xx.shape)
Now, plot the probability grid as a contour map and additionally show the test set samples on top of it:
f, ax = plt.subplots(figsize=(8, 6)) contour = ax.contourf(xx, yy, probs, 25, cmap="RdBu", vmin=0, vmax=1) ax_c = f.colorbar(contour) ax_c.set_label("$P(y = 1)$") ax_c.set_ticks([0, .25, .5, .75, 1]) ax.scatter(X[100:,0], X[100:, 1], c=y[100:], s=50, cmap="RdBu", vmin=-.2, vmax=1.2, edgecolor="white", linewidth=1) ax.set(aspect="equal", xlim=(-5, 5), ylim=(-5, 5), xlabel="$X_1$", ylabel="$X_2$")
The logistic regression lets your classify new samples based on any threshold you want, so it doesn't inherently have one "decision boundary." But, of course, a common decision rule to use is p = .5. We can also just draw that contour level using the above code:
f, ax = plt.subplots(figsize=(8, 6)) ax.contour(xx, yy, probs, levels=[.5], cmap="Greys", vmin=0, vmax=.6) ax.scatter(X[100:,0], X[100:, 1], c=y[100:], s=50, cmap="RdBu", vmin=-.2, vmax=1.2, edgecolor="white", linewidth=1) ax.set(aspect="equal", xlim=(-5, 5), ylim=(-5, 5), xlabel="$X_1$", ylabel="$X_2$")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With