How do you handle graphs like this: 
using scikitlearn's LogisticRegression model. Is there a way to handle these sorts of problems easily using scikitlearn and a standard X, y input that maps to a graph like this?
A promising approach if you really want to use Logistic Regression for this particular setting would be to transform your coordinates from Cartesian system to Polar system. From the visualization, it seems that in that systems you data will be (almost) linearly separable.
This can be done as described here: Python conversion between coordinates
There have been a couple of answers already, but neither of them have mentioned any preprocessing of the data. So I will show both ways of looking at your problem.
First up I'll look at some manifold learning to transform you data into another space
# Do some imports that I'll be using
from sklearn import datasets, manifold, linear_model
from sklearn import model_selection, ensemble, metrics
from matplotlib import pyplot as plt
%matplotlib inline
# Make some data that looks like yours
X, y = datasets.make_circles(n_samples=200, factor=.5,
                             noise=.05)
First of all let's look at your current problem
plt.scatter(X[:, 0], X[:, 1], c=y)
clf = linear_model.LogisticRegression()
scores = model_selection.cross_val_score(clf, X, y)
print scores.mean()
Outputs:

0.440433749257
So you can see this data looks like yours and we get a terrible cross-validated accuracy with logistic regression. So if you're really attached the logistic regression, what we can do is project your data into a different space using some sort of manifold learning, for example:
Xd = manifold.LocallyLinearEmbedding().fit_transform(X)
plt.scatter(Xd[:, 0], Xd[:, 1], c=y)
clf = linear_model.LogisticRegression()
scores = model_selection.cross_val_score(clf, Xd, y)
print scores.mean()
Outputs:

1.0
So you can see that now your data is perfectally linearly seperable from the LocallyLinearEmbedding we get a much better classifier accuracy!
The other option that is available to you, that's been mentioned by other people is using a different model. While there are many options avaiable to you, I'm just going to show an example using RandomForestClassifier. I'm only going to train on half the data so we can evaluate the accuracy on an unbias set. I only used CV previously because it's quick and easy!
clf = ensemble.RandomForestClassifier().fit(X[:100], y[:100])
print metrics.accuracy_score(y[100:], clf.predict(X[100:]))
Outputs:
0.97
So we're getting a good accuracy! If you're interested to see what's going on, we can lift some code from one of the awesome scikit-learn tutorials.
plot_step = 0.02
x_min, x_max = X[:, 0].min() - .1, X[:, 0].max() + .1
y_min, y_max = X[:, 1].min() - .1, X[:, 1].max() + .1
xx, yy = np.meshgrid(np.arange(x_min, x_max, plot_step),
                     np.arange(y_min, y_max, plot_step))
Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
cs = plt.contourf(xx, yy, Z, alpha=0.5)
plt.scatter(X[:, 0], X[:, 1], c=y)
Outputs:

So this shows the areas of your space that are being classified into each class using the Random Forest model.
Two ways to solve the same problem. I leave working out which is best as an exercise to the reader...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With