Calculate Akaike Information Criteria (AIC) by hand in Python

Question

As far as I know, there is no AIC package in Python. Therefore, I am trying to calculate it by hand to find the optimal number of clusters in my dataset (I'm using K-means for clustering)

I'm following the equation on Wiki:

AIC = 2k - 2ln(maximum likelihood)

Below is my current code:

range_n_clusters = range(2, 10)
for n_clusters in range_n_clusters:
    model = cluster.KMeans(n_clusters=n_clusters, init='k-means++', n_init=10, max_iter=300, tol=0.0001,
                           precompute_distances='auto', verbose=0, random_state=None, copy_x=True, n_jobs=1)
    model.fit(X)
    centers = model.cluster_centers_
    labels = model.labels_
    likelihood = ?????
    aic = 2 * len(X.columns) - 2 * likelihood
    print(aic)

Any pointers on how to calculate the likelihood value?

// UPDATED: Using Gaussian Mixture Model to calculate AIC:

enter image description here

Isn't it supposed to look like a curve? (instead of a straight line)

My plotting code:

def aic(X):
  range_n_clusters = range(2, 10)
  aic_list = []
  for n_clusters in range_n_clusters:
     model = mixture.GaussianMixture(n_components=n_clusters, init_params='kmeans')
     model.fit(X)
     aic_list.append(model.aic(X))
  plt.plot(range_n_clusters, aic_list, marker='o')
  plt.show()

agtoever · Accepted Answer

I'm assuming you use scikit-learn to do the job. In that case, there is a model related to K-means, called Gaussian Mixture models. These models can take a K-means clustering to initialise. After that, it models Gauss curves around the K-means centres. This creates a probability density function that is a generalisation for your input data. The advantage of using this, is that you can calculate the likelihood and thereby the AIC.

So you can do:

from sklearn.mixture import GaussianMixture
model = GaussianMixture(n_components=n_clusters, init_params='kmeans')
model.fit(X)
print(model.aic(X))

Easy as Py.

So you can do:

from sklearn.mixture import GaussianMixture
model = GaussianMixture(n_components=n_clusters, init_params='kmeans')
model.fit(X)
print(model.aic(X))

Easy as Py.

Calculate Akaike Information Criteria (AIC) by hand in Python

Tags:

python

data-analysis

scikit-learn

Forrest

1 Answers

agtoever

Recent Activity

Donate For Us

Calculate Akaike Information Criteria (AIC) by hand in Python

Tags:

python

data-analysis

scikit-learn

Forrest

1 Answers

agtoever

Related questions

Recent Activity

Donate For Us