Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: K-modes explanation

Tags:

python

I am using the kmodes python library. Can some one explain what the parameters mean?

Link: https://github.com/nicodv/kmodes#huang97

km = kmodes.KModes(n_clusters=4, init='Huang', n_init=5, verbose=1) 

I know n_clusters is the number of clusters to group the data into, but what are the other parameters?

like image 511
wwjdm Avatar asked Mar 07 '17 03:03

wwjdm


1 Answers

From the source code:

Parameters
    -----------
    n_clusters : int, optional, default: 8
        The number of clusters to form as well as the number of
        centroids to generate.
    max_iter : int, default: 300
        Maximum number of iterations of the k-modes algorithm for a
        single run.
    cat_dissim : func, default: matching_dissim
        Dissimilarity function used by the algorithm for categorical variables.
        Defaults to the matching dissimilarity function.
    init : {'Huang', 'Cao', 'random' or an ndarray}, default: 'Cao'
        Method for initialization:
        'Huang': Method in Huang [1997, 1998]
        'Cao': Method in Cao et al. [2009]
        'random': choose 'n_clusters' observations (rows) at random from
        data for the initial centroids.
        If an ndarray is passed, it should be of shape (n_clusters, n_features)
        and gives the initial centroids.
    n_init : int, default: 10
        Number of time the k-modes algorithm will be run with different
        centroid seeds. The final results will be the best output of
        n_init consecutive runs in terms of cost.
    verbose : int, optional
        Verbosity mode.

So init is just the method used for initialisation, while n_init is the number of times the algorithm will be run, with the best output selected from those independent runs.

verbose just dictates how much output gets passed to stdout (i.e. telling you what stage the algorithm is at etc).

like image 157
Andrew Guy Avatar answered Nov 22 '22 03:11

Andrew Guy