overplot multiple sets of data with hexbin

Tags:

I am doing some KMeans clustering on a large and really dense data set and I am trying to figure out the best way to visualize the clusters.

In 2D, it looks like hexbin would do a good job but I am unable to overplot the clusters on the same figure. I want to use hexbin on each of the clusters separately with a different color map for each but for some reason this does not seem to work. The image shows what I get when I try to plot a second and third sets of data.

Any suggestions on how to go about this? enter image description here

After some fiddling, I was able to make this with Seaborn's kdeplot

enter image description here

550

asked Jul 20 '15 18:07

Labibah

1 Answers

Personally I think your solution from kdeplot is quite good (although I would work a bit on the parts were clusters intercept). In any case as response to your question you can provide a minimum count to hexbin (leaving all empty cells as transparent). Here's a small function to produce random clusters for anyone that might want to make some experiments (in the comments your question seemed to build a lot of interest from users, fell free to use it):

Click to copy

import numpy as np
import matplotlib.pyplot as plt

# Building random clusters
def cluster(number):
    def clusterAroundX(a,b,number):
        x = np.random.normal(size=(number,))
        return (x-x.min())*(b-a)/(x.max()-x.min())+a
    def clusterAroundY(x,m,b):
        y = x.copy()
        half   = (x.max()-x.min())/2
        middle = half+x.min()
        for i in range(x.shape[0]):
            std = (x.max()-x.min())/(2+10*(np.abs(middle-x[i])/half))
            y[i] = np.random.normal(x[i]*m+b,std)
        return y + np.abs(y.min())
    m,b = np.random.randint(-700,700)/100,np.random.randint(0,50)
    print(m,b)
    f = np.random.randint(0,30)
    l = f + np.random.randint(10,50)
    x = clusterAroundX(f,l,number)
    y = clusterAroundY(x,m,b)
    return x,y

, using this code I've produced a few cluster a plotted them with scatterplot (I usually use this for my own cluster analysis, but I guess I should take a look into seaborn), hexbin, imshow (change for pcolormesh for more control) and contourf:

Click to copy

clusters = 5
samples  = 300
xs,ys = [],[]
for i in range(clusters):
    x,y = cluster(samples)
    xs.append(x)
    ys.append(y)

# SCATTERPLOT
alpha = 1
for i in range(clusters):
    x,y = xs[i],ys[i]
    color = (np.random.randint(0,255)/255,np.random.randint(0,255)/255,np.random.randint(0,255)/255)
    plt.scatter(x,y,c = color,s=90,alpha=alpha)
plt.show()

# HEXBIN
# Hexbin seems a bad choice because I think you cant control the size of the hexagons.
alpha = 1
cmaps = ['Reds','Blues','Purples','Oranges','Greys']
for i in range(clusters):
    x,y = xs[i],ys[i]
    plt.hexbin(x,y,gridsize=20,cmap=cmaps.pop(),mincnt=1)
plt.show()

# IMSHOW
alpha = 1
cmaps = ['Reds','Blues','Purples','Oranges','Greys']
xmin,xmax = min([i.min() for i in xs]), max([i.max() for i in xs])
ymin,ymax = min([i.min() for i in ys]), max([i.max() for i in ys])
nums = 30
xsize,ysize  = (xmax-xmin)/nums,(ymax-ymin)/nums
im = [np.zeros((nums+1,nums+1)) for i in range(len(xs))]
def addIm(im,x,y):
    for i,j in zip(x,y):
        im[i,j] = im[i,j]+1
    return im
for i in range(len(xs)):
    xo,yo = np.int_((xs[i]-xmin)/xsize),np.int_((ys[i]-ymin)/ysize)
    #im[i][xo,yo] = im[i][xo,yo]+1
    im[i] = addIm(im[i],xo,yo)
    im[i] = np.ma.masked_array(im[i],mask=(im[i]==0))
for i in range(clusters):
    # REPLACE BY pcolormesh if you need more control over image locations.
    plt.imshow(im[i].T,origin='lower',interpolation='nearest',cmap=cmaps.pop())
plt.show()

# CONTOURF
cmaps = ['Reds','Blues','Purples','Oranges','Greys']
for i in range(clusters):
    # REPLACE BY pcolormesh if you need more control over image locations.
    plt.contourf(im[i].T,origin='lower',interpolation='nearest',cmap=cmaps.pop())
plt.show()

, the result are the folloing:

scatterplot clusters

hexbin clusters

imshow clusters

countourf clusters

172

answered Sep 22 '22 17:09

armatita

Related questions
                            
                                Read and reverse data chunk by chunk from a csv file and copy to a new csv file
                            
                                real time collaboration in google colaboratory
                            
                                Add multiple text labels from DataFrame columns in Plotly
                            
                                WARNING - State of this instance has been externally set to success. Taking the poison pill
                            
                                How to retrieve the selected text from the active window
                            
                                Emitting Cythonic warnings?
                            
                                How to embed Lua inside Python?
                            
                                Get "flat" member output for sphinx automodule
                            
                                Efficiently determine "how sorted" a list is, eg. Levenshtein distance
                            
                                Python HTTPS against Azure service management API fails on Windows
                            
                                Why does iterative elementwise array multiplication slow down in numpy?
                            
                                Why is numpy.array() is sometimes very slow?
                            
                                Extract images from PDF using python PyPDF2
                            
                                django-admin.py and python path on EC2 Amazon Beanstalk
                            
                                How can I make a discrete state Markov model with pymc?
                            
                                Is it possible to write the value of a variable in a %%writefile magic command in IPython notebook?
                            
                                How to handle dependency on scipy in setup.py
                            
                                Cannot use 128bit float in Python on 64bit architecture
                            
                                Creating regular Delaunay grid in with scipy
                            
                                Using OpenCV Python, How would you make all black pixels transparent, and then overlay it over original image

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

overplot multiple sets of data with hexbin

Tags:

python

matplotlib

cluster-analysis

seaborn

scatter-plot

Labibah

People also ask

1 Answers

armatita

Recent Activity

Donate For Us