Python: Creating a 2D histogram from a numpy matrix

Tags:

I'm new to python.

I have a numpy matrix, of dimensions 42x42, with values in the range 0-996. I want to create a 2D histogram using this data. I've been looking at tutorials, but they all seem to show how to create 2D histograms from random data and not a numpy matrix.

So far, I have imported:

import numpy as np
import matplotlib.pyplot as plt
from matplotlib import colors

I'm not sure if these are correct imports, I'm just trying to pick up what I can from tutorials I see.

I have the numpy matrix M with all of the values in it (as described above). In the end, i want it to look something like this:

2D histogram

obviously, my data will be different, so my plot should look different. Can anyone give me a hand?

Edit: For my purposes, Hooked's example below, using matshow, is exactly what I'm looking for.

607

asked Nov 26 '14 18:11

Kestrel

4 Answers

If you have the raw data from the counts, you could use plt.hexbin to create the plots for you (IMHO this is better than a square lattice): Adapted from the example of hexbin:

import numpy as np
import matplotlib.pyplot as plt

n = 100000
x = np.random.standard_normal(n)
y = 2.0 + 3.0 * x + 4.0 * np.random.standard_normal(n)
plt.hexbin(x,y)

plt.show()

enter image description here

If you already have the Z-values in a matrix as you mention, just use plt.imshow or plt.matshow:

XB = np.linspace(-1,1,20)
YB = np.linspace(-1,1,20)
X,Y = np.meshgrid(XB,YB)
Z = np.exp(-(X**2+Y**2))
plt.imshow(Z,interpolation='none')

enter image description here

194

answered Sep 22 '22 20:09

Hooked

If you have not only the 2D histogram matrix but also the underlying (x, y) data, then you could make a scatter plot of the (x, y) points and color each point according to its binned count value in the 2D-histogram matrix:

import numpy as np
import matplotlib.pyplot as plt

n = 10000
x = np.random.standard_normal(n)
y = 2.0 + 3.0 * x + 4.0 * np.random.standard_normal(n)
xedges, yedges = np.linspace(-4, 4, 42), np.linspace(-25, 25, 42)
hist, xedges, yedges = np.histogram2d(x, y, (xedges, yedges))
xidx = np.clip(np.digitize(x, xedges), 0, hist.shape[0]-1)
yidx = np.clip(np.digitize(y, yedges), 0, hist.shape[1]-1)
c = hist[xidx, yidx]
plt.scatter(x, y, c=c)

plt.show()

Example scatter plot of 2D histogram

answered Sep 19 '22 20:09

unutbu

I'm a big fan of the 'scatter histogram', but I don't think the other solutions fully do them justice. Here is a module that implements them. The major advantage of the scatter_hist2d function compared to the other solutions is that it sorts the points by the hist data (see the mode argument). This means that the result looks more like a traditional histogram (i.e., you don't get the chaotic overlap of markers in different bins).

The scatter-histogram

MCVE for this figure (using the hist_scatter module):

import numpy as np
import matplotlib.pyplot as plt
from hist_scatter import scatter_hist2d

fig = plt.figure(figsize=[5, 4])
ax = plt.gca()

x = randgen.randn(npoint)
y = 2 + 3 * x + 4 * randgen.randn(npoint)

scat = scatter_hist2d(x, y,
                      bins=[np.linspace(-4, 4, 42),
                            np.linspace(-25, 25, 42)],
                      s=5,
                      cmap=plt.get_cmap('viridis'))
ax.axhline(0, color='k', linestyle='--', zorder=3, linewidth=0.5)
ax.axvline(0, color='k', linestyle='--', zorder=3, linewidth=0.5)
plt.colorbar(scat)

Room for improvement?

The primary drawback of this approach is that the points in the densest areas overlap the points in lower density areas, leading to somewhat of a misrepresentation of the areas of each bin. I spent quite a bit of time exploring two approaches for resolving this:

using smaller markers for higher density bins
applying a 'clipping' mask to each bin

The first one gives results that are way too crazy. The second one looks nice -- especially if you only clip bins that have >~20 points -- but it is extremely slow (this figure took about a minute).

So, ultimately I've decided that by carefully selecting the marker size and bin size (s and bins), you can get results that are visually pleasing and not too bad in terms of misrepresenting the data. After all, these 2D histograms are usually intended to be visual aids to the underlying data, not strictly quantitative representations of it. Therefore, I think this approach is far superior to 'traditional 2D histograms' (e.g., plt.hist2d or plt.hexbin), and I presume that if you've found this page you're also not a fan of traditional (single color) scatter plots.

If I were king of science, I'd make sure all 2D histograms did something like this for the rest of forever.

UPDATE

I added a scatter_hexbin function to the module.

Comparison of different 2-D histogram types.

answered Sep 20 '22 20:09

farenorth

@unutbu's answer contains a mistake: xidx and yidx are calculated the wrong way (at least on my data sample). The correct way should be:

xidx = np.clip(np.digitize(x, xedges) - 1, 0, hist.shape[0] - 1)
yidx = np.clip(np.digitize(y, yedges) - 1, 0, hist.shape[1] - 1)

As the return dimension of np.digitize that we are interested in is between 1 and len(xedges) - 1, but the c = hist[xidx, yidx] needs indices between 0 and hist.shape - 1.

Below is the comparison of results. As you can see you get similar but not the same result.

import numpy as np
import matplotlib.pyplot as plt

fig = plt.figure()
ax1 = fig.add_subplot(211)
ax2 = fig.add_subplot(212)

n = 10000
x = np.random.standard_normal(n)
y = 2.0 + 3.0 * x + 4.0 * np.random.standard_normal(n)
xedges, yedges = np.linspace(-4, 4, 42), np.linspace(-25, 25, 42)
hist, xedges, yedges = np.histogram2d(x, y, (xedges, yedges))

xidx = np.clip(np.digitize(x, xedges), 0, hist.shape[0] - 1)
yidx = np.clip(np.digitize(y, yedges), 0, hist.shape[1] - 1)
c = hist[xidx, yidx]
old = ax1.scatter(x, y, c=c, cmap='jet')

xidx = np.clip(np.digitize(x, xedges) - 1, 0, hist.shape[0] - 1)
yidx = np.clip(np.digitize(y, yedges) - 1, 0, hist.shape[1] - 1)

c = hist[xidx, yidx]
new = ax2.scatter(x, y, c=c, cmap='jet')


plt.show()

Same Same but Different

answered Sep 22 '22 20:09

TheoryX

Related questions
                            
                                using import inside class
                            
                                Pandas round is not working for DataFrame
                            
                                Python Gmail API 'not JSON serializable'
                            
                                Tensorflow Deep MNIST: Resource exhausted: OOM when allocating tensor with shape[10000,32,28,28]
                            
                                How to get parent folder name of current directory?
                            
                                How to remove special characters except space from a file in python?
                            
                                Install PyTorch from requirements.txt
                            
                                How can I parse HTML with html5lib, and query the parsed HTML with XPath?
                            
                                Python list comprehension overriding value
                            
                                Decorator that prints function call details (parameters names and effective values)?
                            
                                How to run sudo with Paramiko? (Python)
                            
                                Get IP address of url in python? [duplicate]
                            
                                Run Python Script on Selected File
                            
                                Filtering out certain bytes in python
                            
                                Scrapy:How to print request referrer
                            
                                Python generator objects: __sizeof__()
                            
                                Copy list and append an element in one line
                            
                                Python pandas to_excel 'utf8' codec can't decode byte
                            
                                Is there a way to construct an object using PyYAML construct_mapping after all nodes complete loading?
                            
                                Sorting a List by frequency of occurrence in a list

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python: Creating a 2D histogram from a numpy matrix

Tags:

python

matplotlib

matrix

numpy

histogram