Scatter plot with a huge amount of data

Tags:

I would like to use Matplotlib to generate a scatter plot with a huge amount of data (about 3 million points). Actually I've 3 vectors with the same dimension and I use to plot in the following way.

import matplotlib.pyplot as plt
import numpy as np
from numpy import *
from matplotlib import rc
import pylab
from pylab import * 
fig = plt.figure()
fig.subplots_adjust(bottom=0.2)
ax = fig.add_subplot(111)
plt.scatter(delta,vf,c=dS,alpha=0.7,cmap=cm.Paired)

Nothing special actually. But it takes too long to generate it actually (I'm working on my MacBook Pro 4 GB RAM with Python 2.7 and Matplotlib 1.0). Is there any way to improve the speed?

350

asked Nov 02 '10 21:11

Nicola Vianello

2 Answers

Unless your graphic is huge, many of those 3 million points are going to overlap. (A 400x600 image only has 240K dots...)

So the easiest thing to do would be to take a sample of say, 1000 points, from your data:

import random
delta_sample=random.sample(delta,1000)

and just plot that.

For example:

import matplotlib.pyplot as plt
import matplotlib.cm as cm
import numpy as np
import random

fig = plt.figure()
fig.subplots_adjust(bottom=0.2)
ax = fig.add_subplot(111)

N=3*10**6
delta=np.random.normal(size=N)
vf=np.random.normal(size=N)
dS=np.random.normal(size=N)

idx=random.sample(range(N),1000)

plt.scatter(delta[idx],vf[idx],c=dS[idx],alpha=0.7,cmap=cm.Paired)
plt.show()

alt text

Or, if you need to pay more attention to outliers, then perhaps you could bin your data using np.histogram, and then compose a delta_sample which has representatives from each bin.

Unfortunately, when using np.histogram I don't think there is any easy way to associate bins with individual data points. A simple, but approximate solution is to use the location of a point in or on the bin edge itself as a proxy for the points in it:

xedges=np.linspace(-10,10,100)
yedges=np.linspace(-10,10,100)
zedges=np.linspace(-10,10,10)
hist,edges=np.histogramdd((delta,vf,dS), (xedges,yedges,zedges))
xidx,yidx,zidx=np.where(hist>0)
plt.scatter(xedges[xidx],yedges[yidx],c=zedges[zidx],alpha=0.7,cmap=cm.Paired)
plt.show()

alt text

192

answered Oct 12 '22 12:10

unutbu

What about trying pyplot.hexbin? It generates a sort of heatmap based on point density in a set number of bins.

answered Oct 12 '22 11:10

conjectures

Related questions
                            
                                Sorting dictionary using operator.itemgetter
                            
                                Compare two CSV files and search for similar items
                            
                                extracting element and insert a space
                            
                                Tumblr API 2: Where is the "OAUTH_TOKEN" and "OAUTH_TOKEN_SECRET"
                            
                                CSRF verification failed. Request aborted
                            
                                Pixel neighbors in 2d array (image) using Python
                            
                                Expand Text widget to fill the entire parent Frame in Tkinter
                            
                                How to identify the subject of a sentence?
                            
                                Plot NetworkX Graph from Adjacency Matrix in CSV file
                            
                                Eclipse, PyDev "Project interpreter not specified”
                            
                                Retrieving Data from MySQL in batches via Python
                            
                                All possible ways to interleave two strings
                            
                                What to download in order to make nltk.tokenize.word_tokenize work?
                            
                                Track download progress of S3 file using boto3 and callbacks
                            
                                TypeError: generatecode() takes 0 positional arguments but 1 was given
                            
                                TypeError: list object is not an iterator [duplicate]
                            
                                Save LGBMRegressor model from python lightgbm package to disc
                            
                                How to run UVICORN in Heroku?
                            
                                CommandNotFoundError: Your shell has not been properly configured to use 'conda activate'
                            
                                How to produce a 303 Http Response in Django?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Scatter plot with a huge amount of data

Tags:

python

matplotlib

numpy

Nicola Vianello

People also ask

2 Answers

unutbu

conjectures

Recent Activity

Donate For Us