Suppose I have a data set of discrete vectors with <code>n=2</code>: <pre class="prettyprint"><code>DATA = [ ('a', 4), ('b', 5), ('c', 5), ('d', 4), ('e', 2), ('f', 5), ] </code></pre> How can I plot that data set with matplotlib so as to visualize any correlation between the two variables? Any simple code examples would be great.

I'm a bit confused... There are several ways to do something along those lines. The first two that come to mind are a simple stem plot or a scatter plot. Are you just wanting to plot things using a stem plot like this? <pre class="prettyprint"><code>import matplotlib.pyplot as plt data = [ ('a', 4), ('b', 5), ('c', 5), ('d', 4), ('e', 2), ('f', 5), ] labels, y = zip(*data) x = range(len(y)) plt.stem(x, y) plt.xticks(x, labels) plt.axis([-1, 6, 0, 6]) plt.show() </code></pre> <img src="https://i.stack.imgur.com/CbVMo.png" alt="enter image description here"> Or a scatter plot like this: <pre class="prettyprint"><code>import matplotlib.pyplot as plt data = [ ('a', 4), ('b', 5), ('c', 5), ('d', 4), ('e', 2), ('f', 5), ] labels, y = zip(*data) x = range(len(y)) plt.plot(x, y, 'o') plt.xticks(x, labels) plt.axis([-1, 6, 0, 6]) plt.show() </code></pre> <img src="https://i.stack.imgur.com/39Odp.png" alt="enter image description here"> Or something else entirely?

Drawing a correlation graph in matplotlib

Tags:

python

graph

matplotlib

data-visualization

Suppose I have a data set of discrete vectors with n=2:

Click to copy

DATA = [
    ('a', 4),
    ('b', 5),
    ('c', 5),
    ('d', 4),
    ('e', 2),
    ('f', 5),
]

How can I plot that data set with matplotlib so as to visualize any correlation between the two variables?

Any simple code examples would be great.

440

asked Nov 16 '11 15:11

Yuval Adam

2 Answers

Joe Kington has the correct answer, but your DATA probably is more complicated that is represented. It might have multiple values at 'a'. The way Joe builds the x axis values is quick but would only work for a list of unique values. There may be a faster way to do this, but this how I accomplished it:

Click to copy

import matplotlib.pyplot as plt

def assignIDs(list):
    '''Take a list of strings, and for each unique value assign a number.
    Returns a map for "unique-val"->id.
    '''
    sortedList = sorted(list)

    #taken from
    #http://stackoverflow.com/questions/480214/how-do-you-remove-duplicates-from-a-list-in-python-whilst-preserving-order/480227#480227
    seen = set()
    seen_add = seen.add
    uniqueList =  [ x for x in sortedList if x not in seen and not seen_add(x)]

    return  dict(zip(uniqueList,range(len(uniqueList))))

def plotData(inData,color):
    x,y = zip(*inData)

    xMap = assignIDs(x)
    xAsInts = [xMap[i] for i in x]


    plt.scatter(xAsInts,y,color=color)
    plt.xticks(xMap.values(),xMap.keys())


DATA = [
    ('a', 4),
    ('b', 5),
    ('c', 5),
    ('d', 4),
    ('e', 2),
    ('f', 5),
]


DATA2 = [
    ('a', 3),
    ('b', 4),
    ('c', 4),
    ('d', 3),
    ('e', 1),
    ('f', 4),
    ('a', 5),
    ('b', 7),
    ('c', 7),
    ('d', 6),
    ('e', 4),
    ('f', 7),
]

plotData(DATA,'blue')
plotData(DATA2,'red')

plt.gcf().savefig("correlation.png")

My DATA2 set has two values for every x axis value. It's plotted in red below: enter image description here

EDIT

The question you asked is very broad. I searched 'correlation', and Wikipedia had a good discussion on Pearson's product-moment coefficient, which characterizes the slope of a linear fit. Keep in mind that this value is only a guide, and in no way predicts whether or not a linear fit is a reasonable assumption, see the notes in the above page on correlation and linearity. Here is an updated plotData method, which uses numpy.linalg.lstsq to do linear regression and numpy.corrcoef to calculate Pearson's R:

Click to copy

import matplotlib.pyplot as plt
import numpy as np

def plotData(inData,color):
    x,y = zip(*inData)

    xMap = assignIDs(x)
    xAsInts = np.array([xMap[i] for i in x])

    pearR = np.corrcoef(xAsInts,y)[1,0]
    # least squares from:
    # http://docs.scipy.org/doc/numpy/reference/generated/numpy.linalg.lstsq.html
    A = np.vstack([xAsInts,np.ones(len(xAsInts))]).T
    m,c = np.linalg.lstsq(A,np.array(y))[0]

    plt.scatter(xAsInts,y,label='Data '+color,color=color)
    plt.plot(xAsInts,xAsInts*m+c,color=color,
             label="Fit %6s, r = %6.2e"%(color,pearR))
    plt.xticks(xMap.values(),xMap.keys())
    plt.legend(loc=3)

The new figure is: enter image description here

Also flattening each direction and looking at the individual distributions might be useful, and their are examples of doing this in matplotlib: enter image description here

If a linear approximation is useful, which you can determine qualitatively by just looking at the fit, you might want to subtract out this trend before flatting the y direction. This would help show that you have a Gaussian random distribution about a linear trend.

118

answered Oct 05 '22 09:10

Yann

I'm a bit confused... There are several ways to do something along those lines. The first two that come to mind are a simple stem plot or a scatter plot.

Are you just wanting to plot things using a stem plot like this?

Click to copy

import matplotlib.pyplot as plt
data = [
    ('a', 4),
    ('b', 5),
    ('c', 5),
    ('d', 4),
    ('e', 2),
    ('f', 5),
]
labels, y = zip(*data)

x = range(len(y))
plt.stem(x, y)
plt.xticks(x, labels)
plt.axis([-1, 6, 0, 6])
plt.show()

enter image description here

Or a scatter plot like this:

Click to copy

import matplotlib.pyplot as plt
data = [
    ('a', 4),
    ('b', 5),
    ('c', 5),
    ('d', 4),
    ('e', 2),
    ('f', 5),
]
labels, y = zip(*data)

x = range(len(y))
plt.plot(x, y, 'o')
plt.xticks(x, labels)
plt.axis([-1, 6, 0, 6])
plt.show()

enter image description here

Or something else entirely?

answered Oct 05 '22 08:10

Joe Kington

Related questions
                            
                                PyTorch torch.max over multiple dimensions
                            
                                Could not build wheels for _ which use PEP 517 and cannot be installed directly - Easy Solution
                            
                                Experiences of creating Social Network site in Django
                            
                                What permissions are required for subprocess.Popen?
                            
                                Listing installed python site-packages? [duplicate]
                            
                                Python time objects with more than 24 hours
                            
                                Python reclaiming memory after deleting items in a dictionary
                            
                                Python: list comprehension, do f(x) if x exists?
                            
                                Numpy *.npz internal file structure
                            
                                How to run 'python setup.py install' from within Python?
                            
                                django query based on dynamic property()
                            
                                Migrating to pip+virtualenv from setuptools
                            
                                Python Run a daemon sub-process & read stdout
                            
                                Python: Return 2 ints for index in 2D lists given item
                            
                                Improve speed of reading and converting from binary file?
                            
                                What is a good strategy to group similar words?
                            
                                Passing bash variables to a script?
                            
                                Is there a reliable way to determine the system CPU architecture using Python? [duplicate]
                            
                                What is the deal about https when using lxml?
                            
                                Django: How do I use a string as the keyword in a Q() statement?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Drawing a correlation graph in matplotlib

Tags:

python

graph

matplotlib

data-visualization

Yuval Adam

People also ask

2 Answers

Yann

Joe Kington

Recent Activity

Donate For Us