Some background first:
I want to plot of Mel-Frequency Cepstral Coefficients of various songs and compare them. I calculate MFCC's throughout a song and then average them to get one array of 13 coefficients. I want this to represent one point on a graph that I plot.
I'm new to Python and very new to any form of plotting (though I've seen some recommendations to use matplotlib).
I want to be able to visualize this data. Any thoughts on how I might go about doing this?
Considering three attributes or dimensions in the data, we can visualize them by considering a pair-wise scatter plot and introducing the notion of color or hue to separate out values in a categorical dimension. The above plot enables you to check out correlations and patterns and also compare around wine groups.
Firstly, if you want to represent an array of 13 coefficients as a single point in your graph, then you need to break the 13 coefficients down to the number of dimensions in your graph as yan king yin pointed out in his comment. For projecting your data into 2 dimensions you can either create relevant indicators yourself such as max/min/standard deviation/.... or you apply methods of dimensionality reduction such as PCA. Whether or not to do so and how to do so is another topic.
Then, plotting is easy and is done as here: http://matplotlib.org/api/pyplot_api.html
I provide an example code for this solution:
import matplotlib.pyplot as plt
import numpy as np
#fake example data
song1 = np.asarray([1, 2, 3, 4, 5, 6, 2, 35, 4, 1])
song2 = song1*2
song3 = song1*1.5
#list of arrays containing all data
data = [song1, song2, song3]
#calculate 2d indicators
def indic(data):
#alternatively you can calulate any other indicators
max = np.max(data, axis=1)
min = np.min(data, axis=1)
return max, min
x,y = indic(data)
plt.scatter(x, y, marker='x')
plt.show()
The results looks like this:
Yet i want to suggest another solution to your underlying problem, namely: plotting multidimensional data. I recommend using something parralel coordinate plot which can be constructed with the same fake data:
import pandas as pd
pd.DataFrame(data).T.plot()
plt.show()
Then the result shows all coefficents for each song along the x axis and their value along the y axis. I would looks as follows:
UPDATE:
In the meantime I have discovered the Python Image Gallery which contains two nice example of high dimensional visualization with reference code:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With