Followup to a previous question regarding data analysis with pandas. I now want to plot my data, which looks like this:
PrEST ID Gene Sequence Ratio1 Ratio2 Ratio3
HPRR12 ATF1 TTPSAXXXXXXXXXTTTK 6.3222 4.0558 4.958
HPRR23 CREB1 KIXXXXXXXXPGVPR NaN NaN NaN
HPRR23 CREB1 ILNXXXXXXXXGVPR 0.22691 2.077 NaN
HPRR15 ELK4 IEGDCEXXXXXXXGGK 1.177 NaN 12.073
HPRR15 ELK4 SPXXXXXXXXXXXSVIK 8.66 14.755 NaN
HPRR15 ELK4 IEGDCXXXXXXXVSSSSK 15.745 7.9122 9.5966
... except there are a bunch more rows, and I don't actually want to plot the ratios but some other calculated values derived from them, but it doesn't matter for my plotting problem. I have a dataframe that looks more or less like that data above, and what I want is this:
Below is an image that my previous, non-pandas version of this script produces:
... where the red triangles indicate values outside of a cutoff value used for setting the y-axis maximum value. The IDs are blacked-out, but you should be able to see what I'm after. Copy number is essentially the ratios with a calculation on top of them, so they're just another number rather than the ones I show in the data above.
I have tried to find similar questions and solutions in the documentation, but found none. Most people seem to need to do this with dates, for which there seem to be ready-made plotting functions, which doesn't help me (I think). Any help greatly appreciated!
Pandas has a tight integration with Matplotlib. You can plot data directly from your DataFrame using the plot() method. To plot multiple data columns in single frame we simply have to pass the list of columns to the y argument of the plot function.
To plot a specific column, use the selection method of the subset data tutorial in combination with the plot() method. Hence, the plot() method works on both Series and DataFrame .
Skipping some of the finer points of plotting, to get:
I suggest you try using matplotlib to handle the plotting, and manually cycle the colors. You can use something like:
import matplotlib.pyplot as plt
import pandas as pd
import itertools
#data
df = pd.DataFrame(
{'id': [1, 2, 3, 3],
'labels': ['HPRR1234', 'HPRR4321', 'HPRR2345', 'HPRR2345'],
'g': ['KRAS', 'KRAS', 'ELK4', 'ELK4'],
'r1': [15, 9, 15, 1],
'r2': [14, 8, 7, 0],
'r3': [14, 16, 9, 12]})
#extra setup
plt.rcParams['xtick.major.pad'] = 8
#plotting style(s)
marker = itertools.cycle((',', '+', '.', 'o', '*'))
color = itertools.cycle(('b', 'g', 'r', 'c', 'm', 'y', 'k'))
#plot
fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(df['id'], df['r1'], ls='', ms=10, mew=2,
marker=marker.next(), color=color.next())
ax.plot(df['id'], df['r2'], ls='', ms=10, mew=2,
marker=marker.next(), color=color.next())
ax.plot(df['id'], df['r3'], ls='', ms=10, mew=2,
marker=marker.next(), color=color.next())
# set the tick labels
ax.xaxis.set_ticks(df['id'])
ax.xaxis.set_ticklabels(df['labels'])
plt.setp(ax.get_xticklabels(), rotation='vertical', fontsize=12)
plt.tight_layout()
fig.savefig("example.pdf")
If you have many rows, you will probably want more colors, but this shows at least the concept.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With