Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: plot multiple columns to same x value

Followup to a previous question regarding data analysis with pandas. I now want to plot my data, which looks like this:

PrEST ID    Gene    Sequence        Ratio1    Ratio2    Ratio3
HPRR12  ATF1    TTPSAXXXXXXXXXTTTK  6.3222    4.0558    4.958   
HPRR23  CREB1   KIXXXXXXXXPGVPR     NaN       NaN       NaN     
HPRR23  CREB1   ILNXXXXXXXXGVPR     0.22691   2.077     NaN
HPRR15  ELK4    IEGDCEXXXXXXXGGK    1.177     NaN       12.073  
HPRR15  ELK4    SPXXXXXXXXXXXSVIK   8.66      14.755    NaN
HPRR15  ELK4    IEGDCXXXXXXXVSSSSK  15.745    7.9122    9.5966  

... except there are a bunch more rows, and I don't actually want to plot the ratios but some other calculated values derived from them, but it doesn't matter for my plotting problem. I have a dataframe that looks more or less like that data above, and what I want is this:

  • Each row (3 ratios) should be plotted against the row's ID, as points
  • All rows with the same ID should be plotted to the same x value / ID, but with another colour
  • The x ticks should be the IDs, and (if possible) the corresponding gene as well (so some genes will appear on several x ticks, as they have multiple IDs mapping to them)

Below is an image that my previous, non-pandas version of this script produces:

enter image description here

... where the red triangles indicate values outside of a cutoff value used for setting the y-axis maximum value. The IDs are blacked-out, but you should be able to see what I'm after. Copy number is essentially the ratios with a calculation on top of them, so they're just another number rather than the ones I show in the data above.

I have tried to find similar questions and solutions in the documentation, but found none. Most people seem to need to do this with dates, for which there seem to be ready-made plotting functions, which doesn't help me (I think). Any help greatly appreciated!

like image 423
erikfas Avatar asked Jan 14 '14 09:01

erikfas


People also ask

How do I plot multiple columns in pandas?

Pandas has a tight integration with Matplotlib. You can plot data directly from your DataFrame using the plot() method. To plot multiple data columns in single frame we simply have to pass the list of columns to the y argument of the plot function.

How do I plot specific columns in pandas?

To plot a specific column, use the selection method of the subset data tutorial in combination with the plot() method. Hence, the plot() method works on both Series and DataFrame .


1 Answers

Skipping some of the finer points of plotting, to get:

  • Each row (3 ratios) should be plotted against the row's ID, as points
  • All rows with the same ID should be plotted to the same x value / ID, but with another colour
  • The x ticks should be the IDs, and (if possible) the corresponding gene as well (so some genes will appear on several x ticks, as they have multiple IDs mapping to them)

I suggest you try using matplotlib to handle the plotting, and manually cycle the colors. You can use something like:

import matplotlib.pyplot as plt
import pandas as pd
import itertools
#data
df = pd.DataFrame(
    {'id': [1, 2, 3, 3],
     'labels': ['HPRR1234', 'HPRR4321', 'HPRR2345', 'HPRR2345'],
     'g': ['KRAS', 'KRAS', 'ELK4', 'ELK4'],
     'r1': [15, 9, 15, 1],
     'r2': [14, 8, 7, 0],
     'r3': [14, 16, 9, 12]})
#extra setup
plt.rcParams['xtick.major.pad'] = 8
#plotting style(s)
marker = itertools.cycle((',', '+', '.', 'o', '*'))
color = itertools.cycle(('b', 'g', 'r', 'c', 'm', 'y', 'k'))
#plot
fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(df['id'], df['r1'], ls='', ms=10, mew=2,
        marker=marker.next(), color=color.next())
ax.plot(df['id'], df['r2'], ls='', ms=10, mew=2,
        marker=marker.next(), color=color.next())
ax.plot(df['id'], df['r3'], ls='', ms=10, mew=2,
        marker=marker.next(), color=color.next())
# set the tick labels
ax.xaxis.set_ticks(df['id'])
ax.xaxis.set_ticklabels(df['labels'])
plt.setp(ax.get_xticklabels(), rotation='vertical', fontsize=12)
plt.tight_layout()
fig.savefig("example.pdf")

If you have many rows, you will probably want more colors, but this shows at least the concept.

like image 66
Noah Hafner Avatar answered Nov 07 '22 19:11

Noah Hafner