Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

seaborn heatmap using pandas dataframe

I am struggling to massage a dataframe in pandas into the correct format for seaborn's heatmap (or matplotlib really) to make a heatmap.

My current dataframe (called data_yule) is:

     Unnamed: 0  SymmetricDivision         test  MutProb      value
3             3                1.0  sackin_yule    0.100  -4.180864
8             8                1.0  sackin_yule    0.050  -9.175349
13           13                1.0  sackin_yule    0.010 -11.408114
18           18                1.0  sackin_yule    0.005 -10.502450
23           23                1.0  sackin_yule    0.001  -8.027475
28           28                0.8  sackin_yule    0.100  -0.722602
33           33                0.8  sackin_yule    0.050  -6.996394
38           38                0.8  sackin_yule    0.010 -10.536340
43           43                0.8  sackin_yule    0.005  -9.544065
48           48                0.8  sackin_yule    0.001  -7.196407
53           53                0.6  sackin_yule    0.100  -0.392256
58           58                0.6  sackin_yule    0.050  -6.621639
63           63                0.6  sackin_yule    0.010  -9.551801
68           68                0.6  sackin_yule    0.005  -9.292469
73           73                0.6  sackin_yule    0.001  -6.760559
78           78                0.4  sackin_yule    0.100  -0.652147
83           83                0.4  sackin_yule    0.050  -6.885229
88           88                0.4  sackin_yule    0.010  -9.455776
93           93                0.4  sackin_yule    0.005  -8.936463
98           98                0.4  sackin_yule    0.001  -6.473629
103         103                0.2  sackin_yule    0.100  -0.964818
108         108                0.2  sackin_yule    0.050  -6.051482
113         113                0.2  sackin_yule    0.010  -9.784686
118         118                0.2  sackin_yule    0.005  -8.571063
123         123                0.2  sackin_yule    0.001  -6.146121

and my attempts using matplotlib was:

plt.pcolor(data_yule.SymmetricDivision, data_yule.MutProb, data_yule.value)

which threw the error:

ValueError: not enough values to unpack (expected 2, got 1)

and the seaborn attempt was:

sns.heatmap(data_yule.SymmetricDivision, data_yule.MutProb, data_yule.value)

which threw:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

It seems trivial as both functions want rectangular dataset, but I'm missing something, clearly.

like image 585
cancerconnector Avatar asked Jun 13 '16 12:06

cancerconnector


People also ask

Can Seaborn use pandas Dataframe?

Seaborn provides an API on top of Matplotlib that offers sane choices for plot style and color defaults, defines simple high-level functions for common statistical plot types, and integrates with the functionality provided by Pandas DataFrame s.

How do you create a heatmap from a data frame?

To make a heatmap from pandas DataFrame, we need to install the seaborn library. We will use seaborn. heatmap() method to make a heatmap.

How do you plot a heatmap of a Dataframe in Python?

Method 2 : By using matplotlib library In this method, the Panda dataframe will be displayed as a heatmap where the cells of the heatmap will be colour-coded according to the values in the dataframe. A colour bar will be present besides the heatmap which acts as a legend for the figure. Below is the implementation.


2 Answers

Use pandas.DataFrame.pivot (no aggregation of values=) or pandas.DataFrame.pivot_table (with aggregation of values=) to reshape the dataframe from a long to wide form. The index will be on the y-axis, and the columns will be on the x-axis. See Reshaping and pivot tables for an overview.

In [96]: result
Out[96]: 
MutProb               0.001      0.005      0.010     0.050     0.100
SymmetricDivision                                                    
0.2               -6.146121  -8.571063  -9.784686 -6.051482 -0.964818
0.4               -6.473629  -8.936463  -9.455776 -6.885229 -0.652147
0.6               -6.760559  -9.292469  -9.551801 -6.621639 -0.392256
0.8               -7.196407  -9.544065 -10.536340 -6.996394 -0.722602
1.0               -8.027475 -10.502450 -11.408114 -9.175349 -4.180864

Then you can pass the 2D array (or DataFrame) to seaborn.heatmap or plt.pcolor:

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# load the sample data
df = pd.DataFrame({'MutProb': [0.1,
  0.05, 0.01, 0.005, 0.001, 0.1, 0.05, 0.01, 0.005, 0.001, 0.1, 0.05, 0.01, 0.005, 0.001, 0.1, 0.05, 0.01, 0.005, 0.001, 0.1, 0.05, 0.01, 0.005, 0.001], 'SymmetricDivision': [1.0, 1.0, 1.0, 1.0, 1.0, 0.8, 0.8, 0.8, 0.8, 0.8, 0.6, 0.6, 0.6, 0.6, 0.6, 0.4, 0.4, 0.4, 0.4, 0.4, 0.2, 0.2, 0.2, 0.2, 0.2], 'test': ['sackin_yule', 'sackin_yule', 'sackin_yule', 'sackin_yule', 'sackin_yule', 'sackin_yule', 'sackin_yule', 'sackin_yule', 'sackin_yule', 'sackin_yule', 'sackin_yule', 'sackin_yule', 'sackin_yule', 'sackin_yule', 'sackin_yule', 'sackin_yule', 'sackin_yule', 'sackin_yule', 'sackin_yule', 'sackin_yule', 'sackin_yule', 'sackin_yule', 'sackin_yule', 'sackin_yule', 'sackin_yule'], 'value': [-4.1808639999999997, -9.1753490000000006, -11.408113999999999, -10.50245, -8.0274750000000008, -0.72260200000000008, -6.9963940000000004, -10.536339999999999, -9.5440649999999998, -7.1964070000000007, -0.39225599999999999, -6.6216390000000001, -9.5518009999999993, -9.2924690000000005, -6.7605589999999998, -0.65214700000000003, -6.8852289999999989, -9.4557760000000002, -8.9364629999999998, -6.4736289999999999, -0.96481800000000006, -6.051482, -9.7846860000000007, -8.5710630000000005, -6.1461209999999999]})

# pivot the dataframe from long to wide form
result = df.pivot(index='SymmetricDivision', columns='MutProb', values='value')

sns.heatmap(result, annot=True, fmt="g", cmap='viridis')
plt.show()

yields

enter image description here

like image 102
unutbu Avatar answered Oct 16 '22 22:10

unutbu


Another option using .grouby() and .unstack()

df_m = df.groupby(["SymmetricDivision", "MutProb"])
          .size()
          .unstack(level=0)

sns.heatmap(df_m)
like image 33
Antiez Avatar answered Oct 16 '22 23:10

Antiez