Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Plot multiple columns of pandas DataFrame using Seaborn

suppose I have DataFrame with columns ['X_Axis','col_2','col_3',...,'col_n',]

I need to plot the first column on X-Axis and rest on Y-Axis. FYI : all the values have been grouped according to X-Axis, the X-Axis values range from 0-25 and all other column values have been normalized to the scale of 0 - 1. I want it on same graph plot, not subplots.

Preferred : FactorPlot , normal line graph.

like image 425
Rakmo Avatar asked Jul 06 '17 06:07

Rakmo


People also ask

How do you plot multiple columns in Seaborn?

As we have used the groupby function to show the barplot multiple columns. Just specify the three parameters x, y, and hue to generate the bar plot in multiple columns. So, let's begin with adding the python modules for plotting the multiple bars of the plot.

How do I plot multiple columns in pandas?

Pandas has a tight integration with Matplotlib. You can plot data directly from your DataFrame using the plot() method. To plot multiple data columns in single frame we simply have to pass the list of columns to the y argument of the plot function.

How do you plot multiple plots in Seaborn?

In Seaborn, we will plot multiple graphs in a single window in two ways. First with the help of Facetgrid() function and other by implicit with the help of matplotlib. data: Tidy dataframe where each column is a variable and each row is an observation.

How do I plot specific columns in pandas?

To plot a specific column, use the selection method of the subset data tutorial in combination with the plot() method. Hence, the plot() method works on both Series and DataFrame .


2 Answers

  • Some seaborn plots will accept a wide dataframe, sns.pointplot(data=df, x='X_Axis', y='col_2'), but not sns.pointplot(data=df, x='X_Axis', y=['col_2', 'col_3']), so it's better to reshape the DataFrame.
  • Reshape the DataFrame from wide to long with pandas.DataFrame.melt.
    • Converting the dataframe from a wide to long form is standard for all seaborn plots, not just the examples shown.
  • Tested in python 3.8.12, pandas 1.3.4, matplotlib 3.4.3, seaborn 0.11.2

Sample DataFrame

import pandas as pd
import seaborn as sns

df = pd.DataFrame({'X_Axis':[1,3,5,7,10,20],
                   'col_2':[.4,.5,.4,.5,.5,.4],
                   'col_3':[.7,.8,.9,.4,.2,.3],
                   'col_4':[.1,.3,.5,.7,.1,.0],
                   'col_5':[.5,.3,.6,.9,.2,.4]})

# display(df)
   X_Axis  col_2  col_3  col_4  col_5
0       1    0.4    0.7    0.1    0.5
1       3    0.5    0.8    0.3    0.3
2       5    0.4    0.9    0.5    0.6
3       7    0.5    0.4    0.7    0.9
4      10    0.5    0.2    0.1    0.2
5      20    0.4    0.3    0.0    0.4

# convert to long (tidy) form
dfm = df.melt('X_Axis', var_name='cols', value_name='vals')

# display(dfm.head())
   X_Axis   cols  vals
0       1  col_2   0.4
1       3  col_2   0.5
2       5  col_2   0.4
3       7  col_2   0.5
4      10  col_2   0.5

Current Plot Methods

catplot: figure-level

Use seaborn.catplot with kind= (e.g. kind='point' to reproduce the FactorPlot default):

g = sns.catplot(x="X_Axis", y="vals", hue='cols', data=dfm, kind='point')

enter image description here

pointplot: axes-level

sns.pointplot(x="X_Axis", y="vals", hue='cols', data=dfm)

enter image description here

Original

factorplot: was renamed to catplot v0.9.0 (July 2018)

New versions of seaborn get warning:

The factorplot function has been renamed to catplot. The original name will be removed in a future release. Please update your code. Note that the default kind in factorplot ('point') has changed 'strip' in catplot.

g = sns.factorplot(x="X_Axis", y="vals", hue='cols', data=dfm)

# using pd.melt instead of pd.DataFrame.melt for pandas < 0.20.0
# dfm = pd.melt(df, 'X_Axis', var_name='cols',  value_name='vals')
# g = sns.factorplot(x="X_Axis", y="vals", hue='cols', data=dfm)

graph

like image 118
jezrael Avatar answered Oct 22 '22 03:10

jezrael


in addition to mighty @jezrael for those who come from google if you intend to plot lines with the index of the original dataframe just do as follows:

df = pd.DataFrame({'col_2':[.4,.5,.4,.5,.5,.4],
                   'col_3':[.7,.8,.9,.4,.2,.3],
                   'col_4':[.1,.3,.5,.7,.1,.0],
                   'col_5':[.5,.3,.6,.9,.2,.4]})

# resetting index before melting to save the current index in 'index' column...
df = df.reset_index().melt('index', var_name='cols',  value_name='vals')
g = sns.catplot(x="index", y="vals", hue='cols', data=df, kind='point')
like image 9
adir abargil Avatar answered Oct 22 '22 03:10

adir abargil