Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Plotly: How to make a line plot from a pandas dataframe with a long or wide format?

(This is a self-answered post to help others shorten their answers to plotly questions by not having to explain how plotly best handles data of long and wide format)


I'd like to build a plotly figure based on a pandas dataframe in as few lines as possible. I know you can do that using plotly.express, but this fails for what I would call a standard pandas dataframe; an index describing row order, and column names describing the names of a value in a dataframe:

Sample dataframe:

    a           b           c
0   100.000000  100.000000  100.000000
1   98.493705   99.421400   101.651437
2   96.067026   98.992487   102.917373
3   95.200286   98.313601   102.822664
4   96.691675   97.674699   102.378682

An attempt:

fig=px.line(x=df.index, y = df.columns)

This raises an error:

ValueError: All arguments should have the same length. The length of argument y is 3, whereas the length of previous arguments ['x'] is 100`

like image 576
vestland Avatar asked May 25 '20 23:05

vestland


People also ask

What is the method of DataFrame to plot a line chart?

line() function is used to plot series or DataFrame as lines. This function is useful to plot lines using DataFrame's values as coordinates. Columns to use for the horizontal axis.

Is plotly better than Matplotlib?

Plotly has several advantages over matplotlib. One of the main advantages is that only a few lines of codes are necessary to create aesthetically pleasing, interactive plots. The interactivity also offers a number of advantages over static matplotlib plots: Saves time when initially exploring your dataset.

How to plot a line chart using PANDAS?

Let’s now see the steps to plot a line chart using Pandas. To start, prepare your data for the line chart. Here is an example of a dataset that captures the unemployment rate over time: Now create the DataFrame based on the above data: This is how the DataFrame would look like: Finally, plot the DataFrame by adding the following syntax:

How to use Plotly with pandas in Python?

Cufflinks is a Python library that helps us use Plotly with Pandas in a native Pandas-like syntax. It also adds more impressive chart types than we normally see in Pandas dataframes. We can import it and configure the global theme and other options once and use the iplot API that is added to every dataframe instead of its default plot method.

How to plot data from a Dataframe?

The dataframe contains the information on units sold of products A and B by a retailer from 2015 to 2020. 1. Line Plot of column values If you use df.plot.line () without any arguments, it plots all the numerical columns as separate lines. You can see that all the numerical columns in the dataframe are plotted as a line each in the resulting plot.

How to make beautiful charts with pandas DataFrames?

When working with Pandas dataframes, we primarily use its default plot method to create graphics. But these graphs aren’t styled enough to present it nicely. A quick trick is to change the plotting backend to Plotly and have beautiful charts.


3 Answers

Here you've tried to use a pandas dataframe of a wide format as a source for px.line. And plotly.express is designed to be used with dataframes of a long format, often referred to as tidy data (and please take a look at that. No one explains it better that Wickham). Many, particularly those injured by years of battling with Excel, often find it easier to organize data in a wide format. So what's the difference?

Wide format:

  • data is presented with each different data variable in a separate column
  • each column has only one data type
  • missing values are often represented by np.nan
  • works best with plotly.graphobjects (go)
  • lines are often added to a figure using fid.add_traces()
  • colors are normally assigned to each trace

Example:

            a          b           c
0   -1.085631    0.997345   0.282978
1   -2.591925    0.418745   1.934415
2   -5.018605   -0.010167   3.200351
3   -5.885345   -0.689054   3.105642
4   -4.393955   -1.327956   2.661660
5   -4.828307    0.877975   4.848446
6   -3.824253    1.264161   5.585815
7   -2.333521    0.328327   6.761644
8   -3.587401   -0.309424   7.668749
9   -5.016082   -0.449493   6.806994

Long format:

  • data is presented with one column containing all the values and another column listing the context of the value
  • missing values are simply not included in the dataset.
  • works best with plotly.express (px)
  • colors are set by a default color cycle and are assigned to each unique variable

Example:

    id  variable    value
0   0   a        -1.085631
1   1   a        -2.591925
2   2   a        -5.018605
3   3   a        -5.885345
4   4   a        -4.393955
... ... ... ...
295 95  c        -4.259035
296 96  c        -5.333802
297 97  c        -6.211415
298 98  c        -4.335615
299 99  c        -3.515854

How to go from wide to long?

df = pd.melt(df, id_vars='id', value_vars=df.columns[:-1])

The two snippets below will produce the very same plot:

enter image description here

How to use px to plot long data?

fig = px.line(df, x='id', y='value', color='variable')

How to use go to plot wide data?

colors = px.colors.qualitative.Plotly
fig = go.Figure()
fig.add_traces(go.Scatter(x=df['id'], y = df['a'], mode = 'lines', line=dict(color=colors[0])))
fig.add_traces(go.Scatter(x=df['id'], y = df['b'], mode = 'lines', line=dict(color=colors[1])))
fig.add_traces(go.Scatter(x=df['id'], y = df['c'], mode = 'lines', line=dict(color=colors[2])))
fig.show()

By the looks of it, go is more complicated and offers perhaps more flexibility? Well, yes. And no. You can easily build a figure using px and add any go object you'd like!

Complete go snippet:

import numpy as np
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go

# dataframe of a wide format
np.random.seed(123)
X = np.random.randn(100,3)  
df=pd.DataFrame(X, columns=['a','b','c'])
df=df.cumsum()
df['id']=df.index

# plotly.graph_objects
colors = px.colors.qualitative.Plotly
fig = go.Figure()
fig.add_traces(go.Scatter(x=df['id'], y = df['a'], mode = 'lines', line=dict(color=colors[0])))
fig.add_traces(go.Scatter(x=df['id'], y = df['b'], mode = 'lines', line=dict(color=colors[1])))
fig.add_traces(go.Scatter(x=df['id'], y = df['c'], mode = 'lines', line=dict(color=colors[2])))
fig.show()

Complete px snippet:

import numpy as np
import pandas as pd
import plotly.express as px
from plotly.offline import iplot

# dataframe of a wide format
np.random.seed(123)
X = np.random.randn(100,3)  
df=pd.DataFrame(X, columns=['a','b','c'])
df=df.cumsum()
df['id']=df.index

# dataframe of a long format
df = pd.melt(df, id_vars='id', value_vars=df.columns[:-1])

# plotly express
fig = px.line(df, x='id', y='value', color='variable')
fig.show()
like image 114
vestland Avatar answered Oct 16 '22 14:10

vestland


I'm going to add this as answer so it will be on evidence. First of all thank you @vestland for this. It's a question that come over and over so it's good to have this addressed and it could be easier to flag duplicated question.

Plotly Express now accepts wide-form and mixed-form data as you can check in this post.

like image 36
rpanai Avatar answered Oct 16 '22 14:10

rpanai


You can change the pandas plotting backend to use plotly:

import pandas as pd
pd.options.plotting.backend = "plotly"

Then, to get a fig all you need to write is:

fig = df.plot()

result of fig.show()

fig.show() displays the above image.

like image 21
Tudor B Avatar answered Oct 16 '22 16:10

Tudor B