How to create a min-max lineplot by month

Tags:

I have retail beef ad counts time series data, and I intend to make stacked line chart aim to show On a three-week average basis, quantity of average ads that grocers posted per store last week. To do so, I managed to aggregate data for plotting and tried to make line chart that I want. The main motivation is based on context of the problem and desired plot. In my attempt, I couldn't get very nice line chart because it is not informative to understand. I am wondering how can I achieve this goal in matplotlib. Can anyone suggest me what should I do from my current attempt? Any thoughts?

reproducible data and current attempt

Here is minimal reproducible data that I used in my current attempt:

import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import seaborn as sns
from datetime import timedelta, datetime

url = 'https://gist.githubusercontent.com/adamFlyn/96e68902d8f71ad62a4d3cda135507ad/raw/4761264cbd55c81cf003a4219fea6a24740d7ce9/df.csv'

df = pd.read_csv(url, parse_dates=['date'])
df.drop(columns=['Unnamed: 0'], inplace=True)

df_grp = df.groupby(['date', 'retail_item']).agg({'number_of_ads': 'sum'})
df_grp["percentage"] = df_grp.groupby(level=0).apply(lambda x:100 * x / float(x.sum()))
df_grp = df_grp.reset_index(level=[0,1])

for item in df_grp['retail_item'].unique():
    dd = df_grp[df_grp['retail_item'] == item].groupby(['date', 'percentage'])[['number_of_ads']].sum().reset_index(level=[0,1])
    dd['weakly_change'] = dd[['percentage']].rolling(7).mean()
    fig, ax = plt.subplots(figsize=(8, 6), dpi=144)
    sns.lineplot(dd.index, 'weakly_change', data=dd, ax=ax)
    ax.set_xlim(dd.index.min(), dd.index.max())
    ax.xaxis.set_major_formatter(mdates.DateFormatter('%b %Y'))
plt.gcf().autofmt_xdate()
plt.style.use('ggplot')
plt.xticks(rotation=90)
plt.show()

Current Result

enter image description here

but I couldn't get correct line chart that I expected, I want to reproduce the plot from this site. Is that doable to achieve this? Any idea?

enter image description here

desired plot

here is the example desired plot that I want to make from this minimal reproducible data:

enter image description here

I don't know how should make changes for my current attempt to get my desired plot above. Can anyone know any possible way of doing this in matplotlib? what else should I do? Any possible help would be appreciated. Thanks

207

asked Sep 25 '20 15:09

Adam

1 Answers

Also see How to create a min-max plot by month with fill_between?
See in-line comments for details

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import calendar

#################################################################
# setup from question
url = 'https://gist.githubusercontent.com/adamFlyn/96e68902d8f71ad62a4d3cda135507ad/raw/4761264cbd55c81cf003a4219fea6a24740d7ce9/df.csv'
df = pd.read_csv(url, parse_dates=['date'])
df.drop(columns=['Unnamed: 0'], inplace=True)
df_grp = df.groupby(['date', 'retail_item']).agg({'number_of_ads': 'sum'})
df_grp["percentage"] = df_grp.groupby(level=0).apply(lambda x:100 * x / float(x.sum()))
df_grp = df_grp.reset_index(level=[0,1])
#################################################################

# create a month map from long to abbreviated calendar names
month_map = dict(zip(calendar.month_name[1:], calendar.month_abbr[1:]))

# update the month column name
df_grp['month'] = df_grp.date.dt.month_name().map(month_map)

# set month as categorical so they are plotted in the correct order
df_grp.month = pd.Categorical(df_grp.month, categories=month_map.values(), ordered=True)

# use groupby to aggregate min mean and max
dfmm = df_grp.groupby(['retail_item', 'month'])['percentage'].agg([max, min, 'mean']).stack().reset_index(level=[2]).rename(columns={'level_2': 'mm', 0: 'vals'}).reset_index()

# create a palette map for line colors
cmap = {'min': 'k', 'max': 'k', 'mean': 'b'}

# iterate through each retail item and plot the corresponding data
for g, d in dfmm.groupby('retail_item'):
    plt.figure(figsize=(7, 4))
    sns.lineplot(x='month', y='vals', hue='mm', data=d, palette=cmap)

    # select only min or max data for fill_between
    y1 = d[d.mm == 'max']
    y2 = d[d.mm == 'min']
    plt.fill_between(x=y1.month, y1=y1.vals, y2=y2.vals, color='gainsboro')
    
    # add lines for specific years
    for year in [2016, 2018, 2020]:
        data = df_grp[(df_grp.date.dt.year == year) & (df_grp.retail_item == g)]
        sns.lineplot(x='month', y='percentage', ci=None, data=data, label=year)
    
    plt.ylim(0, 100)
    plt.margins(0, 0)
    plt.legend(bbox_to_anchor=(1., 1), loc='upper left')
    
    plt.ylabel('Percentage of Ads')
    plt.title(g)
    plt.show()

enter image description here

125

answered Oct 07 '22 20:10

Trenton McKinney

Related questions
                            
                                Graphing points on a map but the error code is "ValueError: 'box_aspect' and 'fig_aspect' must be positive"
                            
                                How can I extract text fragments from PDF with their coordinates in Python?
                            
                                "WHY" 2 different executables of python of same version?
                            
                                Verify hostname of the server who invoked the API
                            
                                How determine if a token is part of an entity within Spacy?
                            
                                Conditional filtering of ndarrays
                            
                                Python Callback for File Object Close
                            
                                AttributeError: 'Worksheet' object has no attribute 'set_column'
                            
                                selenium.common.exceptions.SessionNotCreatedException: Message: session not created: This version of ChromeDriver only supports Chrome version 85
                            
                                Parse expression with binary and unary operators, reserved words, and without parentheses
                            
                                "requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',))"
                            
                                How to clear the conda environment variables?
                            
                                Pandas: Sampling from a DataFrame according to a target distribution
                            
                                Fastest way to run a single function in python in parallel for multiple parameters
                            
                                Return majority weighted vote from array based in columns
                            
                                Add file filters to JavaFx Filechooser in Jython and parametrize them
                            
                                Find the top 5 values based on the sum in the last column and last row
                            
                                Python ctypes and mutability
                            
                                How to get only specific classes from PyTorch's FashionMNIST dataset?
                            
                                Python functools partial efficiency

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to create a min-max lineplot by month

Tags:

python

pandas

matplotlib

seaborn

Current Result

Adam

People also ask

1 Answers

Trenton McKinney

Recent Activity

Donate For Us