Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Seaborn: Avoid plotting missing values (line plot)

I want a line plot to indicate if a piece of data is missing such as: enter image description here

However, the code below fills the missing data, creating a potentially misleading chart: enter image description here

import pandas as pd
import seaborn as sns
from matplotlib import pyplot as plt

# load csv
df=pd.read_csv('data.csv')
# plot a graph
g = sns.lineplot(x="Date", y="Data", data=df)
plt.show()

What should I change in my code to avoid filling missing values?

csv looks as following:

Date,Data
01-12-03,100
01-01-04,
01-02-04,
01-03-04,
01-04-04,
01-05-04,39
01-06-04,
01-07-04,
01-08-04,53
01-09-04,
01-10-04,
01-11-04,
01-12-04,
01-01-05,28
   ...
01-04-18,14
01-05-18,12
01-06-18,8
01-07-18,8

link to .csv: https://drive.google.com/file/d/1s-RJfAFYD90m4SrFDzIba7EQP4C-J0yO/view?usp=sharing

like image 328
Степан Смирнов Avatar asked Aug 30 '18 13:08

Степан Смирнов


2 Answers

import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns

# Make example data
s = """2018-01-01
2018-01-02,100
2018-01-03,105
2018-01-04
2018-01-05,95
2018-01-06,90
2018-01-07,80
2018-01-08
2018-01-09"""
df = pd.DataFrame([row.split(",") for row in s.split("\n")], columns=["Date", "Data"])
df = df.replace("", np.nan)
df["Date"] = pd.to_datetime(df["Date"])
df["Data"] = df["Data"].astype(float)

Three options:

1) Use pandas or matplotlib.

2) If you need seaborn: not what it's for but for regular dates like yours you can use pointplot out of the box.

fig, ax = plt.subplots(figsize=(10, 5))

plot = sns.pointplot(
    ax=ax,
    data=df, x="Date", y="Data"
)

ax.set_xticklabels([])

plt.show()

enter image description here

3) If you need seaborn and you need lineplot: I've looked at the source code and it looks like lineplot drops nans from the DataFrame before plotting. So unfortunately it's not possible to do it properly. You could use some advanced hackery though and use the hue argument to put the separate sections in separate buckets. We number the sections using the occurrences of nans.

fig, ax = plt.subplots(figsize=(10, 5))

plot = sns.lineplot(
    ax=ax,
    data=df, x="Date", y="Data",
    hue=df["Data"].isna().cumsum(), palette=["black"]*sum(df["Data"].isna()), legend=False, markers=True
)
ax.set_xticklabels([])

plt.show()

enter image description here

Unfortunately the markers argument appears to be broken currently so you'll need to fix it if you want to see dates that have nans on either side.

like image 52
Denziloe Avatar answered Sep 16 '22 20:09

Denziloe


Try setting NaN values to np.inf -- Seaborn doesn't draw those points, and doesn't connect the points before with points after.

like image 43
Dzmitry Lazerka Avatar answered Sep 20 '22 20:09

Dzmitry Lazerka