I want a line plot to indicate if a piece of data is missing such as:
However, the code below fills the missing data, creating a potentially misleading chart:
import pandas as pd
import seaborn as sns
from matplotlib import pyplot as plt
# load csv
df=pd.read_csv('data.csv')
# plot a graph
g = sns.lineplot(x="Date", y="Data", data=df)
plt.show()
What should I change in my code to avoid filling missing values?
csv looks as following:
Date,Data
01-12-03,100
01-01-04,
01-02-04,
01-03-04,
01-04-04,
01-05-04,39
01-06-04,
01-07-04,
01-08-04,53
01-09-04,
01-10-04,
01-11-04,
01-12-04,
01-01-05,28
...
01-04-18,14
01-05-18,12
01-06-18,8
01-07-18,8
link to .csv: https://drive.google.com/file/d/1s-RJfAFYD90m4SrFDzIba7EQP4C-J0yO/view?usp=sharing
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns
# Make example data
s = """2018-01-01
2018-01-02,100
2018-01-03,105
2018-01-04
2018-01-05,95
2018-01-06,90
2018-01-07,80
2018-01-08
2018-01-09"""
df = pd.DataFrame([row.split(",") for row in s.split("\n")], columns=["Date", "Data"])
df = df.replace("", np.nan)
df["Date"] = pd.to_datetime(df["Date"])
df["Data"] = df["Data"].astype(float)
Three options:
1) Use pandas
or matplotlib
.
2) If you need seaborn
: not what it's for but for regular dates like yours you can use pointplot
out of the box.
fig, ax = plt.subplots(figsize=(10, 5))
plot = sns.pointplot(
ax=ax,
data=df, x="Date", y="Data"
)
ax.set_xticklabels([])
plt.show()
3) If you need seaborn
and you need lineplot
: I've looked at the source code and it looks like lineplot
drops nans from the DataFrame before plotting. So unfortunately it's not possible to do it properly. You could use some advanced hackery though and use the hue
argument to put the separate sections in separate buckets. We number the sections using the occurrences of nans.
fig, ax = plt.subplots(figsize=(10, 5))
plot = sns.lineplot(
ax=ax,
data=df, x="Date", y="Data",
hue=df["Data"].isna().cumsum(), palette=["black"]*sum(df["Data"].isna()), legend=False, markers=True
)
ax.set_xticklabels([])
plt.show()
Unfortunately the markers argument appears to be broken currently so you'll need to fix it if you want to see dates that have nans on either side.
Try setting NaN values to np.inf
-- Seaborn doesn't draw those points, and doesn't connect the points before with points after.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With