I have a pandas dataframe where each observation has a date (as a column of entries in datetime[64] format). These dates are spread over a period of about 5 years. I would like to plot a kernel-density plot of the dates of all the observations, with the years labelled on the x-axis.
I have figured out how to create a time-delta relative to some reference date and then create a density plot of the number of hours/days/years between each observation and the reference date:
df['relativeDate'].astype('timedelta64[D]').plot(kind='kde')
But this isn't exactly what I want: If I convert to year-deltas, then the x-axis is right but I lose the within-year variation. But if I take a smaller unit of time like hour or day, the x-axis labels are much harder to interpret.
What's the simplest way to make this work in Pandas?
Creating a Univariate Seaborn KdeplotThe seaborn. kdeplot() function is used to plot the data against a single/univariate variable. It represents the probability distribution of the data values as the area under the plotted curve. In the above example, we have generated some random data values using the numpy.
A kernel density estimate (KDE) plot is a method for visualizing the distribution of observations in a dataset, analogous to a histogram. KDE represents the data using a continuous probability density curve in one or more dimensions.
A density plot is a representation of the distribution of a numeric variable. It uses a kernel density estimate to show the probability density function of the variable (see more). It is a smoothed version of the histogram and is used in the same concept.
Inspired by @JohnE 's answer, an alternative approach to convert date to numeric value is to use .toordinal()
.
import pandas as pd
import numpy as np
# simulate some artificial data
# ===============================
np.random.seed(0)
dates = pd.date_range('2010-01-01', periods=31, freq='D')
df = pd.DataFrame(np.random.choice(dates,100), columns=['dates'])
# use toordinal() to get datenum
df['ordinal'] = [x.toordinal() for x in df.dates]
print(df)
dates ordinal
0 2010-01-13 733785
1 2010-01-16 733788
2 2010-01-22 733794
3 2010-01-01 733773
4 2010-01-04 733776
5 2010-01-28 733800
6 2010-01-04 733776
7 2010-01-08 733780
8 2010-01-10 733782
9 2010-01-20 733792
.. ... ...
90 2010-01-19 733791
91 2010-01-28 733800
92 2010-01-01 733773
93 2010-01-15 733787
94 2010-01-04 733776
95 2010-01-22 733794
96 2010-01-13 733785
97 2010-01-26 733798
98 2010-01-11 733783
99 2010-01-21 733793
[100 rows x 2 columns]
# plot non-parametric kde on numeric datenum
ax = df['ordinal'].plot(kind='kde')
# rename the xticks with labels
x_ticks = ax.get_xticks()
ax.set_xticks(x_ticks[::2])
xlabels = [datetime.datetime.fromordinal(int(x)).strftime('%Y-%m-%d') for x in x_ticks[::2]]
ax.set_xticklabels(xlabels)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With