Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to plot kernel density plot of dates in Pandas?

I have a pandas dataframe where each observation has a date (as a column of entries in datetime[64] format). These dates are spread over a period of about 5 years. I would like to plot a kernel-density plot of the dates of all the observations, with the years labelled on the x-axis.

I have figured out how to create a time-delta relative to some reference date and then create a density plot of the number of hours/days/years between each observation and the reference date:

df['relativeDate'].astype('timedelta64[D]').plot(kind='kde')

But this isn't exactly what I want: If I convert to year-deltas, then the x-axis is right but I lose the within-year variation. But if I take a smaller unit of time like hour or day, the x-axis labels are much harder to interpret.

What's the simplest way to make this work in Pandas?

like image 219
bhackinen Avatar asked Jul 10 '15 19:07

bhackinen


People also ask

How do you plot a KDE plot in Python?

Creating a Univariate Seaborn KdeplotThe seaborn. kdeplot() function is used to plot the data against a single/univariate variable. It represents the probability distribution of the data values as the area under the plotted curve. In the above example, we have generated some random data values using the numpy.

What does a KDE plot show?

A kernel density estimate (KDE) plot is a method for visualizing the distribution of observations in a dataset, analogous to a histogram. KDE represents the data using a continuous probability density curve in one or more dimensions.

What does a kernel density plot show?

A density plot is a representation of the distribution of a numeric variable. It uses a kernel density estimate to show the probability density function of the variable (see more). It is a smoothed version of the histogram and is used in the same concept.


1 Answers

Inspired by @JohnE 's answer, an alternative approach to convert date to numeric value is to use .toordinal().

import pandas as pd
import numpy as np

# simulate some artificial data
# ===============================
np.random.seed(0)
dates = pd.date_range('2010-01-01', periods=31, freq='D')
df = pd.DataFrame(np.random.choice(dates,100), columns=['dates'])
# use toordinal() to get datenum
df['ordinal'] = [x.toordinal() for x in df.dates]

print(df)

        dates  ordinal
0  2010-01-13   733785
1  2010-01-16   733788
2  2010-01-22   733794
3  2010-01-01   733773
4  2010-01-04   733776
5  2010-01-28   733800
6  2010-01-04   733776
7  2010-01-08   733780
8  2010-01-10   733782
9  2010-01-20   733792
..        ...      ...
90 2010-01-19   733791
91 2010-01-28   733800
92 2010-01-01   733773
93 2010-01-15   733787
94 2010-01-04   733776
95 2010-01-22   733794
96 2010-01-13   733785
97 2010-01-26   733798
98 2010-01-11   733783
99 2010-01-21   733793

[100 rows x 2 columns]    

# plot non-parametric kde on numeric datenum
ax = df['ordinal'].plot(kind='kde')
# rename the xticks with labels
x_ticks = ax.get_xticks()
ax.set_xticks(x_ticks[::2])
xlabels = [datetime.datetime.fromordinal(int(x)).strftime('%Y-%m-%d') for x in x_ticks[::2]]
ax.set_xticklabels(xlabels)

enter image description here

like image 174
Jianxun Li Avatar answered Sep 20 '22 15:09

Jianxun Li