Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to plot Pandas datetime series in Seaborn distplot?

I have a pandas dataframe with a datetime column. I would like to plot the distribution of the rows according to that date column, but I'm currenty getting an unhelpful error. I have:

df['Date'] = pd.to_datetime(df['Date'], errors='raise')
s = sns.distplot(df['Date'])

which throws the error:

TypeError: ufunc add cannot use operands with types dtype('<M8[ns]') and dtype('<M8[ns]')

If I change the column I'm plotting to numeric data then it all works fine. How can I get the datetime column to behave nicely? I can't really find much about what I think I need in the docs. Any and all help appreciated.

The below is the result of df.head(2), I have removed some columns for security reasons etc:

               Date                 
2812         2016-03-05
2813         2016-03-05

Apparently the column (when taken as a series) has properties

Name: Date, dtype: datetime64[ns]
like image 389
Joseph Whiting Avatar asked Jul 25 '16 12:07

Joseph Whiting


People also ask

What is the difference between Displot and Distplot?

displot() is the new distplot() with better capabilities and distplot() is deprecated starting from this Seaborn version. With the new displot() function in Seaborn, the plotting function hierarchy kind of of looks like this now covering most of the plotting capabilities.

Is Distplot deprecated?

This function has been deprecated and will be removed in seaborn v0. 14.0. It has been replaced by histplot() and displot() , two functions with a modern API and many more capabilities.


1 Answers

I came across this question while having the same problem myself. As mentioned in comments, it seems like seaborn's distplot doesn't support dates to work with. Unfortunately, I could not find anything in official documentation to support this claim.

I found two ways to deal with this problem. None of them is perfect, yet that's the best I found.

Option 1: Convert dates to numbers

Convert to some numeric metric and work with that. displot works with numbers, so if each date was represented by a number we will be okay. The mapping between dates and numbers is kinda like use MinMax Scaler. For example, We can set "2017-01-01" as 0 and "2020-06-06" as 1, and map all dates between them to values in range [0,1].

What range of numbers to use it's depends on the range of your data, could be days/months/ years or etc.

I'll demonstrate this approach with this toy example.

import pandas as pd
import datetime as dt

original_dates = ["2016-03-05", "2016-03-05", "2016-02-05", "2016-02-05", "2016-02-05", "2014-03-05"]
dates_list = [dt.datetime.strptime(date, '%Y-%m-%d').date() for date in original_dates]

df = pd.DataFrame({"Date":dates_list})

now dataframe is as follows:

         Date
0  2016-03-05
1  2016-03-05
2  2016-02-05
3  2016-02-05
4  2016-02-05
5  2014-03-05

(not the best way to enter dates to dataframe of course, but it doesn't matter how).

Now I create a new column which will hold the difference in days between minimum date:

df["NewDate"] = df["Date"] - dt.date(2014,3,5)
df["NewDate"] = df["NewDate"].apply(lambda x: x.days)

result:

         Date  NewDate
0  2016-03-05      731
1  2016-03-05      731
2  2016-02-05      702
3  2016-02-05      702
4  2016-02-05      702
5  2014-03-05        0

notice I "hard-coded" the minimum date. You can use better ways to find minimum and not hard-coded it. I just wanted to get this part as fast as possible.

Now we can use displot on our new column:

import seaborn as sns
sns.set()
ax = sns.distplot(df['NewDate'])

output:

Seaborn displot with dates

As you can see, it shows the days instead of dates. For my personal problem it was okay to show it that way. If you want to show it as dates, some extra step is needed: Show xticks which are function of x-axis, not directly the data it self. Example with dates (pandas, matplotlib)

As I said earlier, I used scaling by days difference but you can do the same with months or years. Depends on the data.

Option 2: Use histogram directly without seaborn's displot

In this question: Can Pandas plot a histogram of dates? there is an answer how to plot histogram with dates, using pandas's groupby.

It's not the same as displot, but it can be close-enough solution (as displot eventually is based on matplotlib's hist).

like image 100
Roim Avatar answered Sep 18 '22 08:09

Roim