Is there a way to compute and return in datetime format the median of a datetime column? I want to calculate the median of a column in python which is in datetime64[ns] format. Below is a sample to the column:
df['date'].head()
0 2017-05-08 13:25:13.342
1 2017-05-08 16:37:45.545
2 2017-01-12 11:08:04.021
3 2016-12-01 09:06:29.912
4 2016-06-08 03:16:40.422
Name: recency, dtype: datetime64[ns]
My aim is to have the median in same datetime format as the date column above:
Tried converting to np.array:
median_ = np.median(np.array(df['date']))
But that throws the error:
TypeError: ufunc add cannot use operands with types dtype('<M8[ns]') and dtype('<M8[ns]')
Converting to int64 and then calculating the median and attempt to the return format to datetime does not work
df['date'].astype('int64').median().astype('datetime64[ns]')
The median is the value in a vector that divide the data into two equal parts. To find the median of all columns, we can use apply function. For example, if we have a data frame df that contains numerical columns then the median for all the columns can be calculated as apply(df,2,median).
How about just taking the middle value?
dates = list(df.sort('date')['date'])
print dates[len(dates)//2]
If the table is sorted you can even skip a line.
You can also try quantile(0.5)
:
df['date'].astype('datetime64[ns]').quantile(0.5, interpolation="midpoint")
You are close, the median()
return a float
so convert it to be an int
first:
import math
median = math.floor(df['date'].astype('int64').median())
Then convert the int
represent the date into datetime64
:
result = np.datetime64(median, "ns") #unit: nanosecond
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With