Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Count unique dates in pandas dataframe

Tags:

python

pandas

I have a dataframe of surface weather observations (fzraHrObs) organized by a station identifier code and date. fzraHrObs has several columns of weather data. The station code and date (datetime objects) look like:

usaf      dat
716270    2014-11-23 12:00:00
          2015-12-20 08:00:00
          2015-12-20 09:00:00
          2015-12-21 04:00:00
          2015-12-28 03:00:00
716280    2015-12-19 08:00:00
          2015-12-19 08:00:00

I would like to get a count of the number of unique dates (days) per year for each station - i.e. the number of days of obs per year at each station. In my example above this would give me:

    usaf      Year     Count
    716270    2014     1
              2015     3
    716280    2014     0
              2015     1

I've tried using groupby and grouping by station, year, and date: grouped = fzraHrObs['dat'].groupby(fzraHrObs['usaf'], fzraHrObs.dat.dt.year, fzraHrObs.dat.dt.date])

Count, size, nunique, etc. on this just gives me the number of obs on each date, not the number of dates themselves per year. Any suggestions on getting what I want here?

like image 659
MeteoMtl Avatar asked Aug 10 '16 14:08

MeteoMtl


People also ask

How do you count unique values in a pandas DataFrame?

You can use the nunique() function to count the number of unique values in a pandas DataFrame.

What is .count in pandas?

Pandas DataFrame count() Method The count() method counts the number of not empty values for each row, or column if you specify the axis parameter as axis='columns' , and returns a Series object with the result for each row (or column).


2 Answers

Could be something like this, group the date by usaf and year and then count the number of unique values:

import pandas as pd
df.dat.apply(lambda dt: dt.date()).groupby([df.usaf, df.dat.apply(lambda dt: dt.year)]).nunique()

#   usaf   dat 
# 716270  2014    1
#         2015    3
# 716280  2015    1
# Name: dat, dtype: int64
like image 115
Psidom Avatar answered Sep 25 '22 10:09

Psidom


The following should work:

df.groupby(['usaf', df.dat.dt.year])['dat'].apply(lambda s: s.dt.date.nunique())

What I did differently is group by two levels only, then use the nunique method of pandas series to count the number of unique dates in each group.

like image 21
IanS Avatar answered Sep 26 '22 10:09

IanS