Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Counting frequency of values by date using pandas

Let's suppose I have following Time Series:

Timestamp              Category
2014-10-16 15:05:17    Facebook
2014-10-16 14:56:37    Vimeo
2014-10-16 14:25:16    Facebook
2014-10-16 14:15:32    Facebook
2014-10-16 13:41:01    Facebook
2014-10-16 12:50:30    Orkut
2014-10-16 12:28:54    Facebook
2014-10-16 12:26:56    Facebook
2014-10-16 12:25:12    Facebook
...
2014-10-08 15:52:49    Youtube
2014-10-08 15:04:50    Youtube
2014-10-08 15:03:48    Vimeo
2014-10-08 15:02:27    Youtube
2014-10-08 15:01:56    DailyMotion
2014-10-08 13:27:28    Facebook
2014-10-08 13:01:08    Vimeo
2014-10-08 12:52:06    Facebook
2014-10-08 12:43:27    Facebook
Name: summary, Length: 600

I would like to make a count of each category (Unique Value/Factor in the Time Series) per week and year.

Example:

    Week/Year      Category      Count
    1/2014         Facebook      12
    1/2014         Google        5
    1/2014         Youtube       2
...    
    2/2014         Facebook      2
    2/2014         Google        5
    2/2014         Youtube       20
...

How can this be achieved using Python pandas?

like image 447
jcborges Avatar asked Jan 07 '15 15:01

jcborges


People also ask

How do you count the frequency of a value in Pandas?

In pandas you can get the count of the frequency of a value that occurs in a DataFrame column by using Series. value_counts() method, alternatively, If you have a SQL background you can also get using groupby() and count() method.

How do you count occurrences of specific value in Pandas column?

How do you Count the Number of Occurrences in a data frame? To count the number of occurrences in e.g. a column in a dataframe you can use Pandas value_counts() method. For example, if you type df['condition']. value_counts() you will get the frequency of each unique value in the column “condition”.

What does Value_counts () do in Pandas?

Return a Series containing counts of unique values. The resulting object will be in descending order so that the first element is the most frequently-occurring element.

How do you count the frequency of a word in a Pandas DataFrame?

To count the frequency of a value in a DataFrame column in Pandas, we can use df. groupby(column name). size() method.


3 Answers

It might be easiest to turn your Series into a DataFrame and use Pandas' groupby functionality (if you already have a DataFrame then skip straight to adding another column below).

If your Series is called s, then turn it into a DataFrame like so:

>>> df = pd.DataFrame({'Timestamp': s.index, 'Category': s.values})
>>> df
       Category           Timestamp
0      Facebook 2014-10-16 15:05:17
1         Vimeo 2014-10-16 14:56:37
2      Facebook 2014-10-16 14:25:16
...

Now add another column for the week and year (one way is to use apply and generate a string of the week/year numbers):

>>> df['Week/Year'] = df['Timestamp'].apply(lambda x: "%d/%d" % (x.week, x.year))
>>> df
             Timestamp     Category Week/Year
0  2014-10-16 15:05:17     Facebook   42/2014
1  2014-10-16 14:56:37        Vimeo   42/2014
2  2014-10-16 14:25:16     Facebook   42/2014
...

Finally, group by 'Week/Year' and 'Category' and aggregate with size() to get the counts. For the data in your question this produces the following:

>>> df.groupby(['Week/Year', 'Category']).size()
Week/Year  Category   
41/2014    DailyMotion    1
           Facebook       3
           Vimeo          2
           Youtube        3
42/2014    Facebook       7
           Orkut          1
           Vimeo          1
like image 107
Alex Riley Avatar answered Oct 12 '22 18:10

Alex Riley


Convert your TimeStamp column to week number then groupby that week number and value_count the categorical variable like so:

df.groupby('week_num').Category.value_counts()

Where I have assumed that a new column week_num was created from the TimeStamp column.

like image 24
cwharland Avatar answered Oct 12 '22 19:10

cwharland


To be a little bit more clear, you do not need to create a new column called 'week_num' first.

df.groupby(by=lambda x: "%d/%d" % (x.week(), x.year())).Category.value_counts()

The function by will automatically call on each timestamp object of the index to convert them to week and year, and then group by the week and year.

like image 42
Leon Avatar answered Oct 12 '22 17:10

Leon