Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

summing the number of occurrences per day pandas

I have a data set like so in a pandas dataframe:

                                  score timestamp                                  2013-06-29 00:52:28+00:00        -0.420070 2013-06-29 00:51:53+00:00        -0.445720 2013-06-28 16:40:43+00:00         0.508161 2013-06-28 15:10:30+00:00         0.921474 2013-06-28 15:10:17+00:00         0.876710 

I need to get counts for the number of measurements, that occur so I am looking for something like this:

                                    count    timestamp    2013-06-29                       2    2013-06-28                       3 

I do not care about the sentiment column I want the count of the occurrences per day.

like image 746
myusuf3 Avatar asked Jul 17 '13 17:07

myusuf3


People also ask

How do you count the number of occurrences in pandas?

How do you Count the Number of Occurrences in a data frame? To count the number of occurrences in e.g. a column in a dataframe you can use Pandas value_counts() method. For example, if you type df['condition']. value_counts() you will get the frequency of each unique value in the column “condition”.

What does sum () do in pandas?

Pandas DataFrame sum() Method The sum() method adds all values in each column and returns the sum for each column. By specifying the column axis ( axis='columns' ), the sum() method searches column-wise and returns the sum of each row.

What does .values do in pandas?

The values property is used to get a Numpy representation of the DataFrame. Only the values in the DataFrame will be returned, the axes labels will be removed. The values of the DataFrame. A DataFrame where all columns are the same type (e.g., int64) results in an array of the same type.

What is the difference between Value_counts and count in pandas?

count() should be used when you want to find the frequency of valid values present in columns with respect to specified col . . value_counts() should be used to find the frequencies of a series.


Video Answer


1 Answers

If your timestamp index is a DatetimeIndex:

import io import pandas as pd content = '''\ timestamp  score 2013-06-29 00:52:28+00:00        -0.420070 2013-06-29 00:51:53+00:00        -0.445720 2013-06-28 16:40:43+00:00         0.508161 2013-06-28 15:10:30+00:00         0.921474 2013-06-28 15:10:17+00:00         0.876710 '''  df = pd.read_table(io.BytesIO(content), sep='\s{2,}', parse_dates=[0], index_col=[0])  print(df) 

so df looks like this:

                        score timestamp                     2013-06-29 00:52:28 -0.420070 2013-06-29 00:51:53 -0.445720 2013-06-28 16:40:43  0.508161 2013-06-28 15:10:30  0.921474 2013-06-28 15:10:17  0.876710  print(df.index) # <class 'pandas.tseries.index.DatetimeIndex'> 

You can use:

print(df.groupby(df.index.date).count()) 

which yields

            score 2013-06-28      3 2013-06-29      2 

Note the importance of the parse_dates parameter. Without it, the index would just be a pandas.core.index.Index object. In which case you could not use df.index.date.

So the answer depends on the type(df.index), which you have not shown...

like image 116
unutbu Avatar answered Sep 21 '22 01:09

unutbu