I have a data set like so in a pandas dataframe:
score timestamp 2013-06-29 00:52:28+00:00 -0.420070 2013-06-29 00:51:53+00:00 -0.445720 2013-06-28 16:40:43+00:00 0.508161 2013-06-28 15:10:30+00:00 0.921474 2013-06-28 15:10:17+00:00 0.876710
I need to get counts for the number of measurements, that occur so I am looking for something like this:
count timestamp 2013-06-29 2 2013-06-28 3
I do not care about the sentiment column I want the count of the occurrences per day.
How do you Count the Number of Occurrences in a data frame? To count the number of occurrences in e.g. a column in a dataframe you can use Pandas value_counts() method. For example, if you type df['condition']. value_counts() you will get the frequency of each unique value in the column “condition”.
Pandas DataFrame sum() Method The sum() method adds all values in each column and returns the sum for each column. By specifying the column axis ( axis='columns' ), the sum() method searches column-wise and returns the sum of each row.
The values property is used to get a Numpy representation of the DataFrame. Only the values in the DataFrame will be returned, the axes labels will be removed. The values of the DataFrame. A DataFrame where all columns are the same type (e.g., int64) results in an array of the same type.
count() should be used when you want to find the frequency of valid values present in columns with respect to specified col . . value_counts() should be used to find the frequencies of a series.
If your timestamp
index is a DatetimeIndex
:
import io import pandas as pd content = '''\ timestamp score 2013-06-29 00:52:28+00:00 -0.420070 2013-06-29 00:51:53+00:00 -0.445720 2013-06-28 16:40:43+00:00 0.508161 2013-06-28 15:10:30+00:00 0.921474 2013-06-28 15:10:17+00:00 0.876710 ''' df = pd.read_table(io.BytesIO(content), sep='\s{2,}', parse_dates=[0], index_col=[0]) print(df)
so df
looks like this:
score timestamp 2013-06-29 00:52:28 -0.420070 2013-06-29 00:51:53 -0.445720 2013-06-28 16:40:43 0.508161 2013-06-28 15:10:30 0.921474 2013-06-28 15:10:17 0.876710 print(df.index) # <class 'pandas.tseries.index.DatetimeIndex'>
You can use:
print(df.groupby(df.index.date).count())
which yields
score 2013-06-28 3 2013-06-29 2
Note the importance of the parse_dates
parameter. Without it, the index would just be a pandas.core.index.Index
object. In which case you could not use df.index.date
.
So the answer depends on the type(df.index)
, which you have not shown...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With