Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: get a frequency count based on two columns (variables) in pandas dataframe some row appers

Hello I have the following dataframe.

    Group           Size

    Short          Small
    Short          Small
    Moderate       Medium
    Moderate       Small
    Tall           Large

I want to count the frequency of how many time the same row appears in the dataframe.

    Group           Size      Time

    Short          Small        2
    Moderate       Medium       1 
    Moderate       Small        1
    Tall           Large        1
like image 323
emax Avatar asked Oct 21 '15 23:10

emax


People also ask

How do I count the number of occurrences in a column in pandas?

Using the size() or count() method with pandas. DataFrame. groupby() will generate the count of a number of occurrences of data present in a particular column of the dataframe.

How do you count the frequency of elements in pandas DataFrame?

In pandas you can get the count of the frequency of a value that occurs in a DataFrame column by using Series. value_counts() method, alternatively, If you have a SQL background you can also get using groupby() and count() method.

How can I get the frequency counts of each item in one or more columns in a DataFrame Mcq?

After grouping a DataFrame object on one column, we can apply count() method on the resulting groupby object to get a DataFrame object containing frequency count. This method can be used to count frequencies of objects over single or multiple columns.


Video Answer


3 Answers

You can use groupby's size:

In [11]: df.groupby(["Group", "Size"]).size()
Out[11]:
Group     Size
Moderate  Medium    1
          Small     1
Short     Small     2
Tall      Large     1
dtype: int64

In [12]: df.groupby(["Group", "Size"]).size().reset_index(name="Time")
Out[12]:
      Group    Size  Time
0  Moderate  Medium     1
1  Moderate   Small     1
2     Short   Small     2
3      Tall   Large     1
like image 105
Andy Hayden Avatar answered Oct 20 '22 12:10

Andy Hayden


Update after pandas 1.1 value_counts now accept multiple columns

df.value_counts(["Group", "Size"])

You can also try pd.crosstab()

Group           Size

Short          Small
Short          Small
Moderate       Medium
Moderate       Small
Tall           Large

pd.crosstab(df.Group,df.Size)


Size      Large  Medium  Small
Group                         
Moderate      0       1      1
Short         0       0      2
Tall          1       0      0

EDIT: In order to get your out put

pd.crosstab(df.Group,df.Size).replace(0,np.nan).\
     stack().reset_index().rename(columns={0:'Time'})
Out[591]: 
      Group    Size  Time
0  Moderate  Medium   1.0
1  Moderate   Small   1.0
2     Short   Small   2.0
3      Tall   Large   1.0
like image 41
BENY Avatar answered Oct 20 '22 12:10

BENY


Other posibbility is using .pivot_table() and aggfunc='size'

df_solution = df.pivot_table(index=['Group','Size'], aggfunc='size')
like image 5
asantz96 Avatar answered Oct 20 '22 14:10

asantz96