Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get the number of the most frequent value in a column?

I have a data frame and I would like to know how many times a given column has the most frequent value.

I try to do it in the following way:

items_counts = df['item'].value_counts()
max_item = items_counts.max()

As a result I get:

ValueError: cannot convert float NaN to integer

As far as I understand, with the first line I get series in which the values from a column are used as key and frequency of these values are used as values. So, I just need to find the largest value in the series and, because of some reason, it does not work. Does anybody know how this problem can be solved?

like image 942
Roman Avatar asked Feb 28 '13 15:02

Roman


People also ask

How do you find the most frequent value in a data frame?

We can find the number of occurrences of elements using the value_counts() method. From that the most frequent element can be accessed by using the mode() method.

How do you find the most frequent value in a column in R?

To find the most frequent factor value in an R data frame column, we can use names function with which. max function after creating the table for the particular column. This might be required while doing factorial analysis and we want to know which factor occurs the most.

How do I count the number of values in a column in pandas?

We can count by using the value_counts() method. This function is used to count the values present in the entire dataframe and also count values in a particular column.

How do you count occurrences in a DataFrame in Python?

Using the size() or count() method with pandas. DataFrame. groupby() will generate the count of a number of occurrences of data present in a particular column of the dataframe.


2 Answers

It looks like you may have some nulls in the column. You can drop them with df = df.dropna(subset=['item']). Then df['item'].value_counts().max() should give you the max counts, and df['item'].value_counts().idxmax() should give you the most frequent value.

like image 96
beardc Avatar answered Sep 20 '22 09:09

beardc


To continue to @jonathanrocher answer you could use mode in pandas DataFrame. It'll give a most frequent values (one or two) across the rows or columns:

import pandas as pd
import numpy as np
df = pd.DataFrame({"a": [1,2,2,4,2], "b": [np.nan, np.nan, np.nan, 3, 3]})

In [2]: df.mode()
Out[2]: 
   a    b
0  2  3.0
like image 29
Anton Protopopov Avatar answered Sep 20 '22 09:09

Anton Protopopov