Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Frequency table for a single variable

One last newbie pandas question for the day: How do I generate a table for a single Series?

For example:

my_series = pandas.Series([1,2,2,3,3,3]) pandas.magical_frequency_function( my_series )  >> {      1 : 1,      2 : 2,       3 : 3    } 

Lots of googling has led me to Series.describe() and pandas.crosstabs, but neither of these does quite what I need: one variable, counts by categories. Oh, and it'd be nice if it worked for different data types: strings, ints, etc.

like image 276
Abe Avatar asked Aug 31 '12 00:08

Abe


People also ask

What is a single frequency table?

A frequency table is simply a “t-chart” or two-column table which outlines the various possible outcomes and the associated frequencies observed in a sample.

What is the frequency distribution of single variable?

A series of statistical data showing the frequency of only one variable is called Univariate Frequency Distribution. In other words, the frequency distribution of the single variable is called Univariate Frequency Distribution.


2 Answers

Maybe .value_counts()?

>>> import pandas >>> my_series = pandas.Series([1,2,2,3,3,3, "fred", 1.8, 1.8]) >>> my_series 0       1 1       2 2       2 3       3 4       3 5       3 6    fred 7     1.8 8     1.8 >>> counts = my_series.value_counts() >>> counts 3       3 2       2 1.8     2 fred    1 1       1 >>> len(counts) 5 >>> sum(counts) 9 >>> counts["fred"] 1 >>> dict(counts) {1.8: 2, 2: 2, 3: 3, 1: 1, 'fred': 1} 
like image 182
DSM Avatar answered Oct 21 '22 22:10

DSM


You can use list comprehension on a dataframe to count frequencies of the columns as such

[my_series[c].value_counts() for c in list(my_series.select_dtypes(include=['O']).columns)] 

Breakdown:

my_series.select_dtypes(include=['O'])  

Selects just the categorical data

list(my_series.select_dtypes(include=['O']).columns)  

Turns the columns from above into a list

[my_series[c].value_counts() for c in list(my_series.select_dtypes(include=['O']).columns)]  

Iterates through the list above and applies value_counts() to each of the columns

like image 24
Shankar ARUL Avatar answered Oct 21 '22 23:10

Shankar ARUL