One last newbie pandas question for the day: How do I generate a table for a single Series?
For example:
my_series = pandas.Series([1,2,2,3,3,3]) pandas.magical_frequency_function( my_series ) >> { 1 : 1, 2 : 2, 3 : 3 }
Lots of googling has led me to Series.describe() and pandas.crosstabs, but neither of these does quite what I need: one variable, counts by categories. Oh, and it'd be nice if it worked for different data types: strings, ints, etc.
A frequency table is simply a “t-chart” or two-column table which outlines the various possible outcomes and the associated frequencies observed in a sample.
A series of statistical data showing the frequency of only one variable is called Univariate Frequency Distribution. In other words, the frequency distribution of the single variable is called Univariate Frequency Distribution.
Maybe .value_counts()
?
>>> import pandas >>> my_series = pandas.Series([1,2,2,3,3,3, "fred", 1.8, 1.8]) >>> my_series 0 1 1 2 2 2 3 3 4 3 5 3 6 fred 7 1.8 8 1.8 >>> counts = my_series.value_counts() >>> counts 3 3 2 2 1.8 2 fred 1 1 1 >>> len(counts) 5 >>> sum(counts) 9 >>> counts["fred"] 1 >>> dict(counts) {1.8: 2, 2: 2, 3: 3, 1: 1, 'fred': 1}
You can use list comprehension on a dataframe to count frequencies of the columns as such
[my_series[c].value_counts() for c in list(my_series.select_dtypes(include=['O']).columns)]
Breakdown:
my_series.select_dtypes(include=['O'])
Selects just the categorical data
list(my_series.select_dtypes(include=['O']).columns)
Turns the columns from above into a list
[my_series[c].value_counts() for c in list(my_series.select_dtypes(include=['O']).columns)]
Iterates through the list above and applies value_counts() to each of the columns
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With