I have a <code>dataframe</code> with numerous columns (≈30) from an external source (csv file) but several of them have no value or always the same. Thus, I would to see quickly the <code>value_counts</code> for each column, how can i do that? For example <pre class="prettyprint"><code> Id, temp, name 1 34, null, mark 2 22, null, mark 3 34, null, mark </code></pre> Would return me an object stating that <ul> <li>Id: 34 -> 2, 22 -> 1</li> <li>temp: null -> 3</li> <li>name: mark -> 3</li> </ul> So I would know that temp is irrelevant and name is not interesting (always the same)

For the dataframe, <pre class="prettyprint"><code>df = pd.DataFrame(data=[[34, 'null', 'mark'], [22, 'null', 'mark'], [34, 'null', 'mark']], columns=['id', 'temp', 'name'], index=[1, 2, 3]) </code></pre> the following code <pre class="prettyprint"><code>for c in df.columns: print "---- %s ---" % c print df[c].value_counts() </code></pre> will produce the following result: <pre class="prettyprint"><code>---- id --- 34 2 22 1 dtype: int64 ---- temp --- null 3 dtype: int64 ---- name --- mark 3 dtype: int64 </code></pre>

A nice way to do this and return a nicely formatter series is combining <code>pandas.Series.value_counts</code> and <code>pandas.DataFrame.stack</code>. For the DataFrame <pre class="prettyprint"><code>df = pandas.DataFrame(data=[[34, 'null', 'mark'], [22, 'null', 'mark'], [34, 'null', 'mark']], columns=['id', 'temp', 'name'], index=[1, 2, 3]) </code></pre> You can do something like <pre class="prettyprint"><code>df.apply(lambda x: x.value_counts()).T.stack() </code></pre> In this code, <code>df.apply(lambda x: x.value_counts())</code> applies <code>value_counts</code> to every column and appends it to the resulting <code>DataFrame</code>, so you end up with a <code>DataFrame</code> with the same columns and one row per every different value in every column (and a lot of <code>null</code> for each value that doesn't appear in each column). After that, <code>T</code> transposes the <code>DataFrame</code> (so you end up with a <code>DataFrame</code> with an index equal to the columns and the columns equal to the possible values), and <code>stack</code> turns the columns of the <code>DataFrame</code> into a new level of the MultiIndex and "deletes" all the <code>Null</code> values, making the whole thing a <code>Series</code>. The result of this is <pre class="prettyprint"><code>id 22 1 34 2 temp null 3 name mark 3 dtype: float64 </code></pre>

pandas value_counts applied to each column

I have a dataframe with numerous columns (≈30) from an external source (csv file) but several of them have no value or always the same. Thus, I would to see quickly the value_counts for each column, how can i do that?

For example

  Id, temp, name
1 34, null, mark
2 22, null, mark
3 34, null, mark

Would return me an object stating that

Id: 34 -> 2, 22 -> 1
temp: null -> 3
name: mark -> 3

So I would know that temp is irrelevant and name is not interesting (always the same)

What does value_counts () do in Pandas?

Return a Series containing counts of unique values. The resulting object will be in descending order so that the first element is the most frequently-occurring element.

What does the value_counts () method produce?

The value_counts() method returns a Series containing the counts of unique values. This means, for any column in a dataframe, this method returns the count of unique entries in that column.

What is the difference between value_counts and count?

count() should be used when you want to find the frequency of valid values present in columns with respect to specified col . . value_counts() should be used to find the frequencies of a series.

How do I count values in multiple columns in Python?

In order to get the count of unique values on multiple columns use pandas DataFrame. drop_duplicates() which drop duplicate rows from pandas DataFrame. This eliminates duplicates and return DataFrame with unique rows.

For the dataframe,

df = pd.DataFrame(data=[[34, 'null', 'mark'], [22, 'null', 'mark'], [34, 'null', 'mark']], columns=['id', 'temp', 'name'], index=[1, 2, 3])

the following code

for c in df.columns:
    print "---- %s ---" % c
    print df[c].value_counts()

will produce the following result:

---- id ---
34    2
22    1
dtype: int64
---- temp ---
null    3
dtype: int64
---- name ---
mark    3
dtype: int64

A nice way to do this and return a nicely formatter series is combining pandas.Series.value_counts and pandas.DataFrame.stack.

For the DataFrame

df = pandas.DataFrame(data=[[34, 'null', 'mark'], [22, 'null', 'mark'], [34, 'null', 'mark']], columns=['id', 'temp', 'name'], index=[1, 2, 3])

You can do something like

df.apply(lambda x: x.value_counts()).T.stack()

In this code, df.apply(lambda x: x.value_counts()) applies value_counts to every column and appends it to the resulting DataFrame, so you end up with a DataFrame with the same columns and one row per every different value in every column (and a lot of null for each value that doesn't appear in each column).

After that, T transposes the DataFrame (so you end up with a DataFrame with an index equal to the columns and the columns equal to the possible values), and stack turns the columns of the DataFrame into a new level of the MultiIndex and "deletes" all the Null values, making the whole thing a Series.

The result of this is

id    22      1
      34      2
temp  null    3
name  mark    3
dtype: float64

pandas value_counts applied to each column

Tags:

Edouard

People also ask

2 Answers

tanemaki

Martín Fixman

Recent Activity

Donate For Us

pandas value_counts applied to each column

Tags:

Edouard

People also ask

2 Answers

tanemaki

Martín Fixman

Related questions

Recent Activity

Donate For Us