I have this code below. It is surprizing for me that it works for the columns and not for the rows. <pre class="prettyprint"><code>import pandas as pd def summarizing_data_variables(df): numberRows=size(df['ID']) numberColumns=size(df.columns) summaryVariables=np.empty([numberColumns,2], dtype = np.dtype('a50')) cont=-1 for column in df.columns: cont=cont+1 summaryVariables[cont][0]=column summaryVariables[cont][1]=size(df[df[column].isin([0])][column])/(1.0*numberRows) print summaryVariables def summarizing_data_users(fileName): print "Sumarizing users..." numberRows=size(df['ID']) numberColumns=size(df.columns) summaryVariables=np.empty([numberRows,2], dtype = np.dtype('a50')) cont=-1 for row in df['ID']: cont=cont+1 summaryVariables[cont][0]=row dft=df[df['ID']==row] proportionZeros=(size(dft[dft.isin([0])])-1)/(1.0*(numberColumns-1)) # THe -1 is used to not count the ID column summaryVariables[cont][1]=proportionZeros print summaryVariables if __name__ == '__main__': df = pd.DataFrame([[1, 2, 3], [2, 5, 0.0],[3,4,5]]) df.columns=['ID','var1','var2'] print df summarizing_data_variables(df) summarizing_data_users(df) </code></pre> The output is this: <pre class="prettyprint"><code> ID var1 var2 0 1 2 3 1 2 5 0 2 3 4 5 [['ID' '0.0'] ['var1' '0.0'] ['var2' '0.333333333333']] Sumarizing users... [['1' '1.0'] ['2' '1.0'] ['3' '1.0']] </code></pre> I was expecting that for users: <pre class="prettyprint"><code>Sumarizing users... [['1' '0.0'] ['2' '0.5'] ['3' '0.0']] </code></pre> It seems that the problem is in this line: <blockquote> dft[dft.isin([0])] </blockquote> It does not constrain dft to the "True" values like in the first case. Can you help me with this? (1) How to correct the users (ROWS) part (second function above)? (2) Is this the most efficient method to do this? [My database is very big] EDIT: In function summarizing_data_variables(df) I try to evaluate the proportion of zeros in each column. In the example above, the variable Id has no zero (thus the proportion is zero), the variable var1 has no zero (thus the proportion is also zero) and the variable var2 presents a zero in the second row (thus the proportion is 1/3). I keep these values in a 2D numpy.array where the first column is the label of the column of the dataframe and the second column is the evaluated proportion. The function summarizing_data_users I want to do the same, but I do that for each row. However, it is NOT working.

try this instead of the first funtion: <pre class="prettyprint"><code>print(df[df == 0].count(axis=1)/len(df.columns)) </code></pre> UPDATE (correction): <pre class="prettyprint"><code>print('rows') print(df[df == 0].count(axis=1)/len(df.columns)) print('cols') print(df[df == 0].count(axis=0)/len(df.index)) </code></pre> Input data (i've decided to add a few rows): <pre class="prettyprint"><code>ID var1 var2 1 2 3 2 5 0 3 4 5 4 10 10 5 1 0 </code></pre> Output: <pre class="prettyprint"><code>rows ID 1 0.0 2 0.5 3 0.0 4 0.0 5 0.5 dtype: float64 cols var1 0.0 var2 0.4 dtype: float64 </code></pre>

My favorite way of getting number of nonzeros in each column is <pre class="prettyprint"><code>df.astype(bool).sum(axis=0) </code></pre> For the number of non-zeros in each row use <pre class="prettyprint"><code>df.astype(bool).sum(axis=1) </code></pre> Notice: If you have nans in your df you should make these zero first, otherwise they will be counted as 1. <pre class="prettyprint"><code>df.fillna(0).astype(bool).sum(axis=1) </code></pre>

Pandas: Counting the proportion of zeros in rows and columns of dataframe

I have this code below. It is surprizing for me that it works for the columns and not for the rows.

import pandas as pd

def summarizing_data_variables(df):
    numberRows=size(df['ID'])
    numberColumns=size(df.columns)
    summaryVariables=np.empty([numberColumns,2], dtype =  np.dtype('a50'))    
    cont=-1    
    for column in df.columns:
        cont=cont+1
        summaryVariables[cont][0]=column
        summaryVariables[cont][1]=size(df[df[column].isin([0])][column])/(1.0*numberRows)
    print summaryVariables

def summarizing_data_users(fileName):
    print "Sumarizing users..."   
    numberRows=size(df['ID'])
    numberColumns=size(df.columns)      
    summaryVariables=np.empty([numberRows,2], dtype =  np.dtype('a50'))    
    cont=-1

    for row in df['ID']:
        cont=cont+1
        summaryVariables[cont][0]=row
        dft=df[df['ID']==row]
        proportionZeros=(size(dft[dft.isin([0])])-1)/(1.0*(numberColumns-1)) # THe -1 is used to not count the ID column
        summaryVariables[cont][1]=proportionZeros
    print summaryVariables


if __name__ == '__main__':

    df = pd.DataFrame([[1, 2, 3], [2, 5, 0.0],[3,4,5]])
    df.columns=['ID','var1','var2']
    print df

    summarizing_data_variables(df)
    summarizing_data_users(df)

The output is this:

   ID  var1  var2
0   1     2     3
1   2     5     0
2   3     4     5
[['ID' '0.0']
 ['var1' '0.0']
 ['var2' '0.333333333333']]
Sumarizing users...
[['1' '1.0']
 ['2' '1.0']
 ['3' '1.0']]

I was expecting that for users:

Sumarizing users...
[['1' '0.0']
 ['2' '0.5']
 ['3' '0.0']]

It seems that the problem is in this line:

dft[dft.isin([0])]

It does not constrain dft to the "True" values like in the first case.

Can you help me with this? (1) How to correct the users (ROWS) part (second function above)? (2) Is this the most efficient method to do this? [My database is very big]

EDIT:

In function summarizing_data_variables(df) I try to evaluate the proportion of zeros in each column. In the example above, the variable Id has no zero (thus the proportion is zero), the variable var1 has no zero (thus the proportion is also zero) and the variable var2 presents a zero in the second row (thus the proportion is 1/3). I keep these values in a 2D numpy.array where the first column is the label of the column of the dataframe and the second column is the evaluated proportion.

The function summarizing_data_users I want to do the same, but I do that for each row. However, it is NOT working.

How do you count the number of zeros in a column?

Select a blank cell and type this formula =COUNTIF(A1:H8,0) into it, and press Enter key, now all the zero cells excluding blank cells are counted out. Tip: In the above formula, A1:H8 is the data range you want to count the zeros from, you can change it as you need.

Can you write a program to count the number of rows and columns in a DataFrame?

columns represents columns. So, len(dataframe. index) and len(dataframe. columns) gives count of rows and columns respectively.

How does Pandas calculate row percentage?

You can caluclate pandas percentage with total by groupby() and DataFrame. transform() method. The transform() method allows you to execute a function for each value of the DataFrame. Here, the percentage directly summarized DataFrame, then the results will be calculated using all the data.

try this instead of the first funtion:

print(df[df == 0].count(axis=1)/len(df.columns))

UPDATE (correction):

print('rows')
print(df[df == 0].count(axis=1)/len(df.columns))
print('cols')
print(df[df == 0].count(axis=0)/len(df.index))

Input data (i've decided to add a few rows):

ID  var1  var2
1     2     3
2     5     0
3     4     5
4    10    10
5    1      0

Output:

rows
ID
1    0.0
2    0.5
3    0.0
4    0.0
5    0.5
dtype: float64
cols
var1    0.0
var2    0.4
dtype: float64

My favorite way of getting number of nonzeros in each column is

df.astype(bool).sum(axis=0)

For the number of non-zeros in each row use

df.astype(bool).sum(axis=1)

Notice:

If you have nans in your df you should make these zero first, otherwise they will be counted as 1.

df.fillna(0).astype(bool).sum(axis=1)

Pandas: Counting the proportion of zeros in rows and columns of dataframe

Tags:

pandas

python-2.7

DanielTheRocketMan

People also ask

2 Answers

MaxU - stop WAR against UA

Kevin Chou

Recent Activity

Donate For Us

Pandas: Counting the proportion of zeros in rows and columns of dataframe

Tags:

pandas

python-2.7

DanielTheRocketMan

People also ask

2 Answers

MaxU - stop WAR against UA

Kevin Chou

Related questions

Recent Activity

Donate For Us