Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas python how to count the number of records or rows in a dataframe

Obviously new to Pandas. How can i simply count the number of records in a dataframe.

I would have thought some thing as simple as this would do it and i can't seem to even find the answer in searches...probably because it is too simple.

cnt = df.count print cnt 

the above code actually just prints the whole df

like image 592
IcemanBerlin Avatar asked Jul 04 '13 11:07

IcemanBerlin


People also ask

How do I count the number of rows in a data Python?

TL;DR use len(df) len() returns the number of items(the length) of a list object(also works for dictionary, string, tuple or range objects). So, for getting row counts of a DataFrame, simply use len(df) .

How do I count rows and columns in pandas?

To get the number of rows, and columns we can use len(df. axes[]) function in Python.


2 Answers

To get the number of rows in a dataframe use:

df.shape[0] 

(and df.shape[1] to get the number of columns).

As an alternative you can use

len(df) 

or

len(df.index) 

(and len(df.columns) for the columns)

shape is more versatile and more convenient than len(), especially for interactive work (just needs to be added at the end), but len is a bit faster (see also this answer).

To avoid: count() because it returns the number of non-NA/null observations over requested axis

len(df.index) is faster

import pandas as pd import numpy as np  df = pd.DataFrame(np.arange(24).reshape(8, 3),columns=['A', 'B', 'C']) df['A'][5]=np.nan df # Out: #     A   B   C # 0   0   1   2 # 1   3   4   5 # 2   6   7   8 # 3   9  10  11 # 4  12  13  14 # 5 NaN  16  17 # 6  18  19  20 # 7  21  22  23  %timeit df.shape[0] # 100000 loops, best of 3: 4.22 µs per loop  %timeit len(df) # 100000 loops, best of 3: 2.26 µs per loop  %timeit len(df.index) # 1000000 loops, best of 3: 1.46 µs per loop 

df.__len__ is just a call to len(df.index)

import inspect  print(inspect.getsource(pd.DataFrame.__len__)) # Out: #     def __len__(self): #         """Returns length of info axis, but here we use the index """ #         return len(self.index) 

Why you should not use count()

df.count() # Out: # A    7 # B    8 # C    8 
like image 120
user2314737 Avatar answered Sep 29 '22 21:09

user2314737


Regards to your question... counting one Field? I decided to make it a question, but I hope it helps...

Say I have the following DataFrame

import numpy as np import pandas as pd  df = pd.DataFrame(np.random.normal(0, 1, (5, 2)), columns=["A", "B"]) 

You could count a single column by

df.A.count() #or df['A'].count() 

both evaluate to 5.

The cool thing (or one of many w.r.t. pandas) is that if you have NA values, count takes that into consideration.

So if I did

df['A'][1::2] = np.NAN df.count() 

The result would be

 A    3  B    5 
like image 29
tshauck Avatar answered Sep 29 '22 21:09

tshauck