Obviously new to Pandas. How can i simply count the number of records in a dataframe.
I would have thought some thing as simple as this would do it and i can't seem to even find the answer in searches...probably because it is too simple.
cnt = df.count print cnt
the above code actually just prints the whole df
TL;DR use len(df) len() returns the number of items(the length) of a list object(also works for dictionary, string, tuple or range objects). So, for getting row counts of a DataFrame, simply use len(df) .
To get the number of rows, and columns we can use len(df. axes[]) function in Python.
To get the number of rows in a dataframe use:
df.shape[0]
(and df.shape[1]
to get the number of columns).
As an alternative you can use
len(df)
or
len(df.index)
(and len(df.columns)
for the columns)
shape
is more versatile and more convenient than len()
, especially for interactive work (just needs to be added at the end), but len
is a bit faster (see also this answer).
To avoid: count()
because it returns the number of non-NA/null observations over requested axis
len(df.index)
is faster
import pandas as pd import numpy as np df = pd.DataFrame(np.arange(24).reshape(8, 3),columns=['A', 'B', 'C']) df['A'][5]=np.nan df # Out: # A B C # 0 0 1 2 # 1 3 4 5 # 2 6 7 8 # 3 9 10 11 # 4 12 13 14 # 5 NaN 16 17 # 6 18 19 20 # 7 21 22 23 %timeit df.shape[0] # 100000 loops, best of 3: 4.22 µs per loop %timeit len(df) # 100000 loops, best of 3: 2.26 µs per loop %timeit len(df.index) # 1000000 loops, best of 3: 1.46 µs per loop
df.__len__
is just a call to len(df.index)
import inspect print(inspect.getsource(pd.DataFrame.__len__)) # Out: # def __len__(self): # """Returns length of info axis, but here we use the index """ # return len(self.index)
Why you should not use count()
df.count() # Out: # A 7 # B 8 # C 8
Regards to your question... counting one Field? I decided to make it a question, but I hope it helps...
Say I have the following DataFrame
import numpy as np import pandas as pd df = pd.DataFrame(np.random.normal(0, 1, (5, 2)), columns=["A", "B"])
You could count a single column by
df.A.count() #or df['A'].count()
both evaluate to 5.
The cool thing (or one of many w.r.t. pandas
) is that if you have NA
values, count takes that into consideration.
So if I did
df['A'][1::2] = np.NAN df.count()
The result would be
A 3 B 5
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With