Pandas dataframe get first row of each group

People also ask

How do you get the first group of Groupby pandas?

Using Pandas Groupby nth(0) To get the first value in a group, pass 0 as an argument to the nth() function.

How do you get the index of Groupby pandas?

How to perform groupby index in pandas? Pass index name of the DataFrame as a parameter to groupby() function to group rows on an index. DataFrame. groupby() function takes string or list as a param to specify the group columns or index.

How do you get the first row in a DataFrame?

Select & print first row of dataframe using head() It will return the first row of dataframe as a dataframe object. Using the head() function, we fetched the first row of dataframe as a dataframe and then just printed it.

What is first () in pandas?

Pandas DataFrame first() Method The first() method returns the first n rows, based on the specified value. The index have to be dates for this method to work as expected.

>>> df.groupby('id').first()
     value
id        
1    first
2    first
3    first
4   second
5    first
6    first
7   fourth

If you need id as column:

>>> df.groupby('id').first().reset_index()
   id   value
0   1   first
1   2   first
2   3   first
3   4  second
4   5   first
5   6   first
6   7  fourth

To get n first records, you can use head():

>>> df.groupby('id').head(2).reset_index(drop=True)
    id   value
0    1   first
1    1  second
2    2   first
3    2  second
4    3   first
5    3   third
6    4  second
7    4   fifth
8    5   first
9    6   first
10   6  second
11   7  fourth
12   7   fifth

This will give you the second row of each group (zero indexed, nth(0) is the same as first()):

df.groupby('id').nth(1)

Documentation: http://pandas.pydata.org/pandas-docs/stable/groupby.html#taking-the-nth-row-of-each-group

I'd suggest to use .nth(0) rather than .first() if you need to get the first row.

The difference between them is how they handle NaNs, so .nth(0) will return the first row of group no matter what are the values in this row, while .first() will eventually return the first not NaN value in each column.

E.g. if your dataset is :

df = pd.DataFrame({'id' : [1,1,1,2,2,3,3,3,3,4,4],
            'value'  : ["first","second","third", np.NaN,
                        "second","first","second","third",
                        "fourth","first","second"]})

>>> df.groupby('id').nth(0)
    value
id        
1    first
2    NaN
3    first
4    first

And

>>> df.groupby('id').first()
    value
id        
1    first
2    second
3    first
4    first

If you only need the first row from each group we can do with drop_duplicates, Notice the function default method keep='first'.

df.drop_duplicates('id')
Out[1027]: 
    id   value
0    1   first
3    2   first
5    3   first
9    4  second
11   5   first
12   6   first
15   7  fourth

maybe this is what you want

import pandas as pd
idx = pd.MultiIndex.from_product([['state1','state2'],   ['county1','county2','county3','county4']])
df = pd.DataFrame({'pop': [12,15,65,42,78,67,55,31]}, index=idx)

                pop
state1 county1   12
       county2   15
       county3   65
       county4   42
state2 county1   78
       county2   67
       county3   55
       county4   31

df.groupby(level=0, group_keys=False).apply(lambda x: x.sort_values('pop', ascending=False)).groupby(level=0).head(3)

> Out[29]: 
                pop
state1 county3   65
       county4   42
       county2   15
state2 county1   78
       county2   67
       county3   55

Related questions
                            
                                Why can tuples contain mutable items?
                            
                                Where do I call the BatchNormalization function in Keras?
                            
                                Iterate over model instance field names and values in template
                            
                                How to create key or append an element to key?
                            
                                How to save a dictionary to a file?
                            
                                How to limit the maximum value of a numeric field in a Django model?
                            
                                What is the relationship between virtualenv and pyenv?
                            
                                How to change default Anaconda python environment
                            
                                What's a correct and good way to implement __hash__()?
                            
                                Python if not == vs if !=
                            
                                pip: force install ignoring dependencies
                            
                                multiprocessing vs multithreading vs asyncio in Python 3
                            
                                How to add a new row to an empty numpy array
                            
                                python pandas: apply a function with arguments to a series
                            
                                Why is early return slower than else?
                            
                                How to access the local Django webserver from outside world
                            
                                Setting different color for each series in scatter plot on matplotlib
                            
                                Iterating through directories with Python
                            
                                How can I one hot encode in Python?
                            
                                How to do multiple arguments to map function where one remains the same in python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas dataframe get first row of each group

Tags:

python

pandas

dataframe

group-by

row

People also ask

Recent Activity

Donate For Us