Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a way to grab the last item of a group

Say I have a DataFrame

data = {'Column 1':     [ 1, 1, 2, 2, 2, 3, 4, 4, 4, 4], 
        'Column 2':     [ 1, 2, 1, 2, 3, 1, 1, 2, 3, 4], 
        'Column 3':     [ 1, 2, 1, 4, 3, 6, 1, 2, 7, 5]}

df = pd.DataFrame(data=data)

I want to grab row 2, 5, 6 and 10 because these are the last row for each value in Column 1. Let's say Column 1 is an ID and Column 2 indicates the number of that ID. I need it to pick the maximum number in Column 2 for each number in Column 1 and keep Column 3 without changing Column 2 and 3 pairs.

So I go from

1  1  1
1  2  2
2  1  1
2  2  4
2  3  3
3  1  6
4  1  1
4  2  2
4  3  7
4  4  5

to

1  2  2
2  3  3
3  1  6
4  4  5

If I do

df.groupby(['Column 1']).max()

I do not get what I want, because it will max both column 2 and 3.

like image 311
nielsen Avatar asked May 08 '20 16:05

nielsen


People also ask

How do I get last record by GROUP BY?

The group by will always return the first record in the group on the result set. SELECT id, category_id, post_title FROM posts WHERE id IN ( SELECT MAX(id) FROM posts GROUP BY category_id ); This will return the posts with the highest IDs in each group.

How can I get the last record of a table?

To get the last record, the following is the query. mysql> select *from getLastRecord ORDER BY id DESC LIMIT 1; The following is the output. The above output shows that we have fetched the last record, with Id 4 and Name Carol.

Does GROUP BY come before ORDER BY?

Using Group By and Order By TogetherGROUP BY goes before the ORDER BY statement because the latter operates on the final result of the query.

How to get the first and last item in each group?

However getting the sales amount or product key associated with that record, or in the other words getting the first and last item in each group isn’t possible through GUI. Fortunately we can use M (Power Query formula language) to achieve this easily. To get the first or last item in each group I have to order the table based on that date column.

How to get the last item in a Python list?

Now that you have an understanding of how list indexing works in Python, let’s get started to access the last item in a list. Getting the last item in a Python list using negative indexing is very easy. We simply pull the item at the index of -1 to get the last item in a list.

What is the index of the last item in a list?

We simply pull the item at the index of -1 to get the last item in a list. Let’s see how this works in practice: Similarly, if you wanted to get the second last item, you could use the index of -2, as shown below:

What is the last element in an object?

With arrays, the idea of "last element" is well-defined. Objects, on the other hand, require iterating all entries in O (n) to get the last element, which loses the benefit of O (1) key-based access, the primary purpose of the data structure. Performance aside, "last element in object" is semantically surprising.


4 Answers

groupby/tail

df.groupby('Column 1').tail(1)

   Column 1  Column 2  Column 3
1         1         2         2
4         2         3         3
5         3         1         6
9         4         4         5
like image 100
piRSquared Avatar answered Nov 14 '22 23:11

piRSquared


Use drop_duplicates

df_final = df.drop_duplicates('Column 1', keep='last')

Out[9]:
   Column 1  Column 2  Column 3
1         1         2         2
4         2         3         3
5         3         1         6
9         4         4         5
like image 20
Andy L. Avatar answered Nov 14 '22 21:11

Andy L.


Use Groupby.nth:

In [198]: df.groupby('Column 1', as_index=False).nth([-1])    
Out[198]: 
   Column 1  Column 2  Column 3
1         1         2         2
4         2         3         3
5         3         1         6
9         4         4         5
like image 26
Mayank Porwal Avatar answered Nov 14 '22 22:11

Mayank Porwal


if your Dataframe is ordered we don't need groupby, we can perform a boolean indexing with Series.shift

df_filtered = df.loc[~df['Column 2'].lt(df['Column 2'].shift(-1))]
print(df_filtered)
   Column 1  Column 2  Column 3
1         1         2         2
4         2         3         3
5         3         1         6
9         4         4         5
like image 34
ansev Avatar answered Nov 14 '22 22:11

ansev