Say I have a DataFrame
data = {'Column 1': [ 1, 1, 2, 2, 2, 3, 4, 4, 4, 4],
'Column 2': [ 1, 2, 1, 2, 3, 1, 1, 2, 3, 4],
'Column 3': [ 1, 2, 1, 4, 3, 6, 1, 2, 7, 5]}
df = pd.DataFrame(data=data)
I want to grab row 2, 5, 6 and 10 because these are the last row for each value in Column 1. Let's say Column 1 is an ID and Column 2 indicates the number of that ID. I need it to pick the maximum number in Column 2 for each number in Column 1 and keep Column 3 without changing Column 2 and 3 pairs.
So I go from
1 1 1
1 2 2
2 1 1
2 2 4
2 3 3
3 1 6
4 1 1
4 2 2
4 3 7
4 4 5
to
1 2 2
2 3 3
3 1 6
4 4 5
If I do
df.groupby(['Column 1']).max()
I do not get what I want, because it will max both column 2 and 3.
The group by will always return the first record in the group on the result set. SELECT id, category_id, post_title FROM posts WHERE id IN ( SELECT MAX(id) FROM posts GROUP BY category_id ); This will return the posts with the highest IDs in each group.
To get the last record, the following is the query. mysql> select *from getLastRecord ORDER BY id DESC LIMIT 1; The following is the output. The above output shows that we have fetched the last record, with Id 4 and Name Carol.
Using Group By and Order By TogetherGROUP BY goes before the ORDER BY statement because the latter operates on the final result of the query.
However getting the sales amount or product key associated with that record, or in the other words getting the first and last item in each group isn’t possible through GUI. Fortunately we can use M (Power Query formula language) to achieve this easily. To get the first or last item in each group I have to order the table based on that date column.
Now that you have an understanding of how list indexing works in Python, let’s get started to access the last item in a list. Getting the last item in a Python list using negative indexing is very easy. We simply pull the item at the index of -1 to get the last item in a list.
We simply pull the item at the index of -1 to get the last item in a list. Let’s see how this works in practice: Similarly, if you wanted to get the second last item, you could use the index of -2, as shown below:
With arrays, the idea of "last element" is well-defined. Objects, on the other hand, require iterating all entries in O (n) to get the last element, which loses the benefit of O (1) key-based access, the primary purpose of the data structure. Performance aside, "last element in object" is semantically surprising.
groupby
/tail
df.groupby('Column 1').tail(1)
Column 1 Column 2 Column 3
1 1 2 2
4 2 3 3
5 3 1 6
9 4 4 5
Use drop_duplicates
df_final = df.drop_duplicates('Column 1', keep='last')
Out[9]:
Column 1 Column 2 Column 3
1 1 2 2
4 2 3 3
5 3 1 6
9 4 4 5
Use Groupby.nth
:
In [198]: df.groupby('Column 1', as_index=False).nth([-1])
Out[198]:
Column 1 Column 2 Column 3
1 1 2 2
4 2 3 3
5 3 1 6
9 4 4 5
if your Dataframe is ordered we don't need groupby
, we can perform a boolean indexing
with Series.shift
df_filtered = df.loc[~df['Column 2'].lt(df['Column 2'].shift(-1))]
print(df_filtered)
Column 1 Column 2 Column 3
1 1 2 2
4 2 3 3
5 3 1 6
9 4 4 5
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With