I want to keep the last n
rows of each group sorted by a variable var_to_sort
using pandas.
This is how I would do it now, I want to group the below dataframe by name
and then sort
by date
and then use tail(n)
to get the last n
elements within in by-group.
data = [
['tom', date(2018,2,1), "I want this"],
['tom', date(2018,1,1), "Don't want"],
['nick', date(2019,4,1), "Don't want"],
['nick', date(2019,5,1), "I want this"]]
# Create the pandas DataFrame
df = pd.DataFrame(data)
df.columns = ["names", "date", "result"]
# sort it
df.sort_values("date", inplace=True)
df.groupby("names").tail(1)
Is there a more efficient way to do this? What if the dataset is indexed by "date"
or by ["date", "name"]
already?
I think your solution is nice and good, also is possible use sort_values
without inplace
for chain code together.
For another questions:
data = [
['tom', date(2018,2,1), "I want this"],
['tom', date(2018,1,1), "Don't want"],
['nick', date(2019,4,1), "Don't want"],
['nick', date(2019,5,1), "I want this"]]
# Create the pandas DataFrame
df = pd.DataFrame(data)
df.columns = ["names", "date", "result"]
df1 = df.sort_values("date").groupby("names").tail(1)
print (df1)
names date result
0 tom 2018-02-01 I want this
3 nick 2019-05-01 I want this
df2 = df.set_index('date')
print (df2)
names result
date
2018-02-01 tom I want this
2018-01-01 tom Don't want
2019-04-01 nick Don't want
2019-05-01 nick I want this
df22 = df2.sort_index().groupby("names").tail(1)
print (df22)
names result
date
2018-02-01 tom I want this
2019-05-01 nick I want this
df3 = df.set_index(['date','names'])
print (df3)
result
date names
2018-02-01 tom I want this
2018-01-01 tom Don't want
2019-04-01 nick Don't want
2019-05-01 nick I want this
df33 = df3.sort_index().groupby(level=1).tail(1)
print (df33)
result
date names
2018-02-01 tom I want this
2019-05-01 nick I want this
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With