Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

.head() and .tail() with negative indexes on pandas GroupBy object

I'm having trouble with filtering all but the last 1 element in each group of groupby object of pandas.DataFrame:

x = pd.DataFrame([['a', 1], ['b', 1], ['a', 2], ['b', 2], ['a', 3], ['b', 3]], 
                 columns=['A', 'B'])
g = x.groupby('A')

As expected (according to documentation) g.head(1) returns

   A  B
0  a  1
1  b  1

whereas g.head(-1) returns empty DataFrame

From the behavior of x.head(-1) I'd expect it to return

   A  B
0  a  1
1  b  1
2  a  2
3  b  2

i.e. dropping the last element of each group and then merging it back into the dataframe. If that's just the bug in pandas, I'd be grateful to anyone who suggests an alternative approach.

like image 723
whoever Avatar asked Nov 18 '15 14:11

whoever


People also ask

Can pandas do Groupby index?

How to perform groupby index in pandas? Pass index name of the DataFrame as a parameter to groupby() function to group rows on an index. DataFrame. groupby() function takes string or list as a param to specify the group columns or index.

Can I Groupby an object in pandas?

Pandas' groupby() allows us to split data into separate groups to perform computations for better analysis. In this article, you'll learn the “group by” process (split-apply-combine) and how to use Pandas's groupby() function to group data and perform operations.

What is the meaning of DF tail (- 2 in Python?

DataFrame - tail() function The tail() function is used to get the last n rows. This function returns last n rows from the object based on position. It is useful for quickly verifying data, for example, after sorting or appending rows.


1 Answers

As commented these haven't (yet) been implemented in pandas. However, you can use cumcount to implement them efficiently:

def negative_head(g, n):
    return g._selected_obj[g.cumcount(ascending=False) >= n]

def negative_tail(g, n):
    return g._selected_obj[g.cumcount() >= n]

In [11]: negative_head(g, 1)  # instead of g.head(-1)
Out[11]:
   B
0  1
1  1
2  2
3  2
like image 168
Andy Hayden Avatar answered Oct 24 '22 14:10

Andy Hayden