I'm having trouble with filtering all but the last 1 element in each group of groupby object of pandas.DataFrame: <pre class="prettyprint"><code>x = pd.DataFrame([['a', 1], ['b', 1], ['a', 2], ['b', 2], ['a', 3], ['b', 3]], columns=['A', 'B']) g = x.groupby('A') </code></pre> As expected (according to documentation) <code>g.head(1)</code> returns <pre class="prettyprint"><code> A B 0 a 1 1 b 1 </code></pre> whereas <code>g.head(-1)</code> returns empty DataFrame From the behavior of <code>x.head(-1)</code> I'd expect it to return <pre class="prettyprint"><code> A B 0 a 1 1 b 1 2 a 2 3 b 2 </code></pre> i.e. dropping the last element of each group and then merging it back into the dataframe. If that's just the bug in pandas, I'd be grateful to anyone who suggests an alternative approach.

As commented these haven't (yet) been implemented in pandas. However, you can use cumcount to implement them efficiently: <pre class="prettyprint"><code>def negative_head(g, n): return g._selected_obj[g.cumcount(ascending=False) >= n] def negative_tail(g, n): return g._selected_obj[g.cumcount() >= n] In [11]: negative_head(g, 1) # instead of g.head(-1) Out[11]: B 0 1 1 1 2 2 3 2 </code></pre>

.head() and .tail() with negative indexes on pandas GroupBy object

Tags:

python

pandas

group-by

I'm having trouble with filtering all but the last 1 element in each group of groupby object of pandas.DataFrame:

x = pd.DataFrame([['a', 1], ['b', 1], ['a', 2], ['b', 2], ['a', 3], ['b', 3]], 
                 columns=['A', 'B'])
g = x.groupby('A')

As expected (according to documentation) g.head(1) returns

   A  B
0  a  1
1  b  1

whereas g.head(-1) returns empty DataFrame

From the behavior of x.head(-1) I'd expect it to return

i.e. dropping the last element of each group and then merging it back into the dataframe. If that's just the bug in pandas, I'd be grateful to anyone who suggests an alternative approach.

723

asked Nov 18 '15 14:11

whoever

1 Answers

As commented these haven't (yet) been implemented in pandas. However, you can use cumcount to implement them efficiently:

def negative_head(g, n):
    return g._selected_obj[g.cumcount(ascending=False) >= n]

def negative_tail(g, n):
    return g._selected_obj[g.cumcount() >= n]

In [11]: negative_head(g, 1)  # instead of g.head(-1)
Out[11]:
   B
0  1
1  1
2  2
3  2

168

answered Oct 24 '22 14:10

Andy Hayden

Related questions
                            
                                pandas: write df to text file - indent df to right by 5 white spaces
                            
                                how to move identical elements in numpy array into subarrays
                            
                                Permutations over subarray in python
                            
                                Why does this loop in python runs progressively slower?
                            
                                Merge two rows in the same Dataframe if their index is the same?
                            
                                Eliminating spaces between equal signs in ConfigParser - Python [duplicate]
                            
                                Downloading flask-generated html page
                            
                                K-means Clustering in Python
                            
                                Double Output when calling a function through another one
                            
                                Isn't \d redundant in [\w\d]?
                            
                                How can I write a Tornado unit test which tests concurrency
                            
                                Find enclosed spaces in array
                            
                                scipy imsave saves wrong values
                            
                                Embedding reStructuredText in Python docstrings
                            
                                Detecting NSFW submissions with praw
                            
                                Python: print base class variables
                            
                                Data munging in pandas
                            
                                This is forbidden when an 'atomic' block is active. Django 1.8
                            
                                Getting ImportError when running nosetests
                            
                                Find "date" in generic webpage using Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With