How to keep only the consecutive values in a Pandas dataframe using Python

Tags:

I have a dataframe that looks like the this:

Enter image description here

I want to keep only the consecutive years in each group, such as the following figure where the year of 2005 in group A and year of 2009 and 2011 in group B are deleted.

Enter image description here

I created a column of the year difference by using df['year_diff']=df.groupby(['group'])['Year'].diff(), and then only kept the rows where the year difference was equal to 1.

However, this method will also delete the first row in each consecutive year group since the year difference of the first row will be NAN. For example, the year of 2000 will be deleted from group 2000-2005. Is there a way that I can do to avoid this problem?

779

asked May 20 '19 13:05

yihao ren

2 Answers

`shift`

Get the year diffs as OP first did. Then check if equal to 1 or the previous value is 1

yd = df.Year.groupby(df.group).diff().eq(1)
df[yd | yd.shift(-1)]

   group  Year
0      A  2000
1      A  2001
2      A  2002
3      A  2003
5      A  2007
6      A  2008
7      A  2009
8      A  2010
9      A  2011
10     B  2005
11     B  2006
12     B  2007
15     B  2013
16     B  2014
17     B  2015
18     B  2016
19     B  2017

Setup

Thx jez

a = [('A',x) for x in range(2000, 2012) if x not in [2004,2006]]
b = [('B',x) for x in range(2005, 2018) if x not in [2008,2010,2012]]
df = pd.DataFrame(a + b, columns=['group','Year'])

200

answered Oct 16 '22 21:10

piRSquared

If I understand correctly, using diff and cumsum create the additional group key, then groupby it and your group columns, and drop the count equal to 1.

df[df.g.groupby([df.g,df.Year.diff().ne(1).cumsum()]).transform('count').ne(1)]

Out[317]:
    g  Year
0   A  2000
1   A  2001
2   A  2002
3   A  2003
5   A  2007
6   A  2008
7   A  2009
8   A  2010
9   A  2011
10  B  2005
11  B  2006
12  B  2007
15  B  2013
16  B  2014
17  B  2015
18  B  2016
19  B  2017

Data

df=pd.DataFrame({'g':list('AAAAAAAAAABBBBBBBBBB',
                 'Year':[2000,2001,2002,2003,2005,2007,2008,2009,2010,2011,2005,2006,2007,2009,2011,2013,2014,2015,2016,2017])]})

answered Oct 16 '22 22:10

BENY

Related questions
                            
                                Possible Combination of Parentheses in a Matrix Chain Application
                            
                                Converting a DateTime Index value to an Index Number
                            
                                Implementing ROC Curves for K-NN machine learning algorithm using python and Scikit Learn
                            
                                Pickling dict in Python
                            
                                Sorting pandas dataframe by weekdays
                            
                                numpy find the max value in a row and return back to it's column index
                            
                                How to debug Tensorflow segmentation fault in model.fit()?
                            
                                Difference between multiprocessing.cpu_count and os.cpu_count
                            
                                What does the 'm' in a Python ABI tag mean?
                            
                                What is the difference between MLP implementation from scratch and in PyTorch?
                            
                                How to redirect -progress option output of ffmpeg to stderr?
                            
                                How to add calculated column to Dataframe counting frequency in column in pandas
                            
                                What is a time complexity of move_to_end operation for OrderedDict in Python 3?
                            
                                Multivariate polynomial regression with Python
                            
                                How to join nearby bounding boxes in OpenCV Python
                            
                                How to plot a vertical line at the x-axis range median position using plotly in Python API?
                            
                                Count of values grouped per month, year - Pandas
                            
                                Python: Dynamically import module's code from string with importlib
                            
                                gyp ERR! stack Error: Can't find Python executable
                            
                                Multiple aggregated Counting in Pandas

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to keep only the consecutive values in a Pandas dataframe using Python

Tags:

python

pandas

dataframe