I have a Python pandas dataframe with daily data that looks like:
Date Open High Low Close Adj Close Volume
0 2009-12-31 31.709999 31.840000 31.299999 31.309999 23.547892 1957700
1 2010-01-04 31.480000 31.840000 31.330000 31.469999 23.668222 3472500
2 2010-01-05 31.549999 31.770000 31.400000 31.639999 23.796082 3458700
3 2010-01-06 31.600000 31.889999 31.430000 31.559999 23.735907 3745800
4 2010-01-07 31.549999 31.700001 31.049999 31.230000 23.487726 7232100
How can I save the first and last entry for every year? If the last entry for a year is let's say October 31st, I would like to keep this one as the last entry for the year.
Sort by Date if the rows are not already in sorted order i.e., something like:
df = df.sort_values(by='Date')
Group by year and only retain the first and last elements of each group:
df.groupby(pd.DatetimeIndex(df.Date).to_period('Y')).nth([0,-1])
Output using example data frame from question:
Date Open High Low Close Adj Close Volume
Date
2009 2009-12-31 31.709999 31.840000 31.299999 31.309999 23.547892 1957700
2010 2010-01-04 31.480000 31.840000 31.330000 31.469999 23.668222 3472500
2010 2010-01-07 31.549999 31.700001 31.049999 31.230000 23.487726 7232100
Note: If there is only one entry per year like there is in the example, (2009), that row will only be in the output once, not twice, but if used on real data that would not be an issue anyway.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With