I have a dataframe including a month and year column. Both contain strings i.e. 'September' and '2013'. How do I select all rows between September 2013 and May 2008 in one row?
df1 = stats_month_census_2[(stats_month_census_2['year'] <= '2013')
& (stats_month_census_2['year'] >= '2008')]
df2 = df1[...]
After the code above, I was going to do the same thing again but I am having a hard time coming up with clever code to simply get rid of rows that are higher in time than September 2013 ('October to December') and below May 2008. I could hard code this easily, but there must be a more pythonic way of doing this...
Or you could try below if you are looking for rows falls between 2008 to 2013 as you asked in the post "select all rows between September 2013 and May 2008" then use pandas.Series.between:
Dataset borrowed from @jezrael..
DataFrame for Demonstration purpose:
>>> stats_month_census_2
year month data
0 2008 April 1
1 2008 May 3
2 2008 June 4
3 2013 September 6
4 2013 October 5
5 2014 November 6
6 2014 December 7
Using pandas.Series.between()
>>> stats_month_census_2[stats_month_census_2['year'].between(2008, 2013, inclusive=True)]
year month data
0 2008 April 1
1 2008 May 3
2 2008 June 4
3 2013 September 6
4 2013 October 5
If it's just a matter of datetime
format, you can simply try below:
>>> stats_month_census_2[stats_month_census_2['year'].between('2008-05', '2013-09', inclusive=True)]
year month data
1 2008-05-01 May 3
2 2008-06-01 June 4
3 2013-09-01 September 6
Using DataFame.query :
>>> stats_month_census_2.query('"2008-05" <= year <= "2013-09"')
year month data
1 2008-05-01 May 3
2 2008-06-01 June 4
3 2013-09-01 September 6
Using isin method: Select the rows between two dates
>>> stats_month_census_2[stats_month_census_2['year'].isin(pd.date_range('2008-05-01', '2013-09-01'))]
year month data
1 2008-05-01 May 3
2 2008-06-01 June 4
3 2013-09-01 September 6
Or, even you can pass like below..
>>> stats_month_census_2[stats_month_census_2['year'].isin(pd.date_range('2008-05', '2013-09'))]
year month data
1 2008-05-01 May 3
2 2008-06-01 June 4
3 2013-09-01 September 6
Using loc
method by slicing off based on Index start and end dates..
Start = stats_month_census_2[stats_month_census_2['year'] =='2008-05'].index[0]
End = stats_month_census_2[stats_month_census_2['year']=='2013-09'].index[0]
>>> stats_month_census_2.loc[Start:End]
year month data
1 2008-05-01 May 3
2 2008-06-01 June 4
3 2013-09-01 September 6
Note: Just for the curiosity as @jezrael asked in comment, i'm adding how to convert the year
column into datetime format:
As we have the below example DataFrame where we have two distinct columns year
and month
where year column has only years and month column is in literal string format So, First we need to convert the String into an int form join or add the year & month together by assign a day as 1 for all using pandas pd.to_datetime
method.
df
year month data
0 2008 April 1
1 2008 May 3
2 2008 June 4
3 2013 September 6
4 2013 October 5
5 2014 November 6
6 2014 December 7
Above is the raw DataFrame before datetime conversion So, i'm taking the below approach which i learned over the time vi SO itself.
1- First convert the month
names into int form and assign it to a new column called Month
as an easy go So, we can use that for conversion later.
df['Month'] = pd.to_datetime(df.month, format='%B').dt.month
2- Secondly, or at last convert Directly the year column into a proper datetime
format by directly assigning to year
column itself it's a kind of inplace we can say.
df['Date'] = pd.to_datetime(df[['year', 'Month']].assign(Day=1))
Now the Desired DataFrame and
year
column is in datetime Form:
print(df)
year month data Month
0 2008-04-01 April 1 4
1 2008-05-01 May 3 5
2 2008-06-01 June 4 6
3 2013-09-01 September 6 9
4 2013-10-01 October 5 10
5 2014-11-01 November 6 11
6 2014-12-01 December 7 12
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With