Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I perform ordered selection on multiple Columns by Value

I have a dataframe including a month and year column. Both contain strings i.e. 'September' and '2013'. How do I select all rows between September 2013 and May 2008 in one row?

df1 = stats_month_census_2[(stats_month_census_2['year'] <= '2013')
                 & (stats_month_census_2['year'] >= '2008')]

df2 = df1[...]

After the code above, I was going to do the same thing again but I am having a hard time coming up with clever code to simply get rid of rows that are higher in time than September 2013 ('October to December') and below May 2008. I could hard code this easily, but there must be a more pythonic way of doing this...

like image 844
Odisseo Avatar asked Dec 24 '22 02:12

Odisseo


1 Answers

Or you could try below if you are looking for rows falls between 2008 to 2013 as you asked in the post "select all rows between September 2013 and May 2008" then use pandas.Series.between:

Dataset borrowed from @jezrael..

DataFrame for Demonstration purpose:

>>> stats_month_census_2
   year      month  data
0  2008      April     1
1  2008        May     3
2  2008       June     4
3  2013  September     6
4  2013    October     5
5  2014   November     6
6  2014   December     7

Using pandas.Series.between()

>>> stats_month_census_2[stats_month_census_2['year'].between(2008, 2013, inclusive=True)]
   year      month  data
0  2008      April     1
1  2008        May     3
2  2008       June     4
3  2013  September     6
4  2013    October     5

If it's just a matter of datetime format, you can simply try below:

>>> stats_month_census_2[stats_month_census_2['year'].between('2008-05', '2013-09', inclusive=True)]
        year      month  data
1 2008-05-01        May     3
2 2008-06-01       June     4
3 2013-09-01  September     6

Using DataFame.query :

>>> stats_month_census_2.query('"2008-05" <= year <= "2013-09"')
        year      month  data
1 2008-05-01        May     3
2 2008-06-01       June     4
3 2013-09-01  September     6

Using isin method: Select the rows between two dates

>>> stats_month_census_2[stats_month_census_2['year'].isin(pd.date_range('2008-05-01', '2013-09-01'))]
        year      month  data
1 2008-05-01        May     3
2 2008-06-01       June     4
3 2013-09-01  September     6

Or, even you can pass like below..

>>> stats_month_census_2[stats_month_census_2['year'].isin(pd.date_range('2008-05', '2013-09'))]
        year      month  data
1 2008-05-01        May     3
2 2008-06-01       June     4
3 2013-09-01  September     6

Using loc method by slicing off based on Index start and end dates..

Start = stats_month_census_2[stats_month_census_2['year'] =='2008-05'].index[0]
End = stats_month_census_2[stats_month_census_2['year']=='2013-09'].index[0]

>>> stats_month_census_2.loc[Start:End]
        year      month  data
1 2008-05-01        May     3
2 2008-06-01       June     4
3 2013-09-01  September     6

Note: Just for the curiosity as @jezrael asked in comment, i'm adding how to convert the year column into datetime format:

As we have the below example DataFrame where we have two distinct columns year and month where year column has only years and month column is in literal string format So, First we need to convert the String into an int form join or add the year & month together by assign a day as 1 for all using pandas pd.to_datetime method.

df
   year      month  data
0  2008      April     1
1  2008        May     3
2  2008       June     4
3  2013  September     6
4  2013    October     5
5  2014   November     6
6  2014   December     7

Above is the raw DataFrame before datetime conversion So, i'm taking the below approach which i learned over the time vi SO itself.

1- First convert the month names into int form and assign it to a new column called Month as an easy go So, we can use that for conversion later.

df['Month'] = pd.to_datetime(df.month, format='%B').dt.month

2- Secondly, or at last convert Directly the year column into a proper datetime format by directly assigning to year column itself it's a kind of inplace we can say.

df['Date'] = pd.to_datetime(df[['year', 'Month']].assign(Day=1))

Now the Desired DataFrame and year column is in datetime Form:

print(df)
        year      month  data  Month
0 2008-04-01      April     1      4
1 2008-05-01        May     3      5
2 2008-06-01       June     4      6
3 2013-09-01  September     6      9
4 2013-10-01    October     5     10
5 2014-11-01   November     6     11
6 2014-12-01   December     7     12
like image 65
Karn Kumar Avatar answered Apr 05 '23 23:04

Karn Kumar