How to get last 5 months of data of a Pandas DataFrame?

Q: How can you look at the last 5 records in a DataFrame?

Method 1: Using tail() method DataFrame. tail(n) to get the last n rows of the DataFrame. It takes one optional argument n (number of rows you want to get from the end). By default n = 5, it return the last 5 rows if the value of n is not passed to the method.

Q: How do I select a specific date range in pandas?

In order to select rows between two dates in pandas DataFrame, first, create a boolean mask using mask = (df['InsertedDates'] > start_date) & (df['InsertedDates'] <= end_date) to represent the start and end of the date range. Then you select the DataFrame that lies within the range using the DataFrame. loc[] method.

Q: How do you drop the last 5 rows in pandas?

Drop Last Row of Pandas DataFrame Using iloc[] iloc[] you can drop the rows from DataFrame and use -1 to drop the last row. For example use df. iloc[:-1,:] to select all rows except the last one and then assign it back to the original variable which ideally drops the last row from DataFrame.

Tags:

python

pandas

dataframe

I have a Pandas dataframe that was generated by fetching data from a KDB database, using QPython.

First, the Date column is returned as a strange dtype: dtype('<M8[ns]')

df = conn.sync("select Date, Open, High, Low, Close from stocktable", pandas=True)
df["Date"].dtype
# dtype('<M8[ns]')

However, when I inspect the contents of the column, the bottom row displays the dtype as datetime.

0      2017-04-17
1      2017-04-13
2      2017-04-12
3      2017-04-11
4      2017-04-10
5      2017-04-07
6      2017-04-06
7      2017-04-05
8      2017-04-04
9      2017-04-03
10     2017-03-31
11     2017-03-30
          ...    

3180   2004-08-27
3181   2004-08-26
3182   2004-08-25
3183   2004-08-24
3184   2004-08-23
3185   2004-08-20
3186   2004-08-19
Name: Date, dtype: datetime64[ns]

Also, method last() is not working correctly. I ask for the last 5 months worth of data, but all of the data is returned.

# Expected to only return last 5 months of data, but returns it all.
df.set_index("Date").last("5M")

How do I get the last rows of this DataFrame?

389

asked Apr 28 '17 11:04

Leonel

3 Answers

Solved it. The issue was that the data returned by KDB was sorted in DESC order, which confused method last().

The solution is to either add a sort clause to the query (in the Q language, it's with a backtick followed by the keyword xasc)

df = conn.sync("`Date xasc select Date, Open, High, Low, Close from stocktable", pandas=True) \
     .last("5M")

Or else, to sort the data in the Pandas dataframe itself.

df_sorted = stocktable.dataframe() \
    .sort_values(by="Date",ascending=True) \
    .set_index("Date")
    .last("5M")

answered Sep 28 '22 08:09

Leonel

It works properly for me.

Demo:

In [71]: from pandas_datareader import data as web

In [72]: df = web.DataReader('AAPL', 'yahoo', '2010-04-01')

In [73]: df
Out[73]:
                  Open        High         Low       Close     Volume   Adj Close
Date
2010-04-01  237.410000  238.730003  232.750000  235.969994  150786300   30.572166
2010-04-05  234.980011  238.509998  234.769993  238.489998  171126900   30.898657
2010-04-06  238.200005  240.239998  237.000004  239.540009  111754300   31.034696
2010-04-07  239.549995  241.920010  238.659988  240.600006  157125500   31.172029
2010-04-08  240.440002  241.540001  238.040001  239.950005  143247300   31.087815
2010-04-09  241.430012  241.889996  240.460003  241.789993   83545700   31.326203
2010-04-12  242.199989  243.069996  241.809994  242.290005   83256600   31.390984
2010-04-13  241.860008  242.800003  241.110004  242.430008   76552700   31.409123
2010-04-14  245.280006  245.810005  244.069992  245.690002  101019100   31.831486
2010-04-15  245.779991  249.029999  245.509998  248.920010   94196200   32.249965
...                ...         ...         ...         ...        ...         ...
2017-04-13  141.910004  142.380005  141.050003  141.050003   17652900  141.050003
2017-04-17  141.479996  141.880005  140.869995  141.830002   16424000  141.830002
2017-04-18  141.410004  142.039993  141.110001  141.199997   14660800  141.199997
2017-04-19  141.880005  142.000000  140.449997  140.679993   17271300  140.679993
2017-04-20  141.220001  142.919998  141.160004  142.440002   23251100  142.440002
2017-04-21  142.440002  142.679993  141.850006  142.270004   17245200  142.270004
2017-04-24  143.500000  143.949997  143.179993  143.639999   17099200  143.639999
2017-04-25  143.910004  144.899994  143.869995  144.529999   18290300  144.529999
2017-04-26  144.470001  144.600006  143.380005  143.679993   19769400  143.679993
2017-04-27  143.919998  144.160004  143.309998  143.789993   14106100  143.789993

[1781 rows x 6 columns]

In [74]: df.last('5M')
Out[74]:
                  Open        High         Low       Close    Volume   Adj Close
Date
2016-12-01  110.370003  110.940002  109.029999  109.489998  37086900  109.017344
2016-12-02  109.169998  110.089996  108.849998  109.900002  26528000  109.425578
2016-12-05  110.000000  110.029999  108.250000  109.110001  34324500  108.638987
2016-12-06  109.500000  110.360001  109.190002  109.949997  26195500  109.475358
2016-12-07  109.260002  111.190002  109.160004  111.029999  29998700  110.550697
2016-12-08  110.860001  112.430000  110.599998  112.120003  27068300  111.635996
2016-12-09  112.309998  114.699997  112.309998  113.949997  34402600  113.458090
2016-12-12  113.290001  115.000000  112.489998  113.300003  26374400  112.810902
2016-12-13  113.839996  115.919998  113.750000  115.190002  43733800  114.692743
2016-12-14  115.040001  116.199997  114.980003  115.190002  34031800  114.692743
...                ...         ...         ...         ...       ...         ...
2017-04-13  141.910004  142.380005  141.050003  141.050003  17652900  141.050003
2017-04-17  141.479996  141.880005  140.869995  141.830002  16424000  141.830002
2017-04-18  141.410004  142.039993  141.110001  141.199997  14660800  141.199997
2017-04-19  141.880005  142.000000  140.449997  140.679993  17271300  140.679993
2017-04-20  141.220001  142.919998  141.160004  142.440002  23251100  142.440002
2017-04-21  142.440002  142.679993  141.850006  142.270004  17245200  142.270004
2017-04-24  143.500000  143.949997  143.179993  143.639999  17099200  143.639999
2017-04-25  143.910004  144.899994  143.869995  144.529999  18290300  144.529999
2017-04-26  144.470001  144.600006  143.380005  143.679993  19769400  143.679993
2017-04-27  143.919998  144.160004  143.309998  143.789993  14106100  143.789993

[101 rows x 6 columns]

answered Sep 28 '22 09:09

MaxU - stop WAR against UA

For me it works nice:

rng = pd.date_range('2017-04-03', periods=10, freq='20D')
df = pd.DataFrame({'Date': rng, 'a': range(10)})  
print (df)
        Date  a
0 2017-04-03  0
1 2017-04-23  1
2 2017-05-13  2
3 2017-06-02  3
4 2017-06-22  4
5 2017-07-12  5
6 2017-08-01  6
7 2017-08-21  7
8 2017-09-10  8
9 2017-09-30  9

df = df.set_index('Date').last('5M')
print (df)
            a
Date         
2017-05-13  2
2017-06-02  3
2017-06-22  4
2017-07-12  5
2017-08-01  6
2017-08-21  7
2017-09-10  8
2017-09-30  9

It works nice with duplicates also, only is necessary sort DateTime column:

rng = pd.date_range('2017-04-03', periods=10, freq='20D')
df = pd.DataFrame({'Date': rng, 'a': range(10)})  
df = pd.concat([df,df], ignore_index=True).sort_values('Date')
print (df)
         Date  a
0  2017-04-03  0
10 2017-04-03  0
1  2017-04-23  1
11 2017-04-23  1
2  2017-05-13  2
12 2017-05-13  2
3  2017-06-02  3
13 2017-06-02  3
4  2017-06-22  4
14 2017-06-22  4
5  2017-07-12  5
15 2017-07-12  5
6  2017-08-01  6
16 2017-08-01  6
17 2017-08-21  7
7  2017-08-21  7
18 2017-09-10  8
8  2017-09-10  8
9  2017-09-30  9
19 2017-09-30  9

df = df.set_index('Date').last('5M')
print (df)
            a
Date         
2017-05-13  2
2017-05-13  2
2017-06-02  3
2017-06-02  3
2017-06-22  4
2017-06-22  4
2017-07-12  5
2017-07-12  5
2017-08-01  6
2017-08-01  6
2017-08-21  7
2017-08-21  7
2017-09-10  8
2017-09-10  8
2017-09-30  9
2017-09-30  9

rng = pd.date_range('2017-04-03', periods=10, freq='20D')
df = pd.DataFrame({'Date': rng, 'a': range(10)})  
df = pd.concat([df,df], ignore_index=True)
print (df)
         Date  a
0  2017-04-03  0
1  2017-04-23  1
2  2017-05-13  2
3  2017-06-02  3
4  2017-06-22  4
5  2017-07-12  5
6  2017-08-01  6
7  2017-08-21  7
8  2017-09-10  8
9  2017-09-30  9
10 2017-04-03  0
11 2017-04-23  1
12 2017-05-13  2
13 2017-06-02  3
14 2017-06-22  4
15 2017-07-12  5
16 2017-08-01  6
17 2017-08-21  7
18 2017-09-10  8
19 2017-09-30  9

df = df.set_index('Date').last('5M')
print (df)
            a
Date         
2017-05-13  2
2017-06-02  3
2017-06-22  4
2017-07-12  5
2017-08-01  6
2017-08-21  7
2017-09-10  8
2017-09-30  9

answered Sep 28 '22 09:09

jezrael

Related questions
                            
                                Display Django form fields on the "same line"
                            
                                Python-PPTX: Changing table style or adding borders to cells
                            
                                Pandas group hourly data into daily sums with date index
                            
                                Python Pillow - ValueError: Decompressed Data Too Large
                            
                                Access superclass' property setter in subclass
                            
                                Grandchild inheriting from Parent class - Python
                            
                                How to sort edges in networkx based on their weight
                            
                                Django JWT auth: How to get user data?
                            
                                How to delete numpy nan from a list of strings in Python?
                            
                                Scrapy: How to output items in a specific json format
                            
                                spaCy needs a file that is not there: strings.json
                            
                                Index of multiple minimum elements in a list [duplicate]
                            
                                Why do we still need parser like BeautifulSoup if we can use Selenium?
                            
                                flask-admin: How to make columns read-only according to other columns' value?
                            
                                keras usage of the Activation layer instead of activation parameter
                            
                                Can defaultdict accept callables dependent on the given missing key?
                            
                                Python Jupyter Notebook - Unable to open CSV file through a path
                            
                                Python string formation using list comprehension
                            
                                Tensorflow maxpool with dynamic ksize
                            
                                Python Pretty table with color output

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to get last 5 months of data of a Pandas DataFrame?

Tags:

python

pandas

dataframe

Leonel

People also ask

3 Answers

Leonel

MaxU - stop WAR against UA

jezrael

Recent Activity

Donate For Us