I have been searching around here and google and have found many examples that show how to remove old dates based on a non moving date variable but can not figure out how to remove rows of data based on today's date going forward. In the example below, how would I go about removing anything older than today's date (should be one value removed) then save it? The real source file will continue to get new data everyday and I will need to be removing everything older than 'today' date.
from datetime import datetime
import pandas as pd
data = {'date': ['2001-04-10 18:47:05.069722', '2018-05-16 18:47:05.119994', '2018-05-16 18:47:05.178768', '2018-05-16 18:47:05.230071', '2018-05-16 18:47:05.230071', '2018-05-16 18:47:05.280592', '2018-05-16 18:47:05.332662', '2018-05-16 18:47:05.385109', '2018-05-16 18:47:05.436523', '2018-05-16 18:47:05.486877'],
'battle_deaths': [34, 25, 26, 15, 15, 14, 26, 25, 62, 41]}
df = pd.DataFrame(data, columns = ['date', 'battle_deaths'])
df
date battle_deaths
0 2001-04-10 18:47:05.069722 34
1 2018-05-16 18:47:05.119994 25
2 2018-05-16 18:47:05.178768 26
3 2018-05-16 18:47:05.230071 15
4 2018-05-16 18:47:05.230071 15
5 2018-05-16 18:47:05.280592 14
6 2018-05-16 18:47:05.332662 26
7 2018-05-16 18:47:05.385109 25
8 2018-05-16 18:47:05.436523 62
9 2018-05-16 18:47:05.486877 41
Pandas has spared no expense to make life easier for developers. Compare against to_datetime('today') and filter accordingly:
df[pd.to_datetime(df.date, errors='coerce') >= pd.to_datetime('today')]
date battle_deaths
1 2018-05-16 18:47:05.119994 25
2 2018-05-16 18:47:05.178768 26
3 2018-05-16 18:47:05.230071 15
4 2018-05-16 18:47:05.230071 15
5 2018-05-16 18:47:05.280592 14
6 2018-05-16 18:47:05.332662 26
7 2018-05-16 18:47:05.385109 25
8 2018-05-16 18:47:05.436523 62
9 2018-05-16 18:47:05.486877 41
This removes the 0th row.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With