Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas - Python, deleting rows based on Date column

I'm trying to delete rows of a dataframe based on one date column; [Delivery Date]

I need to delete rows which are older than 6 months old but not equal to the year '1970'.

I've created 2 variables:

from datetime import date, timedelta
sixmonthago = date.today() - timedelta(188)

import time
nineteen_seventy = time.strptime('01-01-70', '%d-%m-%y')

but I don't know how to delete rows based on these two variables, using the [Delivery Date] column.

Could anyone provide the correct solution?

like image 700
Colin O'Brien Avatar asked Feb 20 '15 12:02

Colin O'Brien


People also ask

How do I delete rows in Pandas DataFrame based on condition?

Use pandas. DataFrame. drop() method to delete/remove rows with condition(s).

How do I delete a row based on date?

Display the Data tab of the ribbon. Click the Sort Oldest to Newest tool. Excel sorts the data according to the dates in column B, with the oldest date in row 2. Select and delete the rows that contain dates before your cutoff.

How do I remove a specific row from a DataFrame in Python?

To drop a specific row from the data frame – specify its index value to the Pandas drop function. It can be useful for selection and aggregation to have a more meaningful index. For our sample data, the “name” column would make a good index also, and make it easier to select country rows for deletion from the data.


2 Answers

You can just filter them out:

df[(df['Delivery Date'].dt.year == 1970) | (df['Delivery Date'] >= sixmonthago)]

This returns all rows where the year is 1970 or the date is less than 6 months.

You can use boolean indexing and pass multiple conditions to filter the df, for multiple conditions you need to use the array operators so | instead of or, and parentheses around the conditions due to operator precedence.

Check the docs for an explanation of boolean indexing

like image 186
EdChum Avatar answered Oct 04 '22 07:10

EdChum


Be sure the calculation itself is accurate for "6 months" prior. You may not want to be hardcoding in 188 days. Not all months are made equally.

from datetime import date
from dateutil.relativedelta import relativedelta

#http://stackoverflow.com/questions/546321/how-do-i-calculate-the-date-six-months-from-the-current-date-using-the-datetime
six_months = date.today() - relativedelta( months = +6 )

Then you can apply the following logic.

import time
nineteen_seventy = time.strptime('01-01-70', '%d-%m-%y')

df = df[(df['Delivery Date'].dt.year == nineteen_seventy.tm_year) | (df['Delivery Date'] >= six_months)]

If you truly want to drop sections of the dataframe, you can do the following:

df = df[(df['Delivery Date'].dt.year != nineteen_seventy.tm_year) | (df['Delivery Date'] < six_months)].drop(df.columns)
like image 45
unique_beast Avatar answered Oct 04 '22 07:10

unique_beast