Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Comparing Pandas Dataframe Rows & Dropping rows with overlapping dates

Tags:

python

pandas

I have a dataframe filled with trades taken from a trading strategy. The logic in the trading strategy needs to be updated to ensure that trade isn't taken if the strategy is already in a trade - but that's a different problem. The trade data for many previous trades is read into a dataframe from a csv file.

Here's my problem for the data I have: I need to do a row-by-row comparison of the dataframe to determine if Entrydate of rowX is less than ExitDate rowX-1.

A sample of my data:

Row 1:
EntryDate  ExitDate
2012-07-25 2012-07-27 

Row 2:
EntryDate  ExitDate
2012-07-26 2012-07-29

Row 2 needs to be deleted because it is a trade that should not have occurred.

I'm having trouble identifying which rows are duplicates and then dropping them. I tried the approach in answer 3 of this question with some luck but it isn't ideal because I have to manually iterate through the dataframe and read each row's data. My current approach is below and is ugly as can be. I check the dates, and then add them to a new dataframe. Additionally, this approach gives me multiple duplicates in the final dataframe.

for i in range(0,len(df)+1):
    if i+1 == len(df): break #to keep from going past last row
    ExitDate = df['ExitDate'].irow(i)
    EntryNextTrade = df['EntryDate'].irow(i+1)

    if EntryNextTrade>ExitDate: 
        line={'EntryDate':EntryDate,'ExitDate':ExitDate}
        df_trades=df_trades.append(line,ignore_index=True)

Any thoughts or ideas on how to more efficiently accomplish this?

You can click here to see a sampling of my data if you want to try to reproduce my actual dataframe.

like image 313
Eric D. Brown Avatar asked Oct 16 '13 16:10

Eric D. Brown


People also ask

How do you find the different rows between two data frames in Python?

Finding the common rows between two DataFrames We can use either merge() function or concat() function. The merge() function serves as the entry point for all standard database join operations between DataFrame objects. Merge function is similar to SQL inner join, we find the common rows between two dataframes.

How do you check if two rows in a DataFrame are the same?

Method 2: Using equals() methods. This method Test whether two-column contain the same elements. This function allows two Series or DataFrames to be compared against each other to see if they have the same shape and elements. NaNs in the same location are considered equal.

How do I compare two DataFrames columns in Pandas?

Here are a few things to keep in mind when comparing two columns in a pandas DataFrame: The number of conditions and choices should be equal. The default value specifies the value to display in the new column if none of the conditions are met. Both NumPy and Pandas are required to make this code work.

How do I compare values in Pandas Series?

Compare two Series objects of the same length and return a Series where each element is True if the element in each Series is equal, False otherwise. Compare two DataFrame objects of the same shape and return a DataFrame where each element is True if the respective element in each DataFrame is equal, False otherwise.


1 Answers

You should use some kind of boolean mask to do this kind of operation.

One way is to create a dummy column for the next trade:

df['EntryNextTrade'] = df['EntryDate'].shift()

Use this to create the mask:

msk = df['EntryNextTrade'] > df'[ExitDate']

And use loc to look at the subDataFrame where msk is True, and only the specified columns:

df.loc[msk, ['EntryDate', 'ExitDate']]
like image 153
Andy Hayden Avatar answered Nov 05 '22 09:11

Andy Hayden