Comparing Pandas Dataframe Rows & Dropping rows with overlapping dates

Tags:

I have a dataframe filled with trades taken from a trading strategy. The logic in the trading strategy needs to be updated to ensure that trade isn't taken if the strategy is already in a trade - but that's a different problem. The trade data for many previous trades is read into a dataframe from a csv file.

Here's my problem for the data I have: I need to do a row-by-row comparison of the dataframe to determine if Entrydate of rowX is less than ExitDate rowX-1.

A sample of my data:

Row 1:
EntryDate  ExitDate
2012-07-25 2012-07-27 

Row 2:
EntryDate  ExitDate
2012-07-26 2012-07-29

Row 2 needs to be deleted because it is a trade that should not have occurred.

I'm having trouble identifying which rows are duplicates and then dropping them. I tried the approach in answer 3 of this question with some luck but it isn't ideal because I have to manually iterate through the dataframe and read each row's data. My current approach is below and is ugly as can be. I check the dates, and then add them to a new dataframe. Additionally, this approach gives me multiple duplicates in the final dataframe.

for i in range(0,len(df)+1):
    if i+1 == len(df): break #to keep from going past last row
    ExitDate = df['ExitDate'].irow(i)
    EntryNextTrade = df['EntryDate'].irow(i+1)

    if EntryNextTrade>ExitDate: 
        line={'EntryDate':EntryDate,'ExitDate':ExitDate}
        df_trades=df_trades.append(line,ignore_index=True)

Any thoughts or ideas on how to more efficiently accomplish this?

You can click here to see a sampling of my data if you want to try to reproduce my actual dataframe.

313

asked Oct 16 '13 16:10

Eric D. Brown

1 Answers

You should use some kind of boolean mask to do this kind of operation.

One way is to create a dummy column for the next trade:

df['EntryNextTrade'] = df['EntryDate'].shift()

Use this to create the mask:

msk = df['EntryNextTrade'] > df'[ExitDate']

And use loc to look at the subDataFrame where msk is True, and only the specified columns:

df.loc[msk, ['EntryDate', 'ExitDate']]

153

answered Nov 05 '22 09:11

Andy Hayden

Related questions
                            
                                Resizing window doesn't resize contents in tkinter
                            
                                Complicated (for me) reshaping from wide to long in Pandas
                            
                                Parsing nested HTML list with BeautifulSoup
                            
                                sqlalchemy connect to server, with not specify database
                            
                                python unable to import module
                            
                                Forking python, defunct child
                            
                                Python: find out whether a list of integers is coherent
                            
                                Python, How to extend Decimal class to add helpful methods
                            
                                Harvesting the power of highly-parallel computers with python scientific code [closed]
                            
                                Tkinter importing without *?
                            
                                COM: excelApplication.Application.Quit() preserves the process
                            
                                Is there any tool to translate Lisp code into Python? [closed]
                            
                                filling numpy array with random element from another array
                            
                                CSVs in Python with newline in quotes [duplicate]
                            
                                Making the `nosetests` script select folder by Python version
                            
                                Regex re.sub list in a file
                            
                                Python: why not (a, b, c) = (*x, 3)
                            
                                Django db error: could not identify an equality operator for type json when trying to annotate a model with jsonfield
                            
                                Using Mutagen to process all accepted file types
                            
                                Sphinx autodoc functions within module

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Comparing Pandas Dataframe Rows & Dropping rows with overlapping dates

Tags:

python

pandas

Eric D. Brown

People also ask

1 Answers

Andy Hayden

Recent Activity

Donate For Us