Pandas How to create a new dataframe with a start and end even if on different rows

Tags:

I have a pandas dataframe with 2 columns. Some of the MessageID's end on the same row that they start with the NewMessageID like in index row 0 below. But others like index row 2 doesnt end until index row 4. I am looking for a clever way to simplify the output in a new dataframe.

df
    MessageID   NewMessageID
0   28          10
1   21          9
2   4           18
3   3           6
4   18          22
5   99          102
6   102         118
7   1           20

I am looking for an output like:

df1
    Start  Finish
0   28     10 
1   21     9
2   4      22
3   3      6
4   99     118
5   1      20

974

asked Sep 06 '19 18:09

sectechguy

2 Answers

I have yet another solution, since I noticed the most up-voted solution will not work in a scenario where there are more than two rows to be linked. I added yet another connection, from 22 -> 23 to show that it works in such a scenario.

def main():
    b = pd.DataFrame()
    b['MessageID'] = [28, 21, 4, 3, 18, 99, 22, 102, 1]
    b['NewMessageID'] = [10, 9, 18, 6, 22, 102, 23, 118, 20]
    b = b.rename(columns={'MessageID': 'Start', 'NewMessageID': 'End'})
    rows_to_drop = []
    for i, row in b.iterrows():
        recursion(i, row, b, rows_to_drop)
    b.drop(index=rows_to_drop, inplace=True)


def recursion(i, row, b, rows_to_drop):
    exists = b[b['Start'] == row['End']]
    if not exists.empty and i not in rows_to_drop and exists.index[0] not in rows_to_drop:
        b.at[i, 'End'] = exists['End']
        rows_to_drop.append(exists.index[0])
        for _i, _row in b.iterrows():
            recursion(_i, _row, b, rows_to_drop)

Output:

   Start  End
0     28   10
1     21    9
2      4   23
3      3    6
5     99  118
8      1   20

It clearly is suboptimal - we are iterating over a dataframe here. But it should do the trick, and be efficient enough for relatively small datasets.

It has yet another upside - we are maintaining the input order.

answered Oct 07 '22 10:10

Epion

Join on itself, to create df2, drop rows from original df which have common values between the two columns. Keep the outer two columns of df2 and rename them to match df and append one to the other.

df = pd.DataFrame({'MessageID':[28,21,4,3,18,99,102,1],'NewMessageID':[10,9,18,6,22,102,118,20]})

df2 = df.merge(df, left_on='NewMessageID', right_on='MessageID')
df2 = df2[['MessageID_x','NewMessageID_y']]
df2.columns = ['MessageID', 'NewMessageID']

df = df[(~df['MessageID'].isin(df['NewMessageID'].values.tolist())) & (~df['NewMessageID'].isin(df['MessageID'].values.tolist()))]

output = df.append(df2)


              MessageID  NewMessageID
    0         28            10
    1         21             9
    3          3             6
    7          1            20
    0          4            22
    1         99           118

121

answered Oct 07 '22 09:10

Chris

Related questions
                            
                                Why does `categorical_feature` of lightgbm not work?
                            
                                String format printing with python3: print from unpacked array *some* of the time
                            
                                Cyclic permutation operators in python
                            
                                Comparing two potentially NULL values in SQLite query
                            
                                Graphene: Enum argument doesn't seem to work
                            
                                train spacy for text classification
                            
                                How do I perform One Hot Encoding on lists in a pandas column?
                            
                                How to fix TypeError: can only concatenate str (not "list") to str
                            
                                Multiply all elements of PyTorch tensor
                            
                                How to call an async coroutine periodically using an RxPY interval observable?
                            
                                How to find_all(id) from a div with beautiful soup in python
                            
                                How to add new line to existing pandas dataframe? [duplicate]
                            
                                Can't fix "zipimport.ZipImportError: can't decompress data; zlib not available" when I type in "python3.6 get-pip.py"
                            
                                Set 'y' axis to scientific notation
                            
                                Workaround for blocked GET requests in Python
                            
                                write pytest test function return value to file with pytest.hookimpl
                            
                                Why pd.to_numeric `errors=''` is equivalent to `errors='coerce'`
                            
                                LSTM Keras input shape confusion
                            
                                multiplying two int arrays in python
                            
                                Fill dataframe nan values from a join

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas How to create a new dataframe with a start and end even if on different rows

Tags:

python

pandas

sectechguy

People also ask

2 Answers

Epion

Chris

Recent Activity

Donate For Us