I have a pandas dataframe with 2 columns.  Some of the MessageID's end on the same row that they start with the NewMessageID like in index row 0 below.  But others like index row 2 doesnt end until index row 4.  I am looking for a clever way to simplify the output in a new dataframe.
df
    MessageID   NewMessageID
0   28          10
1   21          9
2   4           18
3   3           6
4   18          22
5   99          102
6   102         118
7   1           20
I am looking for an output like:
df1
    Start  Finish
0   28     10 
1   21     9
2   4      22
3   3      6
4   99     118
5   1      20 
                Series can only contain single list with index, whereas dataframe can be made of more than one series or we can say that a dataframe is a collection of series that can be used to analyse the data.
You can create a new DataFrame of a specific column by using DataFrame. assign() method. The assign() method assign new columns to a DataFrame, returning a new object (a copy) with the new columns added to the original ones.
I have yet another solution, since I noticed the most up-voted solution will not work in a scenario where there are more than two rows to be linked. I added yet another connection, from 22 -> 23 to show that it works in such a scenario.
def main():
    b = pd.DataFrame()
    b['MessageID'] = [28, 21, 4, 3, 18, 99, 22, 102, 1]
    b['NewMessageID'] = [10, 9, 18, 6, 22, 102, 23, 118, 20]
    b = b.rename(columns={'MessageID': 'Start', 'NewMessageID': 'End'})
    rows_to_drop = []
    for i, row in b.iterrows():
        recursion(i, row, b, rows_to_drop)
    b.drop(index=rows_to_drop, inplace=True)
def recursion(i, row, b, rows_to_drop):
    exists = b[b['Start'] == row['End']]
    if not exists.empty and i not in rows_to_drop and exists.index[0] not in rows_to_drop:
        b.at[i, 'End'] = exists['End']
        rows_to_drop.append(exists.index[0])
        for _i, _row in b.iterrows():
            recursion(_i, _row, b, rows_to_drop)
Output:
   Start  End
0     28   10
1     21    9
2      4   23
3      3    6
5     99  118
8      1   20
It clearly is suboptimal - we are iterating over a dataframe here. But it should do the trick, and be efficient enough for relatively small datasets.
It has yet another upside - we are maintaining the input order.
Join on itself, to create df2, drop rows from original df which have common values between the two columns.  Keep the outer two columns of df2 and rename them to match df and append one to the other.
df = pd.DataFrame({'MessageID':[28,21,4,3,18,99,102,1],'NewMessageID':[10,9,18,6,22,102,118,20]})
df2 = df.merge(df, left_on='NewMessageID', right_on='MessageID')
df2 = df2[['MessageID_x','NewMessageID_y']]
df2.columns = ['MessageID', 'NewMessageID']
df = df[(~df['MessageID'].isin(df['NewMessageID'].values.tolist())) & (~df['NewMessageID'].isin(df['MessageID'].values.tolist()))]
output = df.append(df2)
              MessageID  NewMessageID
    0         28            10
    1         21             9
    3          3             6
    7          1            20
    0          4            22
    1         99           118
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With