Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to extract specific substrings into new rows, using regex?

I have a dataframe that contains the full chat between the user and customer agent. I would like to extract just the messages from the user and create new rows from them with the same ticket id:

ticket_id = pd.DataFrame(["1","2"]).rename(columns={0:"Ticket-ID"})
full_chat = pd.DataFrame([
   "User foo foo foo 12:12 PM, Agent bar bar bar 12:12 PM, User foo foo 12:13 
    PM, Agent bar bar 12:13 PM, User foo 12:14 PM, Agent bar 12:14 PM", 

   "User bar bar bar 12:12 PM, Agent foo foo foo 12:12 PM, User bar bar 12:13 
    PM"
    ]).rename(columns={0:"Full-Chat"})


merge_chat = pd.merge(ticket_id, full_chat, left_index=True, right_index=True, how='outer')


def _split_row(text):
    cleaned_text = text.lower()

    lines = re.findall(r"\b\w*user\b\ (.*?)\ *\d\d:\d\d*", cleaned_text)        

    for line in lines:
        print(line.split())

print(merge_chat["Full-Chat"].apply(_split_row))

I would like it to be like:

Ticket-ID      Full-Chat
1              foo foo foo
1              foo foo
1              foo
2              bar bar bar
2              bar bar
like image 534
JLo Avatar asked Mar 17 '26 10:03

JLo


1 Answers

IIUC,

merge_chat['Full-Chat'] = merge_chat['Full-Chat'].apply(lambda i: re.findall(r"\b\w*user\b\ (.*?)\ *\d\d:\d\d*", i.lower()))

From Pandas 0.25.0 onwards,

merge_chat.explode(column='Full-Chat')

would give you the result

In versions prior to 0.25.0,

df = pd.DataFrame(merge_chat['Full-Chat'].tolist(), index=merge_chat['Ticket-ID']).stack()
df = df.reset_index([0, 'Ticket-ID'])
df.rename(columns={0:'Full-Chat'}, inplace=True)
df
  Ticket-ID Full-Chat
0   1   foo foo foo
1   1   foo foo
2   1   foo
3   2   bar bar bar
4   2   bar bar
like image 90
Suraj Motaparthy Avatar answered Mar 20 '26 00:03

Suraj Motaparthy



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!