Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas dataframe Merge text rows group by ID

I have a dataframe as follows:

ID    Date          Text  
1     01/01/2019    abcd
1     01/01/2019    pqrs
2     01/02/2019    abcd
2     01/02/2019    xyze

I want to merge Text by ID in Python using group by clause.

I want to merge 'Text' columns by grouping ID.

ID    Date        Text
1     01/01/2019  abcdpqrs
2     01/02/2019  abcdxyze

I want to do this in Python.

I have attempted following code chunks but it didn't work:

  1. groups = groupby(dataset_new, key=ID(1))

  2. dataset_new.group_by{row['Reference']}.values.each do |group| puts [group.first['Reference'], group.map{|r| r['Text']} * ' '] * ' | ' end

I also attempted to merge text in excel using formulas but it is also not giving required results.

like image 203
Parag Avatar asked Oct 16 '22 07:10

Parag


1 Answers

Try groupby and sum. Judging from your error message and the output of df.info() it seems there are mixed dtypes and NaN in column Text. I suggest converting NaN to empty string using fillna(''), then convert all elements in the column to string using astype(str).

df = pd.DataFrame({'ID': [1,1,2,2], 
                   'Date': ['01/01/2019', '01/01/2019', '01/02/2019', '01/02/2019'],
                   'Text': ['abcd', 'pqrs', 'abcd', 'xyze']})

df['Text'] = df['Text'].fillna('').astype(str)
df_grouped = df.groupby(['ID', 'Date'])['Text'].sum()
print(df_grouped)

This should return

ID  Date      
1   01/01/2019    abcdpqrs
2   01/02/2019    abcdxyze
like image 90
WolfgangK Avatar answered Oct 21 '22 07:10

WolfgangK