Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

more than one value with comma in dataframe pandas

Tags:

python

pandas

colum 1, colum2 a,b,c 30 b,c,f 40 a,g,z 50 . . . Using above dataframe with col1,2 I'd like to have dataframe as below dataframe with col3, 4. Additionally, col1 consists of values with commas. col4 consists of sum of col2 following col3. column3, column4 a 80 b 70 c 70 f 40 g 50 z 50

like image 811
Ken Kim Avatar asked Jul 26 '18 05:07

Ken Kim


1 Answers

Use:

df = (df.set_index('colum2')['colum1']
        .str.split(',', expand=True)
        .stack()
        .reset_index(name='column3')
        .groupby('column3', as_index=False)['colum2']
        .sum()
        .rename(columns={'colum2':'column4'})
      )
print (df)
  column3  column4
0       a       80
1       b       70
2       c       70
3       f       40
4       g       50
5       z       50

Explanation:

  1. First set_index by column colum2
  2. Create DataFrame by split
  3. Reshape by stack
  4. Create index by columns by reset_index
  5. groupby and aggregate sum
  6. Last rename column if necessary

Another solution:

from itertools import chain

a = df['colum1'].str.split(',')
lens = a.str.len()

df = pd.DataFrame({
    'column3' : list(chain.from_iterable(a)), 
    'column4' : df['colum2'].repeat(lens)
}).groupby('column3', as_index=False)['column4'].sum()

print (df)
  column3  column4
0       a       80
1       b       70
2       c       70
3       f       40
4       g       50
5       z       50

Explanation:

  1. Create lists by split
  2. Get lengths of lsits by len
  3. Last repeat columns and flatten colum1
  4. groupby and aggregate sum
like image 88
jezrael Avatar answered Oct 03 '22 20:10

jezrael