Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Insert new rows into pandas dataframe depending on range of 2 columns

I have a df like:

ID       value1   start     end
 1         100     1        2       
 1         200     2        4
 2         50      1        1
 2         30      3        5 

I want to fill in new rows for each value between start and end

The final df should look like this where period is each unique value between start and end

    ID       value1      period     
     1         100     1        
     1         100     2            
     1         200     2 
     1         200     3 
     1         200     4       
     2         50      1        
     2         30      3    
     2         30      4 
     2         30      5     
like image 362
user2963882 Avatar asked Oct 28 '22 18:10

user2963882


2 Answers

After using for loop create the list of column with range , this become an unnesting problem

df['New']=[list(range(x,y+1)) for x , y in zip(df.start,df.end)]
Yourdf=unnesting(df,['New'])
Yourdf
   New  ID  value1  start  end
0    1   1     100      1    2
0    2   1     100      1    2
1    2   1     200      2    4
1    3   1     200      2    4
1    4   1     200      2    4
2    1   2      50      1    1
3    3   2      30      3    5
3    4   2      30      3    5
3    5   2      30      3    5
like image 86
BENY Avatar answered Nov 15 '22 06:11

BENY


Or use the below, list comprehension:

>>> df['period']=[','.join(map(str,range(x,y+1))) for x,y in zip(df['start'],df['end'])]
>>> df.set_index(['ID','value1','start','end']).stack().str.split(',', expand=True).stack().unstack(-2).reset_index(-1,drop=True).reset_index()
   ID  value1  start  end period
0   1     100      1    2      1
1   1     100      1    2      2
2   1     200      2    4      2
3   1     200      2    4      3
4   1     200      2    4      4
5   2      30      3    5      3
6   2      30      3    5      4
7   2      30      3    5      5
8   2      50      1    1      1
>>> 

If you want to drop columns:

>>> df['period']=[','.join(map(str,range(x,y+1))) for x,y in zip(df.pop('start'),df.pop('end'))]
>>> df.set_index(['ID','value1']).stack().str.split(',', expand=True).stack().unstack(-2).reset_index(-1,drop=True).reset_index()
   ID  value1 period
0   1     100      1
1   1     100      2
2   1     200      2
3   1     200      3
4   1     200      4
5   2      30      3
6   2      30      4
7   2      30      5
8   2      50      1
>>> 
like image 39
U12-Forward Avatar answered Nov 15 '22 06:11

U12-Forward