I have a df like:
ID value1 start end
1 100 1 2
1 200 2 4
2 50 1 1
2 30 3 5
I want to fill in new rows for each value between start and end
The final df should look like this where period is each unique value between start and end
ID value1 period
1 100 1
1 100 2
1 200 2
1 200 3
1 200 4
2 50 1
2 30 3
2 30 4
2 30 5
After using for loop create the list of column with range
, this become an unnesting problem
df['New']=[list(range(x,y+1)) for x , y in zip(df.start,df.end)]
Yourdf=unnesting(df,['New'])
Yourdf
New ID value1 start end
0 1 1 100 1 2
0 2 1 100 1 2
1 2 1 200 2 4
1 3 1 200 2 4
1 4 1 200 2 4
2 1 2 50 1 1
3 3 2 30 3 5
3 4 2 30 3 5
3 5 2 30 3 5
Or use the below, list comprehension:
>>> df['period']=[','.join(map(str,range(x,y+1))) for x,y in zip(df['start'],df['end'])]
>>> df.set_index(['ID','value1','start','end']).stack().str.split(',', expand=True).stack().unstack(-2).reset_index(-1,drop=True).reset_index()
ID value1 start end period
0 1 100 1 2 1
1 1 100 1 2 2
2 1 200 2 4 2
3 1 200 2 4 3
4 1 200 2 4 4
5 2 30 3 5 3
6 2 30 3 5 4
7 2 30 3 5 5
8 2 50 1 1 1
>>>
If you want to drop columns:
>>> df['period']=[','.join(map(str,range(x,y+1))) for x,y in zip(df.pop('start'),df.pop('end'))]
>>> df.set_index(['ID','value1']).stack().str.split(',', expand=True).stack().unstack(-2).reset_index(-1,drop=True).reset_index()
ID value1 period
0 1 100 1
1 1 100 2
2 1 200 2
3 1 200 3
4 1 200 4
5 2 30 3
6 2 30 4
7 2 30 5
8 2 50 1
>>>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With