Lets say I start with a dataframe that has some data and a column of quantities:
In: df=pd.DataFrame({'first-name':['Jan','Leilani'],'Qty':[2,4]})
Out: Qty first-name
2 Jan
4 Leilani
I want to create a dataframe that copies and labels the data into new lines a number of times equal to the quantity on each line. Here is what the output should look like:
Qty first-name position
2 Jan 1
2 Jan 2
4 Leilani 1
4 Leilani 2
4 Leilani 3
4 Leilani 4
I can do this using python like so:
l=[]
x=0
for idx in df.index:
x=0
for _ in range(df.loc[idx]['Qty']):
x+=1
tempSrs=df.loc[idx]
tempSrs['position']=x
l.append(tempSrs)
outDf=pd.DataFrame(l)
This is very slow. Is there a way to do this using pandas functions? This is effectively an "unpivot", which in pandas is "melt", but I wasn't able to figure out how to use the melt function to accomplish this.
Thanks,
With repeat
and cumcount
Newdf=df.reindex(df.index.repeat(df.Qty))
Newdf['position']=Newdf.groupby(level=0).cumcount()+1
Newdf
Out[931]:
Qty first-name position
0 2 jan 1
0 2 jan 2
1 4 jay 1
1 4 jay 2
1 4 jay 3
1 4 jay 4
The differences are:
loc
instead of reindex
(same thing)assign
instead of =
assignment (assign
produces a copy)lambda
to assign
to embed groupby
logicdf.loc[df.index.repeat(df.Qty)].assign(
position=lambda d: d.groupby('first-name').cumcount() + 1
)
Qty first-name position
0 2 jan 1
0 2 jan 2
1 4 jay 1
1 4 jay 2
1 4 jay 3
1 4 jay 4
np.arange
q = df.Qty.values
r = np.arange(q.sum()) - np.append(0, q[:-1]).cumsum().repeat(q) + 1
df.loc[df.index.repeat(q)].assign(position=r)
Qty first-name position
0 2 jan 1
0 2 jan 2
1 4 jay 1
1 4 jay 2
1 4 jay 3
1 4 jay 4
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With