Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create n rows per id | Pandas

I have a Dataframe df as follows:

id lob addr addr2
a1 001 1234 0
a1 001 1233 0
a3 003 1221 0
a4 009 1234 0

I want to generate n (let's take 4) rows per id, with the other columns being null/na/nan values. So, the above table is to be transformed to:

id lob addr addr2
a1 001 1234 0
a1 001 1233 0
a1 001 na na
a1 na na na
a3 003 1221 0
a3 na na na
a3 na na na
a3 na na na
a4 009 1234 0
a4 na na na
a4 na na na
a4 na na na

How can I achieve this? I will have anywhere from 500-700 ids at the time of execution and the n will always be 70 (so each id should have 70 rows).

I wanted to create a loop that would create a row, do a group by id, see if it's less than 70 and repeat the process but it would end up doing a lot of unnecessary operations.

like image 596
Harsha Avatar asked Apr 12 '21 17:04

Harsha


1 Answers

Here's a solution using Counter to count how many extra rows you need for each ID, and then just appending the new data:

from collections import Counter
id_count = Counter(df['id'])
# Create lists of each id repeated the number of times each is needed:
n = 4
id_values = [[i] * (n - id_count[i]) for i in id_count.keys()]
# Flatten to a single list:
id_values = [i for s in id_values for i in s]
# Create as new DataFrame and append to existing data:
new_data = pd.DataFrame({"id": id_values})
df = df.append(new_data).sort_values(by="id")
like image 143
Toby Petty Avatar answered Sep 30 '22 12:09

Toby Petty