I have a sheet I need to transform from data being stored horizontally to vertically, i.e. from:
| Company ID | DoB | Name | DoB | Name | DoB | Name |
|---|---|---|---|---|---|---|
| ID 1 | DoB 1 | Name 1 | DoB 2 | Name 2 | DoB 3 | Name 3 |
| ID 2 | DoB 4 | Name 4 | DoB 5 | Name 5 |
To:
| Company ID | DoB | Name |
|---|---|---|
| ID 1 | DoB 1 | Name 1 |
| ID 1 | DoB 2 | Name 2 |
| ID 1 | DoB 3 | Name 3 |
| ID 2 | DoB 4 | Name 4 |
| ID 2 | DoB 5 | Name 5 |
The data is structured such that the DoB/name entities always occur periodically on rows as illustrated above. The number of entities stored horizontally on each row can vary from none to 16.
How would one go about solving this in Python/Pandas (or something else)? This is a one-time thing, so performance is not really an issue.
Thankful for any help!
You can use:
out = (df
.set_index('Company ID')
.pipe(lambda d: d.set_axis(
pd.MultiIndex
.from_arrays([d.columns,
d.groupby(level=0, axis=1).cumcount().add(1)]),
axis=1)
)
.stack()
.reset_index().drop(columns='level_1')
)
Alternative:
out = pd.concat([g.stack().droplevel(1).rename(k)
for k,g in df.set_index('Company ID').groupby(level=0, axis=1)],
axis=1).reset_index()
output:
Company ID DoB Name
0 ID 1 DoB 1 Name 1
1 ID 1 DoB 2 Name 2
2 ID 1 DoB 3 Name 3
3 ID 2 DoB 4 Name 4
4 ID 2 DoB 5 Name 5
I would use stack in a for loop here:
res = pd.DataFrame()
for col in df.columns.drop('Company ID').unique():
new_df = ( df.set_index('Company ID')[col].stack().reset_index().
rename({0:col}, axis=1)[['Company ID', col]] )
res = pd.concat([res, new_df], axis=1)
# drop duplicated columns coming from stack
res = res.loc[:, ~res.columns.duplicated()]
Output:
Company ID DoB Name
0 ID 1 DoB 1 Name 1
1 ID 1 DoB 2 Name 2
2 ID 1 DoB 3 Name 3
3 ID 2 DoB 4 Name 4
4 ID 2 DoB 5 Name 5
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With