Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to repeat rows by their value by multiple columns and divide back

Let's say I have this dataframe:

> df <- data.frame(A=1:5, B=c(0, 0, 3, 0, 0), C=c(1, 0, 0, 1, 0), D=c(0, 2, 0, 0, 1))
> df
  A B C D
1 1 0 1 0
2 2 0 0 2
3 3 3 0 0
4 4 0 1 0
5 5 0 0 1

How would I go about converting it to:

  A B C D
1 1 0 1 0
2 2 0 0 1
3 2 0 0 1
4 3 1 0 0
5 3 1 0 0
6 3 1 0 0
7 4 0 1 0
8 5 0 0 1

As you can see there are value 2 and 3, I want to repeat them by that length and change the values back to 1. How would I do that?

I also want to duplicate the the A column as you can see.

I tried:

replace(df[rep(rownames(df), select(df, -A)),], 2, 1)

But it gives me an error.

like image 530
U12-Forward Avatar asked Oct 13 '21 03:10

U12-Forward


People also ask

How do you repeat rows in a Dataframe?

repeat(3) will create a list where each index value will be repeated 3 times and df. iloc[df. index. repeat(3),:] will help generate a dataframe with the rows as exactly returned by this list.

How do I split one row into multiple columns in Excel?

Click in a cell, or select multiple cells that you want to split. Under Table Tools, on the Layout tab, in the Merge group, click Split Cells. Enter the number of columns or rows that you want to split the selected cells into.


Video Answer


2 Answers

One option would be to get max value from columns B, C and D using pmax, use uncount to repeat the rows. Use pmin to replace the values greater than 1 to 1.

library(dplyr)
library(tidyr)

df %>%
  mutate(repeat_row = pmax(B, C, D)) %>%
  uncount(repeat_row) %>%
  mutate(across(-A, pmin, 1))

#  A B C D
#1 1 0 1 0
#2 2 0 0 1
#3 2 0 0 1
#4 3 1 0 0
#5 3 1 0 0
#6 3 1 0 0
#7 4 0 1 0
#8 5 0 0 1
like image 162
Ronak Shah Avatar answered Nov 15 '22 00:11

Ronak Shah


Apparently, there's just one value > 0 in columns B to D, so we can exploit the partial rowSums for a replicate call on columns B to D binarized using > 0. So that we can use this in Map, we transpose twice. Rest is cosmetics.

t(do.call(cbind, Map(replicate,
                     rowSums(df[-1]), 
                     as.data.frame(t(cbind(df[1], df[-1] > 0)))))) |>
  as.data.frame() |>
  setNames(names(df))
#   A B C D
# 1 1 0 1 0
# 2 2 0 0 1
# 3 2 0 0 1
# 4 3 1 0 0
# 5 3 1 0 0
# 6 3 1 0 0
# 7 4 0 1 0
# 8 5 0 0 1

Note: R>=4.1 used.

like image 39
jay.sf Avatar answered Nov 15 '22 00:11

jay.sf